In this section, the results for the Ped-Cross GAN will be introduced and analyzed. To deliver insight in to the generated results, the results are validated in a number of ways, which will be outlined in
Section 6.1. The validated results themselves will be shown in
Section 6.2, while selected generated results will be shown in visual form in
Section 6.3.
6.1. Validation Method
When training and testing GANs, it is important to avoid a self fulfilling prophecy. That being, when the generated results are tested on the very same data in which the discriminator was trained using in the GAN itself. To avoid this, the subset of the Pedestrian Scenario dataset was divided, to keep 20% of the samples to one side, so that they could be used for testing and validation. In this case, this meant that 1802 testing samples were available, 1802 samples which Ped-Cross GAN would have never seen prior.
The fully trained Ped-Cross GAN is capable of generating as many, or as few, new samples as is required. For that reason, several different number of samples were generated for testing the success of the GAN. While 5000 samples for each class was the number of samples that provided the most favorable and balanced results, validation efforts were also carried out on fewer and greater samples for each class.
The validation methods used were in the form of a simple LSTM classifier network. This network would see each pose sample in a sequence length of five, and output a classification score. In this case, the classifier would classify whether the pose sequence was one of crossing from the left, or a pose sequence of crossing from the right.
The architecture of the validation LSTM was very similar to the LSTM in the discriminator in Ped-Cross GAN. The reason for this is that both networks are essentially doing a very similar classification task. The discrimintator in Ped-Cross GAN is trying to classify between two classes, whether a sample is real or fake. Whereas, the validation LSTM is also classifying between two classes, the two movement classes previously defined.
Therefore, the LSTM was constructed as a single LSTM block, which contained 400 hidden units. It accepted an input dimension of 34, a sequence dimension of 5, and an output dimension of 1. The only slight difference between this LSTM and the discriminator was that of the final layer. In the discriminator, a non integer value on the classification was acceptable as the Generator could use this to learn. In this LSTM, a firmer decision on the classification of a sample was desired, so therefore the final layer was a fully connected layer, with a softmax activation function, so that the classification for any particular sample would be mutually exclusive of any other class. The classifier was trained with a batch size of 16 and for 20 epochs.
This validation using the classifier was carried out in two ways. The two methods sound very similar in practice, however they harbor different results and importantly, different insights that can be taken from the training in Ped-Cross GAN.
The first validation method was to train the classifier on the 1802 samples, which were kept from the Pedestrian Scenario dataset. This trained classifier was then used to test the 10,000 newly generated samples from Ped-Cross GAN. This will be called normal validation throughout the discussion.
The second validation was to train the classifier on the 10,000 newly generated samples from Ped-Cross GAN. Then this trained classifier was used to test how well it could classify the real 1802 samples kept aside from the Pedestrian Scenario dataset. This will be called reverse validation throughout the discussion.
6.3. Visual Results
The results presented in
Section 6.2.1 and
Section 6.2.2 show a disparity in results when attempting to validate Ped-Cross GAN in two very similar ways. One way to display some of the reasons behind these two results is to visually inspect some of the generated samples, and to try understand why they led to the results seen in
Section 6.2.2.
The generated pose sequences in
Figure 10 and
Figure 11 have been chosen by the authors of this paper. After viewing hundreds of samples, the pose sequences were cherry picked, with a view to give a good idea of what was observed in the generated pose sequences.
It is clear to see that the Generator in the Ped-Cross GAN has provided some errors, especially when looking at Class 1 (crossing from the left).
Figure 10 shows some selected generated pose sequences. In the first four rows, the results are promising, where it clearly looks like a human crossing from the left. However, when consulting the final four rows, there are several issues that are clearly apparent. Specifically, row 6 appears to start the generated sequence well, before it encounters an error, which causes the human-like form to disappear.
The results in
Figure 11 are far better. Unlike Class 1, Class 4 (crossing from the right) did not show any erratic visual errors. In all the visual trials, not one observed sample from Class 4 appeared to show any great error. On the one hand this is a good result, it means that some confidence can be had in Ped-Cross GAN and how it had applied its learning to its Generator. However, on the other hand, it is noted that there is a strong similarity between generated sequences for crossing from the right.