Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Journal of Space Technology, Vol 7, No 1, July 2017 Automatic Target Detection in Satellite Images using Deep Learning Muhammad Jaleed Khan, Adeel Yousaf, Nizwa Javed, Shifa Nadeem, and Khurram Khurshid Abstract— Automatic target detection in satellite images is a challenging problem due to the varying size, orientation and background of the target object. The traditionally engineered features such as HOG, Gabor feature and Hough transform do not work well for huge data of high resolution. Robust and computationally efficient systems are required which can learn presentations from the massive satellite imagery. In this paper, a target detection system for satellite imagery is proposed which uses EdgeBoxes and Convolutional Neural Network (CNN) for classifying target and non-target objects in a scene. The edge information of targets in satellite imagery contains very prominent and concise attributes. EdgeBoxes uses the edge information to filter the set of target proposals. CNN is a deep learning classifier with a high learning capacity and a capability of automatically learning optimum features from training data. Moreover, CNN is invariant to minor rotations and shifts in the target object. Encouraging experimental results have been obtained on a large dataset which shows the optimum performance and robustness of our system in complex scenes. Index Terms— Convolutional Neural Networks, Learning, EdgeBoxes, Satellite Images, Target Detection. Deep I. INTRODUCTION A detection of military targets such as oil tanks, aircrafts, artillery, etc. in high resolution satellite imagery has great significance in military applications. With the rapid development of satellite imaging and geographic information systems, a large amount of high resolution images can be acquired effortlessly from Google Earth. The nonhyperspectral image data has been used in many civil and military applications. Various techniques and features have been proposed so far for automatic target detection in satellite imagery. Zhang et al. UTOMATIC Muhammad Jaleed Khan is with the Department of Electrical Engineering, Institute of Space Technology, Islamabad, Pakistan. (Phone: +92316-4101087; mjk093@gmail.com). Adeel Yousaf is with the Department of Avionics Engineering, Institute of Space Technology, Islamabad, Pakistan. (adeelyousaf1993@gmail.com). Nizwa Javed is with the Department of Electrical Engineering, Institute of Space Technology, Islamabad, Pakistan. (niz.jvd@gmail.com). Shifa Nadeem is with the Department of Electrical Engineering, Institute of Space Technology, Islamabad, Pakistan. (shifanadeem93@gmail.com). Khurram Khurshid is with the Department of Electrical Engineering, Institute of Space Technology, Islamabad, Pakistan. (khurram.khurshid@ist.edu.pk). [1] developed a hierarchical algorithm based on Adaboost classifier and HOG feature for detection of oil tanks. Han et al. in [2] proposed a method based on graph search strategy and improved Hough Transform for detection of oil tanks in satellite imagery. Yildiz et al. in [3] employed Gabor feature and used SVM classifier to detect different aircrafts. Gabor filter is also employed by authors in [4,5] for road crack detection in aerial images and settlement zone detection is satellite images respectively. Hsieh et al. in [6] employed Zernike moments, aircraft contour and wavelets and used SVM classifier for the detection of aircrafts in satellite images. Most of the methods discussed above use hand-crafted features and work effectively in their scenes only. Deep learning is a very effective method for learning optimum features directly from huge training dataset automatically. Now a day in numerous applications computer vision along with deep learning have outperformed humans. Furthermore, the use of Graphical Processing Units (GPUs) has decreased the training time of deep learning methods. Large databases of labelled data and pre-trained networks are now publicly available. The two popular models of deep learning are Deep Belief Network (DBN) [7] and Convolutional Neural Network (CNN) [8]. CNN is a modern deep learning method which is widely used for image recognition because it is invariant to small rotation and shifts [9]. DBN is a probabilistic generative model which is pre-trained as Restricted Boltzmann Machine layer by layer, and then finally tuned by back-propagation algorithm to become a classifier [9]. Chen et al. [10] employed object locating method along with DBN for aircraft detection in satellite images. Saliency has also been used for image classification by various researchers such as Li et al. [11] applied visual symmetry detection and saliency computation for aircraft detection in satellite images. Zhang et al. [12] and Sattar et al. [13] employed saliency and used unsupervised learning for image classification. Identification of fixation points, detection of image regions representing the scene and detection of dominant objects in a scene are the primary goals of saliency. Nevertheless, satellite images often contain several targets and correct localization of each target is required. Saliency cannot be directly employed for automatic target detection in satellite images and it needs the Figure 1.Conceptual level block diagram of the proposed target detection system 44 Automatic Target Detection in Satellite Images using Deep Learning (a) Aircraft Patches (b) Non-Aircraft Patches computational efficiency of target detection, such as EdgeBoxes [14], Binarized Normed Gradients (BING) [15] and Selective Search [16]. Selective search greedily merges low-level super-pixels to generate object proposals. Ross Girshick et al. [17] proposed the use of selective search along with CNN instead of sliding-window detector and achieved outstanding results on ILSVRC2013 detection dataset. BING generates object proposals based on binarized normed gradients. EdgeBoxes uses the object boundaries in the image as a feature for proposing candidate objects [14]. Moreover, EdgeBoxes are robust to varying size of objects. In satellite images, scale and orientation changes are the main characteristics of targets. Moreover, the edge information of targets in satellite imagery contains very prominent and concise attributes. The major challenges in target detection in satellite imagery include presence of targets in different sizes, different orientations and at very close locations. BING generates very loosely fitting proposals and thus is only suitable at low IoU. Selective search is relatively good in general object detection, but it is considerably slower and doesn’t perform well if the size of objects is rather small. EdgeBoxes provides best tradeoff between speed and quality. An automatic target detection method based on EdgeBoxes and Convolutional Neural Networks (CNN) is proposed in this paper. Fig. 1 illustrates the conceptual level block diagram of the proposed system. We use EdgeBoxes to produce object proposals in the initial stage. The candidate objects proposed by EdgeBoxes are filtered using some geometric checks while maintaining high recall rates. We then feed the potential object proposals to CNN for automatic feature extraction and classification. Finally, the performance of our method is evaluated on a large military target dataset which contains aircraft and non-aircraft patches for training and test satellite images from Google Earth as shown in Fig. 2. The target detection results using EdgeBoxes and CNN are illustrated in Fig. 3. These methods are for panchromatic data. The proposed algorithm can be used for detection of any type of targets. However, we have only detected aircrafts so far due to availability of the aircraft dataset. The literature related to detection of any military target is of interest here. The proposed target detection system is explained in detail in Section II. The experimental analysis is presented in Section III followed by conclusion and future prospects in Section IV. (c) Satellite Images (Airport Scenes) II. THE PROPOSED TARGET DETECTION SYSTEM Figure 2. Examples from the aircrafts dataset. The proposed framework for target detection system is shown in Fig. 1. We detect objects in input image using EdgeBoxes and apply geometric checks to select military targets among the object proposals. A well-trained CNN is used to extract features of the proposed objects and classify them as aircraft or non-aircraft objects. help of other methods such as symmetry detection. CNN is being used in various computer vision applications for the last two decades. The sliding window approach for target detection using CNN is very slow and contrast to mechanisms in the human vision system. Many objectiveness detection methods have been proposed to increase the 45 Automatic Target Detection in Satellite Images using Deep Learning A. Candidate Objects Proposal using EdgeBoxes The edge information of objects is very useful in remote sensing because it contains very prominent and concise attributes as shown. The EdgeBoxes technique presented in [14] leverages the edge information to detect objects. In EdgeBoxes, a single score from contours confined in a bounding box of the candidate object is calculated and edges with high affinity are grouped together using a greedy approach. The affinity between two groups is given by: ( ) (1) where and represents the pairs of groups, the mean orientation of a group and the angle between mean positions of groups and is represented by . The sensitivity of affinity to orientation variations is controlled by γ. The score of a bounding box of a candidate object is given by: (a) Satellite Image. ∑ (2) where width and height of a box are represented by and respectively, the sum of magnitudes of all edges in a group ϵ [0, 1] indicates is represented by . The value of whether contains or not. To normalize the score, the magnitude of edges from box centered in is subtracted to improve the accuracy of EdgeBoxes: ∑ ( ) (3) where has height and width equal to and respectively. For further details on EdgeBoxes, the reader is referred to [14]. B. Candidate Objects Selection The candidate objects proposed by EdgeBoxes are very large in number for classification by CNN. We apply some geometric checks to discard the objects which are unlikely to be aircraft objects. The patches of aircraft in satellite images are generally small in size and square shaped, thus the objects with very large or very small area and high aspect ratio are discarded. The objects left behind in geometric filtering are passed to the CNN for automatic feature extraction and image classification. C. Feature Extraction and Classification using CNN The Convolutional Neural Network (CNN) is a modern deep learning method which is being widely used for image analysis tasks such as image classification and object detection and segmentation. Krizhevsky et al. [18] achieved excellent recognition rates on Large Scale Visual Recognition Challenge dataset using standard backpropagation for training a deep CNN. A CNN consists of several layers: convolutional, activation and pooling layers in alternation followed by a fully connected layer that produces the output. Unlike typical neural networks, only a small region of input neurons known as Local (b) Object Proposals by EdgeBoxes. (c) Target Detection Results. Figure 3. Simulation results of the proposed target detection system on a test image from the aircraft dataset. Receptive Field (LRF) is connected to the hidden neurons. LRF is translated across the image using convolution to map the input to hidden neurons. The hidden layers in CNN learn 46 Automatic Target Detection in Satellite Images using Deep Learning Figure 4. CNN Framework used for feature extraction and classification in the proposed system to detect different features in an image. The weights and biases for all neurons is a hidden layer are the same. Thus, all hidden neurons detect the same features such as edges and blobs in different regions of an image, making the CNN tolerant to translation of objects in an image. Activation transforms the output of each neuron by using activation functions such as Rectified Linear Unit (ReLU) which maps the output of a neuron to the highest positive value, or if the output is negative, ReLU maps it to zero. Pooling reduces the dimensionality of the feature map by condensing the output of small regions of neurons into a single output, thus simplifying the following layers and reducing the number of parameters to learn. The final layer connects the neurons from the last hidden layer to the output neurons which produce the final output. The class probabilities are determined by the value of each node in the final layer. Fig. 4 represents the proposed architecture of CNN. There are a total of five layers, i.e. two convolutional and two pooling layers in alternation followed by a fully connected layer. There are 6 filters of mask size 5×5 in first convolutional and 12 filters in second convolutional layer. Pooling layer field has a size of 2×2. A sigmoid activation function processes the output of the last layer to generate class labels. III. EXPERIMENTAL RESULTS A. Dataset Specifications We use the publicly available military target dataset consisting of panchromatic images [20] for evaluation of the proposed system. The dataset contains 500 aircraft patches, 5000 non-aircraft patches and 26 test images taken from Google Earth. Some patches and test images are shown in Fig. 2. We resize the patches to 32×32 and use them to train the CNN. Hyperspectral images of the same scenes, if available, would definitely improve the classification results and also allow for multi target detection TABLE I EXPERIMENTAL RESULTS ON THE AIRCRAFTS DATASET Dataset Image No. 1. No. of No. of Targets Detected Objects 18 26 No. of Precision Detected (%) Targets 16 62 Recall (%) 2. 7 7 7 100 100 3. 19 26 17 65 89 4. 56 65 49 75 88 5. 30 33 28 85 93 6. 21 25 17 68 81 7. 21 28 15 54 71 8. 15 21 11 52 73 9. 27 34 24 71 89 10. 37 40 35 88 95 11. 7 8 7 88 100 12. 15 21 12 57 80 13. 12 14 12 86 100 14. 23 27 22 81 96 15. 15 19 12 63 80 16. 16 17 16 94 100 17. 16 20 14 70 88 18. 8 8 8 100 100 19. 15 17 15 88 100 20. 7 8 6 75 86 21. 5 5 5 100 100 22. 11 12 11 92 100 23. 12 15 12 80 100 24. 8 12 8 67 100 25. 12 13 11 85 92 26. 13 14 11 79 85 77.9 91.3 Average 89 47 Automatic Target Detection in Satellite Images using Deep Learning Figure 5. Results of the proposed target detection system on some test images from the aircrafts dataset. B. Evaluation Metrics We use precision and recall as our evaluation metrics. They are defined as: The experimentation results clearly show the effectiveness of the proposed target detection system in complex scenes. Fig. 5 also shows that the proposed system can correctly detect aircrafts in satellite images. (4) (5) C. Results The airplane dataset [21] contains 26 test images taken from Google Earth. Some examples are shown in Fig. 2(c). Edge boxes method is used to generate object proposals which are filtered using geometric checks as discussed in Section II. For each detected object in the input image, the trained CNN is used to predict whether it is aircraft or a non-aircraft object. The results of the proposed system on each test image are presented in detail in Table I. The results show that the proposed system reaches the maximum precision and recall both of 100% and achieves an average precision and recall of 77.9% and 91.3% respectively on the complete dataset. The DBN and two fixed scale location based method [10] could not detect all targets because is sensitive to scale variations. The DBN based methods reach a maximum recall rate of only 77.7%. The HOG and SVM based method [1] performs the worst on the aircraft dataset because it is sensitive to orientation changes. BING and CNN based method [1, 2] performs OK when the targets are far away. RCNN [17] reaches a maximum recall rate of 81.68%. The orientation and scale of aircrafts in satellite images vary significantly. The proposed method performs much better than other comparative methods because EdgeBoxes is robust to varying size of objects and CNN is robust to minor rotation and shift. Excellent precision and recall rates have been obtained in images with unvarying illumination and less noise such as 100% precision and 100% recall rates have been obtained on image no. 2, 18 and 21. Relatively lower recall rates are obtained in images where the aircraft objects coincide or overlap with other objects. Lower precision rates with high recall rates are observed in images containing a lot of objects similar to aircraft objects. IV. CONCLUSION Automatic target detection in satellite imagery has great significance in military applications. In our work, we propose the use of EdgeBoxes algorithm for object detection and CNN for classification in satellite images. The EdgeBoxes technique leverages the edge information to detect objects. EdgeBoxes is robust to varying size of objects. CNN effectively learns optimum features directly from huge amount of data automatically. Moreover, CNN is invariant to minor rotations and shifts in the target object. Encouraging experimental results have been obtained on a large dataset. The high precision and recall rates show the optimum performance and robustness of our system in complex scenes. In our future work, we will try to improve the performance of our system and lower the computational cost. We will also apply it in other areas where target detection is used. REFERENCES [1] L. Zhang, Z.W. Shi, X.R. Yu, “A hierarchical oil depot detector in highresolution images with false detection control”, 7th International. Congress on Image and Signal Processing (CISP), 2014, pp. 530-535. [2] X.W. Han, Y.L. Fu, G. Li, “Oil Depots Recognition Based on Improved Hough Transform and Graph Search”, Journal of Electronics & Information Technology, Vol. 33, No.1, pp. 66-72, 2011. [3] C. Yildiz, E. Polat, “Detection of stationary aircrafts from satellite images”, IEEE 19th Conference onSignal Processing Communications Applications (SIU), 2011, pp. 518-521. [4] [5] and H.A. Khan, M. Salman, S. Hussain, K. Khurshid, “Automation of Optimized Gabor Filter Parameter Selection for Road Cracks Detection”, International Journal of Advanced Computer Science and Applications, Vol. 7. No. 3, pp. 269-275, 2016. H. Iftikhar, K. Khurshid, “Fusion of Gabor Filter and Morphological Operators for the Detection of Settlement Zones in Google Earth Satellite Images”, IEEE International Conference on Signal and Image Processing Applications (ICSIPA), Nov 2011. 48 Automatic Target Detection in Satellite Images using Deep Learning [6] J.W. Hsieh, J.M. Chen, C.H. Chuang, K.C. Fan, “Aircraft type recognition in satellite images”, IEE Proceedings - Vision, Image and Signal Processing, Vol. 152, No.3, pp. 307-315, 2005. [7] G.E. Hinton, S. Osindero, Y.W. Teh, “A fast learning algorithm for deep belief networks”, Neural Computation 2006, Vol. 18, No.7, pp.1527-1554, 2006. [8] Y. Lecun, L. Bottou, Y. Bengio, P. Haffner, "Gradient-based learning applied to document recognition," in Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov 1998. [9] G.E. Hinton, “A practical guide to training restricted boltzmann machines”, Neural Networks: Tricks of the Trade - Springer, 2012, pp.599-619. [10] X.Y. Chen, S.M. Xiang, C.L. Liu, C.H. Pan, “Aircraft Detection by Deep Belief Networks”, The 2nd IAPR Asian Conference on Pattern Recognition, 2013, pp. 54-58. [11] W. Li, S.M. Xiang, H. B. Wang, C.H. Pan, “Robust airplane detection in satellite images”, 18th IEEE International Conference on Image Processing (ICIP), pp. 2821-2824, 2011. [12] F. Zhang, B. Du and L. Zhang, "Saliency-Guided Unsupervised Feature Learning for Scene Classification," in IEEE Transactions on Geoscience and Remote Sensing, Vol. 53, No. 4, pp. 2175-2184, April 2015. [13] S. Sattar, H. A. Khan, K. Khurshid, “Optimized Class-Separability in HyperSpectral Images”, 36th IEEE International Geo-science and Remote Sensing Symposium (IGARSS), July 2016 [14] C. L. Zitnick, P. Dollar, “EdgeBoxes: Locating object proposalsfrom edges”, European Conference on Computer Vision Zurich, pp. 391-405, September 2014. [15] M.M. Cheng, Z.M. Zhang, W.Y. Lin, P. Torr, “BING: Binarized normed gradients for objectness estimation at 300fps”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.3286-3293, 2014. [16] J.R. R. Uijlings, K.E. A. van de Sande, T. Gevers, A.W.M. Smeulders, “Selective search for object recognition”, International Journal of Computer Vision - Springer, Vol. 104, No.2, pp.154-171, 2013. [17] R. Girshick, J. Donahue, T. Darrell, J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.580-587, 2014. [18] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet Classification with Deep Convolutional Neural Networks”, The 26th Annual Conference on Neural Information Processing Systems (NIPS), pp. 1097-1105, 2012. [19] A.J. Chen, J.Z. Li, “Automatic recognition method for quasicircular oil depots in satellite remote sensing images”, OptoElectronic Engineering, Vol. 33, No.9, pp. 96-100, 2006. [20] P. Dollar, C.L. Zitnick, “Fast edge detection using structuredforests”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PP, No.99, pp.1, 2014. [21] H. Wu, H. Zhang, J. Zhang and F. Xu, "Fast aircraft detection in satellite images based on convolutional neural networks," 2015 IEEE International Conference on Image Processing (ICIP), Quebec City, QC, pp. 4210-4214, 2015. 49