Abstract
Remotely sensed hyperspectral imaging offers the possibility to collect hundreds of images, at different wavelength channels, for the same area on the surface of the Earth. Hyperspectral images are characterized by their large volume and dimensionality, which makes their processing and storage difficult. As a result, several techniques have been developed in previous years to perform hyperspectral image analysis on high-performance computing architectures. However, the application of cloud computing techniques has not been as widespread. There are many potential advantages in exploiting cloud computing architectures for distributed hyperspectral image analysis. In this paper, we present a cloud implementation (developed using Apache Spark) of the popular K-means algorithm for unsupervised hyperspectral image clustering. The experimental results suggest that cloud architectures allow for the efficient distributed processing of large hyperspectral image data sets.
Similar content being viewed by others
Notes
Since the K-means may not find the optimal overall solution, it is recommended to run it several times to converge to a better final solution. So, if \(runs>1\), for each iteration the total number of sets with different centroids that will be executed equals the number of runs.
After iterating, the algorithm takes only the best solution reached.
Available online: https://engineering.purdue.edu/~biehl/MultiSpec/hyperspectral.html.
References
Arthur D, Vassilvitskii S (2007) K-means++: The Advantages of Careful Seeding. In: ACM (ed.) Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035. Society for Industrial and Applied Mathematics, New Orleans, Louisiana . 1283494
Bahmani B, Moseley B, Vattani A, Kumar R, Vassilvitskii S (2012) Scalable K-means++. Proc VLDB Endow (PVLDB) 5(7):622–633
Chang CI (2003) Hyperspectral imaging: techniques for spectral detection and classification. Kluwer Academic/Plenum Publishers, New York
Green RO, Eastwood ML, Sarture CM, Chrien TG, Aronsson M, Chippendale BJ, Faust JA, Pavri BE, Chovit CJ, Solis M, Olah MR, Williams O (1998) Imaging spectroscopy and the airborne visible/infrared imaging spectrometer (AVIRIS). Remote Sens Environ 65:227–248
Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverman R, Wu AY (2002) An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell 24(7):881–892
León G, Molero JM, Garzón EM, García I, Plaza A, Quintana-Ortí ES (2015) Exploring the performance-power-energy balance of low-power multicore and manycore architectures for anomaly detection in remote sensing. J Supercomput 71(5):1893–1906
Manning CD, Raghavan P, Schtze H (2008) Introduction to information retrieval. Cambridge University Press, New York
Martínez JA, Garzón EM, Plaza A, García I (2011) Automatic tuning of iterative computation on heterogeneous multiprocessors with adithe. J Supercomput 58(2):151–159
Molero JM, Paz A, Garzón EM, Martínez JA, Plaza A, García I (2011) Fast anomaly detection in hyperspectral images with rx method on heterogeneous clusters. J Supercomput 58(3):411–419
Plaza A, Plaza J, Martin G, Sanchez S (2011) Hyperspectral data processing algorithms. In: Prasad AH, Thenkabail S, John G. Lyon (ed.) Hyperspectral remote sensing of vegetation, chap. 5, Taylor and Francis, Abingdon, United Kingdom, pp 121–137
Plaza A, Plaza J, Paz A, Sanchez S (2011) Parallel hyperspectral image and signal processing. IEEE Signal Process Mag 28:196–218
Plaza A, Plaza J, Valencia D (2007) Impact of platform heterogeneity on the design of parallel algorithms for morphological processing of high-dimensional image data. J Supercomput 40(1):81–107
Sevilla J, Bernabe S, Plaza A (2014) Unmixing-based content retrieval system for remotely sensed hyperspectral imagery on gpus. J Supercomput 70(2):588–599
Stehman SV (1997) Selecting and interpreting measures of thematic classification accuracy. Remote Sens Environ 62(1):77–89
Wu Z, Li Y, Plaza A, Li J, Xiao F, Wei Z (2016) Parallel and distributed dimensionality reduction of hyperspectral data on cloud computing architectures. IEEE J Sel Topics Appl Earth Obs Remote Sens 9(6):2270–2278
Acknowledgments
The authors would like to take this opportunity to gratefully thank the Editors and Anonymous Reviewers for their outstanding comments and suggestions, which greatly helped us improve the technical quality and presentation of the manuscript. This work has been supported by the Spanish Ministry of Science and Education (FPU grants). This work has also been supported by Junta de Extremadura (GR15005 grant). We acknowledge the use of the computing facilities at Extremadura Research Centre for Advanced Technologies (CETA-CIEMAT), funded by the European Regional Development Fund (ERDF), and particularly the system administrators Abel Francisco Paz Gallardo and Alfonso Pardo Diaz.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Haut, J.M., Paoletti, M., Plaza, J. et al. Cloud implementation of the K-means algorithm for hyperspectral image analysis. J Supercomput 73, 514–529 (2017). https://doi.org/10.1007/s11227-016-1896-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-016-1896-3