Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Cloud implementation of the K-means algorithm for hyperspectral image analysis

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Remotely sensed hyperspectral imaging offers the possibility to collect hundreds of images, at different wavelength channels, for the same area on the surface of the Earth. Hyperspectral images are characterized by their large volume and dimensionality, which makes their processing and storage difficult. As a result, several techniques have been developed in previous years to perform hyperspectral image analysis on high-performance computing architectures. However, the application of cloud computing techniques has not been as widespread. There are many potential advantages in exploiting cloud computing architectures for distributed hyperspectral image analysis. In this paper, we present a cloud implementation (developed using Apache Spark) of the popular K-means algorithm for unsupervised hyperspectral image clustering. The experimental results suggest that cloud architectures allow for the efficient distributed processing of large hyperspectral image data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. http://www.enmap.org/.

  2. http://hadoop.apache.org.

  3. http://spark.apache.org/.

  4. https://wiki.openstack.org/wiki/Main_Page.

  5. https://spark.apache.org/docs/latest/mllib-guide.html.

  6. Since the K-means may not find the optimal overall solution, it is recommended to run it several times to converge to a better final solution. So, if \(runs>1\), for each iteration the total number of sets with different centroids that will be executed equals the number of runs.

  7. After iterating, the algorithm takes only the best solution reached.

  8. Available online: https://engineering.purdue.edu/~biehl/MultiSpec/hyperspectral.html.

References

  1. Arthur D, Vassilvitskii S (2007) K-means++: The Advantages of Careful Seeding. In: ACM (ed.) Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035. Society for Industrial and Applied Mathematics, New Orleans, Louisiana . 1283494

  2. Bahmani B, Moseley B, Vattani A, Kumar R, Vassilvitskii S (2012) Scalable K-means++. Proc VLDB Endow (PVLDB) 5(7):622–633

    Article  Google Scholar 

  3. Chang CI (2003) Hyperspectral imaging: techniques for spectral detection and classification. Kluwer Academic/Plenum Publishers, New York

    Book  Google Scholar 

  4. Green RO, Eastwood ML, Sarture CM, Chrien TG, Aronsson M, Chippendale BJ, Faust JA, Pavri BE, Chovit CJ, Solis M, Olah MR, Williams O (1998) Imaging spectroscopy and the airborne visible/infrared imaging spectrometer (AVIRIS). Remote Sens Environ 65:227–248

    Article  Google Scholar 

  5. Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverman R, Wu AY (2002) An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell 24(7):881–892

    Article  MATH  Google Scholar 

  6. León G, Molero JM, Garzón EM, García I, Plaza A, Quintana-Ortí ES (2015) Exploring the performance-power-energy balance of low-power multicore and manycore architectures for anomaly detection in remote sensing. J Supercomput 71(5):1893–1906

    Article  Google Scholar 

  7. Manning CD, Raghavan P, Schtze H (2008) Introduction to information retrieval. Cambridge University Press, New York

    Book  MATH  Google Scholar 

  8. Martínez JA, Garzón EM, Plaza A, García I (2011) Automatic tuning of iterative computation on heterogeneous multiprocessors with adithe. J Supercomput 58(2):151–159

    Article  Google Scholar 

  9. Molero JM, Paz A, Garzón EM, Martínez JA, Plaza A, García I (2011) Fast anomaly detection in hyperspectral images with rx method on heterogeneous clusters. J Supercomput 58(3):411–419

    Article  Google Scholar 

  10. Plaza A, Plaza J, Martin G, Sanchez S (2011) Hyperspectral data processing algorithms. In: Prasad AH, Thenkabail S, John G. Lyon (ed.) Hyperspectral remote sensing of vegetation, chap. 5, Taylor and Francis, Abingdon, United Kingdom, pp 121–137

  11. Plaza A, Plaza J, Paz A, Sanchez S (2011) Parallel hyperspectral image and signal processing. IEEE Signal Process Mag 28:196–218

    Article  Google Scholar 

  12. Plaza A, Plaza J, Valencia D (2007) Impact of platform heterogeneity on the design of parallel algorithms for morphological processing of high-dimensional image data. J Supercomput 40(1):81–107

    Article  Google Scholar 

  13. Sevilla J, Bernabe S, Plaza A (2014) Unmixing-based content retrieval system for remotely sensed hyperspectral imagery on gpus. J Supercomput 70(2):588–599

    Article  Google Scholar 

  14. Stehman SV (1997) Selecting and interpreting measures of thematic classification accuracy. Remote Sens Environ 62(1):77–89

    Article  Google Scholar 

  15. Wu Z, Li Y, Plaza A, Li J, Xiao F, Wei Z (2016) Parallel and distributed dimensionality reduction of hyperspectral data on cloud computing architectures. IEEE J Sel Topics Appl Earth Obs Remote Sens 9(6):2270–2278

    Article  Google Scholar 

Download references

Acknowledgments

The authors would like to take this opportunity to gratefully thank the Editors and Anonymous Reviewers for their outstanding comments and suggestions, which greatly helped us improve the technical quality and presentation of the manuscript. This work has been supported by the Spanish Ministry of Science and Education (FPU grants). This work has also been supported by Junta de Extremadura (GR15005 grant). We acknowledge the use of the computing facilities at Extremadura Research Centre for Advanced Technologies (CETA-CIEMAT), funded by the European Regional Development Fund (ERDF), and particularly the system administrators Abel Francisco Paz Gallardo and Alfonso Pardo Diaz.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Javier Plaza.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Haut, J.M., Paoletti, M., Plaza, J. et al. Cloud implementation of the K-means algorithm for hyperspectral image analysis. J Supercomput 73, 514–529 (2017). https://doi.org/10.1007/s11227-016-1896-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-016-1896-3

Keywords