Abstract
Modern architectures of data acquisition and processing often consider low-cost and low-power devices that can be bound together to form a distributed infrastructure. In this paper we overview possibilities to organize a distributed computing testbed based on microcomputers similar to Raspberry Pi and Intel Edison. The goal of the research is to investigate and develop a scheduler for orchestrating distributed data processing and general purpose computations on such unreliable and resource-constrained hardware. Also we consider integration of the scheduler with well-known distributed data processing framework Apache Spark. We outline the project carried out in collaboration with Siemens LLC to compare different configurations of the hardware and software deployment and evaluate performance and applicability of the tools to the testbed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Apache spark official website. http://spark.apache.org/
B.A.T.M.A.N. official web page. https://www.open-mesh.org/projects/open-mesh/wiki
Cox, S.J., Cox, J.T., Boardman, R.P., Johnston, S.J., Scott, M., Obrien, N.S.: Iridis-pi: a low-cost, compact demonstration cluster. Cluster Comput. 17(2), 349–358 (2014)
Fox, K., Mongan, W.M., Popyack, J.: Raspberry hadoopi: a low-cost, hands-on laboratory in big data and analytics. In: SIGCSE, p. 687 (2015)
Gankevich, I., Tipikin, Y., Gaiduchok, V.: Subordination: cluster management without distributed consensus. In: 2015 International Conference on High Performance Computing & Simulation (HPCS), pp. 639–642. IEEE (2015)
Gankevich, I., Tipikin, Y., Korkhov, V., Gaiduchok, V.: Factory: non-stop batch jobs without checkpointing. In: 2016 International Conference on High Performance Computing & Simulation (HPCS), pp. 979–984. IEEE (2016)
Gankevich, I., Tipikin, Y., Korkhov, V., Gaiduchok, V., Degtyarev, A., Bogdanov, A.: Factory: master node high-availability for big data applications and beyond. In: Gervasi, O., et al. (eds.) ICCSA 2016, Part II. LNCS, vol. 9787, pp. 379–389. Springer, Cham (2016). doi:10.1007/978-3-319-42108-7_29
Hajji, W., Tso, F.P.: Understanding the performance of low power raspberry pi cloud for big data. Electronics 5(2), 29 (2016)
Kaewkasi, C., Srisuruk, W.: A study of big data processing constraints on a low-power hadoop cluster. In: 2014 International Conference on Computer Science and Engineering Conference (ICSEC), pp. 267–272. IEEE (2014)
Laskowski, J.: Mastering apache spark 2.0. https://www.gitbook.com/book/jaceklaskowski/mastering-apache-spark/details
Acknowledgments
The research was supported by Siemens LLC.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Korkhov, V. et al. (2017). Distributed Data Processing on Microcomputers with Ascheduler and Apache Spark. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2017. ICCSA 2017. Lecture Notes in Computer Science(), vol 10408. Springer, Cham. https://doi.org/10.1007/978-3-319-62404-4_28
Download citation
DOI: https://doi.org/10.1007/978-3-319-62404-4_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62403-7
Online ISBN: 978-3-319-62404-4
eBook Packages: Computer ScienceComputer Science (R0)