Abstract
This paper describes a novel method to couple a standalone database management system (DBMS) with a highly scalable key-value store. The system employs Apache Cassandra as data storage and the extensible DBMS Secondo as a query processing engine. The resulting system is a distributed, general-purpose DBMS which is highly scalable and fault tolerant. The logical ring of Cassandra is used to split up input data into smaller units of work (UOWs), which can be processed independently. A decentralized algorithm is responsible to assign the UOWs to query processing nodes. In case of a node failure, UOWs are recalculated on a different node. All the data models (e.g. relational, spatial and spatio-temporal) and functions (e.g. filter, aggregates, joins and spatial-joins) implemented in Secondo can be used in a scalable way without changing the implementation. Many aspects of the distribution are hidden from the user. Existing sequential queries can be easily converted into parallel ones.
Similar content being viewed by others
Notes
Some special functions, like the interaction with other distributed systems, are excluded.
In Secondo, nested lists are used at some points to interchange structured data. For example: ((value1 value2) (value3)).
The two cases \(begin = p_0\) and \(end = p_n\) are ignored in the description to keep the examples clear.
Phase 3 is influenced by the speculative task execution of Hadoop [8, p. 3]. The table system_pending prevents, that all idle QPNs are processing the same UOW at the same time. This would lead to hot spots (parts of the logical ring that are read or written by many nodes simultaneously) and to longer query processing times.
The part of the logical ring which is read, is determined by the the UOW which is processed at the moment.
Each line contains 5000 characters + 4 field separators (e.g. \(\gg ,\ll \)) + 1 new line character (e.g. \(\gg \textbackslash n\ll \) = 5005 bytes per line. By creating 10,000,000 lines with 5005 bytes each, 46,61 GB data in total is generated.
The data generator creates 46.61 GB of data, 38.84 GB needs to be transferred. With an 1 Gbit/s network link, the transfer takes 333.63 s.
The parallel version executes multiple Secondo-threads on one hardware node. This is the reason, why the parallel version can not use 6 GB memory for each thread. However, UOWs are small and with 1.5 GB memory only one MMR-tree needs to be created. As a consequence, the second relation needs to be analyzed only once.
References
Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D., Silberschatz, A., Rasin, A.: Hadoopdb: an architectural hybrid of mapreduce and dbms technologies for analytical workloads. Proc. VLDB Endow. 2(1), 922–933 (2009)
Apache license, version 2.0. http://www.apache.org/licenses/ (2004). Accessed 30 Jul 2015
Ceri, S., Pelagatti, G.: Distributed Databases Principles and Systems. McGraw-Hill Inc, New York (1984)
Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. In: Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation, OSDI’06, vol. 7, pp. 15–15. USENIX Association, Berkeley (2006)
Corbett, J.C., Dean, J., Epstein, M., Fikes, A., Frost, C., Furman, J.J., Ghemawat, S., Gubarev, A., Heiser, C., Hochschild, P., Hsieh, W., Kanthak, S., Kogan, E., Li, H., Lloyd, A., Melnik, S., Mwaura, D., Nagle, D., Quinlan, S., Rao, R., Rolig, L., Saito, Y., Szymaniak, M., Taylor, C., Wang, R., Woodford, D.: Spanner: Google’s globally-distributed database. In: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI’12, pp. 251–264. USENIX Association, Berkeley (2012)
Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. In: Proceedings of the 6th Conference on Symposium on Operating Systems Design & Implementation, OSDI’04, vol. 6, pp. 10. USENIX Association, Berkeley (2004)
DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: Amazon’s highly available key-value store. SIGOPS Oper. Syst. Rev. 41(6), 205–220 (2007)
Dinun, F., Ng, T.S.E.: Understanding the effects and implications of compute node related failures in hadoop. In: Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, HPDC’12, pp. 187–198. ACM, New York (2012)
Dittrich, J.P., Seeger, B.: Data redundancy and duplicate detection in spatial join processing. In: ICDE, pp. 535–546 (2000)
Düntgen, C., Behr, T., Güting, R.H.: Berlinmod: a benchmark for moving object databases. VLDB J. 18(6), 1335–1368 (2009)
Eldawy, A., Mokbel, M.F.: Pigeon: a spatial mapreduce language. In: IEEE 30th International Conference on Data Engineering, Chicago, ICDE 2014, IL, USA, March 31–April 4, 2014, pp. 1242–1245 (2014)
Eldawy, A., Mokbel, M.F.: SpatialHadoop: a mapreduce framework for spatial data. In: 31st IEEE International Conference on Data Engineering, ICDE 2015, Seoul, South Korea, pp. 1352–1363, 13–17 April 2015
Gantz, J.F., Reinsel, D.: The digital universe in 2020: big data, bigger digital shadow’s, and biggest growth in the far east. In: IDC (2012)
George, L.: HBase: The Definitive Guide. O’Reilly Media Inc, Sebastopol (2011)
Ghemawat, S., Gobioff, H., Leung, S.T.: The google file system. In: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles. SOSP’03, pp. 29–43. ACM, New York (2003)
Güting, R.H.: Operator Based Query Progress Estimation. Fern Universität in Hagen, Hagen (2008)
Güting, R.H., Behr, T., Düntgen, C.: Secondo: a platform for moving objects database research and for publishing and integrating research implementations. IEEE Data Eng. Bull. 33(2), 56–63 (2010)
Idreos, S., Liarou, E., Koubarakis, M.: Continuous multi-way joins over distributed hash tables. In: Proceedings of the 11th International Conference on Extending Database Technology: Advances in Database Technology. EDBT’08, pp. 594–605. ACM, New York (2008)
Karger, D., Lehman, E., Leighton, T., Panigrahy, R., Levine, M., Lewin, D.: Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the world wide web. In: Proceedings of the Twenty-ninth Annual ACM Symposium on Theory of Computing, STOC’97, pp. 654–663. ACM, New York (1997)
Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)
Lamport, L.: The part-time parliament. ACM Trans. Comput. Syst. 16(2), 133–169 (1998)
Leach, P., Mealling, M., Salz, R.: RFC 4122: A Universally Unique IDentifier (UUID) URN Namespace (2005)
Lu, J., Güting., R.H.: Parallel secondo: boosting database engines with hadoop. In: 2013 International Conference on Parallel and Distributed Systems, pp. 738–743 (2012)
Nidzwetzki, J.K.: Entwicklung eines skalierbaren und verteilten Datenbanksystems. Springer, Berlin (2016)
Nidzwetzki, J.K., Güting, R.H.: Distributed SECONDO: a highly available and scalable system for spatial data processing. In: Advances in spatial and temporal databases—14th international symposium, SSTD 2015, Hong Kong, China, pp. 491–496, 26–28 August 2015
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. SIGMOD’08, pp. 1099–1110. ACM, New York (2008)
Özsu, M.T., Valduriez, P. (eds.): Principles of Distributed Database Systems, vol. 3. Springer, New York (2011)
Palma, W., Akbarinia, R., Pacitti, E., Valduriez, P.: Distributed processing of continuous join queries using DHT networks. In: Proceedings of the 2009 EDBT/ICDT Workshops. EDBT/ICDT’09, pp. 34–41. ACM, New York (2009)
Patel, J.M., DeWitt, D.J.: Partition based spatial-merge join. SIGMOD Rec. 25(2), 259–270 (1996)
Rothnie, J.B., Goodman, N.: A survey of research and development in distributed database management. In: Proceedings of the Third International Conference on Very Large Data Bases, VLDB’77, vol. 3, pp. 48–62. VLDB Endowment (1977)
Rothnie, J.B., Bernstein, P.A., Fox, S., Goodman, N., Hammer, M., Landers, T.A., Reeve, C., Shipman, D.W., Wong, E.: Introduction to a system for distributed databases (SDD-1). ACM Trans. Database Syst. 5(1), 1–17 (1980)
Shute, J., Oancea, M., Ellner, S., Handy, B., Rollins, E., Samwel, B., Vingralek, R., Whipkey, C., Chen, X., Jegerlehner, B., Littleield, K., Tong, P.: F1: the fault-tolerant distributed RDBMS supporting googles ad business. In: SIGMOD, 2012. Talk given at SIGMOD (2012)
Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: a scalable peer-to-peer lookup service for internet applications. SIGCOMM Comput. Commun. Rev. 31(4), 149–160 (2001)
Tanenbaum, A.S., Steen, Mv: Distributed Systems: Principles and Paradigms, vol. 2. Prentice-Hall, Inc., Upper Saddle River (2006)
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive: a warehousing solution over a map-reduce framework. Proc. VLDB Endow. 2(2), 1626–1629 (2009)
Transaction Processing Performance Council. TPC BENCHMARK H (Decision Support) Standard Specification. http://www.tpc.org/tpch/. Accessed 15 May 2015
Vogels, W.: Eventually consistent. Commun. ACM 52(1), 40–44 (2009)
Website of Apache Drill. http://drill.apache.org (2015). Accessed 20 July 2015
Website of Apache Spark. http://spark.apache.org/ (2015). Accessed 20 Jul 2015
Website of cpp-driver for Cassandra. https://github.com/datastax/cpp-driver (2015). Accessed 15 Sept 2015
Website of distributed secondo http://dna.fernuni-hagen.de/secondo/DSecondo/DSECONDO-Website/index.html (2015). Accessed 15 Nov 2015
Website of the Open Street Map Project. http://www.openstreetmap.org (2015). Accessed 09 July 2015
White, T.: Hadoop: The Definitive Guide, 1st edn. O’Reilly Media Inc., Sebastopol (2009)
Xie, D., Li, F., Yao, B., Li, G., Zhou, L., Guo, M.: Simba: efficient in-memory spatial analytics. In: Proceedings of the 2016 International Conference on Management of Data. SIGMOD’16, pp. 1071–1085. ACM, New York (2016)
You, S., Zhang, J., Gruenwald, L.: Large-scale spatial join query processing in cloud. Technical Report http://www-cs.ccny.cuny.edu/~jzhang/papers/spatial_cc_tr.pdf (2016). Accessed 14 Mar 2017
Zhang, S., Han, J., Liu, Z., Wang, K., Xu, Z.: SJMR: parallelizing spatial join with mapreduce on clusters. In: Proceedings of the 2009 IEEE International Conference on Cluster Computing, August 31–September 4, 2009, New Orleans, Louisiana, USA, pp. 1–8 (2009)
Author information
Authors and Affiliations
Corresponding author
Queries of the experiments
Queries of the experiments
Rights and permissions
About this article
Cite this article
Nidzwetzki, J.K., Güting, R.H. Distributed secondo: an extensible and scalable database management system. Distrib Parallel Databases 35, 197–248 (2017). https://doi.org/10.1007/s10619-017-7198-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10619-017-7198-9