Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Pangea: monolithic distributed storage for data analytics

Published: 01 February 2019 Publication History

Abstract

Storage and memory systems for modern data analytics are heavily layered, managing shared persistent data, cached data, and nonshared execution data in separate systems such as a distributed file system like HDFS, an in-memory file system like Alluxio, and a computation framework like Spark. Such layering introduces significant performance and management costs. In this paper we propose a single system called Pangea that can manage all data---both intermediate and long-lived data, and their buffer/caching, data placement optimization, and failure recovery---all in one monolithic distributed storage system, without any layering. We present a detailed performance evaluation of Pangea and show that its performance compares favorably with several widely used layered systems such as Spark.

References

[1]
Amazon simple storage system. https://aws.amazon.com/s3.
[2]
Apache ignite. https://ignite.apache.org.
[3]
Google cloud storage. https://cloud.google.com/storage.
[4]
Hash table benchmark. http://incise.org/hash-table-benchmarks.html.
[5]
Project tungsten: Bringing spark closer to bare metal. https://databricks.com/blog/2015/04/28/project-tungsten-bringing-spark-closer-to-bare-metal.html.
[6]
Why enterprises of different sizes are adopting 'fast data' with apache spark. https://www.lightbend.com/blog/why-enterprises-of-different-sizes-are-adopting-fast-data-with-apache-spark.
[7]
M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, et al. Tensorflow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow. org.
[8]
S. Agrawal, V. Narasayya, and B. Yang. Integrating vertical and horizontal partitioning into automated physical database design. In Proceedings of the 2004 ACM SIGMOD international conference on Management of data, pages 359--370. ACM, 2004.
[9]
A. Alexandrov, R. Bergmann, S. Ewen, J.-C. Freytag, F. Hueske, A. Heise, O. Kao, M. Leich, U. Leser, V. Markl, et al. The stratosphere platform for big data analytics. The International Journal on Very Large Data Bases, 23(6):939--964, 2014.
[10]
G. Ananthanarayanan, A. Ghodsi, A. Wang, D. Borthakur, S. Kandula, S. Shenker, and I. Stoica. Pacman: Coordinated memory caching for parallel jobs. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, pages 20--20. USENIX Association, 2012.
[11]
M. Armbrust, R. S. Xin, C. Lian, Y. Huai, D. Liu, J. K. Bradley, X. Meng, T. Kaftan, M. J. Franklin, A. Ghodsi, et al. Spark sql: Relational data processing in spark. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pages 1383--1394. ACM, 2015.
[12]
J. Arnold. Openstack swift: Using, administering, and developing for swift object storage. " O'Reilly Media, Inc.", 2014.
[13]
J. Bent, D. Thain, A. C. Arpaci-Dusseau, R. H. Arpaci-Dusseau, and M. Livny. Explicit control in the batch-aware distributed file system. In NSDI, volume 4, pages 365--378, 2004.
[14]
D. Borthakur. Hdfs architecture guide. HADOOP APACHE PROJECT http://hadoop.apache.org/common/docs/current/hdfsdesign.pdf, 2008.
[15]
D. P. Bovet and M. Cesati. Understanding the Linux kernel. " O'Reilly Media, Inc.", 2005.
[16]
B. Calder, J. Wang, A. Ogus, N. Nilakantan, A. Skjolsvold, S. McKelvie, Y. Xu, S. Srivastav, J. Wu, H. Simitci, et al. Windows azure storage: a highly available cloud storage service with strong consistency. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, pages 143--157. ACM, 2011.
[17]
P. Cao and et al. Implementation and performance of integrated application-controlled file caching, prefetching, and disk scheduling. TOCS, 14(4):311--343, 1996.
[18]
P. Cao and S. Irani. Cost-aware www proxy caching algorithms. In Usenix symposium on internet technologies and systems, volume 12, pages 193--206, 1997.
[19]
R. Chaiken, B. Jenkins, P.-Å. Larson, B. Ramsey, D. Shakib, S. Weaver, and J. Zhou. SCOPE: easy and efficient parallel processing of massive data sets. PVLDB, 1(2):1265--1276, 2008.
[20]
Y. Chen, S. Alspaugh, and R. Katz. Interactive analytical processing in big data systems: A cross-industry study of mapreduce workloads. PVLDB, 5(12):1802--1813, 2012.
[21]
H.-T. Chou and D. J. DeWitt. An evaluation of buffer management strategies for relational database systems. Algorithmica, 1(1--4):311--336, 1986.
[22]
A. Crotty, A. Galakatos, K. Dursun, T. Kraska, C. Binnig, U. Cetintemel, and S. Zdonik. An architecture for compiling udf-centric workflows. PVLDB, 8(12):1466--1477, 2015.
[23]
J. Dittrich, J.-A. Quiané-Ruiz, A. Jindal, Y. Kargin, V. Setty, and J. Schad. Hadoop++: making a yellow elephant run like a cheetah (without it even noticing). PVLDB, 3(1--2):515--529, 2010.
[24]
D. Ellard, E. Thereska, G. R. Ganger, M. I. Seltzer, et al. Attribute-based prediction of file properties. 2003.
[25]
M. Y. Eltabakh, Y. Tian, F. Özcan, R. Gemulla, A. Krettek, and J. McPherson. CoHadoop: flexible data placement and its exploitation in hadoop. PVLDB, 4(9):575--585, 2011.
[26]
R. Fagin and T. G. Price. Efficient calculation of expected miss ratios in the independent reference model. SIAM Journal on Computing, 7(3):288--297, 1978.
[27]
B. Fitzpatrick. Distributed caching with memcached. Linux journal, 2004(124):5, 2004.
[28]
R. Fonseca, V. Almeida, M. Crovella, and B. Abrahao. On the intrinsic locality properties of web reference streams. Technical report, Boston University Computer Science Department, 2002.
[29]
M. Garetto, E. Leonardi, and S. Traverso. Efficient analysis of caching strategies under dynamic content popularity. In Computer Communications (INFOCOM), 2015 IEEE Conference on, pages 2263--2271. IEEE, 2015.
[30]
S. Ghemawat and et al. The google file system. In ACM SIGOPS Operating Systems Review, volume 37, pages 29--43. ACM, 2003.
[31]
K. Gupta and et al. GPFS-SNC: An enterprise storage framework for virtual-machine clouds. IBM Journal of Research and Development, 55(6):2--1, 2011.
[32]
A. Jaleel, K. B. Theobald, S. C. Steely Jr, and J. Emer. High performance cache replacement using re-reference interval prediction (rrip). In ACM SIGARCH Computer Architecture News, volume 38, pages 60--71. ACM, 2010.
[33]
A. Jindal, S. Qiao, H. Patel, Z. Yin, J. Di, M. Bag, M. Friedman, Y. Lin, K. Karanasos, and S. Rao. Computation reuse in analytics job service at microsoft. In Proceedings of the 2018 International Conference on Management of Data, pages 191--203. ACM, 2018.
[34]
S. A. Jyothi, C. Curino, I. Menache, S. M. Narayanamurthy, A. Tumanov, J. Yaniv, R. Mavlyutov, I. Goiri, S. Krishnan, J. Kulkarni, et al. Morpheus: Towards automated slos for enterprise clusters. In OSDI, pages 117--134, 2016.
[35]
L. Kleinrock. Queueing systems, volume 2: Computer applications, volume 66. Wiley New York, 1976.
[36]
M. Kornacker and J. Erickson. Cloudera Impala: Real time queries in apache hadoop, for real. http://blog.cloudera.com/blog/2012/10/cloudera-impala-real-time-queries-in-apache-hadoop-for-real, 2012.
[37]
D. Lee, J. Choi, J.-H. Kim, S. H. Noh, S. L. Min, Y. Cho, and C. S. Kim. LRFU: A spectrum of policies that subsumes the least recently used and least frequently used policies. IEEE transactions on Computers, (12):1352--1361, 2001.
[38]
H. Li. Alluxio: A virtual distributed file system. 2018.
[39]
H. Li and et al. Tachyon: Reliable, memory speed storage for cluster computing frameworks. In SOCC, pages 1--15, 2014.
[40]
J. Liedtke. Toward real microkernels. Communications of the ACM, 39(9):70--77, 1996.
[41]
L. Lu, X. Shi, Y. Zhou, X. Zhang, H. Jin, C. Pei, L. He, and Y. Geng. Lifetime-based memory management for distributed data processing systems. PVLDB, 9(12):936--947, 2016.
[42]
M. Masmano, I. Ripoll, A. Crespo, and J. Real. TLSF: A new dynamic memory allocator for real-time systems. In Real-Time Systems, 2004. ECRTS 2004. Proceedings. 16th Euromicro Conference on, pages 79--88. IEEE, 2004.
[43]
M. Mesnier, E. Thereska, G. R. Ganger, D. Ellard, and M. Seltzer. File classification in self-* storage systems. In Autonomic Computing, 2004. Proceedings. International Conference on, pages 44--51. IEEE, 2004.
[44]
A. Morton. Usermode pagecache control: fadvise ().
[45]
R. Nishtala and et al. Scaling memcache at facebook. In NSDI, pages 385--398, 2013.
[46]
E. J. O'neil and et al. The lru-k page replacement algorithm for database disk buffering. ACM SIGMOD Record, 22(2):297--306, 1993.
[47]
V. S. Pai, P. Druschel, and W. Zwaenepoel. IO-Lite: a unified i/o buffering and caching system. ACM Transactions on Computer Systems (TOCS), 18(1):37--66, 2000.
[48]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al. Scikit-learn: Machine learning in python. Journal of machine learning research, 12(Oct):2825--2830, 2011.
[49]
J. Rao, C. Zhang, N. Megiddo, and G. Lohman. Automating physical database design in a parallel database. In Proceedings of the 2002 ACM SIGMOD international conference on Management of data, pages 558--569. ACM, 2002.
[50]
S. Sanfilippo and P. Noordhuis. Redis, 2009.
[51]
M. Sathiamoorthy, M. Asteris, D. Papailiopoulos, A. G. Dimakis, R. Vadali, S. Chen, and D. Borthakur. Xoring elephants: Novel erasure codes for big data. PVLDB, 6(5):325--336, 2013.
[52]
M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. O'Neil, et al. C-store: a column-oriented dbms. In Proceedings of the 31st international conference on Very large data bases, pages 553--564. VLDB Endowment, 2005.
[53]
S. A. Weil, S. A. Brandt, E. L. Miller, D. D. Long, and C. Maltzahn. Ceph: A scalable, high-performance distributed file system. In Proceedings of the 7th symposium on Operating systems design and implementation, pages 307--320. USENIX Association, 2006.
[54]
T. White. Hadoop: The Definitive Guide. O'Reilly Media, 2012.
[55]
M.-J. Wu, M. Zhao, and D. Yeung. Studying multicore processor scaling via reuse distance analysis. In ACM SIGARCH Computer Architecture News, volume 41, pages 499--510. ACM, 2013.
[56]
N. Young. The k-server dual and loose competitiveness for paging. Algorithmica, 11(6):525--541, 1994.
[57]
M. Zaharia and et al. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In NSDI, pages 2--15. USENIX, 2012.
[58]
J. Zhou, N. Bruno, and W. Lin. Advanced partitioning techniques for massively distributed computation. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pages 13--24. ACM, 2012.
[59]
Y. Zhou, J. Philbin, and K. Li. The multi-queue replacement algorithm for second level buffer caches. In USENIX Annual Technical Conference, General Track, pages 91--104, 2001.
[60]
J. Zou, R. M. Barnett, T. Lorido-Botran, S. Luo, C. Monroy, S. Sikdar, K. Teymourian, B. Yuan, and C. Jermaine. PlinyCompute: A platform for high-performance, distributed, data-intensive tool development. In Proceedings of the 2018 International Conference on Management of Data, pages 1189--1204. ACM, 2018.

Cited By

View all
  • (2023)MirrorKV: An Efficient Key-Value Store on Hybrid Cloud Storage with Balanced Performance of Compaction and QueryingProceedings of the ACM on Management of Data10.1145/36267361:4(1-27)Online publication date: 12-Dec-2023
  • (2023)A Comparison of End-to-End Decision Forest Inference PipelinesProceedings of the 2023 ACM Symposium on Cloud Computing10.1145/3620678.3624656(200-215)Online publication date: 30-Oct-2023
  • (2023)In Support of Push-Based Streaming for the Computing ContinuumIntelligent Information and Database Systems10.1007/978-981-99-5837-5_28(339-350)Online publication date: 24-Jul-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 12, Issue 6
February 2019
100 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 February 2019
Published in PVLDB Volume 12, Issue 6

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)MirrorKV: An Efficient Key-Value Store on Hybrid Cloud Storage with Balanced Performance of Compaction and QueryingProceedings of the ACM on Management of Data10.1145/36267361:4(1-27)Online publication date: 12-Dec-2023
  • (2023)A Comparison of End-to-End Decision Forest Inference PipelinesProceedings of the 2023 ACM Symposium on Cloud Computing10.1145/3620678.3624656(200-215)Online publication date: 30-Oct-2023
  • (2023)In Support of Push-Based Streaming for the Computing ContinuumIntelligent Information and Database Systems10.1007/978-981-99-5837-5_28(339-350)Online publication date: 24-Jul-2023
  • (2022)Serving deep learning models with deduplication from relational databasesProceedings of the VLDB Endowment10.14778/3547305.354732515:10(2230-2243)Online publication date: 1-Jun-2022
  • (2022)Benchmark of DNN Model Search at Deployment TimeProceedings of the 34th International Conference on Scientific and Statistical Database Management10.1145/3538712.3538725(1-12)Online publication date: 6-Jul-2022
  • (2022)TimeUnion: An Efficient Architecture with Unified Data Model for Timeseries Management Systems on Hybrid Cloud StorageProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526175(1418-1432)Online publication date: 10-Jun-2022
  • (2021)LachesisProceedings of the VLDB Endowment10.14778/3457390.345739214:8(1262-1275)Online publication date: 1-Apr-2021
  • (2021)Constant 12 and reflexivity 472319 hahslm on the geography of the earth in the economic era of covidIOP Conference Series: Earth and Environmental Science10.1088/1755-1315/936/1/012018936:1(012018)Online publication date: 1-Dec-2021
  • (2019)A Unified Storage System for Whole-Time-Range Data Analytics over Unbounded Data2019 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)10.1109/ISPA-BDCloud-SustainCom-SocialCom48970.2019.00140(967-974)Online publication date: Dec-2019

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media