research-article

Pangea: monolithic distributed storage for data analytics

Authors:

Chris JermaineAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 12, Issue 6

Pages 681 - 694

https://doi.org/10.14778/3311880.3311885

Published: 01 February 2019 Publication History

Abstract

Storage and memory systems for modern data analytics are heavily layered, managing shared persistent data, cached data, and nonshared execution data in separate systems such as a distributed file system like HDFS, an in-memory file system like Alluxio, and a computation framework like Spark. Such layering introduces significant performance and management costs. In this paper we propose a single system called Pangea that can manage all data---both intermediate and long-lived data, and their buffer/caching, data placement optimization, and failure recovery---all in one monolithic distributed storage system, without any layering. We present a detailed performance evaluation of Pangea and show that its performance compares favorably with several widely used layered systems such as Spark.

References

[1]

Amazon simple storage system. https://aws.amazon.com/s3.

[2]

Apache ignite. https://ignite.apache.org.

[3]

Google cloud storage. https://cloud.google.com/storage.

[4]

Hash table benchmark. http://incise.org/hash-table-benchmarks.html.

[5]

Project tungsten: Bringing spark closer to bare metal. https://databricks.com/blog/2015/04/28/project-tungsten-bringing-spark-closer-to-bare-metal.html.

[6]

Why enterprises of different sizes are adopting 'fast data' with apache spark. https://www.lightbend.com/blog/why-enterprises-of-different-sizes-are-adopting-fast-data-with-apache-spark.

[7]

M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, et al. Tensorflow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow. org.

[8]

S. Agrawal, V. Narasayya, and B. Yang. Integrating vertical and horizontal partitioning into automated physical database design. In Proceedings of the 2004 ACM SIGMOD international conference on Management of data, pages 359--370. ACM, 2004.

Digital Library

[9]

A. Alexandrov, R. Bergmann, S. Ewen, J.-C. Freytag, F. Hueske, A. Heise, O. Kao, M. Leich, U. Leser, V. Markl, et al. The stratosphere platform for big data analytics. The International Journal on Very Large Data Bases, 23(6):939--964, 2014.

Digital Library

[10]

G. Ananthanarayanan, A. Ghodsi, A. Wang, D. Borthakur, S. Kandula, S. Shenker, and I. Stoica. Pacman: Coordinated memory caching for parallel jobs. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, pages 20--20. USENIX Association, 2012.

Digital Library

[11]

M. Armbrust, R. S. Xin, C. Lian, Y. Huai, D. Liu, J. K. Bradley, X. Meng, T. Kaftan, M. J. Franklin, A. Ghodsi, et al. Spark sql: Relational data processing in spark. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pages 1383--1394. ACM, 2015.

Digital Library

[12]

J. Arnold. Openstack swift: Using, administering, and developing for swift object storage. " O'Reilly Media, Inc.", 2014.

Digital Library

[13]

J. Bent, D. Thain, A. C. Arpaci-Dusseau, R. H. Arpaci-Dusseau, and M. Livny. Explicit control in the batch-aware distributed file system. In NSDI, volume 4, pages 365--378, 2004.

Digital Library

[14]

D. Borthakur. Hdfs architecture guide. HADOOP APACHE PROJECT http://hadoop.apache.org/common/docs/current/hdfsdesign.pdf, 2008.

[15]

D. P. Bovet and M. Cesati. Understanding the Linux kernel. " O'Reilly Media, Inc.", 2005.

Digital Library

[16]

B. Calder, J. Wang, A. Ogus, N. Nilakantan, A. Skjolsvold, S. McKelvie, Y. Xu, S. Srivastav, J. Wu, H. Simitci, et al. Windows azure storage: a highly available cloud storage service with strong consistency. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, pages 143--157. ACM, 2011.

Digital Library

[17]

P. Cao and et al. Implementation and performance of integrated application-controlled file caching, prefetching, and disk scheduling. TOCS, 14(4):311--343, 1996.

Digital Library

[18]

P. Cao and S. Irani. Cost-aware www proxy caching algorithms. In Usenix symposium on internet technologies and systems, volume 12, pages 193--206, 1997.

Digital Library

[19]

R. Chaiken, B. Jenkins, P.-Å. Larson, B. Ramsey, D. Shakib, S. Weaver, and J. Zhou. SCOPE: easy and efficient parallel processing of massive data sets. PVLDB, 1(2):1265--1276, 2008.

Digital Library

[20]

Y. Chen, S. Alspaugh, and R. Katz. Interactive analytical processing in big data systems: A cross-industry study of mapreduce workloads. PVLDB, 5(12):1802--1813, 2012.

Digital Library

[21]

H.-T. Chou and D. J. DeWitt. An evaluation of buffer management strategies for relational database systems. Algorithmica, 1(1--4):311--336, 1986.

[22]

A. Crotty, A. Galakatos, K. Dursun, T. Kraska, C. Binnig, U. Cetintemel, and S. Zdonik. An architecture for compiling udf-centric workflows. PVLDB, 8(12):1466--1477, 2015.

Digital Library

[23]

J. Dittrich, J.-A. Quiané-Ruiz, A. Jindal, Y. Kargin, V. Setty, and J. Schad. Hadoop++: making a yellow elephant run like a cheetah (without it even noticing). PVLDB, 3(1--2):515--529, 2010.

Digital Library

[24]

D. Ellard, E. Thereska, G. R. Ganger, M. I. Seltzer, et al. Attribute-based prediction of file properties. 2003.

[25]

M. Y. Eltabakh, Y. Tian, F. Özcan, R. Gemulla, A. Krettek, and J. McPherson. CoHadoop: flexible data placement and its exploitation in hadoop. PVLDB, 4(9):575--585, 2011.

Digital Library

[26]

R. Fagin and T. G. Price. Efficient calculation of expected miss ratios in the independent reference model. SIAM Journal on Computing, 7(3):288--297, 1978.

Digital Library

[27]

B. Fitzpatrick. Distributed caching with memcached. Linux journal, 2004(124):5, 2004.

Digital Library

[28]

R. Fonseca, V. Almeida, M. Crovella, and B. Abrahao. On the intrinsic locality properties of web reference streams. Technical report, Boston University Computer Science Department, 2002.

[29]

M. Garetto, E. Leonardi, and S. Traverso. Efficient analysis of caching strategies under dynamic content popularity. In Computer Communications (INFOCOM), 2015 IEEE Conference on, pages 2263--2271. IEEE, 2015.

[30]

S. Ghemawat and et al. The google file system. In ACM SIGOPS Operating Systems Review, volume 37, pages 29--43. ACM, 2003.

Digital Library

[31]

K. Gupta and et al. GPFS-SNC: An enterprise storage framework for virtual-machine clouds. IBM Journal of Research and Development, 55(6):2--1, 2011.

Digital Library

[32]

A. Jaleel, K. B. Theobald, S. C. Steely Jr, and J. Emer. High performance cache replacement using re-reference interval prediction (rrip). In ACM SIGARCH Computer Architecture News, volume 38, pages 60--71. ACM, 2010.

Digital Library

[33]

A. Jindal, S. Qiao, H. Patel, Z. Yin, J. Di, M. Bag, M. Friedman, Y. Lin, K. Karanasos, and S. Rao. Computation reuse in analytics job service at microsoft. In Proceedings of the 2018 International Conference on Management of Data, pages 191--203. ACM, 2018.

Digital Library

[34]

S. A. Jyothi, C. Curino, I. Menache, S. M. Narayanamurthy, A. Tumanov, J. Yaniv, R. Mavlyutov, I. Goiri, S. Krishnan, J. Kulkarni, et al. Morpheus: Towards automated slos for enterprise clusters. In OSDI, pages 117--134, 2016.

Digital Library

[35]

L. Kleinrock. Queueing systems, volume 2: Computer applications, volume 66. Wiley New York, 1976.

[36]

M. Kornacker and J. Erickson. Cloudera Impala: Real time queries in apache hadoop, for real. http://blog.cloudera.com/blog/2012/10/cloudera-impala-real-time-queries-in-apache-hadoop-for-real, 2012.

[37]

D. Lee, J. Choi, J.-H. Kim, S. H. Noh, S. L. Min, Y. Cho, and C. S. Kim. LRFU: A spectrum of policies that subsumes the least recently used and least frequently used policies. IEEE transactions on Computers, (12):1352--1361, 2001.

Digital Library

[38]

H. Li. Alluxio: A virtual distributed file system. 2018.

[39]

H. Li and et al. Tachyon: Reliable, memory speed storage for cluster computing frameworks. In SOCC, pages 1--15, 2014.

Digital Library

[40]

J. Liedtke. Toward real microkernels. Communications of the ACM, 39(9):70--77, 1996.

Digital Library

[41]

L. Lu, X. Shi, Y. Zhou, X. Zhang, H. Jin, C. Pei, L. He, and Y. Geng. Lifetime-based memory management for distributed data processing systems. PVLDB, 9(12):936--947, 2016.

Digital Library

[42]

M. Masmano, I. Ripoll, A. Crespo, and J. Real. TLSF: A new dynamic memory allocator for real-time systems. In Real-Time Systems, 2004. ECRTS 2004. Proceedings. 16th Euromicro Conference on, pages 79--88. IEEE, 2004.

Digital Library

[43]

M. Mesnier, E. Thereska, G. R. Ganger, D. Ellard, and M. Seltzer. File classification in self-* storage systems. In Autonomic Computing, 2004. Proceedings. International Conference on, pages 44--51. IEEE, 2004.

Digital Library

[44]

A. Morton. Usermode pagecache control: fadvise ().

[45]

R. Nishtala and et al. Scaling memcache at facebook. In NSDI, pages 385--398, 2013.

Digital Library

[46]

E. J. O'neil and et al. The lru-k page replacement algorithm for database disk buffering. ACM SIGMOD Record, 22(2):297--306, 1993.

Digital Library

[47]

V. S. Pai, P. Druschel, and W. Zwaenepoel. IO-Lite: a unified i/o buffering and caching system. ACM Transactions on Computer Systems (TOCS), 18(1):37--66, 2000.

Digital Library

[48]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al. Scikit-learn: Machine learning in python. Journal of machine learning research, 12(Oct):2825--2830, 2011.

Digital Library

[49]

J. Rao, C. Zhang, N. Megiddo, and G. Lohman. Automating physical database design in a parallel database. In Proceedings of the 2002 ACM SIGMOD international conference on Management of data, pages 558--569. ACM, 2002.

Digital Library

[50]

S. Sanfilippo and P. Noordhuis. Redis, 2009.

[51]

M. Sathiamoorthy, M. Asteris, D. Papailiopoulos, A. G. Dimakis, R. Vadali, S. Chen, and D. Borthakur. Xoring elephants: Novel erasure codes for big data. PVLDB, 6(5):325--336, 2013.

Digital Library

[52]

M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. O'Neil, et al. C-store: a column-oriented dbms. In Proceedings of the 31st international conference on Very large data bases, pages 553--564. VLDB Endowment, 2005.

Digital Library

[53]

S. A. Weil, S. A. Brandt, E. L. Miller, D. D. Long, and C. Maltzahn. Ceph: A scalable, high-performance distributed file system. In Proceedings of the 7th symposium on Operating systems design and implementation, pages 307--320. USENIX Association, 2006.

Digital Library

[54]

T. White. Hadoop: The Definitive Guide. O'Reilly Media, 2012.

Digital Library

[55]

M.-J. Wu, M. Zhao, and D. Yeung. Studying multicore processor scaling via reuse distance analysis. In ACM SIGARCH Computer Architecture News, volume 41, pages 499--510. ACM, 2013.

Digital Library

[56]

N. Young. The k-server dual and loose competitiveness for paging. Algorithmica, 11(6):525--541, 1994.

Digital Library

[57]

M. Zaharia and et al. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In NSDI, pages 2--15. USENIX, 2012.

Digital Library

[58]

J. Zhou, N. Bruno, and W. Lin. Advanced partitioning techniques for massively distributed computation. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pages 13--24. ACM, 2012.

Digital Library

[59]

Y. Zhou, J. Philbin, and K. Li. The multi-queue replacement algorithm for second level buffer caches. In USENIX Annual Technical Conference, General Track, pages 91--104, 2001.

Digital Library

[60]

J. Zou, R. M. Barnett, T. Lorido-Botran, S. Luo, C. Monroy, S. Sikdar, K. Teymourian, B. Yuan, and C. Jermaine. PlinyCompute: A platform for high-performance, distributed, data-intensive tool development. In Proceedings of the 2018 International Conference on Management of Data, pages 1189--1204. ACM, 2018.

Digital Library

Cited By

Wang ZShao Z(2023)MirrorKV: An Efficient Key-Value Store on Hybrid Cloud Storage with Balanced Performance of Compaction and QueryingProceedings of the ACM on Management of Data10.1145/36267361:4(1-27)Online publication date: 12-Dec-2023
https://dl.acm.org/doi/10.1145/3626736
Guan HMasood SDwarampudi MGunda VMin HYu LNag SZou J(2023)A Comparison of End-to-End Decision Forest Inference PipelinesProceedings of the 2023 ACM Symposium on Cloud Computing10.1145/3620678.3624656(200-215)Online publication date: 30-Oct-2023
https://dl.acm.org/doi/10.1145/3620678.3624656
Marcu OBouvry P(2023)In Support of Push-Based Streaming for the Computing ContinuumIntelligent Information and Database Systems10.1007/978-981-99-5837-5_28(339-350)Online publication date: 24-Jul-2023
https://dl.acm.org/doi/10.1007/978-981-99-5837-5_28
Show More Cited By

Pangea: monolithic distributed storage for data analytics

Recommendations

PANGeA: procedural artificial narrative using generative AI for turn-based, role-playing video games
AIIDE '24: Proceedings of the Twentieth AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment

Large language models (LLMs) offer unprecedented flexibility in procedural generation, enabling the creation of dynamic video game storylines that evolve with user input. A critical aspect of realizing this potential is allowing players and developers to ...
Pangea: an eager database replication middleware guaranteeing snapshot isolation without modification of database servers

Recently, several middleware-based approaches have been proposed. If we implement all functionalities of database replication only in a middleware layer, we can avoid the high cost of modifying existing database servers or scratch-building. However, it ...
Pangea: A Workbench for Statically Analyzing Multi-language Software Corpora
SCAM '14: Proceedings of the 2014 IEEE 14th International Working Conference on Source Code Analysis and Manipulation

Software corpora facilitate reproducibility of analyses, however, static analysis for an entire corpus still requires considerable effort, often duplicated unnecessarily by multiple users. Moreover, most corpora are designed for single languages ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 12, Issue 6

February 2019

100 pages

ISSN:2150-8097

Editors:
Lei Chen
HKUST
,
Fatma Özcan
IBM Research - Almaden

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 February 2019

Published in PVLDB Volume 12, Issue 6

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
118
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang ZShao Z(2023)MirrorKV: An Efficient Key-Value Store on Hybrid Cloud Storage with Balanced Performance of Compaction and QueryingProceedings of the ACM on Management of Data10.1145/36267361:4(1-27)Online publication date: 12-Dec-2023
https://dl.acm.org/doi/10.1145/3626736
Guan HMasood SDwarampudi MGunda VMin HYu LNag SZou J(2023)A Comparison of End-to-End Decision Forest Inference PipelinesProceedings of the 2023 ACM Symposium on Cloud Computing10.1145/3620678.3624656(200-215)Online publication date: 30-Oct-2023
https://dl.acm.org/doi/10.1145/3620678.3624656
Marcu OBouvry P(2023)In Support of Push-Based Streaming for the Computing ContinuumIntelligent Information and Database Systems10.1007/978-981-99-5837-5_28(339-350)Online publication date: 24-Jul-2023
https://dl.acm.org/doi/10.1007/978-981-99-5837-5_28
Zhou LChen JDas AMin HYu LZhao MZou J(2022)Serving deep learning models with deduplication from relational databasesProceedings of the VLDB Endowment10.14778/3547305.354732515:10(2230-2243)Online publication date: 1-Jun-2022
https://dl.acm.org/doi/10.14778/3547305.3547325
Zhou LJain AWang ZDas AYang YZou J(2022)Benchmark of DNN Model Search at Deployment TimeProceedings of the 34th International Conference on Scientific and Statistical Database Management10.1145/3538712.3538725(1-12)Online publication date: 6-Jul-2022
https://dl.acm.org/doi/10.1145/3538712.3538725
Wang ZShao ZIves ZBonifati AEl Abbadi A(2022)TimeUnion: An Efficient Architecture with Unified Data Model for Timeseries Management Systems on Hybrid Cloud StorageProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526175(1418-1432)Online publication date: 10-Jun-2022
https://dl.acm.org/doi/10.1145/3514221.3526175
Zou JDas ABarhate PIyengar AYuan BJankov DJermaine C(2021)LachesisProceedings of the VLDB Endowment10.14778/3457390.345739214:8(1262-1275)Online publication date: 1-Apr-2021
https://dl.acm.org/doi/10.14778/3457390.3457392
R Mochamad A(2021)Constant 12 and reflexivity 472319 hahslm on the geography of the earth in the economic era of covidIOP Conference Series: Earth and Environmental Science10.1088/1755-1315/936/1/012018936:1(012018)Online publication date: 1-Dec-2021
https://doi.org/10.1088/1755-1315/936/1/012018
Shen YYao GGuo SXiong JJiang D(2019)A Unified Storage System for Whole-Time-Range Data Analytics over Unbounded Data2019 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)10.1109/ISPA-BDCloud-SustainCom-SocialCom48970.2019.00140(967-974)Online publication date: Dec-2019
https://doi.org/10.1109/ISPA-BDCloud-SustainCom-SocialCom48970.2019.00140

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents