Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Massively parallel data analysis with PACTs on Nephele

Published: 01 September 2010 Publication History
  • Get Citation Alerts
  • Abstract

    Large-scale data analysis applications require processing and analyzing of Terabytes or even Petabytes of data, particularly in the areas of web analysis or scientific data management. This trend has been discussed as "web-scale data management" in a panel at VLDB 2009. Formerly, parallel data processing was the domain of parallel database systems. Today, novel requirements like scaling out to thousands of machines, improved fault-tolerance, and schema free processing have made a case for new approaches.

    References

    [1]
    Hadoop. URL: http://hadoop.apache.org.
    [2]
    TPC-H. URL: http://www.tpc.org/tpch/.
    [3]
    D. Battré, S. Ewen, F. Hueske, O. Kao, V. Markl, and D. Warneke. Nephele/PACTs: A Programming Model and Execution Framework for Web-Scale Analytical Processing. In Symposium on Cloud Computing, 2010.
    [4]
    J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In OSDI, pages 137--150, 2004.
    [5]
    M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks. In P. Ferreira, T. R. Gross, and L. Veiga, editors, EuroSys, pages 59--72. ACM, 2007.
    [6]
    P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. Access path selection in a relational database management system. In P. A. Bernstein, editor, SIGMOD Conference, pages 23--34. ACM, 1979.
    [7]
    D. Warneke and O. Kao. Nephele: Efficient Parallel Data Processing in the Cloud. In I. Raicu, I. T. Foster, and Y. Zhao, editors, SC-MTAGS. ACM, 2009.

    Cited By

    View all
    • (2017)From conceptual design to performance optimization of ETL workflowsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-017-0477-226:6(777-801)Online publication date: 1-Dec-2017
    • (2016)Big Data 2.0 Processing SystemsJournal of Grid Computing10.1007/s10723-016-9371-114:3(379-405)Online publication date: 1-Sep-2016
    • (2014)A study of partitioning and parallel UDF execution with the SAP HANA databaseProceedings of the 26th International Conference on Scientific and Statistical Database Management10.1145/2618243.2618274(1-4)Online publication date: 30-Jun-2014
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 3, Issue 1-2
    September 2010
    1658 pages

    Publisher

    VLDB Endowment

    Publication History

    Published: 01 September 2010
    Published in PVLDB Volume 3, Issue 1-2

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)0

    Other Metrics

    Citations

    Cited By

    View all
    • (2017)From conceptual design to performance optimization of ETL workflowsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-017-0477-226:6(777-801)Online publication date: 1-Dec-2017
    • (2016)Big Data 2.0 Processing SystemsJournal of Grid Computing10.1007/s10723-016-9371-114:3(379-405)Online publication date: 1-Sep-2016
    • (2014)A study of partitioning and parallel UDF execution with the SAP HANA databaseProceedings of the 26th International Conference on Scientific and Statistical Database Management10.1145/2618243.2618274(1-4)Online publication date: 30-Jun-2014
    • (2014)Versatile optimization of UDF-heavy data flows with sofaProceedings of the 2014 ACM SIGMOD International Conference on Management of Data10.1145/2588555.2594517(685-688)Online publication date: 18-Jun-2014
    • (2014)The Stratosphere platform for big data analyticsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-014-0357-y23:6(939-964)Online publication date: 1-Dec-2014
    • (2013)Massively Parallel Databases and MapReduce SystemsFoundations and Trends in Databases10.1561/19000000365:1(1-104)Online publication date: 20-Nov-2013
    • (2013)The family of mapreduce and large-scale data processing systemsACM Computing Surveys10.1145/2522968.252297946:1(1-44)Online publication date: 11-Jul-2013
    • (2013)Issues in big data testing and benchmarkingProceedings of the Sixth International Workshop on Testing Database Systems10.1145/2479440.2482677(1-5)Online publication date: 24-Jun-2013
    • (2013)A performance comparison of parallel DBMSs and MapReduce on large-scale text analyticsProceedings of the 16th International Conference on Extending Database Technology10.1145/2452376.2452448(613-624)Online publication date: 18-Mar-2013
    • (2012)The DEBS 2012 grand challengeProceedings of the 6th ACM International Conference on Distributed Event-Based Systems10.1145/2335484.2335536(393-398)Online publication date: 16-Jul-2012
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media