Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2213836.2213934acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
demonstration

Shark: fast data analysis using coarse-grained distributed memory

Published: 20 May 2012 Publication History
  • Get Citation Alerts
  • Abstract

    Shark is a research data analysis system built on a novel coarse-grained distributed shared-memory abstraction. Shark marries query processing with deep data analysis, providing a unified system for easy data manipulation using SQL and pushing sophisticated analysis closer to data. It scales to thousands of nodes in a fault-tolerant manner. Shark can answer queries 40X faster than Apache Hive and run machine learning programs 25X faster than MapReduce programs in Apache Hadoop on large datasets.

    References

    [1]
    G. Ananthanarayanan, A. Ghodsi, S. Shenker, and I. Stoica. Disk-locality in datacenter computing considered irrelevant. In HotOS '11, 2011.
    [2]
    A. Pavlo, E. Paulson, A. Rasin, D. Abadi, D. DeWitt, S. Madden, and M. Stonebraker. A comparison of approaches to large-scale data analysis. In Proceedings of the 35th SIGMOD international conference on Management of data, pages 165--178. ACM, 2009.
    [3]
    A. Thusoo, J. Sarma, N. Jain, Z. Shao, P. Chakka, N. Zhang, S. Antony, H. Liu, and R. Murthy. Hive-a petabyte scale data warehouse using hadoop. In Data Engineering (ICDE), 2010 IEEE 26th International Conference on, pages 996--1005. IEEE, 2010.
    [4]
    M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In NSDI 2012.

    Cited By

    View all
    • (2024)IoT Data Stream Handling, Analysis, Communication and Security Issues: A Systematic SurveyWireless Personal Communications10.1007/s11277-024-11177-1Online publication date: 20-May-2024
    • (2023)Marketing Information and Marketing IntelligenceInternational Journal of Business Strategy and Automation10.4018/IJBSA.3162353:1(1-12)Online publication date: 13-Jan-2023
    • (2022)Marketing Information and Marketing IntelligenceInternational Journal of Technology Diffusion10.4018/IJTD.30074513:1(1-14)Online publication date: 20-May-2022
    • Show More Cited By

    Index Terms

    1. Shark: fast data analysis using coarse-grained distributed memory

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGMOD '12: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
      May 2012
      886 pages
      ISBN:9781450312479
      DOI:10.1145/2213836
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 20 May 2012

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. data warehouse
      2. databases
      3. machine learning
      4. resilient distributed dataset
      5. shark
      6. spark

      Qualifiers

      • Demonstration

      Conference

      SIGMOD/PODS '12
      Sponsor:

      Acceptance Rates

      SIGMOD '12 Paper Acceptance Rate 48 of 289 submissions, 17%;
      Overall Acceptance Rate 785 of 4,003 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)12
      • Downloads (Last 6 weeks)0
      Reflects downloads up to

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)IoT Data Stream Handling, Analysis, Communication and Security Issues: A Systematic SurveyWireless Personal Communications10.1007/s11277-024-11177-1Online publication date: 20-May-2024
      • (2023)Marketing Information and Marketing IntelligenceInternational Journal of Business Strategy and Automation10.4018/IJBSA.3162353:1(1-12)Online publication date: 13-Jan-2023
      • (2022)Marketing Information and Marketing IntelligenceInternational Journal of Technology Diffusion10.4018/IJTD.30074513:1(1-14)Online publication date: 20-May-2022
      • (2022)IoT Analytics and ERP Interoperability in Automotive SCMInternational Journal of Fuzzy System Applications10.4018/IJFSA.30628211:3(1-19)Online publication date: 29-Jul-2022
      • (2022)Data Locality in High Performance Computing, Big Data, and Converged Systems: An Analysis of the Cutting Edge and a Future System ArchitectureElectronics10.3390/electronics1201005312:1(53)Online publication date: 23-Dec-2022
      • (2022)Design and Implementation of Automatic Selection of the Most Efficient Itemset Algorithm Based on SparkScientific Programming10.1155/2022/33622032022Online publication date: 1-Jan-2022
      • (2022)Recommender System: Analysing Products Using Data Stream Mining2022 6th International Conference On Computing, Communication, Control And Automation (ICCUBEA10.1109/ICCUBEA54992.2022.10010897(1-4)Online publication date: 26-Aug-2022
      • (2021)Online Marketing Research - Roles in Generating Customer InsightsStudies in Business and Economics10.2478/sbe-2021-001216:1(147-161)Online publication date: 26-May-2021
      • (2021)TSCacheProceedings of the VLDB Endowment10.14778/3484224.348422514:13(3253-3266)Online publication date: 28-Oct-2021
      • (2021)SLA-Based Profit Optimization Resource Scheduling for Big Data Analytics-as-a-Service Platforms in Cloud Computing EnvironmentsIEEE Transactions on Cloud Computing10.1109/TCC.2018.28899569:3(1236-1253)Online publication date: 1-Jul-2021
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media