Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3284028.3284029acmconferencesArticle/Chapter ViewAbstractPublication PagesmiddlewareConference Proceedingsconference-collections
research-article

Serverless Data Analytics in the IBM Cloud

Published: 10 December 2018 Publication History
  • Get Citation Alerts
  • Abstract

    Unexpectedly, the rise of serverless computing has also collaterally started the "democratization" of massive-scale data parallelism. This new trend heralded by PyWren pursues to enable untrained users to execute single-machine code in the cloud at massive scale through platforms like AWS Lambda. Inspired by this vision, this industry paper presents IBM-PyWren, which continues the pioneering work begun by PyWren in this field. It must be noted that IBM-PyWren is not, however, just a mere reimplementation of PyWren's API atop IBM Cloud Functions. Rather, it is must be viewed as an advanced extension of PyWren to run broader MapReduce jobs. We describe the design, innovative features (API extensions, data discovering & partitioning, composability, etc.) and performance of IBM-PyWren, along with the challenges encountered during its implementation.

    Supplementary Material

    MP4 File (p1-sampe.mp4)

    References

    [1]
    Amazon. 2016. Step Functions. https://aws.amazon.com/step-functions/.
    [2]
    Ioana Baldini, Paul C. Castro, Kerry Shih-Ping Chang, Perry Cheng, Stephen J. Fink, Vatche Ishakian, Nick Mitchell, Vinod Muthusamy, Rodric M. Rabbah, Aleksander Slominski, and Philippe Suter. 2017. Serverless Computing: Current Trends and Open Problems. CoRR abs/1706.03178 (2017). arXiv:1706.03178 http://arxiv.org/abs/1706.03178
    [3]
    Ben Congdon. 2018. Corral a MapReduce framework. https://github.com/bcongdon/corral.
    [4]
    Qifan Pu Eric Jonas. 2018. PyWren Alternative Clouds. https://github.com/pywren/alternative_clouds.
    [5]
    Sadjad Fouladi, Riad S. Wahby, Brennan Shacklett, Karthikeyan Vasuki Balasubramaniam, William Zeng, Rahul Bhalerao, Anirudh Sivaraman, George Porter, and Keith Winstein. 2017. Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI'17).
    [6]
    Pedro García-López, Marc Sánchez-Artigas, Gerard París, Daniel Barcelona, Álvaro Ruiz, and David Arroyo. 2018. Comparison of FaaS Orchestration Systems. In 4th International Workshop on Serverless Computing (WoSC'18).
    [7]
    Scott Hendrickson, Stephen Sturdevant, Tyler Harter, Venkateshwaran Venkataramani, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2016. Serverless Computation with OpenLambda. In 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud'16).
    [8]
    IBM. 2018. OpenWhisk Python Envirnoment. https://console.bluemix.net/docs/openwhisk/openwhisk_reference.html.
    [9]
    IBM. 2018. Watson Studio. https://dataplatform.ibm.com.
    [10]
    IBM. 2018. Watson Studio Community. https://dataplatform.cloud.ibm.com/community.
    [11]
    Carlos Santana (IBM). 2018. Using the New Python 3 Runtime and IBM Cloud Services in your Serverless Apps. https://www.ibm.com/blogs/bluemix/2018/02/using-new-python-3-runtime-ibm-cloud-services-serverless-apps/.
    [12]
    Gil Vernik (IBM) and Josep Sampé. 2018. Process large data sets at massive scale with PyWren over IBM Cloud Functions. https://www.ibm.com/blogs/bluemix/2018/04/process-large-data-sets-massive-scale-pywren-ibm-cloud-functions/.
    [13]
    Eric Jonas, Qifan Pu, Shivaram Venkataraman, Ion Stoica, and Benjamin Recht. 2017. Occupy the cloud: distributed computing for the 99%. In 2017 ACM Symposium on Cloud Computing (SoCC'17). ACM, 445--451.
    [14]
    Gil Vernik (IBM) Josep Sampé. 2018. PyWren for IBM Cloud. https://github.com/pywren/pywren-ibm-cloud.
    [15]
    jupyter. 2018. Python Notebooks. https://jupyter.org/.
    [16]
    Youngbin Kim and Jimmy Lin. 2018. Serverless Data Analytics with Flint. arXiv preprint arXiv:1803.06354 (2018).
    [17]
    Qubole. 2018. Spark on Lambda. https://github.com/qubole/spark-on-lambda.
    [18]
    Josep Sampé, Marc Sánchez-Artigas, Pedro García-López, and Gerard París. 2017. Data-driven Serverless Functions for Object Storage. In 18th ACM/IFIP/USENIX Middleware Conference (Middleware'17). 121--133.
    [19]
    Josef Spillner. 2018. Lambada. https://gitlab.com/josefspillner/lambada.

    Cited By

    View all
    • (2024)Optimizing Big Data Insights with Serverless ArchitectureInternational Journal of Advanced Research in Science, Communication and Technology10.48175/IJARSCT-15934(191-197)Online publication date: 22-Mar-2024
    • (2024)Serverless Computing Real-World Applications and Benefits in Cloud EnvironmentsEmerging Trends in Cloud Computing Analytics, Scalability, and Service Models10.4018/979-8-3693-0900-1.ch014(268-290)Online publication date: 22-Mar-2024
    • (2024)Smart Healthcare System in Server-Less Environment: Concepts, Architecture, Challenges, Future DirectionsComputers10.3390/computers1304010513:4(105)Online publication date: 19-Apr-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    Middleware '18: Proceedings of the 19th International Middleware Conference Industry
    December 2018
    64 pages
    ISBN:9781450360166
    DOI:10.1145/3284028
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 December 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Distributed computing
    2. IBM Cloud Functions
    3. IBM Cloud Object Storage
    4. PyWren
    5. Serverless computing

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • Ministerio de Ciencia, Innovación y Universidades - España

    Conference

    Middleware '18
    Sponsor:
    • ACM
    • USENIX Assoc
    • IFIP

    Acceptance Rates

    Overall Acceptance Rate 203 of 948 submissions, 21%

    Upcoming Conference

    MIDDLEWARE '24
    25th International Middleware Conference
    December 2 - 6, 2024
    Hong Kong , Hong Kong

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)101
    • Downloads (Last 6 weeks)7
    Reflects downloads up to

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Optimizing Big Data Insights with Serverless ArchitectureInternational Journal of Advanced Research in Science, Communication and Technology10.48175/IJARSCT-15934(191-197)Online publication date: 22-Mar-2024
    • (2024)Serverless Computing Real-World Applications and Benefits in Cloud EnvironmentsEmerging Trends in Cloud Computing Analytics, Scalability, and Service Models10.4018/979-8-3693-0900-1.ch014(268-290)Online publication date: 22-Mar-2024
    • (2024)Smart Healthcare System in Server-Less Environment: Concepts, Architecture, Challenges, Future DirectionsComputers10.3390/computers1304010513:4(105)Online publication date: 19-Apr-2024
    • (2024)Data pipeline approaches in serverless computing: a taxonomy, review, and research trendsJournal of Big Data10.1186/s40537-024-00939-011:1Online publication date: 11-Jun-2024
    • (2024)Demystifying the Cost of Serverless Computing: Towards a Win-Win DealIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.333084935:1(59-72)Online publication date: Jan-2024
    • (2024)Cloud-Operated Open Literate Educational Resources: The Case of the MyBinderIEEE Transactions on Learning Technologies10.1109/TLT.2023.334369017(893-902)Online publication date: 1-Jan-2024
    • (2024)ComFaaS Distributed: Edge Computing with Function-as-a-Service in Parallel Cloud Environments2024 7th International Conference on Information and Computer Technologies (ICICT)10.1109/ICICT62343.2024.00027(133-138)Online publication date: 15-Mar-2024
    • (2024)Exploiting Inherent Elasticity of Serverless in Algorithms with Unbalanced and Irregular WorkloadsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2024.104891(104891)Online publication date: Apr-2024
    • (2024)MLLess: Achieving cost efficiency in serverless machine learning trainingJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.104764183(104764)Online publication date: Jan-2024
    • (2024)A Seer knows bestJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.104763183:COnline publication date: 1-Jan-2024
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media