Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3378679.3394532acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

Edge replication strategies for wide-area distributed processing

Published: 13 May 2020 Publication History

Abstract

The rapid digitalization across industries comes with many challenges. One key problem is how the ever-growing and volatile data generated at distributed locations can be efficiently processed to inform decision making and improve products. Unfortunately, wide-area network capacity cannot cope with the growth of the data at the network edges. Thus, it is imperative to decide which data should be processed in-situ at the edge and which should be transferred and analyzed in data centers.
In this paper, we study two families of proactive online data replication strategies, namely ski-rental and machine learning algorithms, to decide which data is processed at the edge, close to where it is generated, and which is transferred to a data center. Our analysis using real query traces from a Global 2000 company shows that such online replication strategies can significantly reduce data transfer volume in many cases up to 50% compared to naive approaches and achieve close to optimal performance. After analyzing their shortcomings for ease of use and performance, we propose a hybrid strategy that combines the advantages of both competitive and machine learning algorithms.

References

[1]
Daniel S. Berger. 2018. Towards Lightweight and Robust Machine Learningfor CDN Caching. In Proceedings of the 17th ACM Workshop on Hot Topics in Networks.
[2]
Marcin Bienkowski. 2009. Price Fluctuations: To Buy or To Rent. In International Workshop on Approximation and Online Algorithms. Springer, 25--36.
[3]
Martin Boissier and Kurzynski Daniel. 2018. Workload-Driven Horizontal Partitioning and Pruning for Large HTAP Systems. In 2018 IEEE 34th International Conference on Data Engineering Workshops (ICDEW). 116--121. 2473-3490
[4]
Martin Boissier, Carsten Alexander Meyer, Timo Djürken, Jan Lindemann, Kathrin Mao, Pascal Reinhardt, Tim Specht, Tim Zimmermann, and Matthias Uflacker. 2016. Analyzing Data Relevance and Access Patterns of Live Production Database Systems. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. 2473--2475.
[5]
Leo Breiman. 2001. Random forests. Machine learning 45, 1 (2001), 5--32.
[6]
Guoqiang Jerry Chen, Janet L. Wiener, Shridhar Iyer, Anshul Jaiswal, Ran Lei, Nikhil Simha, Wei Wang, Kevin Wilfong, Tim Williamson, and Serhat Yilmaz. 2016. Realtime Data Processing at Facebook. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD '16). Association for Computing Machinery, New York, NY, USA, 1087--1098.
[7]
Cisco. 2020. Cisco Annual Internet Report (2018--2023) White Paper.
[8]
Anna R Karlin, Kai Li, Mark S Manasse, and Susan Owicki. 1991. Empirical Studies of Competitve Spinning for a Shared-Memory Multiprocessor. In ACM SIGOPS Operating Systems Review, Vol. 25. 41--55.
[9]
Anna R Karlin, Mark S Manasse, Larry Rudolph, and Daniel D Sleator. 1988. Competitive Snoopy Caching. Algorithmica 3, 1-4 (1988), 79--119. https://doi.org/
[10]
Tim Kraska, Mohammad Alizadeh, Alex Beutel, Ed H Chi, Jialin Ding, Ani Kristo, Guillaume Leclerc, Samuel Madden, Hongzi Mao, and Vikram Nathan. 2019. SageDB: A Learned Database System. In CIDR.
[11]
Per-Åke Larson, Cipri Clinciu, Eric N. Hanson, Artem Oks, Susan L. Price, Srikumar Rangarajan, Aleksandras Surna, and Qingqing Zhou. 2011. SQL Server Column Store Indexes. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data (SIGMOD '11). Association for Computing Machinery, New York, NY, USA, 1177--1184.
[12]
Alexandru Niculescu-Mizil and Rich Caruana. 2005. Predicting good probabilities with supervised learning. In Proceedings of the 22nd international conference on Machine learning. 625--632.
[13]
Jingjing Ren, Daniel J Dubois, David Choffnes, Anna Maria Mandalari, Roman Kolcun, and Hamed Haddadi. 2019. Information Exposure From Consumer IoT Devices: A Multidimensional, Network-Informed Measurement Approach. In Proceedings of the Internet Measurement Confernece.
[14]
Niklas Semmler, Georgios Smaragdakis, and Anja Feldmann. 2019a. Distributed Mega-Datasets: The Need for Novel Computing Primitives. In 39th IEEE International Conference on Distributed Computing Systems, ICDCS 2019, Dallax, TX, USA, July 7-9, 2010.
[15]
Niklas Semmler, Georgios Smaragdakis, and Anja Feldmann. 2019b. Online Replication Strategies for Distributed Data Stores. Open Journal of Internet Of Things (OJIOT) 5, 1 (2019), 47--57. 2364-7108
[16]
Liwen Sun, Michael J. Franklin, Sanjay Krishnan, and Reynold S. Xin. 2014. Fine-Grained Partitioning for Aggressive Data Skipping. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD '14). Association for Computing Machinery, New York, NY, USA, 1115--1126.
[17]
International Telecommunication Union. 2019. Individuals Using the Internet Statistics. https://www.itu.int/en/ITU-D/Statistics/Pages/stat/default.aspx. Accessed: 2020-04-18.
[18]
Jia-Dong Zhang and Chi-Yin Chow. 2015. GeoSoCa: Exploiting Geographical, Social and Categorical Correlations for Point-of-Interest Recommendations. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '15). Association for Computing Machinery, New York, NY, USA, 443--452.
[19]
Weinan Zhang, Shuai Yuan, and Jun Wang. 2014. Optimal Real-Time Bidding for Display Advertising. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '14). Association for Computing Machinery, New York, NY, USA, 1077--1086.

Cited By

View all
  • (2022)Leveraging joint allocation of multidimensional resources for distributed task assignmentJournal of Optical Communications and Networking10.1364/JOCN.44674714:5(351)Online publication date: 1-Apr-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
EdgeSys '20: Proceedings of the Third ACM International Workshop on Edge Systems, Analytics and Networking
April 2020
78 pages
ISBN:9781450371322
DOI:10.1145/3378679
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data replication
  2. distributed systems
  3. edge computing

Qualifiers

  • Research-article

Funding Sources

  • ERC

Conference

EuroSys '20
Sponsor:
EuroSys '20: Fifteenth EuroSys Conference 2020
April 27, 2020
Heraklion, Greece

Acceptance Rates

Overall Acceptance Rate 10 of 23 submissions, 43%

Upcoming Conference

EuroSys '25
Twentieth European Conference on Computer Systems
March 30 - April 3, 2025
Rotterdam , Netherlands

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)1
Reflects downloads up to 01 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Leveraging joint allocation of multidimensional resources for distributed task assignmentJournal of Optical Communications and Networking10.1364/JOCN.44674714:5(351)Online publication date: 1-Apr-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media