Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3616131.3616141acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiccbdcConference Proceedingsconference-collections
research-article
Open access

A Serverless Architecture for Efficient and Scalable Monte Carlo Markov Chain Computation

Published: 02 October 2023 Publication History

Abstract

Computer power is a constantly increasing demand in scientific data analyses, in particular when Markov Chain Monte Carlo (MCMC) methods are involved, for example for estimating integral functions or Bayesian posterior probabilities. In this paper, we describe the benefits of a parallel computation of MCMC using a cloud-based, serverless architecture: first, the computation time can be spread over thousands of processes, hence greatly reducing the time the user should wait to have its computation completed. Second, the overhead time required for running in parallel several processes is minor and grows logarithmically with respect to the number of processes. Third, the serverless approach does not require time-consuming efforts for maintaining and updating the computing infrastructure when/if the number of walkers increases or for adapting the code to optimally use the infrastructure. The benefits are illustrated with the computation of the posterior probability distribution of a real astronomical analysis.

References

[1]
Aitor Arjona, Pedro García López, Josep Sampé, Aleksander Slominski, and Lionel Villard. 2021. Triggerflow: Trigger-based orchestration of serverless workflows. Future Gener. Comput. Syst. 124 (2021), 215–229.
[2]
Daniel Barcelona-Pons, Marc Sánchez-Artigas, Gerard París, Pierre Sutra, and Pedro García-López. 2019. On the faas track: Building stateful distributed applications with serverless architectures. In Proceedings of the 20th international middleware conference. 41–54.
[3]
J. E. Carlstrom, P. A. R. Ade, K. A. Aird, B. A. Benson, L. E. Bleem, S. Busetti, C. L. Chang, E. Chauvin, H. M. Cho, T. M. Crawford, A. T. Crites, M. A. Dobbs, N. W. Halverson, S. Heimsath, W. L. Holzapfel, J. D. Hrubes, M. Joy, R. Keisler, T. M. Lanting, A. T. Lee, E. M. Leitch, J. Leong, W. Lu, M. Lueker, D. Luong-Van, J. J. McMahon, J. Mehl, S. S. Meyer, J. J. Mohr, T. E. Montroy, S. Padin, T. Plagge, C. Pryke, J. E. Ruhl, K. K. Schaffer, D. Schwan, E. Shirokoff, H. G. Spieler, Z. Staniszewski, A. A. Stark, C. Tucker, K. Vanderlinde, J. D. Vieira, and R. Williamson. 2011. The 10 Meter South Pole Telescope. PASP 123, 903 (May 2011), 568. https://doi.org/10.1086/659879 arxiv:0907.4445 [astro-ph.IM]
[4]
Fabio Castagna and Stefano Andreon. 2019. PreProFit: Pressure Profile Fitter for galaxy clusters. A&A 632, Article A22 (Dec. 2019), A22 pages. https://doi.org/10.1051/0004-6361/201936487 arxiv:1910.06620 [astro-ph.IM]
[5]
Eric D. Feigelson, Rafael S. de Souza, Emille E. O. Ishida, and Gutti Jogesh Babu. 2021. 21st Century Statistical and Computational Challenges in Astrophysics. Annual Review of Statistics and Its Application 8 (March 2021), 493–517. https://doi.org/10.1146/annurev-statistics-042720-112045 arxiv:2005.13025 [astro-ph.IM]
[6]
Pedro García-López, Marc Sánchez-Artigas, Simon Shillaker, Peter Pietzuch, David Breitgand, Gil Vernik, Pierre Sutra, Tristan Tarrant, and Ana Juan Ferrer. 2019. ServerMix: Tradeoffs and Challenges of Serverless Data Analytics. arXiv e-prints, Article arXiv:1907.11465 (July 2019), arXiv:1907.11465 pages. arxiv:1907.11465 [cs.DC]
[7]
W. K. Hastings. 1970. Monte Carlo Sampling Methods using Markov Chains and their Applications. Biometrika 57, 1 (April 1970), 97–109. https://doi.org/10.1093/biomet/57.1.97
[8]
Pierre Jacob, Christian P. Robert, and Murray H. Smith. 2010. Using parallel computation to improve Independent Metropolis–Hastings based estimation. arXiv e-prints, Article arXiv:1010.1595 (Oct. 2010), arXiv:1010.1595 pages. https://doi.org/10.48550/arXiv.1010.1595 arxiv:1010.1595 [stat.CO]
[9]
Eric Jonas, Qifan Pu, Shivaram Venkataraman, Ion Stoica, and Benjamin Recht. 2017. Occupy the cloud: distributed computing for the 99%. In Proceedings of the 2017 Symposium on Cloud Computing, SoCC 2017, Santa Clara, CA, USA, September 24-27, 2017. ACM, 445–451.
[10]
Youngbin Kim and Jimmy Lin. 2018. Serverless Data Analytics with Flint. arXiv e-prints, Article arXiv:1803.06354 (March 2018), arXiv:1803.06354 pages. https://doi.org/10.48550/arXiv.1803.06354 arxiv:1803.06354 [cs.DC]
[11]
Jacek Kusnierz, Vincenzo Eduardo Padulano, Maciej Malawski, Kamil Burkiewicz, Enric Tejedor Saavedra, Pedro Alonso-Jordá, Michael Pitt, and Valentina Avati. 2022. A Serverless Engine for High Energy Physics Distributed Analysis. In 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2022, Taormina, Italy, May 16-19, 2022. IEEE, 575–584.
[12]
Zhuozhao Li, Ryan Chard, Yadu N. Babuji, Ben Galewsky, Tyler J. Skluzacek, Kirill Nagaitsev, Anna Woodard, Ben Blaiszik, Josh Bryan, Daniel S. Katz, Ian T. Foster, and Kyle Chard. 2022. $f$funcX: Federated Function as a Service for Science. IEEE Trans. Parallel Distributed Syst. 33, 12 (2022), 4948–4963.
[13]
Maciej Malawski, Adam Gajek, Adam Zima, Bartosz Balis, and Kamil Figiela. 2020. Serverless execution of scientific workflows: Experiments with HyperFlow, AWS Lambda and Google Cloud Functions. Future Gener. Comput. Syst. 110 (2020), 502–514.
[14]
Josep Sampe, Marc Sanchez-Artigas, Gil Vernik, Ido Yehekzel, and Pedro Garcia-Lopez. 2021. Outsourcing Data Processing Jobs with Lithops. IEEE Transactions on Cloud Computing (2021).
[15]
Vaishaal Shankar, Karl Krauth, Kailas Vodrahalli, Qifan Pu, Benjamin Recht, Ion Stoica, Jonathan Ragan-Kelley, Eric Jonas, and Shivaram Venkataraman. 2020. Serverless linear algebra. In SoCC ’20: ACM Symposium on Cloud Computing, Virtual Event, USA, October 19-21, 2020, Rodrigo Fonseca, Christina Delimitrou, and Beng Chin Ooi (Eds.). ACM, 281–295.
[16]
Suryakanthi Tangirala. 2016. Efficient Big Data Analytics and Management through the Usage of Cloud Architecture. Journal of Advances in Information Technology Vol 7, 4 (2016).
[17]
Douglas N VanDerwerken and Scott C Schmidler. 2013. Parallel markov chain monte carlo. arXiv preprint arXiv:1312.7479 (2013).

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICCBDC '23: Proceedings of the 2023 7th International Conference on Cloud and Big Data Computing
August 2023
101 pages
ISBN:9798400707339
DOI:10.1145/3616131
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 October 2023

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICCBDC 2023

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 116
    Total Downloads
  • Downloads (Last 12 months)116
  • Downloads (Last 6 weeks)15
Reflects downloads up to 03 Sep 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media