Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3311790.3401777acmconferencesArticle/Chapter ViewAbstractPublication PagespearcConference Proceedingsconference-collections
research-article
Public Access

SLATE: Monitoring Distributed Kubernetes Clusters

Published: 26 July 2020 Publication History

Abstract

The SLATE (Services Layer at the Edge) accelerates collaborative scientific computing through a secure container orchestration framework focused on the Science DMZ, enabling creation of advanced multi-institution platforms and novel science gateways. The goal of the SLATE project is to provide a secure federation platform to simplify deployment and operation of complex and often specialized services required by multi-institution scientific collaborations, utilizing where applicable open source, cloud native tooling such as Kubernetes. This paper outlines the design and operation of a monitoring infrastructure suitable for application developers and resource providers which gives visibility to resource utilization and service deployments across a network of independently managed Kubernetes clusters.

Supplemental Material

MP4 File
Presentation video

References

[1]
Kubernetes Authors. 2018. Kubernetes | Production-Grade Container Orchestration. https://kubernetes.io/
[2]
Prometheus Authors. 2018. Prometheus Monitoring and Alerting Toolkit. https://prometheus.io/
[3]
Joe Breen, Lincoln Bryant, Gabriele Carcassi, Jiahui Chen, Robert W. Gardner, Ryan Harden, Martin Izdimirski, Robert Killen, Ben Kulbertis, Shawn McKee, Benedikt Riedel, Jason Stidd, Luan Truong, and Ilija Vukotic. 2018. Building the SLATE Platform. In Proceedings of the Practice and Experience on Advanced Research Computing (Pittsburgh, PA, USA) (PEARC ’18). ACM, New York, NY, USA, Article 5, 7 pages. https://doi.org/10.1145/3219104.3219144
[4]
Brendan Burns, Brian Grant, David Oppenheimer, Eric Brewer, and John Wilkes. 2016. Borg, Omega, and Kubernetes. ACM Queue 14(2016), 70–93. http://queue.acm.org/detail.cfm?id=2898444
[5]
Ceph.io. 2020. Ceph Architecture. Retrieved 2020/04/25 from https://docs.ceph.com/docs/master/architecture/
[6]
Kyle Chard, Eli Dart, Ian Foster, David Shifflett, Steven Tuecke, and Jason Williams. 2018. The Modern Research Data Portal: a design pattern for networked, data-intensive science. PeerJ Computer Science 4, e144.
[7]
CoreOS/RedHat. 2020. The Prometheus Operator makes the Prometheus configuration Kubernetes native and manages and operates Prometheus and Alertmanager clusters.Retrieved 20/02/10 from https://github.com/coreos/prometheus-operator
[8]
ElasticStack 2020. Elastic Stack. https://www.elastic.co/guide/index.html
[9]
Robert Gardner, Joseph Breen, Lincoln Bryant, and S McKee. 2017. SLATE and the Mobility of Capability. Science Gateways 2017 -, - (2017). https://figshare.com/articles/SLATE_and_the_Mobility_of_Capability/5501269
[10]
Red Hat. 2020. Ansible IT Management. Retrieved 2020/04/25 from https://www.ansible.com/overview/how-ansible-works
[11]
Improbable. 2020. Thanos Open source, highly available Prometheus setup with long term storage capabilities.https://thanos.io/
[12]
KubernetesMonitoringBP01 2020. Best Practices for Monitoring and Alerting on Kubernetes. https://rancher.com/learning-paths/best-practices-for-monitoring-and-alerting-on-kubernetes/
[13]
KubernetesMonitoringBP02 2020. Monitoring Kubernetes: Best Practices and Methods. https://www.sumologic.com/kubernetes/monitoring/#metrics-monitor-kubernetes
[14]
KubernetesWebUI 2020. Kubernetes Web UI. https://kubernetes.io/docs/tasks/access-application-cluster/web-ui-dashboard/
[15]
Grafana Labs. 2020. Grafana is the open source analytics and monitoring solution for every database. Retrieved 2020/02/10 from https://grafana.com/
[16]
Puppet Labs. 2020. Open Source Puppet Configuration Management. Retrieved 2020/04/25 from https://puppet.com/open-source/osp
[17]
Shawn McKee, Ezra Kissel, Benjeman Meekhof, Martin Swany, Charles Miller, and Michael Gregorowicz. 2017. OSiRIS: a distributed Ceph deployment using software defined networking for multi-institutional research. Journal of Physics: Conference Series 898 (oct 2017), 062045. https://doi.org/10.1088/1742-6596/898/6/062045
[18]
Fábio Oliveira, Sahil Suneja, Shripad Nadgowda, Priya Nagpurkar, and Canturk Isci. 2017. Opvis: Extensible, Cross-Platform Operational Visibility and Analytics for Cloud. In Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference: Industrial Track (Las Vegas, Nevada) (Middleware ’17). Association for Computing Machinery, New York, NY, USA, 43–49. https://doi.org/10.1145/3154448.3154455
[19]
RancherMonitoring 2020. Integrating Rancher and Prometheus for Cluster Monitoring. https://rancher.com/docs/rancher/v2.x/en/cluster-admin/tools/monitoring/
[20]
Weisong Shi, Jie Cao, Quan Zhang, Youhuizi Li, and Lanyu Xu. 2016. Edge Computing: Vision and Challenges. IEEE Internet of Things Journal 3, 5 (2016), 637–646. https://doi.org/10.1109/JIOT.2016.2579198

Cited By

View all
  • (2024)Securing Kubernetes: A Study on the Measures for Enhancing Control and Data Plane SecurityAI Applications in Cyber Security and Communication Networks10.1007/978-981-97-3973-8_9(127-152)Online publication date: 18-Sep-2024
  • (2023)iMon: Network Function Virtualisation Monitoring Based on a Unique AgentIEICE Transactions on Communications10.1587/transcom.2022EBP3103E106.B:3(230-240)Online publication date: 1-Mar-2023
  • (2023)AdapPF: Self-Adaptive Scrape Interval for Monitoring in Geo-Distributed Cluster Federations2023 IEEE Symposium on Computers and Communications (ISCC)10.1109/ISCC58397.2023.10218080(417-423)Online publication date: 9-Jul-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PEARC '20: Practice and Experience in Advanced Research Computing 2020: Catch the Wave
July 2020
556 pages
ISBN:9781450366892
DOI:10.1145/3311790
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 July 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Containerization
  2. Distributed Monitoring
  3. Distributed computing
  4. Edge computing

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

PEARC '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 133 of 202 submissions, 66%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)189
  • Downloads (Last 6 weeks)31
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Securing Kubernetes: A Study on the Measures for Enhancing Control and Data Plane SecurityAI Applications in Cyber Security and Communication Networks10.1007/978-981-97-3973-8_9(127-152)Online publication date: 18-Sep-2024
  • (2023)iMon: Network Function Virtualisation Monitoring Based on a Unique AgentIEICE Transactions on Communications10.1587/transcom.2022EBP3103E106.B:3(230-240)Online publication date: 1-Mar-2023
  • (2023)AdapPF: Self-Adaptive Scrape Interval for Monitoring in Geo-Distributed Cluster Federations2023 IEEE Symposium on Computers and Communications (ISCC)10.1109/ISCC58397.2023.10218080(417-423)Online publication date: 9-Jul-2023
  • (2023)Cost-Effective Automation: Cloud-Based Monitoring Combining HPA with VPA for Scalable Startups2023 9th International Conference on Wireless and Telematics (ICWT)10.1109/ICWT58823.2023.10335324(1-5)Online publication date: 6-Jul-2023
  • (2023)Toward the Observability of Cloud-Native Applications: The Overview of the State-of-the-ArtIEEE Access10.1109/ACCESS.2023.328186011(73036-73052)Online publication date: 2023
  • (2023)MONCHi: MONitoring for Cloud-native Hyperconnected IslandsComputer Performance Engineering and Stochastic Modelling10.1007/978-3-031-43185-2_20(294-308)Online publication date: 20-Jun-2023
  • (2022)Real-Time Resource Monitoring Framework in a Heterogeneous Kubernetes Cluster2022 Muthanna International Conference on Engineering Science and Technology (MICEST)10.1109/MICEST54286.2022.9790264(184-189)Online publication date: 16-Mar-2022

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media