research-article

The wisdom of virtual crowds: mining datacenter telemetry to collaboratively debug performance

Authors:

Dragos Ionescu,

Rean GriffithAuthors Info & Claims

SOCC '13: Proceedings of the 4th annual Symposium on Cloud Computing

Article No.: 33, Pages 1 - 2

https://doi.org/10.1145/2523616.2525941

Published: 01 October 2013 Publication History

Get Access

Abstract

Explaining the (mis)behavior of virtual machines in large-scale cloud environments presents a number of challenges with respect to both scale and making sense of torrents of datacenter telemetry emanating from multiple levels of the stack. In this paper we leverage VM-similarity to explain the behavior or performance of a VM using its cohort as a reference (or by contrasting it against groups of VMs outside of its cohort). The key insight is that virtual machines (VMs) running the same application (components or workloads), or VMs colocated within the same (logical) tier of a complex application exhibit similar telemetry patterns.

The power of similarity relationships stems from the additional context that similarity provides. The quantitative or qualitative "distance" between a VM and its expected cohort could be used to explain or diagnose any discrepancy. Similarly, the distance between a VM and one in another cohort can be used to explain why the VMs are dissimilar. As an example we apply our data-mining techniques to debugging ViewPlanner performance. ViewPlanner is a tool used to emulate and evaluate large-scale deployments of virtual desktops. Using a ViewPlanner deployment of 175 VMs we collect ~ 300 metrics-per-VM, sampled at 20-second frequency over multiple 1 hour epochs, from the PerformanceManager [4] on ESX and automatically filter (using entropy measures [2]) and cluster them using K-means [1]. We use the median value of each metric within an epoch to summarize the VM's behavior during that epoch.

We introduce spread/diffusion metrics to explain the difference between VMs. Spread metrics are those such that the expected value of the order statistic (in our case the median) of a metric, m, E[m] differs between two clusters, i.e., the expected value is conditioned on the cluster, E[m_i|clusterA] ≠ E[m_i|clusterB]. Within a cluster of VMs, differences in the distribution of a particular metric, m_i, may be explained by conditioning m_i on other metrics, {c₀, ..., c_n}, where E[m_i] ≠ E[m_i|c₀, ..., c_n]. We automatically find potentially interesting m_i's using Silverman's test [3] for multi-modality and we use Mutual Information [2] to find associated c_i's.

References

[1]

T. Hastie, R. Tibshirani, and J. H. Friedman. The elements of statistical learning: data mining, inference, and prediction: with 200 full-color illustrations. New York: Springer-Verlag, 2001.

Google Scholar

[2]

A. Hyvarinen, J. Karhunen, and E. Oja. Independent Component Analysis. John Wiley & Sons, Inc., 2001.

Google Scholar

[3]

B. W. Silverman. Using kernel density estimates to investigate multimodality. J. R. Statist. Soc. B, 43(1): 97--99, 1981.

Google Scholar

[4]

VMware Inc. vsphere performance. http://pubs.vmware.com/vsphere-50/index.jsp?topic=%2Fcom.vmware.wssdk.pg.doc_50%2FPG_Ch16_Performance.18.1.html.

Google Scholar

Cited By

View all

Delul PGriffith RHoller AShankari KZhu XSoundararajan RJagadeeshwaran APadala P(2014)Crowdsourced Resource-Sizing of Virtual AppliancesProceedings of the 2014 IEEE International Conference on Cloud Computing10.1109/CLOUD.2014.111(801-809)Online publication date: 27-Jun-2014
https://dl.acm.org/doi/10.1109/CLOUD.2014.111

Index Terms

The wisdom of virtual crowds: mining datacenter telemetry to collaboratively debug performance

Recommendations

Traffic-sensitive live migration of virtual machines
CCGRID '15: Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing

In this paper we address the problem of network contention between the migration traffic and the Virtual Machine (VM) application traffic for the live migration of co-located Virtual Machines. When VMs are migrated with pre-copy, they run at the source ...
Live gang migration of virtual machines
HPDC '11: Proceedings of the 20th international symposium on High performance distributed computing

This paper addresses the problem of simultaneously migrating a group of co-located and live virtual machines (VMs), i.e, VMs executing on the same physical machine. We refer to such a mass simultaneous migration of active VMs as "live gang migration". ...
Performance Metrics of Virtual Machine Live Migration
CLOUD '15: Proceedings of the 2015 IEEE 8th International Conference on Cloud Computing

Live virtual machine migration allows resources from one physical server to be moved to another with little or no interruption in the processes of the guest operating system. The process involved in performing a live migration includes copying the guest ...

Comments

Information & Contributors

Information

Published In

SOCC '13: Proceedings of the 4th annual Symposium on Cloud Computing

October 2013

427 pages

ISBN:9781450324281

DOI:10.1145/2523616

General Chair:
Guy Lohman

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2013

Check for updates

Qualifiers

Research-article

Conference

SOCC '13

Sponsor:

SOCC '13: ACM Symposium on Cloud Computing

October 1 - 3, 2013

California, Santa Clara

Acceptance Rates

SOCC '13 Paper Acceptance Rate 23 of 114 submissions, 20%;

Overall Acceptance Rate 169 of 722 submissions, 23%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
90
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Delul PGriffith RHoller AShankari KZhu XSoundararajan RJagadeeshwaran APadala P(2014)Crowdsourced Resource-Sizing of Virtual AppliancesProceedings of the 2014 IEEE International Conference on Cloud Computing10.1109/CLOUD.2014.111(801-809)Online publication date: 27-Jun-2014
https://dl.acm.org/doi/10.1109/CLOUD.2014.111

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Traffic-sensitive live migration of virtual machines

Live gang migration of virtual machines

Performance Metrics of Virtual Machine Live Migration

Comments

Published In

Sponsors

Publisher

Publication History

Check for updates

Qualifiers

Conference

Acceptance Rates

Other Metrics

Article Metrics

Other Metrics

Cited By

Login options

Full Access

PDF

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Traffic-sensitive live migration of virtual machines

Live gang migration of virtual machines

Performance Metrics of Virtual Machine Live Migration

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations