Abstract
We consider the k-median clustering problem over distributed streams. In the distributed streaming setting there are multiple computational nodes where each node receives a data stream and the goal is to maintain an approximation of a function of interest at all time over the union of the local data at all the nodes. The approximation is maintained at a coordinator node which has bidirectional communication channels to all the nodes. This model is also known as the distributed functional monitoring model. A natural variant of this model is the distributed sliding window model where we are interested only in maintaining approximation over a recent period of time.
This paper gives new algorithms for the k-median clustering problem in the distributed streaming model and its sliding-window counter part.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Throughout this paper with high probability means with probability at least \((1-\frac{1}{n})\).
References
Braverman, V., Lang, H., Levin, K., Monemizadeh, M.: Clustering problems on sliding windows. In: Krauthgamer, R. (ed.) Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2016, Arlington, VA, USA, 10–12 January 2016, pp. 1374–1390. SIAM (2016)
Braverman, V., Meyerson, A., Ostrovsky, R., Roytman, A., Shindler, M., Tagiku, B.: Streaming k-means on well-clusterable data. In: Randall, D. (ed.) Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2011, San Francisco, California, USA, 23–25 January 2011, pp. 26–40. SIAM (2011)
Chan, H.-L., Lam, T.W., Lee, L.-K., Ting, H.-F.: Continuous monitoring of distributed data streams over a time-based sliding window. In: Marion, J.-Y., Schwentick, T. (eds.) 27th International Symposium on Theoretical Aspects of Computer Science, STACS 2010, Nancy, France, 4–6 March 2010. LIPIcs, vol. 5, pp. 179–190. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2010)
Charikar, M., O’Callaghan, L., Panigrahy, R.: Better streaming algorithms for clustering problems. In: Larmore, L.L., Goemans, M.X. (eds.) Proceedings of the 35th Annual ACM Symposium on Theory of Computing, 9–11 June 2003, San Diego, CA, USA, pp. 30–39. ACM (2003)
Cormode, G.: Algorithms for continuous distributing monitoring: a survey. In: Laura, L., Querzoni, L. (eds.) First International Workshop on Algorithms and Models for Distributed Event Processing 2011, Proceedings, Rome, Italy, 19 September 2011. ACM International Conference Proceeding Series, vol. 585, pp. 1–10. ACM (2011)
Cormode, G., Muthukrishnan, S., Yi, K.: Algorithms for distributed functional monitoring. In: Teng, S.-H. (ed.) Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2008, San Francisco, California, USA, 20–22 January 2008, pp. 1076–1085. SIAM (2008)
Cormode, G., Muthukrishnan, S., Yi, K., Zhang, Q.: Optimal sampling from distributed streams. In: Paredaens, J., Van Gucht, D. (eds.) Proceedings of the Twenty-Ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2010, Indianapolis, Indiana, USA, 6-11 June 2010, pp. 77–86. ACM (2010)
Cormode, G., Muthukrishnan, S., Zhuang, W.: Conquering the divide: continuous clustering of distributed data streams. In: Chirkova, R., Dogac, A., Tamer Özsu, M., Sellis, T.K. (eds.) Proceedings of the 23rd International Conference on Data Engineering, ICDE 2007, The Marmara Hotel, Istanbul, Turkey, 15-20 April 2007, pp. 1036–1045. IEEE (2007)
Cormode, G., Yi, K.: Tracking distributed aggregates over time-based sliding windows. In: Ailamaki, A., Bowers, S. (eds.) SSDBM 2012. LNCS, vol. 7338, pp. 416–430. Springer, Heidelberg (2012)
Frahling, G., Sohler, C.: Coresets in dynamic geometric data streams. In: Gabow, H.N., Fagin, R. (eds.) Proceedings of the 37th Annual ACM Symposium on Theory of Computing, Baltimore, MD, USA, 22–24 May 2005, pp. 209–217. ACM (2005)
Har-Peled, S., Mazumdar, S.: On coresets for k-means and k-median clustering. In: Babai, L. (ed.) Proceedings of the 36th Annual ACM Symposium on Theory of Computing, Chicago, IL, USA, 13–16 June 2004, pp. 291–300. ACM (2004)
Jain, K., Mahdian, M., Saberi, A.: A new greedy approach for facility location problems. In: Reif, J.H. (ed.) Proceedings on 34th Annual ACM Symposium on Theory of Computing, Montréal, Québec, Canada, 19–21 May 2002, pp. 731–740. ACM (2002)
Keralapura, R., Cormode, G., Ramamirtham, J.: Communication-efficient distributed monitoring of thresholded counts. In: Chaudhuri, S., Hristidis, V., Polyzotis, N. (eds.) Proceedings of the ACM SIGMOD International Conference on Management of Data, Chicago, Illinois, USA, 27–29 June 2006, pp. 289–300. ACM (2006)
Meyerson, A.: Online facility location. In: 42nd Annual Symposium on Foundations of Computer Science, FOCS 2001, Las Vegas, Nevada, USA, 14–17 October 2001, pp. 426–431. IEEE Computer Society (2001)
Woodruff, D.P., Zhang, Q.: Tight bounds for distributed functional monitoring. In: Karloff, H.J., Pitassi, T. (eds.) Proceedings of the 44th Symposium on Theory of Computing Conference, STOC 2012, New York, NY, USA, 19–22 May 2012, pp. 941–960. ACM (2012)
Zhang, Q., Liu, J., Wang, W.: Approximate clustering on distributed data streams. In: Alonso, G., Blakeley, J.A., Chen, A.L.P. (eds.) Proceedings of the 24th International Conference on Data Engineering, ICDE 2008, Cancún, México, 7–12 April 2008, pp. 1131–1139. IEEE (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Gayen, S., Vinodchandran, N.V. (2016). Algorithms for k-median Clustering over Distributed Streams. In: Dinh, T., Thai, M. (eds) Computing and Combinatorics . COCOON 2016. Lecture Notes in Computer Science(), vol 9797. Springer, Cham. https://doi.org/10.1007/978-3-319-42634-1_43
Download citation
DOI: https://doi.org/10.1007/978-3-319-42634-1_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42633-4
Online ISBN: 978-3-319-42634-1
eBook Packages: Computer ScienceComputer Science (R0)