Abstract
In current digital era according to (as far) massive progress and development of internet and online world technologies such as big and powerful data servers we face huge volume of information and data day by day from many different resources and services which was not available to human kind just a few decades ago. This data comes from available different online resources and services that are established to serve customers. Services and resources like Sensor Networks, Cloud Storages, Social Networks and etc., produce big volume of data and also need to manage and reuse that data or some analytical aspects of the data. Although this massive volume of data can be really useful for people and corporates it could be problematic as well. Therefore big volume of data or big data has its own deficiencies as well. They need big storage/s and this volume makes operations such as analytical operations, process operations, retrieval operations real difficult and hugely time consuming. One resolution to overcome these difficult problems is to have big data summarized so they would need less storage and extremely shorter time to get processed and retrieved. The summarized data will be then in “compact format” and still informative version of the entire data. Data summarization techniques aim then to produce a “good” quality of summaries. Therefore, they would hugely benefit everyone from ordinary users to researches and corporate world, as it can provide an efficient tool to deal with large data such as news (for new summarization).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
A. Hathaway, J. Bezdek, and Y. Hu, “Generalized fuzzyc-means clustering strategies using Lnorm distances,” IEEE Transaction on Fuzzy Systems, 8(5):576–582, October 2000.
J. MacQueen, “Some methods for classification and analysis of multivariate observations,” in Proc. 5th Berkeley Sympium, 1:281–297, 1967.
G. Carpenter, S. Grossberg, and D. Rosen, “Fuzzy ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system,” Neural Network, 4:759–771, 1991.
G. Anagnostopoulos and M. Georgiopoulos, “Ellipsoid ART and ARTMAP for incremental unsupervised and supervised learning,” Proceedings of IEEE International Joint Conference Neural Networks (IJCNN’01), Washington DC, pp. 1221–1226, 2001.
J. Mao and A. Jain, “A self-organizing network for hyperellipsoidal clustering (HEC),” IEEE Transactions Neural Networks, 7(1):16–29, January 1996.
C. Van Rijsbergen, “Information Retrieval,” Butterworth-Heinemann, 1979.
J. Cezkanowski, “Zur differentialdiagnose der neandertalgruppe. KorrespondenzBlatt deutsch. Ges. Anthropol,” Ethnol. Urgesch, 40:44–47, 1909.
R. Whittaker, “A study of summer foliage insect communities in the Great Smoky Mountains,” Ecological Monographs, 22:1–44, 1952.
L. Legendre and P. Legendre, “Numerical ecology,” New York: Elsevier Scientific, 1983.
R. Johnson and D. Wichern, “Applied multivariate statistical analysis,” Englewood Cliffs, NJ: Prentice–Hall, 1998.
P.F. Russel and T. R. Rao, “On habitat and association of species of anopheline larvae in south-eastern Madras,” Journal of Malaria India Institute (3):153–178, 1940.
R.R. Sokal and C. D. Michener, “A statistical method for evaluating systematic relationships,” Bulletin of the Society of University of Kansas, 38:1409–1438, 1958.
P. Jaccard, “Étude comparative de la distribuition florale dans une portion des Alpes et de Jura,” Bulletin de la Societé Voudoise des Sciences Naturelles, 37:547–579, 1901.
J.S. Rogers and T. T. Tanimoto, “A computer program for classifying plants,” Science, 132:1115–1118, 1960.
S. Kulczynski, “Classe des Sciences Mathématiques et Naturelles, ” Bulletin International de lʼAcadamie Polonaise des Sciences et des Lettres Série B (Sciences Naturelles) (Supplement II), pp. 57–203, 1927.
J. Tubbs, “A note on binary template matching,” Pattern Recognition, 22(4):359–365, 1989.
L. Kaufman and P. Rousseeuw, “Finding Groups in Data: An Introduction to Cluster Analysis,” Wiley, 1990.
B. Everitt, S. Landau, and M. Leese, “Cluster Analysis,” London:Arnold, 2001.
P. Sneath, “The application of computers to taxonomy,” J. Gen. Microbiology, 17:201–226, 1957.
T. Sorensen, “A method of establishing groups of equal amplitude in plant sociology based on similarity of species content and its application to analyzes of the vegetation on Danish commons,” Biologiske Skrifter, 5:1–34, 1948.
A. Jain and R. Dubes, “Algorithms for clustering data,” Englewood Cliffs, NJ: Prentice–Hall, 1988.
T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: An efficient data clustering method for very large databases,” Proceedings of ACM International Conference Management of Data (SIGMOD), pp. 103–114, 1996.
T. Chiu, D. Fang, J. Chen, Y. Wang and C. Jeris, “A robust and scalable clustering algorithm for mixed type attributes in large database environment,” Proceedings of 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 263–268, 2001.
V. Ganti, R. Ramakrishnan, J. Gehrke, A. Powell, and J. French, “Clustering large datasets in arbitrary metric spaces,” Proceedings of the 15th International Conference on Data Engineering (ICDE), pp. 502–511, 1999.
S. Guha, R. Rastogi, and K. Shim, “CURE: An efficient clustering algorithm for large databases,” Proc. ACM SIGMOD International Conference Management of Data, pp. 73–84, 1998.
S. Guha, R. Rastogi, and K. Shim, “ROCK: A robust clustering algorithm for categorical attributes,” Information Systems, 25(5):345–366, 2000.
E. Forgy, “Cluster analysis of multivariate data: efficiency vs. interpretability of classifications,” Biometrics, 21:768–780, 1965.
J. MacQueen, “Some methods for classification and analysis of multivariate observations,” Proceedings of 5th Berkeley Symposium, 1:281–297, 1976.
J. Mao and A.K. Jain, “A Self-organizing network for hyperellipsoidal clustering (HEC),” IEEE Transactions on Neural Networks, 7(1):16–29, 1996.
J. Dunn, “A fuzzy relative of the ISODATA process and its use in detecting compact well separated clusters,” Journal of Cybernetic, 3(3):32–57, 1974.
E. Forgy, “Cluster analysis of multivariate data: Efficiency versus interpretability of classification,” Biometrics, 21:768–780, 1965.
J. Dunn, “A fuzzy relative of the ISODATA process and its use in detecting compact well separated clusters,” Journal of Cybernetics, 3(3):32–57, 1974.
J. Bezdek, “Pattern Recognition with fuzzy objective function algorithms,” New York: Plenum, 1981.
S. Eschrich, J. Ke, J. Hall and D. Goldgof, “Fast accurate fuzzy clustering through data reduction,” IEEE Transactions on Fuzzy Systems, 11 (2):262–270, 2003.
M. Steinbach, G. Karypis, and V. Kumar, “A comparison of document clustering techniques,” KDD Workshop on Text Mining, 2000.
D. Pelleg and A. Moore, “Accelerating exact K-means algorithms with geometric reasoning,” Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.277–281, 1999.
D. Pelleg and A. Moore, “X-means: extending K-means with efficient estimation of the number of clusters,” Proceedings 17th International Conference on Machine Learning (ICML), Stanford University, 2000.
B. Schölkopf, C. Burges, and A. Smola, “Advances in kernel methods: support vector learning,” The MIT Press, 1999.
L. Kaufman and P. Rousseeuw, “Finding groups in data: an introduction to cluster analysis,” John Wiley and Sons, New York, NY, 1990.
R. Ng and J. Han, “Efficient and effective clustering methods for spatial data mining,” Proceedings of the 20th International Conference on Very Large Databases (VLDB), pp.144–155, Santiago, Chile, 1994.
M. Ester, H-P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise,” Proceedings of the 2nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 226–231, Portland, Oregon, 1996.
X. Xu, M. Ester, H-P. Kriegel, and J. Sander, “A distribution-based clustering algorithm for mining in large spatial databases,” Proceedings of the 14th International Conference on Data Engineering (ICDE), 324–331, Orlando, FL, 1998.
J. Sander, M. Ester, H-P. Kriegel, and X. Xu, “Density-based clustering in spatial databases: the algorithm GDBSCAN and its applications,” Data Mining and Knowledge Discovery, 2(2):169–194, 1998.
A. Hinneburg and D. Keim, “An efficient approach to clustering large multimedia databases with noise,” Proceedings of the 4th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 58–65, 1998.
M. Ankerst, M. Breunig, and H-P. Kriegel, K. Sander, “OPTICS: Ordering points to identify clustering structure,” Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 49–60, 1999.
P. Grabusts and Borisov, “A Using grid-clustering methods in data classification,” Proceedings of the IEEE International Conference on Parallel Computing in Electrical Engineering (PARELEC), 2002.
F. Murtagh and P. Contreras, “Methods of Hierarchical Clustering,” CSIR, 2011.
S.A. Elavarasi, J. Akilandeswari, B. Sathiyabhama, “A survey on partition clustering algorithms,” International Journal of Enterprise Computing and Business Systems, 2011.
W. Wang, J. Yang, and R. Muntz, “STING: a statistical information grid approach to spatial data mining,”, Proceedings of the 23rd International Conference on Very Large Databases (VLDB), pp. 18–195, 1997.
G. Sheikholeslami, S. Chatterjee, and A. Zhang, “Wavecluster: a wavelet based clustering approach for spatial data in very large databases,” The VLDB Journal, 8(3–4):289–304, 2000.
E. Schikuta, “Grid-clustering: An efficient hierarchical clustering method for very large data sets,” Proceedings of the 13th IEEE International Conference on Pattern Recognition, pp. 101–105, 1996
D. Barbar and P. Chen, “Using the fractal dimension to cluster datasets,” Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 260–264, 2000.
A. Hinneburg and D. Keim, “Optimal grid-clustering: towards breaking the curse of dimensionality in high-dimensional clustering,” Proceedings of the 25th International Conference on Very Large Data Bases (VLDB), pp. 506–517, 1999.
R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan, “Automatic subspace clustering of high dimensional data for data mining applications,” Proc. ACM SIGMOD Int. Conf. Management of Data, pp. 94–105, 1998.
P. Berkhin, “Survey of clustering data mining techniques,” Technical report, Accrue Software, San Jose, California, 2002.
P. Kaur and S. Aggrawal, “Comparative study of clustering techniques,” International Journal on Advanced Research in Engineering and Technology, 1:69–75, 2013.
R. Xu and D. Wunsch, “Survey of clustering algorithms,” IEEE Transactions on Neural Networks, 16(3):645–678, 2005.
W.G. Cochran, “Sampling techniques,” 3rd Ed. John Wiley, 1977.
J.S. Vitter. “Random sampling with a reservoir,” ACM Transactions on Mathematical Software, pp.37–57, 1985.
J.S. Vitter, “Faster methods for random sampling,” Communication of the ACM (CACM), 27(7), July 1984.
J. Zhang, J. Xu, and S. Liao, “Sampling methods for summarizing unordered vehicle-to-vehicle data streams”, Transportation Research Part C—Emerging Technologies, 23:56–67, 2012.
M. Dash. And W. Ng, “Efficient reservoir sampling for transactional data streams,” Proceedings of IEEE International Conference on Data Mining (ICDM), pp. 662–666, 2006.
D. Ghosh, and A. Vogt, “A modification of Poisson sampling,” Proceedings of the American Statistical Association, Survey Research Methods Section, pp.198–199, 1999.
B. Babcock, M. Datar, and R. Motwani, “Sampling from a moving window over streaming data,” Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). Society for Industrial and Applied Mathematics, Philadelphia, pp. 633–634, 2002.
C.C. Aggarwal. “On biased reservoir sampling in the presence of stream evolution,” Proceedings of the 32nd International Conference on Very large Data Bases (VLDB), pp.607–618, 2006.
R. Gemulla, W. Lehner, and P.J. Haas, “A Dip in the reservoir maintaining sample synopses of evolving datasets,” Proceedings of the 32nd International Conference on Very large Data Bases (VLDB), pp. 595–606, 2006.
P.B. Gibbons and Y. Matias, “New sampling-based summary statistics for improving approximate query answers,” Proceedings of the ACM International Conference on Management of Data (SIGMOD), New York, NY USA, pp. 331–342, 1998.
R. Gemulla, W. Lehner, and P.J. Haas, “Maintaining Bernoulli samples over evolving multisets,” In: Proc. ACM International Conference on Principles of Database Systems (PODS), pp. 93–102, 2007.
S. Chaudhuri, G. Das, M. Datar, R. Motwani, and V. Narasayya, ” Overcoming limitations of sampling for aggregation queries,” Proceedings of the IEEE International Conference on Data Engineering (ICDE), 2001.
C. Hua-Hui and L. Kang-Li, “Weighted random sampling based hierarchical amnesic synopses for data streams,”Proceedings of the 5th International Conference on Computer Science and Education (ICCSE), pp.1816–1820, 2010.
P.S. Efraimidis and P.G. Spirakis, “Weighted random sampling with a reservoir,” Information Processing Letters, 97(5):181–185, 2006.
S. Acharya, P.B. Gibbons, and V. Poosala, “Congressional samples for approximate answering of group-by queries,” ACMSIGMOD Record, 29(2):487–498, 2000.
H.J. Chang and K.C. Huang, “Remainder linear systematic sampling,” Sankhya B 62, pp. 249–256, 2000.
N. Uthayakumaran, “Additional circular systematic sampling methods”. Biometrical Journal, 40 (4):467–474, 1998.
C.-H. Leu and F.F. Kao, “Modified balanced circular systematic sampling,” Statistics & Probability Letters, 76(4):373–383, 2006.
M.A. Bujang et al., “Modification of systematic sampling: a comparison with a conventional approach in systematic sampling,” Proceedings of the International Conference on Statistics in Science, Business, and Engineering (ICSSBE), pp.1–4, 2012.
M. Al-Kateb, B.S. Lee, and X.S. Wang, “Adaptive-size reservoir sampling over data streams,” Proceedings of the 19th IEEE International Conference on Scientific and Statistical Database Management, Banff, Canada, pp. 22–33, 2007.
M. Al-Kateb and B.S. Lee, “Adaptive stratified reservoir sampling over heterogeneous data streams,” Information Systems, Available online, 2012.
M.D. Bankier, “Power allocations: determining sample sizes for subnational areas,” The American Statistician, 42:174–177, 1988.
S. Chaudhuri, G. Das, and V. Narasayya, “Optimized stratified sampling for approximate query processing,” ACM Transactions on Database Systems (TODS), 32(2), p.9-es, June 2007.
T. Liu and G. Agrawal, “Stratified k-means clustering over a deep web data source,” Proceedings of the 18th ACM International Conference on Knowledge Discovery and Data Mining (KDD), pp.1113–1121, 2012.
H. Sug, “A structural sampling technique for better decision trees,” Proceedings of the 1st Asian Conference on Intelligent Information and Database Systems (ACIIDS), pp.24–27, 2009.
A. Pol, C. Jermaine, and S. Arumugam, “Maintaining very large random samples using the geometric file,” The VLDB Journal, 17:997–1018, 2008.
T.S. Buda, J. Murphy, and M. Kristiansen, “Towards realistic sampling: generating dependencies in a relational database”. Proceedings of the 7th International Conference on Ubiquitous Information Management and Communication (ICUIMC), 2013.
S. Cong, J. Han, J. Hoeflinger, and D. Padua, “A sampling-based framework for parallel data mining,” Proceedings of the 10th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pp. 255–265, 2005.
B. Babcock, S. Chaudhuri, and G. Das, “Dynamic sample selection for approximate query processing,” Proceedings of the ACM International Conference on Management of Data (SIGMOD), pp. 539–550, 2003.
R. Gemulla, W. Lehner, and P. J. Haas, “Maintaining bounded-size sample synopses of evolving datasets,” The VLDB Journal, 17:173–201, 2008.
R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. I. Verkamo, “Fast discovery of association rules,” In Advances in Knowledge Discovery and Data Mining, 1996.
B. Chen, P. Haas, and P. Scheuermann, “A new two-phase sampling based algorithm for discovering association rules,” Proceedings of the eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2002.
F. Olken, “Random sampling from databases,” Ph. D. Dissertation, 1993.
I. Boxill, C. Chambers, and W. Eleanor, “Introduction to social research with applications to the Caribbean,” University of the West Indies Press, Chapter 4, page 36, 1997.
C.A. Moser, “Quota sampling,” Journal of the Royal Statistical Society, 115(3):411–423, 1952.
C. Sibona and S. Walczak, “Purposive sampling on Twitter: a case study," Proceedings of the 45th Hawaii International Conference System Science (HICSS), pp. 3510, 3519, 2012.
D.F. Nettleton, “Data mining of social networks represented as graphs,” Computer Science Review, 7:1–34, 2013.
P.D. Grünwald, “Minimum description length tutorial,” In: Advances in Minimum Description Length, P. Grünwald and I. Myung I (eds), MIT Press, Cambridge, 2005.
J. Rissanen, “Modeling by shortest data description,” Automatica, 14(1):465–471, 1978.
P.D. Grunwald, “The Minimum description length principle and reasoning under uncertainty,” cwi.nl, 1998.
J. Kiernan and E. Terzi,“Constructing comprehensive summaries of large event sequences,” Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 417–425, 2008.
J. Kiernan and E. Terzi, “Constructing comprehensive summaries of large event sequences,” ACM Transactions on Knowledge and Data Discovery Data, 3(4), 2009.
P. Wang, H. Wang, M. Liu, and W. Wang, “An algorithmic approach to event summarization,” Proceedings of the ACM International Conference on Management of data (SIGMOD), pp.183–194, 2010.
Y. Jiang, C.-S. Perng, and T. Li, “Natural event summarization,” Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM), pp.765–774, 2011.
R. Agrawal, C. Aggarwal, and V.V.V. Prasad, “Depth first generation of long patterns,” Proceedings of 7th International Conference on Knowledge Discovery and Data Mining, 2000.
D. Burdick, M. Calimlim, and J. Gehrke, “MAFIA: a maximal frequent itemset algorithm for transactional databases,” Proceedings of the International Conference on Data Engineering (ICDE), April 2001.
J. Pei, J. Han, and R. Mao, “Closet: An efficient algorithm for mining frequent closed itemsets,” Proceedings of the ACM SIGMOD Workshop on Data Mining and Knowledge Discovery, May 2000.
W. Zhou, H. Liu, and H. Cheng, “Mining closed episodes from event sequences efficiently,” Proceedings of the 14th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining (PAKDD), pp. 310–318, 2010.
S. A. Vreeken and M. van Leeuwen, “Item sets that compress,” Proceedings of SIAM International Conference on Data Mining (SDM), pp.393–404, 2006.
M. van Leeuwen, J. Vreeken, A. Siebes, “Compression picks the item sets that matter,” Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), pp 585–592, 2006.
J. Vreeken, M. van Leeuwen, and A. Siebes, “Krimp: mining itemsets that compress,” Data Mining and Knowledge Discovery, 23(1):169–214, 2011.
M. Leeuwen and A. Siebes, “StreamKrimp: detecting change in data streams,” Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), pp: 672–687, 2008.
K. Smets and J. Vreeken, “Slim: directly mining descriptive patterns,” Proceedings of SIAM International Conference on Data Mining (SDM), pp. 236–247, 2012.
N. Tatti and J. Vreeken, “The long and the short of it: summarising event sequences with serial episodes,” Proceedings of the 18th ACM SIGKDD international conference on Knowledge Discovery and Data Mining (KDD), pp: 462–470, 2012.
L.H. Thanh, M. Fabian, F. Dmitriy, and C. Toon, “Mining compressing sequential patterns,” Statistical Analysis and Data Mining, 2013.
F. Moerchen, M. Thies, and A. Ultsch, “Efficient mining of all margin-closed itemsets with applications in temporal knowledge discovery and classification by compression,” Knowledge Information Systems, 29:55–80, 2011.
R. Polikar, “The wavelet tutorial,” http://engineering.rowan.edu/polikar/WAVELETS/WTtutorial.html.
G. Strang, “Wavelet transforms versus fourier transforms,” Bulletin of American Mathematic Society, (new series 28):288–305, 1990.
A. Haar, “Zur Theorie der orthogonalen Funktionensysteme,”Mathematische Annalen, 69(3):331–371, 1910.
I. Daubechies, “Ten lectures on wavelets,” SIAM publications, 1992.
M. Garofalakis and P. B. Gibbons, “Probabilistic wavelet synopses,” ACM Transactions on Database Systems (TODS), 29:43–90, 2004.
Y. Matias, J.S. Vitter, and M. Wang, “Wavelet-based histograms for selectivity estimation,” Proceedings of the ACM International Conference on Management of Data (SIGMOD), pp. 448–459, 1998.
Y. Matias and D. Urieli, “Inner-product based wavelet synopses for range-sum queries,” Proceedings of the 14th Annual European Symposium on Algorithms (ESA), pp. 504–515, 2006.
J. S. Vitter and M. Wang, “Approximate computation of multidimensional aggregates of sparse data using wavelets”, Proceedings of the ACM International Conference on Management of Data (SIGMOD), pp. 193–204, 1999.
K. Chakrabarti, M. Garofalakis, R. Rastogi, and K. Shim, “Approximate query processing using wavelets,” The VLDB Journal, 10(2–3):199–223, 2001.
A.C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss, “Surfing wavelets on streams: One-pass summaries for approximate aggregate queries”. The VLDB Journal, pp. 79–88, 2001.
D. Sacharidis, A. Deligiannakis, and T. Sellis, “Hierarchically compressed wavelet synopses,” The VLDB Journal, 18:203–231, 2009.
A. Deligiannakis and N. Roussopoulos, “Extended wavelets for multiple measures,” Proceedings of ACM International Conference on Management of Data (SIGMOD), pp. 229–240, 2003.
A. Deligiannakis, M. Garofalakis, and N. Roussopoulos, “Extended wavelets for multiple measures,” ACM Transactions on Database Systems (TODS), 32(2), 2007.
S. Guha, C. Kim, and K. Shim, “Xwave: Approximate extended wavelets for streaming data,” Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 288–299, 2004.
S. Guha and B. Harb, “Approximation algorithms for wavelet transform coding of data streams,” Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA), 2006.
Y. Matias, J.S. Vitter, and M. Wang, “Dynamic maintenance of wavelet-based histograms,” Proceedings of International Conference on Very Large Data Bases (VLDB), pp. 101–110, 2000.
G. Cormode, M. Garofalakis, and D. Sacharidis, “Fast approximate wavelet tracking on streams,” Proceedings of the International Conference on Extending Database Technology (EDBT), 2006.
P. Karras and N. Mamoulis, “One-pass wavelet synopses for maximum-error metrics,” Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 421–432, 2005.
K.-L. Liao, H.-H. Chen, J.-B. Qian, and Y.-H. Dong, “Wavelet decomposition algorithm for uncertain data streams,”Proceedings of the 6th International Conference on Computer Science & Education (ICCSE), pp.965–970, 2011.
Y. Zhao, C. Aggarwal, and P. Yu, “On wavelet decomposition of uncertain time series data sets,” Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM), pp.129–138, 2010.
C.C. Aggarwal (ed.), “Data streams: models and algorithms”, Springer, 2007.
M. Stern, E. Buchmann, and K. Böhm, “A wavelet transform for efficient consolidation of sensor relations with quality guarantees,” Proceedings of the International Conference on Very Large Databases (VLDB), pp.157–168, 2009.
J. Jestes, K. Yi, and F. Li, “Building wavelet histograms on large data in MapReduce,” Proceedings of the International Conference on Very Large Databases (VLDB), pp.109–120, 2011.
G. Cormode and M. Garofalakis, “Histograms and wavelets on probabilistic data,"Proceedings of the IEEE 25th International Conference on Data Engineering (ICDE), pp.293–304, 2009.
R. P. Kooi, “The optimization of queries in relational databases,” PhD thesis, Case Western Reserver University, Sept. 1980.
M. Muralikrisbna and D.J. Dewitt, “Equi-depth histograms for estimating selectivity factors for multidimensional queries,” Proceedings of ACM International Conference on Management of Data (SIGMOD), pp. 28–36, 1988.
Y. Ioannidis and V. Poosala. “Balancing histogram optimality and practicality for query result size estimation”. Proceedings of ACM International Conference on Management of Data (SIGMOD), pp. 233–244, 1995.
V. Poosala, Y.E. Ioannidis, P.J. Haas, E.J. Shekita, “Improved histograms for selectivity estimation of range predicates,” Proceedings of ACM International Conference on Management of Data (SIGMOD), pp. 294–305, 1996.
A.C. Konig and G. Weikum, “Combining histograms and parametric curve fitting for feedback-driven query result-size estimation,” Proceedings of the International Conference on Very Large Data Bases (VLDB), Edinburgh, pp. 423–434, 1999.
V. Poosala and Y. Ioannidis, “Selectivity estimation without the attribute value independence assumption,” Proceedings of the International Conference on Very Large Data Bases (VLDB), Athens, pp: 486–495, 1997.
D. Gunopulos, G. Kollios, V.J. Tsotras, and C. Domeniconi, “Approximating multi-dimensional aggregate range queries over real attributes,” Proceedings of the ACM International Conference on Management of Data (SIGMOD), pp.463–474, 2000.
N. Bruno and S. Chaudhuri, “Exploiting statistics on query expressions for optimization,” Proceedings of the ACM International Conference on Management of Data (SIGMOD), pp. 263–274, 2002.
C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu, “A framework for clustering evolving data streams,” Proceedings of the 29th International conference on Very Large Data Bases (VLDB), pp. 81–92, 2003.
F. Cao, M. Ester, W. Qian, and A. Zhou, “Density-based clustering over an evolving data stream with noise,” Proceedings of SIAM Conference on Data Mining (SDM), pp. 328–339, 2006.
Y. Chen, “Density-based clustering for real-time stream data,” Proceedings of the Knowledge Discovery and Data Mining (KDD), San Jose, California, USA, pp. 133–142, 2007.
J. Ren, R. Ma, and J. Ren, “Density-based data streams clustering over sliding windows,” Proceedings of the 6th International Conference on Fuzzy systems and Knowledge Discovery (FSKD), Piscataway, NJ, USA, pp. 248–252, 2009.
W. Ng and M. Dash, “Discovery of frequent patterns in transactional data streams,” Transactions on Large-Scale Data- and Knowledge-Centered Systems II,. Springer Berlin/Heidelberg, 6380:1–30, 2010.
L.-X. Liu, H. Huang, Y.-F. Gu, and F.-C. Chen, “rDenStream—a clustering algorithm over an evolving data stream,”Proceedings of CIECS International Conference on Information Engineering and Computer Science, pp.1–4, 2009.
C. Ruiz, E. Menasalvas, and M. Spiliopoulou, “C-DenStream: using domain knowledge on a data stream,” Proceedings of the 12th International Conference on Discovery Science, pp. 287–301, 2009.
W.-H. Zhu, Y. Yin, Y.-H. Xie, “Arbitrary shape cluster algorithm for clustering data stream,” Journal of Software, 17(3):379–387, 2006.
H. Wang, Y. Yu, Q. Wang, and Y. Wan, “A density-based clustering structure mining algorithm for data streams,” Proceedings of the 1st ACM International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications (BigMine), pp. 69–76, 2012.
P. Kranen, I. Assent, C. Baldauf, and T. Sei, “The ClusTree: indexing micro-clusters for anytime stream mining,” Knowledge Information Systems, 29(2):249–272, 2011.
A. Amini, T.Y. Wah, M.R. Saybani, and S.R.A.S. Yazdi, “A study of density-grid based clustering algorithms on data streams,” Proceedings of 18th International Conference Fuzzy Systems and Knowledge Discovery (FSKD), 3:1652–1656, 2011.
A. Amini and T.Y. Wah,“ Density micro-clustering algorithms on data streams: a review,” Proceeding of the International Multiconference of Engineers and Computer scientists (IMECS), 2011.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer Science+Business Media New York
About this chapter
Cite this chapter
Hesabi, Z., Tari, Z., Goscinski, A., Fahad, A., Khalil, I., Queiroz, C. (2015). Data Summarization Techniques for Big Data—A Survey. In: Khan, S., Zomaya, A. (eds) Handbook on Data Centers. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-2092-1_38
Download citation
DOI: https://doi.org/10.1007/978-1-4939-2092-1_38
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-2091-4
Online ISBN: 978-1-4939-2092-1
eBook Packages: Computer ScienceComputer Science (R0)