Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1982185.1982397acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

ClustCube: an OLAP-based framework for clustering and mining complex database objects

Published: 21 March 2011 Publication History

Abstract

In this paper, we introduce and experimentally assess ClustCube, an innovative OLAP-based framework for clustering and mining complex database objects extracted from distributed database settings by means of complex SQL statements involving multiple JOIN queries across (distributed) relational tables. To this end, ClustCube puts together conventional clustering techniques and well-consolidated OLAP methodologies in order to achieve higher expressive power and mining effectiveness over traditional methodologies for mining tuple-oriented information. A relevant challenge in our research is represented by the issue of efficiently computing ClustCube cubes, enriched by the respective cuboid lattices, which may represent a critical bottleneck for the proposed ClustCube framework. To face-off this drawback, we propose a collection of algorithms that implement an innovative distributive approach taking advantages from both the structured nature of complex database objects within cuboids and the distributive nature of clustering across hierarchical domains, like those defined by conventional OLAP schemas.

References

[1]
Agarwal, S., Agrawal, R., Deshpande, P., Gupta, A., Naughton, J. F., Ramakrishnan, R., Sarawagi, S.: On the Computation of Multidimensional Aggregates. In: Proc. of VLDB 1996, pp. 506--521 (1996)
[2]
Cuzzocrea, A.: OLAP Intelligence: Meaningfully Coupling OLAP and Data Mining Tools and Algorithms. International Journal of Business Intelligence and Data Mining 3(3--4), pp. 213--218 (2009)
[3]
Eckerson, W. W.: Three Tier Client/Server Architecture: Achieving Scalability, Performance, and Efficiency in Client Server Applications. Open Information Systems 10(1), pp. 3--20 (1995)
[4]
Emmerich, W., Kaveh, N.: Component Technologies: Java Beans, COM, CORBA, RMI, EJB and the CORBA Component Model. In: Proc. of ACM ICSE 2002, pp. 691--692 (2002)
[5]
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: Proc. of KDD 1996, pp. 226--231 (1996)
[6]
Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venkatrao, M., Pellow, F., Pirahesh, H.: Data Cube: A Relational Aggregation Operator Generalizing Group-by, Cross-Tab, and Sub Totals. Data Mining and Knowledge Discovery 1(1), pp. 29--53 (1997)
[7]
Han J.: Towards On-line Analytical Mining in Large Databases. ACM SIGMOD Record 27(1), pp. 97--107 (1998)
[8]
Han, J., Kamber, M.: Data Mining: Concepts and Techniques, second ed. Morgan Kauffmann Publishers, San Francisco, CA, USA (2006)
[9]
Hinneburg, A., Keim, D. A.: Clustering Methods for Large Databases: From the Past to the Future. In: Proc. of ACM SIGMOD 1999, p. 509 (1999)
[10]
Kriegel, H.-P., Kröger, P., Zimek, A.: Clustering High-Dimensional Data: A Survey on Subspace Clustering, Pattern-Based Clustering, and Correlation Clustering. ACM Transactions on Knowledge Discovery from Data 3(1), pp. 1--58 (2009)
[11]
Ng, R. T., Han, J.: CLARANS: A Method for Clustering Objects for Spatial Data Mining. IEEE Transactions on Knowledge and Data Engineering 14(5), pp. 1003--1016 (2002)
[12]
Sarawagi, S.: iDiff: Informative Summarization of Differences in Multidimensional Aggregates. Data Mining and Knowledge Discovery 5(4), pp. 255--276 (2001)
[13]
Sarawagi, S., Agrawal, R., Megiddo, N.: Discovery-Driven Exploration of OLAP Data Cubes. In: Proc. of EDBT 1998, pp. 168--182 (1998)
[14]
Sheikholeslami, G., Chatterjee, S., Zhang, A.: WaveCluster: A Wavelet Based Clustering Approach for Spatial Data in Very Large Databases. VLDB Journal 8(3--4), pp. 289--304 (2000)
[15]
Transaction Processing Council, TPC Benchmark H, available at http://www.tpc.org/tpch/
[16]
Vrhovnik, M., Schwarz, H., Suhre, O., Mitschang, B., Markl, V., Maier, A., Kraft, T.: An Approach to Optimize Data Processing in Business Processes. In: Proc. of VLDB 2007, pp. 615--626 (2007)
[17]
Xin, D., Han, J., Xiaolei, L., Shao, Z., and Wah, B. W.: Computing Iceberg Cubes by Top-Down and Bottom-Up Integration: The StarCubing Approach. IEEE Transactions on Knowledge and Data Engineering 19(1), pp. 111--126 (2007)
[18]
Yin, X., Han, J., Yu, P. S.: CrossClus: User-Guided Multi-Relational Clustering. Data Mining and Knowledge Discovery 15(3), pp. 321--348 (2007)
[19]
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: A New Data Clustering Algorithm and Its Applications. Data Mining and Knowledge Discovery 1(2), pp. 141--182 (1997)

Cited By

View all
  • (2020)Experimenting and Assessing a Distributed Privacy-Preserving OLAP over Big Data Framework: Principles, Practice, and Experiences2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC48688.2020.00-69(1344-1350)Online publication date: Jul-2020
  • (2018)Effectively and efficiently supporting roll-up and drill-down OLAP operations over continuous dimensions via hierarchical clusteringJournal of Intelligent Information Systems10.1007/s10844-013-0268-144:3(309-333)Online publication date: 28-Dec-2018
  • (2017)Scalable OLAP-Based Big Data Analytics over Cloud InfrastructuresProceedings of the 2017 International Conference on Cloud and Big Data Computing10.1145/3141128.3141149(17-21)Online publication date: 17-Sep-2017
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '11: Proceedings of the 2011 ACM Symposium on Applied Computing
March 2011
1868 pages
ISBN:9781450301138
DOI:10.1145/1982185
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 March 2011

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

SAC'11
Sponsor:
SAC'11: The 2011 ACM Symposium on Applied Computing
March 21 - 24, 2011
TaiChung, Taiwan

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2020)Experimenting and Assessing a Distributed Privacy-Preserving OLAP over Big Data Framework: Principles, Practice, and Experiences2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC48688.2020.00-69(1344-1350)Online publication date: Jul-2020
  • (2018)Effectively and efficiently supporting roll-up and drill-down OLAP operations over continuous dimensions via hierarchical clusteringJournal of Intelligent Information Systems10.1007/s10844-013-0268-144:3(309-333)Online publication date: 28-Dec-2018
  • (2017)Scalable OLAP-Based Big Data Analytics over Cloud InfrastructuresProceedings of the 2017 International Conference on Cloud and Big Data Computing10.1145/3141128.3141149(17-21)Online publication date: 17-Sep-2017
  • (2015)OLAP-enabled web search of complex objectsProceedings of the 17th International Conference on Information Integration and Web-based Applications & Services10.1145/2837185.2837243(1-10)Online publication date: 11-Dec-2015
  • (2015)A data placement strategy for data-intensive scientific workflows in cloudProceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing10.1109/CCGrid.2015.72(928-934)Online publication date: 4-May-2015
  • (2015)Computing and Mining ClustCube Cubes EfficientlyAdvances in Knowledge Discovery and Data Mining10.1007/978-3-319-18032-8_12(146-161)Online publication date: 9-May-2015
  • (2012)Enhanced clustering of complex database objects in the clustcube frameworkProceedings of the fifteenth international workshop on Data warehousing and OLAP10.1145/2390045.2390066(129-136)Online publication date: 2-Nov-2012
  • (2011)OLAP over continuous domains via density-based hierarchical clusteringProceedings of the 15th international conference on Knowledge-based and intelligent information and engineering systems - Volume Part II10.5555/2041341.2041405(559-570)Online publication date: 12-Sep-2011
  • (2011)OLAP over Continuous Domains via Density-Based Hierarchical ClusteringKnowlege-Based and Intelligent Information and Engineering Systems10.1007/978-3-642-23863-5_57(559-570)Online publication date: 2011

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media