Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

iDiff: Informative Summarization of Differences in Multidimensional Aggregates

Published: 01 October 2001 Publication History

Abstract

Multidimensional OLAP products provide an excellent opportunity for integrating mining functionality because of their widespread acceptance as a decision support tool and their existing heavy reliance on manual, user-driven analysis. Most OLAP products are rather simplistic and rely heavily on the user's intuition to manually drive the discovery process. Such ad hoc user-driven exploration gets tedious and error-prone as data dimensionality and size increases. Our goal is to automate these manual discovery processes. In this paper we present an example of such automation through a iDiff operator that in a single step returns summarized reasons for drops or increases observed at an aggregated level.
We formulate this as a problem of summarizing the difference between two multidimensional arrays of real numbers. We develop a general framework for such summarization and propose a specific formulation for the case of OLAP aggregates. We develop an information theoretic formulation for expressing the reasons that is compact and easy to interpret. We design an efficient dynamic programming algorithm that requires only one pass of the data and uses a small amount of memory independent of the data size. This allows easy integration with existing OLAP products. Our prototype has been tested on the Microsoft OLAP server, DB2/UDB and Oracle 8i. Experiments using the OLAP benchmark demonstrate (1) scalability of our algorithm as the size and dimensionality of the cube increases and (2) feasibility of getting interactive answers with modest hardware resources.

References

[1]
Arbor Software Corporation, Sunnyvale, CA. Multidimensional Analysis: Converting Corporate Data into Strategic Information. http://www.arborsoft.com.
[2]
Chaudhuri, S. and Dayal, U. 1997. An overviewof data warehouse and OLAP technology. ACM SIGMOD Record.
[3]
Codd, E. F. 1993. Providing OLAP (on-line analytical processing) to user-analysts: An IT mandate. Technical Report, E. F. Codd and Associates.
[4]
Cognos Software Corporation. 1997. Power play 5, special edition. http://www.cognos.com/powercubes/ index.html.
[5]
International Data Corporation. http://www.idc.com, 1997.
[6]
The OLAP Council. The OLAP benchmark. http://www.olapcouncil.org.
[7]
Cover, T. M. and Thomas, J. A. 1991. Elements of Information Theory. New York: John Wiley and Sons.
[8]
DBMS. 1998. Open olap in intelligent enterprise. DBMS Magazine, April 1998. http://www.dbmsmag. com/9804d14.html.
[9]
Information Discovery. http://www.datamine.inter.net/.
[10]
Flajolet, P. and Martin, G. N. 1985. Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences, 31:182-209.
[11]
Gerber, C. 1996. Excavate your data. Datamation, May 1 1996. http://www.datamation.com/PlugIn/ issues/1996/may1/05asoft3.html.
[12]
Han, J. 1998. Towards on-line analytical mining: An integration of data warehousing and data mining. DB Summit: http://www.dbsummit.com/articles/Han/han.html.
[13]
Han, J. and Fu, Y. 1995. Discovery of multiple-level association rules from large databases. In Proc. of the 21st Int'l Conference on Very Large Databases, Zurich, Switzerland.
[14]
Intelligent Data Analysis Group IDAG. 1999. Olap vendors increasingly see data mining integration as potent differentiator. http://www.idagroup.com/v2n0701.htm.
[15]
Information Discovery Inc. 1996. Olap and datamining: Bridging the gap. http://www.datamining.com/ datamine/bridge.htm.
[16]
Microsoft Corporation.1998. http://www.microsoft.com/data/oledb/olap/spec/. OLE DB for OLAP version 1.0 Specification.
[17]
Sarawagi, S. 1999. Explaining differences in multidimensional aggregates. In Proc. of the 25th Int'l Conference on Very Large Databases (VLDB).
[18]
Shukla, A., Deshpande, P. M., Naughton, J. F., and Ramasamy, K. 1996. Storage estimation for multidimensional aggregates in the presence of hierarchies. In Proc. of the 22nd Int'l Conference on Very Large Databases, Mumbai (Bombay), India. pp. 522-531.
[19]
Sarawagi, S. and Sathe, G. 2000. i3: Intelligent, interactive investigaton of OLAP data cubes. In Proc. ACM SIGMOD International Conf. on Management of Data (Demonstration section), Dallas, USA, May 2000. http://www.it.iitb.ernet.in/ sunita/icube.
[20]
Thomsen, E. 1998. Olap and data mining: Creating a total dss solution. DB Summit. http://www.dbsummit. com/articles/Thomsen/thomsen.html.

Cited By

View all
  • (2020)Analysis of Measure Fluctuation Based on Adtributor AlgorithmProceedings of the 4th International Conference on Big Data Research10.1145/3445945.3445946(1-6)Online publication date: 27-Nov-2020
  • (2019)A data mining approach to knowledge discovery from multidimensional cube structuresKnowledge-Based Systems10.1016/j.knosys.2012.11.00840(36-49)Online publication date: 1-Jan-2019
  • (2018)Effectively and efficiently supporting roll-up and drill-down OLAP operations over continuous dimensions via hierarchical clusteringJournal of Intelligent Information Systems10.1007/s10844-013-0268-144:3(309-333)Online publication date: 28-Dec-2018
  • Show More Cited By

Index Terms

  1. iDiff: Informative Summarization of Differences in Multidimensional Aggregates

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Data Mining and Knowledge Discovery
      Data Mining and Knowledge Discovery  Volume 5, Issue 4
      October 2001
      108 pages

      Publisher

      Kluwer Academic Publishers

      United States

      Publication History

      Published: 01 October 2001

      Author Tags

      1. OLAP
      2. OLAP-mining integration
      3. advanced aggregates
      4. data summarization
      5. difference mining
      6. multidimensional databases

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 18 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2020)Analysis of Measure Fluctuation Based on Adtributor AlgorithmProceedings of the 4th International Conference on Big Data Research10.1145/3445945.3445946(1-6)Online publication date: 27-Nov-2020
      • (2019)A data mining approach to knowledge discovery from multidimensional cube structuresKnowledge-Based Systems10.1016/j.knosys.2012.11.00840(36-49)Online publication date: 1-Jan-2019
      • (2018)Effectively and efficiently supporting roll-up and drill-down OLAP operations over continuous dimensions via hierarchical clusteringJournal of Intelligent Information Systems10.1007/s10844-013-0268-144:3(309-333)Online publication date: 28-Dec-2018
      • (2014)AdtributorProceedings of the 11th USENIX Conference on Networked Systems Design and Implementation10.5555/2616448.2616454(43-55)Online publication date: 2-Apr-2014
      • (2011)OLAP over continuous domains via density-based hierarchical clusteringProceedings of the 15th international conference on Knowledge-based and intelligent information and engineering systems - Volume Part II10.5555/2041341.2041405(559-570)Online publication date: 12-Sep-2011
      • (2011)ClustCubeProceedings of the 2011 ACM Symposium on Applied Computing10.1145/1982185.1982397(976-982)Online publication date: 21-Mar-2011
      • (2004)A new OLAP aggregation based on the AHC techniqueProceedings of the 7th ACM international workshop on Data warehousing and OLAP10.1145/1031763.1031777(65-72)Online publication date: 12-Nov-2004

      View Options

      View options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media