Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1065167.1065191acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
Article

Multi-structural databases

Published: 13 June 2005 Publication History

Abstract

We introduce the Multi-Structural Database, a new data framework to support efficient analysis of large, complex data sets. An instance of the model consists of a set of data objects, together with a schema that specifies segmentations of the set of data objects according to multiple distinct criteria (e.g., into a taxonomy based on a hierarchical attribute). Within this model, we develop a rich set of analytical operations and design highly efficient algorithms for these operations. Our operations are formulated as optimization problems, and allow the user to analyze the underlying data in terms of the allowed segmentations.
Our algorithms and results extend those of Fagin et al. [8] who studied composition of mappings given by several kinds of constraints. In particular, they proved that full source-to-target tuple-generating dependencies (tgds) are closed under composition, but embedded source-to-target tgds are not. They introduced a class of second-order constraints, <i>SO tgds</i>, that is closed under composition and has desirable properties for data exchange.
We study constraints that need not be source-to-target and we concentrate on obtaining (first-order) embedded dependencies. As part of this study, we also consider full dependencies and second-order constraints that arise from Skolemizing embedded dependencies. For each of the three classes of mappings that we study, we provide (a) an algorithm that attempts to compute the composition and (b) sufficient conditions on the input mappings that guarantee that the algorithm will succeed.
In addition, we give several negative results. In particular, we show that full dependencies are not closed under composition, and that second-order dependencies that are not limited to be source-to-target are not closed under restricted composition. Furthermore, we show that determining whether the composition can be given by these kinds of dependencies is undecidable.

References

[1]
R. Agrawal, A. Gupta, and S. Sarawagi. Modeling multidimensional databases. In Proc. 13th Intl. Conference on Data Engineering, pages 232--243, 1997.
[2]
D. Barbará, Y. Li, and J. Couto. COOLCAT: An entropy-based algorithm for categorical clustering. In Proc. 11th Intl. Conference on Information and Knowledge Management, pages 582--589, 2002.
[3]
L. Cabibbo and R. Torlone. A logical framework for querying multidimensional data. In Intl. Seminar on New Techniques and Technologies for Statistics, pages 155--162, 1998.
[4]
E. F. Codd, S. B. Codd, and C. T. Salley. Providing OLAP (on-line analytical processing) to user analysts: An IT mandate, 1993. Arbor Software, now Hyperion Solutions Corp., White Paper.
[5]
W. F. Cody, J. T. Kreulen, V. Krishna, and W. S. Spangler. The integration of business intelligence and knowledge management. IBM Systems Journal, 41(4):697--713, 2002.
[6]
U. Feige. A threshold of ln n for approximating set cover. J. ACM, 45(4):634--652, 1998.
[7]
R. Feldman and I. Dagan. Knowledge discovery in textual databases (KDT). In Knowledge Discovery and Data Mining, pages 112--117, 1995.
[8]
V. Ganti, J. Gehrke, and R. Ramakrishnan. Cactus: clustering categorical data using summaries. In Proc. 5th ACM SIGKDD Intl. Conference on Knowledge Discovery and Data Mining, pages 73--83, 1999.
[9]
S. Gollapudi and D. Sivakumar. Framework and algorithms for trend analysis in massive temporal data sets. In Proc. 13th Intl. Conference on Information and Knowledge Management, pages 168--177, 2004.
[10]
J. Gray, A. Bosworth, A. Layman, and H. Pirahesh. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-total. In Proc. 12th Intl. Conference on Data Engineering, pages 152--159, 1996.
[11]
M. Grigni and F. Manne. On the complexity of the generalized block distribution. In Proc. 3rd Intl. Workshop on Parallel Algorithms for Irregularly Structured Problems, pages 319--326, 1996.
[12]
D. Gruhl, L. Chavet, D. Gibson, J. Meyer, P. Pattanayak, A. Tomkins, and J. Zien. How to build a WebFountain: An architecture for very large-scale text analytics. IBM Systems Journal, 43(1):64--77, 2004.
[13]
S. Guha, R. Rastogi, and K. Shim. Rock: A robust clustering algorithm for categorical attributes. In Proc. 15th Intl. Conference on Data Engineering, page 512, 1999.
[14]
M. Gyssens and L. Lakshmanan. A foundation for multi-dimensional databases. In Proc. 23rd Intl. Conference on Very Large Data Bases, pages 106--115, 1997.
[15]
J. Han. Towards on-line analytical mining in large databases. SIGMOD Record, 27(1):97--107, 1998.
[16]
V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing data cubes efficiently. In Proc. ACM SIGMOD Intl. Conference on Management of Data, pages 205--216, 1996.
[17]
J. Håstad. Clique is hard to approximate within n1-ε Acta Mathematica, pages 105--142, 1999.
[18]
S. Khanna, S. Muthukrishnan, and S. Skiena. Efficient array partitioning. In Proc. 24th Intl. Colloquium on Automata, Languages and Programming, pages 616--626, 1997.
[19]
R. Kimball. The Data Warehouse Toolkit. J. Wiley and Sons, Inc, 1996.
[20]
L. Lakshmanan, J. Pei, and J. Han. Quotient cube: How to summarize the semantics of a data cube. In Proc. 28th Intl. Conference on Very Large Data Bases, pages 778--789, 2002.
[21]
B. Lent, R. Agrawal, and R. Srikant. Discovering trends in text databases. In Proc. 3rd Intl. Conference on Knowledge Discovery in Databases and Data Mining, August 1997.
[22]
C. Lund and M. Yannakakis. On the hardness of approximating minimization problems. J. ACM, 41(5):960--981, 1994.
[23]
K. E. Paluch. A 2(1/8)-approximation algorithm for rectangle tiling. In Proc. 31st Intl. Colloquium on Automata, Languages and Programming, pages 1054--1065, 2004.
[24]
S. Sarawagi. User-adaptive exploration of multidimensional data. In Proc. 26th Intl. Conference on Very Large Data Bases, pages 307--316, 2000.
[25]
S. Sarawagi, R. Agrawal, and N. Megiddo. Discovery-driven exploration of OLAP data cubes. In Proc. 6th Intl. Conference on Extending Database Technology, pages 168--182, 1998.
[26]
S. Sarawagi and G. Sathe. i3: Intelligent, interactive investigation of OLAP data cubes. In Proc. ACM SIGMOD Intl. Conference on Management of Data, page 589, 2000.
[27]
J. Tremblay and R. Manohar. Discrete Mathematical Structures with Applications to Computer Science. McGraw Hill Book Company, 1975.
[28]
P. Vassiliadis and T. Sellis. A survey of logical models for OLAP databases. SIGMOD Record, 28(4):64--69, 1999.

Cited By

View all
  • (2022)Sommelier: Curating DNN Models for the MassesProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526173(1876-1890)Online publication date: 10-Jun-2022
  • (2021)Efficient Exploration of Interesting Aggregates in RDF GraphsProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457307(392-404)Online publication date: 9-Jun-2021
  • (2020)DIFF: a relational interface for large-scale data explanationThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-020-00633-630:1(45-70)Online publication date: 30-Sep-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PODS '05: Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
June 2005
388 pages
ISBN:1595930620
DOI:10.1145/1065167
  • General Chair:
  • Georg Gottlob,
  • Program Chair:
  • Foto Afrati
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2005

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SIGMOD/PODS05

Acceptance Rates

Overall Acceptance Rate 642 of 2,707 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Sommelier: Curating DNN Models for the MassesProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526173(1876-1890)Online publication date: 10-Jun-2022
  • (2021)Efficient Exploration of Interesting Aggregates in RDF GraphsProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457307(392-404)Online publication date: 9-Jun-2021
  • (2020)DIFF: a relational interface for large-scale data explanationThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-020-00633-630:1(45-70)Online publication date: 30-Sep-2020
  • (2018)DIFFProceedings of the VLDB Endowment10.14778/3297753.329776112:4(419-432)Online publication date: 1-Dec-2018
  • (2018)The Cascading Analysts AlgorithmProceedings of the 2018 International Conference on Management of Data10.1145/3183713.3183745(1083-1096)Online publication date: 27-May-2018
  • (2018)Mining Tours and Paths in Activity NetworksProceedings of the 2018 World Wide Web Conference10.1145/3178876.3186112(459-468)Online publication date: 10-Apr-2018
  • (2017)Multimedia, Similarity, and Preferences: Adding Flexibility to Your Information NeedsA Comprehensive Guide Through the Italian Database Research Over the Last 25 Years10.1007/978-3-319-61893-7_8(127-141)Online publication date: 31-May-2017
  • (2015)Multimedia Queries in Digital LibrariesData Management in Pervasive Systems10.1007/978-3-319-20062-0_15(311-325)Online publication date: 2015
  • (2014)Taxonomy-based relaxation of query answering in relational databasesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-013-0350-x23:5(747-769)Online publication date: 1-Oct-2014
  • (2013)SHIATSUMultimedia Tools and Applications10.1007/s11042-011-0948-163:2(357-385)Online publication date: 1-Mar-2013
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media