Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/956750.956768acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Fragments of order

Published: 24 August 2003 Publication History

Abstract

High-dimensional collections of 0--1 data occur in many applications. The attributes in such data sets are typically considered to be unordered. However, in many cases there is a natural total or partial order ≺ underlying the variables of the data set. Examples of variables for which such orders exist include terms in documents, courses in enrollment data, and paleontological sites in fossil data collections. The observations in such applications are flat, unordered sets; however, the data sets respect the underlying ordering of the variables. By this we mean that if ABC are three variables respecting the underlying ordering ≺, and both of variables A and C appear in an observation, then, up to noise levels, variable B also appears in this observation. Similarly, if A1A2 ≺ … ≺ Al-1Ai is a longer sequence of variables, we do not expect to see many observations for which there are indices i < j < k such that Ai and Ak occur in the observation but Aj does not.In this paper we study the problem of discovering fragments of orders of variables implicit in collections of unordered observations. We define measures that capture how well a given order agrees with the observed data. We describe a simple and efficient algorithm for finding all the fragments that satisfy certain conditions. We also discuss the sometimes necessary postprocessing for selecting only the best fragments of order. Also, we relate our method with a sequencing approach that uses a spectral algorithm, and with the consecutive ones problem. We present experimental results on some real data sets (author lists of database papers, exam results data, and paleontological data).

References

[1]
R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In P. Buneman and S. Jajodia, editors, Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD'93), pages 207--216, Washington, D.C., USA, May 1993. ACM.
[2]
R. Agrawal and R. Srikant. Mining sequential patterns. In Proceedings of the Eleventh International Conference on Data Engineering (ICDE'95), pages 3--14, Taipei, Taiwan, Mar. 1995.
[3]
J. E. Atkins, E. G. Boman, and B. Hendrickson. A spectral algorithm for seriation and the consecutive ones problem. SIAM Journal on Computing, 28(1):297--310, Feb. 1999.
[4]
K. S. Booth and G. S. Lueker. Linear algorithms to recognize interval graphs and test for the consecutive ones property. In ACM, editor, Conference record of Seventh Annual ACM Symposium on Theory of Computing: papers presented at the Symposium, Albuquerque, New Mexico, May 5-May 7, 1975, pages 255--265, New York, NY, USA, 1975. ACM Press.
[5]
K. S. Booth and G. S. Lueker. Testing for the consecutive ones property, interval graphs, and graph planarity using P-Q tree algorithms. J. of Comp. and Syst. Sci., 13:335--379, 1976.
[6]
T. F. Chan and D. C. Resasco. A framework for the analysis and construction of domain decomposition preconditioners. Technical Report CAM-87-09, UCLA, 1987.
[7]
F. R. K. Chung. Spectral Graph Theory. CBMS Regional Conference Series in Mathematics, 1997.
[8]
R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge, 1998.
[9]
M. Fortelius. Private communication. 2003.
[10]
J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, 2000.
[11]
Hsu. A simple test for the consecutive ones property. Journal of Algorithms, 43, 2002.
[12]
J. Jernvall and M. Fortelius. Common mammals drive the evolutionary increase of hypsodonty in the neogene. Nature, 417:538--540, 2002.
[13]
Y. Koren and D. Harel. Multi-scale algorithm for the linear arrangement problem. Technical Report MCS02-04, Faculty of Mathematics and Computer Science, The Weizmann Institute of Science, 2002.
[14]
H. Mannila and C. Meek. Global partial orders from sequential data. In Proceedings of the sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Boston, MA, pages 161--168. ACM Press, 2000.
[15]
H. Mannila, H. Toivonen, and A. I. Verkamo. Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1(3):259--289, Nov. 1997.
[16]
A. Ng, M. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In In Advances in Neural Information Processing Systems, 2001.
[17]
A. Popescul, G. W. Flake, S. Lawrence, L. H. Ungar, and C. L. Giles. Clustering and identifying temporal trends in document databases. In ADL 2000, pages 173--182, 2000.
[18]
A. Pothen, H. Simon, and L. Wang. Spectral nested dissection. Technical Report CS-92-01, Pennsylvania State University, Department of Computer Science, 1992.
[19]
R. Ramakrishnan and J. Gehrke. Database Management Systems (2nd ed.). McGraw-Hill, 2001.
[20]
H. D. Simon. Partitioning of unstructured mesh problems for parallel processing. Computing Systems in Engineering, 2, 1991.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
August 2003
736 pages
ISBN:1581137370
DOI:10.1145/956750
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 August 2003

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. consecutive ones property
  2. discovering hidden orderings
  3. novel data mining algorithms
  4. spectral analysis of data

Qualifiers

  • Article

Conference

KDD03
Sponsor:

Acceptance Rates

KDD '03 Paper Acceptance Rate 46 of 298 submissions, 15%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)1
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2015)Discovering Restricted Regular Expressions with InterleavingWeb Technologies and Applications10.1007/978-3-319-25255-1_9(104-115)Online publication date: 13-Nov-2015
  • (2007)Learning to orderProceedings of the 3rd ECML/PKDD international conference on Mining complex data10.5555/1786154.1786177(209-223)Online publication date: 17-Sep-2007
  • (2007)Unsupervised pattern mining from symbolic temporal dataACM SIGKDD Explorations Newsletter10.1145/1294301.12943029:1(41-55)Online publication date: 1-Jun-2007
  • (2007)Finding low-entropy sets and trees from binary dataProceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining10.1145/1281192.1281232(350-359)Online publication date: 12-Aug-2007
  • (2007)Frequent pattern miningData Mining and Knowledge Discovery10.1007/s10618-006-0059-115:1(55-86)Online publication date: 1-Aug-2007
  • (2007)Learning to orderProceedings of the Third International Conference on Mining Complex Data10.1007/978-3-540-68416-9_17(209-223)Online publication date: 17-Sep-2007
  • (2006)Finding trees from unordered 0–1 dataProceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases10.5555/2089856.2089878(175-186)Online publication date: 18-Sep-2006
  • (2006)Algorithms for discovering bucket orders from dataProceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining10.1145/1150402.1150468(561-566)Online publication date: 20-Aug-2006
  • (2006)Discovering Frequent Closed Partial Orders from StringsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2006.17218:11(1467-1481)Online publication date: 1-Nov-2006
  • (2006)Discovering Partial Orders in Binary DataProceedings of the Sixth International Conference on Data Mining10.1109/ICDM.2006.57(510-521)Online publication date: 18-Dec-2006
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media