research-article

Beyond itemsets: mining frequent featuresets over structured items

Editors: Chen Li, Volker Markl Authors:

Saravanan Thirumuruganathan,

Habibur Rahman,

Gautam DasAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 8, Issue 3

Pages 257 - 268

https://doi.org/10.14778/2735508.2735515

Published: 01 November 2014 Publication History

Abstract

We assume a dataset of transactions generated by a set of users over structured items where each item could be described through a set of features. In this paper, we are interested in identifying the frequent featuresets (set of features) by mining item transactions. For example, in a news website, items correspond to news articles, the features are the named-entities/topics in the articles and an item transaction would be the set of news articles read by a user within the same session. We show that mining frequent featuresets over structured item transactions is a novel problem and show that straightforward extensions of existing frequent itemset mining techniques provide unsatisfactory results. This is due to the fact that while users are drawn to each item in the transaction due to a subset of its features, the transaction by itself does not provide any information about such underlying preferred features of users. In order to overcome this hurdle, we propose a featureset uncertainty model where each item transaction could have been generated by various featuresets with different probabilities. We describe a novel approach to transform item transactions into uncertain transaction over featuresets and estimate their probabilities using constrained least squares based approach. We propose diverse algorithms to mine frequent featuresets. Our experimental evaluation provides a comparative analysis of the different approaches proposed.

References

[1]

C. C. Aggarwal, Y. Li, J. Wang, and J. Wang. Frequent pattern mining with uncertain data. In SIGKDD, pages 29--38. ACM, 2009.

Digital Library

[2]

C. C. Aggarwal and P. S. Yu. A survey of uncertain data algorithms and applications. IEEE TKDE, 2009.

Digital Library

[3]

R. Agrawal, T. Imieliński, and A. Swami. Mining association rules between sets of items in large databases. In SIGMOD, pages 207--216, New York, NY, USA, 1993.

Digital Library

[4]

R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In VLDB, 1994.

Digital Library

[5]

L. Antova, T. Jansen, C. Koch, and D. Olteanu. Fast and simple relational processing of uncertain data. In ICDE, pages 983--992. IEEE, 2008.

Digital Library

[6]

O. Benjelloun, A. D. Sarma, A. Halevy, and J. Widom. Uldbs: Databases with uncertainty and lineage. In VLDB, 2006.

Digital Library

[7]

T. Bernecker, H.-P. Kriegel, M. Renz, F. Verhein, and A. Zuefle. Probabilistic frequent itemset mining in uncertain databases. In SIGKDD, pages 119--128. ACM, 2009.

Digital Library

[8]

Å. Björck. Numerical Methods for Least Squares Problems. SIAM, Philadelphia, 1996.

[9]

C. K. Chui, B. Kao, and E. Hung. Mining frequent itemsets from uncertain data. In PAKDD, pages 47--58, 2007.

Digital Library

[10]

N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. VLDBJ, 16(4): 523--544, 2007.

Digital Library

[11]

P. Damaschke. The union of minimal hitting sets: parameterized combinatorial bounds and counting. In STACS, pages 332--343, 2007.

Digital Library

[12]

M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., 1979.

Digital Library

[13]

R. Gupta, G. Fang, B. Field, M. Steinbach, and V. Kumar. Quantitative evaluation of approximate frequent pattern mining algorithms. In KDD, pages 301--309, 2008.

Digital Library

[14]

T. Hofmann. Learning what people (don't) want. In EMCL, EMCL '01, pages 214--225, 2001.

Digital Library

[15]

C. K.-S. Leung, M. A. F. Mateo, and D. A. Brajczuk. A tree-based approach for frequent pattern mining from uncertain data. In AKDDM. Springer, 2008.

Digital Library

[16]

R. Reiter. A theory of diagnosis from first principles. Artif. Intell., 32(1): 57--95, Apr. 1987.

Digital Library

[17]

P. Sen, A. Deshpande, and L. Getoor. Representing tuple and attribute uncertainty in probabilistic databases. In Data Mining Workshops, ICDM, pages 507--512. IEEE, 2007.

Digital Library

[18]

L. Sun, R. Cheng, D. W. Cheung, and J. Cheng. Mining uncertain data with probabilistic guarantees. In SIGKDD, pages 273--282. ACM, 2010.

Digital Library

[19]

H. Toivonen. Sampling large databases for association rules. In VLDB, pages 134--145, 1996.

Digital Library

[20]

Y. Tong, L. Chen, Y. Cheng, and P. S. Yu. Mining frequent itemsets over uncertain databases. VLDB, 2012.

Digital Library

[21]

P. Venetis, G. Koutrika, and H. Garcia-Molina. On the selection of tags for tag clouds. In WSDM, 2011.

Digital Library

[22]

Q. Zhang, F. Li, and K. Yi. Finding frequent items in probabilistic data. SIGMOD '08, 2008.

Digital Library

Cited By

Jansen BJung SSalminen J(2024)Finetuning Analytics Information Systems for a Better Understanding of Users: Evidence of Personification Bias on Multiple Digital ChannelsInformation Systems Frontiers10.1007/s10796-023-10395-526:2(775-798)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1007/s10796-023-10395-5

Recommendations

Reference itemsets: useful itemsets to approximate the representation of frequent itemsets

Deriving frequent itemsets from databases is an important research issue in data mining. The number of frequent itemsets may be unusually large when a low minimum support threshold is given. As such, the design of a compact representation to compress ...
Efficient algorithms for mining high-utility itemsets in uncertain databases

High-utility itemset mining (HUIM) is a useful set of techniques for discovering patterns in transaction databases, which considers both quantity and profit of items. However, most algorithms for mining high-utility itemsets (HUIs) assume that the ...
Hiding co-occurring frequent itemsets
EDBT/ICDT '09: Proceedings of the 2009 EDBT/ICDT Workshops

Knowledge hiding, hiding rules/patterns that are inferable from published data and attributed sensitive, is extensively studied in the literature in the context of frequent itemsets and association rules mining from transactional data. The research in ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 8, Issue 3

November 2014

144 pages

ISSN:2150-8097

Editors:
Chen Li
University of California, Irvine
,
Volker Markl
TU Berlin

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 November 2014

Published in PVLDB Volume 8, Issue 3

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
60
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 09 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Jansen BJung SSalminen J(2024)Finetuning Analytics Information Systems for a Better Understanding of Users: Evidence of Personification Bias on Multiple Digital ChannelsInformation Systems Frontiers10.1007/s10796-023-10395-526:2(775-798)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1007/s10796-023-10395-5

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents