Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Beyond itemsets: mining frequent featuresets over structured items

Published: 01 November 2014 Publication History

Abstract

We assume a dataset of transactions generated by a set of users over structured items where each item could be described through a set of features. In this paper, we are interested in identifying the frequent featuresets (set of features) by mining item transactions. For example, in a news website, items correspond to news articles, the features are the named-entities/topics in the articles and an item transaction would be the set of news articles read by a user within the same session. We show that mining frequent featuresets over structured item transactions is a novel problem and show that straightforward extensions of existing frequent itemset mining techniques provide unsatisfactory results. This is due to the fact that while users are drawn to each item in the transaction due to a subset of its features, the transaction by itself does not provide any information about such underlying preferred features of users. In order to overcome this hurdle, we propose a featureset uncertainty model where each item transaction could have been generated by various featuresets with different probabilities. We describe a novel approach to transform item transactions into uncertain transaction over featuresets and estimate their probabilities using constrained least squares based approach. We propose diverse algorithms to mine frequent featuresets. Our experimental evaluation provides a comparative analysis of the different approaches proposed.

References

[1]
C. C. Aggarwal, Y. Li, J. Wang, and J. Wang. Frequent pattern mining with uncertain data. In SIGKDD, pages 29--38. ACM, 2009.
[2]
C. C. Aggarwal and P. S. Yu. A survey of uncertain data algorithms and applications. IEEE TKDE, 2009.
[3]
R. Agrawal, T. Imieliński, and A. Swami. Mining association rules between sets of items in large databases. In SIGMOD, pages 207--216, New York, NY, USA, 1993.
[4]
R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In VLDB, 1994.
[5]
L. Antova, T. Jansen, C. Koch, and D. Olteanu. Fast and simple relational processing of uncertain data. In ICDE, pages 983--992. IEEE, 2008.
[6]
O. Benjelloun, A. D. Sarma, A. Halevy, and J. Widom. Uldbs: Databases with uncertainty and lineage. In VLDB, 2006.
[7]
T. Bernecker, H.-P. Kriegel, M. Renz, F. Verhein, and A. Zuefle. Probabilistic frequent itemset mining in uncertain databases. In SIGKDD, pages 119--128. ACM, 2009.
[8]
Å. Björck. Numerical Methods for Least Squares Problems. SIAM, Philadelphia, 1996.
[9]
C. K. Chui, B. Kao, and E. Hung. Mining frequent itemsets from uncertain data. In PAKDD, pages 47--58, 2007.
[10]
N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. VLDBJ, 16(4): 523--544, 2007.
[11]
P. Damaschke. The union of minimal hitting sets: parameterized combinatorial bounds and counting. In STACS, pages 332--343, 2007.
[12]
M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., 1979.
[13]
R. Gupta, G. Fang, B. Field, M. Steinbach, and V. Kumar. Quantitative evaluation of approximate frequent pattern mining algorithms. In KDD, pages 301--309, 2008.
[14]
T. Hofmann. Learning what people (don't) want. In EMCL, EMCL '01, pages 214--225, 2001.
[15]
C. K.-S. Leung, M. A. F. Mateo, and D. A. Brajczuk. A tree-based approach for frequent pattern mining from uncertain data. In AKDDM. Springer, 2008.
[16]
R. Reiter. A theory of diagnosis from first principles. Artif. Intell., 32(1): 57--95, Apr. 1987.
[17]
P. Sen, A. Deshpande, and L. Getoor. Representing tuple and attribute uncertainty in probabilistic databases. In Data Mining Workshops, ICDM, pages 507--512. IEEE, 2007.
[18]
L. Sun, R. Cheng, D. W. Cheung, and J. Cheng. Mining uncertain data with probabilistic guarantees. In SIGKDD, pages 273--282. ACM, 2010.
[19]
H. Toivonen. Sampling large databases for association rules. In VLDB, pages 134--145, 1996.
[20]
Y. Tong, L. Chen, Y. Cheng, and P. S. Yu. Mining frequent itemsets over uncertain databases. VLDB, 2012.
[21]
P. Venetis, G. Koutrika, and H. Garcia-Molina. On the selection of tags for tag clouds. In WSDM, 2011.
[22]
Q. Zhang, F. Li, and K. Yi. Finding frequent items in probabilistic data. SIGMOD '08, 2008.

Cited By

View all
  • (2024)Finetuning Analytics Information Systems for a Better Understanding of Users: Evidence of Personification Bias on Multiple Digital ChannelsInformation Systems Frontiers10.1007/s10796-023-10395-526:2(775-798)Online publication date: 1-Apr-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 8, Issue 3
November 2014
144 pages
ISSN:2150-8097
  • Editors:
  • Chen Li,
  • Volker Markl
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 November 2014
Published in PVLDB Volume 8, Issue 3

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Finetuning Analytics Information Systems for a Better Understanding of Users: Evidence of Personification Bias on Multiple Digital ChannelsInformation Systems Frontiers10.1007/s10796-023-10395-526:2(775-798)Online publication date: 1-Apr-2024

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media