Abstract
The task of measuring the dependence between terms is computationally expensive for IR systems which have to deal with large and sparse datasets. The current approaches to mining frequent term sets are based on the enumeration of the term sets found in a set of documents and on monotonicity, the latter being the property that a term set is frequent only if all its subsets are frequent as implemented by Apriori. However, the computational time can be very large. An alternative approach is to store the dataset in a FPT and to visit and prune the tree in a recursive way as implemented by FPGrowth. However, the storage space can still be very large. We introduce the BWI as a conceptual enhancement of monotonicity to predict with certainty when an itemset is frequent and when it is infrequent. We describe the empirical validation that the BWI can significantly reduce both the computational time of Apriori and the storage space of pattern tree-based algorithms such as FPGrowth. The empirical validation has been performed using some runs produced by IR systems from the TIPSTER test collection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson International Edition (2006)
Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of SIGMOD, Washington, D.C., pp. 207–216 (1993)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of SIGMOD, pp. 1–12 (2000)
Pitowsky, I.: Correlation polytopes: Their geometry and complexity. Mathematical Programming 50, 395–414 (1991)
Pitowsky, I.: Quantum Probability - Quantum Logic. Springer (1989)
Blanco, R., Boldi, P.: Extending BM25 with multiple query operators. In: Proceedings of SIGIR, pp. 921–930 (2012)
Han, J., Cheng, H., Xin, D., Yan, X.: Frequent pattern mining: current status and future directions. Data Mining and Knowledge Discovery 15, 55–86 (2007)
Kirsch, A., Mitzenmacher, M., Pietracaprina, A., Pucci, G., Upfal, E., Vandin, F.: An efficient rigorous approach for identifying statistically significant frequent itemsets. Journal of the ACM 59(3) (2012)
Wang, K., He, Y., Han, J.: Mining frequent itemsets using support constraints. In: Proceedings of VLDB (2000)
Burdick, D., Calimlim, M., Flannick, J., Gehrke, J., Yiu, T.: MAFIA: A maximal frequent itemset algorithm. IEEE Transactions on Knowledge and Data Engineering 11, 1490–1504 (2005)
Gouda, K., Zaki, M.J.: Efficiently mining maximal frequent itemsets. In: Proceedings of ICDM (2001)
Zheng, Z., Kohavi, R., Mason, L.: Real world performance of association rule algorithms. In: Proceedings of KDD, pp. 401–406. ACM New York (2001)
Liu, J., Pan, Y., Wang, K., Han, J.: Mining frequent item sets by opportunistic projection. In: Proceedings of KDD, pp. 229–238. ACM, New York (2002)
Pei, J., Han, J., Lu, H., Nishio, S., Tang, S., Yang, D.: H-mine: Hyper-structure mining of frequent patterns in large databases. In: Proceedings of ICDM, pp. 441–448. IEEE Computer Society, Washington, DC (2001)
Pietracaprina, A., Zandolin, D.: Mining frequent itemsets using patricia tries. In: Goethals, B., Zaki, M.J. (eds.) FIMI. CEUR Workshop Proceedings, vol. 90. CEUR-WS.org (2003)
Schlegel, B., Gemulla, R., Lehner, W.L.W.: Memory-efficient frequent-itemset mining. In: Proceedings of EDBT, pp. 461–472 (2011)
Pôssas, B., Ziviani, N., Meira Jr, W., Ribeiro-Neto, B.: Set-based vector model: An efficient approach for correlation-based ranking. ACM Trans. Inf. Syst. 23(4), 397–429 (2005)
Amir, A., Aumann, Y., Feldman, R., Fresko, M.: Maximal association rules: A tool for mining associations in text. J. Intell. Inf. Syst. 25(3), 333–345 (2005)
Sarawagi, S., Thomas, S., Agrawal, R.: Integrating association rule mining with relational database systems: Alternatives and implications. Data Min. Knowl. Discov. 4(2–3), 89–125 (2000)
Fonseca, B.M., Golgher, P., Pôssas, B., Ribeiro-Neto, B., Ziviani, N.: Concept-based interactive query expansion. In: Proceedings of CIKM, CIKM 2005, pp. 696–703. ACM, New York (2005)
Fonseca, B.M., Golgher, P.B., De Moura, E.S., Pôssas, B., Ziviani, N.: Discovering search engine related queries using association rules. J. Web Eng. 2(4), 215–227 (2003)
Song, D., Huang, Q., Rüger, S.M., Bruza, P.D.: Facilitating Query Decomposition in Query Language Modeling by Association Rule Mining Using Multiple Sliding Windows. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 334–345. Springer, Heidelberg (2008)
Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comput. Surv. 38(2) (2006)
Robertson, S., Zaragoza, H.: The probabilistic relevance framework: BM25 and beyond. Foundations and Trends in Information Retrieval 3(4), 333–389 (2009)
Keyword Discovery. http://www.keyworddiscovery.com/keyword-stats.html (visited on April 2014)
Bendersky, M., Croft, W.B.: Analysis of long queries in a large scale search log. In: Proceedings of the Workshop on Web Search Click Data, WSCD 2009, pp. 8–14. ACM, New York (2009)
Gan, Q., Attenberg, J., Markowetz, A., Suel, T.: Analysis of geographic queries in a search engine log. In: Proceedings of the International Workshop on Location and the Web, LOCWEB 2008, pp. 49–56. ACM New York (2008)
Jansen, B.J., Spink, A.: How are we searching the world wide web?: a comparison of nine search engine transaction logs. Inf. Process. Manage. 42, 248–263 (2006)
Jansen, B.J., Booth, D.L., Spink, A.: Determining the user intent of Web search engine queries. In: Proceedings of WWW, pp. 1149–1150. ACM, New York (2007)
Jansen, B.J., Booth, D.L., Spink, A.: Determining the informational, navigational, and transactional intent of Web queries. Inf. Process. Manage. 44, 1251–1266 (2008)
Jansen, B.J., Booth, D.L., Spink, A.: Patterns of query reformulation during Web searching. Journal of the American Society for Information Science and Technology 60, 1358–1371 (2009)
Huston, S., Croft, W.B.: Evaluating verbose query processing techniques. In: Proceedings of SIGIR, pp. 291–298. ACM, New York (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Melucci, M. (2015). Efficient Term Set Prediction Using the Bell-Wigner Inequality. In: Iliopoulos, C., Puglisi, S., Yilmaz, E. (eds) String Processing and Information Retrieval. SPIRE 2015. Lecture Notes in Computer Science(), vol 9309. Springer, Cham. https://doi.org/10.1007/978-3-319-23826-5_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-23826-5_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23825-8
Online ISBN: 978-3-319-23826-5
eBook Packages: Computer ScienceComputer Science (R0)