Abstract
Extracting frequent subtrees from the tree structured data has important applications in Web mining. In this paper, we introduce a novel canonical form for rooted labelled unordered trees called the balanced-optimal-search canonical form (BOCF) that can handle the isomorphism problem efficiently. Using BOCF, we define a tree structure guided scheme based enumeration approach that systematically enumerates only the valid subtrees. Finally, we present the balanced optimal search tree miner(BOSTER) algorithm based on BOCF and the proposed enumeration approach, for finding frequent induced subtrees from a database of labelled rooted unordered trees. Experiments on the real datasets compare the efficiency of BOSTER over the two state-of-the-art algorithms for mining induced unordered subtrees, HybridTreeMiner and UNI3. The results are encouraging.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Pei, J., Han, J., Mortazavi-asl, B., Zhu, H.: Mining Access Patterns Efficiently from Web Logs. In: Terano, T., Liu, H., Chen, A.L.P. (eds.) PAKDD 2000. LNCS, vol. 1805, pp. 396–407. Springer, Heidelberg (2000)
Zaki, M.J., Aggarwal, C.C.: XRules: An Effective Structural Classifier for XML Data. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 316–325. ACM, Washington, D. C. (2003)
Wang, Y., DeWitt, D.J., Cai, J.-Y.: X-Diff: An Effective Change Detection Algorithm for XML Documents. In: Proceedings of the 19th International Conference on Data Engineering, pp. 519–530. IEEE, Vienna (2003)
Luccio, F., Enriquez, A.M., Rieumont, P.O., Pagli, L.: Exact Rooted Subtree Matching in Sublinear Time. Universita Di Pisa Technical Report TR-01 (2001)
Asai, T., Arimura, H., Uno, T., Nakano, S.-I.: Discovering Frequent Substructures in Large Unordered Trees. Springer, Heidelberg (2003)
Nijssenm, S., Kok, J.N.: Efficient Discovery of Frequent Unordered Trees. In: First International Workshop on Mining Graphs, Trees and Sequences. Springer, Heidelberg (2003)
Chi, Y., Yang, Y., Muntz, R.R.: Canonical Forms for Labelled Trees and Their Applications in Frequent Subtree Mining. Knowledge and Information System 8(2), 203–234 (2005)
Chi, Y., Yang, Y., Muntz, R.R.: HybridTreeMiner: An Efficient Algorithm for Mining Frequent Rooted Trees and Free Trees Using Canonical Forms. In: Proceedings of the 16th International Conference on Scientific and Statistical Database Management, pp. 11–20. IEEE, Santorini (2004)
Chehreghani, M.H.: Efficiently Mining Unordered Trees. In: Proceedings of the 11th IEEE International Conference on Data Mining, Vancouver, BC, pp. 111–120 (2011)
Hadzic, F., Tan, H., Dillon, T.S.: UNI3 - Efficient Algorithm for Mining Unordered Induced Subtrees Using TMG Candidate Generation. In: Proceedings of the 1st IEEE Symposium on Computational Intelligence and Data Mining, Honolulu, Hawaii, pp. 568–575 (2007)
Chowdhury, I.J., Nayak, R.: A Novel Method for Finding Similarities between Unordered Trees Using Matrix Data Model. In: Lin, X., Manolopoulos, Y., Srivastava, D., Huang, G. (eds.) WISE 2013, Part I. LNCS, vol. 8180, pp. 421–430. Springer, Heidelberg (2013)
Valiente. Algorithms on Trees and Graphs. Springer, Heidelberg (2002)
Scholl, A.: Balancing and Sequencing of Assembly Lines. Physica-Verlag, Heidelberg (1999)
Zaki, M.J.: Efficiently Mining Frequent Trees in A Forest: Algorithms and Applications. IEEE Transactions on Knowledge and Data Engineering 17(8), 1021–1035 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Chowdhury, I.J., Nayak, R. (2014). BOSTER: An Efficient Algorithm for Mining Frequent Unordered Induced Subtrees. In: Benatallah, B., Bestavros, A., Manolopoulos, Y., Vakali, A., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2014. WISE 2014. Lecture Notes in Computer Science, vol 8786. Springer, Cham. https://doi.org/10.1007/978-3-319-11749-2_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-11749-2_12
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11748-5
Online ISBN: 978-3-319-11749-2
eBook Packages: Computer ScienceComputer Science (R0)