Abstract
Information retrieval encounters a migration from the traditional paradigm (returning an ordered list of responses) to the aggregate search paradigm (grouping the most comprehensive and relevant answers into one final aggregated document). Nowadays extensible markup language (XML) is an important standard of information exchange and representation. Usually the tree representation of documents and queries is used to process them. It allows to consider the XML documents retrieval as a tree matching problem between the document trees and the query tree. Several paradigms for retrieving XML documents have been proposed in the literature but only a few of them try to aggregate a set of XML documents in order to provide more significant answers for a given query. In this paper, we propose and evaluate an aggregated search method to obtain the most accurate and richest answers in XML fragment search. Our search method is based on the Top-k Approximate Subtree Matching (TASM) algorithm and a new similarity function is proposed to improve the returned fragments. Then an aggregation process is presented to generate a single aggregate response containing the most relevant, exhaustive and non-redundant information given by the fragments. The method is evaluated on two real world datasets. Experimentations show that it generates good results in terms of relevance and quality.
This work is partially funded by the French National Agency of Research project: Contextual and Aggregated Information Retrieval (ANR-14-CE23-0006).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
W3C XML web page. http://www.w3.org/XML/
Arguello, J.: Improving aggregated search coherence. In: Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds.) ECIR 2015. LNCS, vol. 9022, pp. 25–36. Springer, Cham (2015). doi:10.1007/978-3-319-16354-3_3
Arguello, J., Capra, R.: The effect of aggregated search coherence on search behavior. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM 2012, New York, NY, USA, pp. 1293–1302. ACM (2012)
Arguello, J., Diaz, F., Callan, J., Carterette, B.: A methodology for evaluating aggregated search results. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 141–152. Springer, Heidelberg (2011). doi:10.1007/978-3-642-20161-5_15
Augsten, N., Barbosa, D., BÃűhlen, M., Palpanas, T.: TASM: top-k approximate subtree matching. In: 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010), pp. 353–364, March 2010
Bessai-Mechmache, F.Z., Alimazighi, Z.: Aggregated search in XML documents. J. Emerg. Technol. Web Intell. 4(2), 181–188 (2012)
Carmel, D., Maarek, Y.S., Mandelbrod, M., Mass, Y., Soffer, A.: Searching XML documents via XML fragments. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2003, pp. 151–158. ACM, New York (2003)
Dunlavy, D.M., OâĂŹLeary, D.P., Conroy, J.M., Schlesinger, J.D.: QCS: a system for querying, clustering and summarizing documents. Inf. Process. Manag. 43(6), 1588–1605 (2007)
Géry, M., Largeron, C., Thollard, F.: Probabilistic document model integrating xml structure. In: Proceedings in INEX, pp. 139–149 (2007)
Huang, Y., Liu, Z., Chen, Y.: Query biased snippet generation in xml search. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, pp. 315–326. ACM, New York (2008)
Kaptein, R., Marx, M.: Focused retrieval and result aggregation with political data. Inf. Retrieval 13(5), 412–433 (2010)
Kopliku, A., Pinel-Sauvagnat, K., Boughanem, M.: Aggregated search: a new information retrieval paradigm. ACM Comput. Surv. 46(3), 41:1–41:31 (2014)
Kuhn, H.W.: The hungarian method for the assignment problem. Naval Res. Logistics Q. 2(1–2), 83–97 (1955)
Lalmas, M.: Aggregated search. In: Melucci, M., Baeza-Yates, R. (eds.) Advanced Topics in Information Retrieval. The Information Retrieval Series, vol. 33, pp. 109–123. Springer, Heidelberg (2011). doi:10.1007/978-3-642-20946-8_5
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. In: Soviet Physics Doklady, vol. 10, pp. 707–710 (1966)
Mass, Y., Mandelbrod, M.: Retrieving the most relevant xml components. In: INEX 2003 Workshop Proceedings, p. 58. Citeseer (2003)
Mendoza, M., Bonilla, S., Noguera, C., Cobos, C., León, E.: Extractive single-document summarization based on genetic operators and guided local search. Expert Syst. Appl. 41(9), 4158–4169 (2014)
Murdock, V., Lalmas, M.: Workshop on aggregated search. SIGIR Forum 42(2), 80–83 (2008)
Myers, E.W.: An O(ND) difference algorithm and its variations. Algorithmica 1(1), 251–266 (1986)
Naffakhi, N., Faiz, R.: Aggregated search in XML documents: what to retrieve? In: 2012 International Conference on Information Technology and e-Services, pp. 1–6, March 2012
Paris, C., Wan, S., Thomas, P.: Focused and aggregated search: a perspective from natural language generation. Inf. Retrieval 13(5), 434–459 (2010)
Qumsiyeh, R., Qumsiyeh, R., Ng, Y.-K., Ng, Y.-K.: Searching web documents using a summarization approach. Int. J. Web Inf. Syst. 12(1), 83–101 (2016)
Radev, D., Otterbacher, J., Winkel, A., Blair-Goldensohn, S.: Newsinessence: summarizing online news topics. Commun. ACM 48(10), 95–98 (2005)
Sauvagnat, K., Hlaoua, L., Boughanem, M.: XFIRM at INEX 2005: ad-hoc and relevance feedback tracks. In: Fuhr, N., Lalmas, M., Malik, S., Kazai, G. (eds.) INEX 2005. LNCS, vol. 3977, pp. 88–103. Springer, Heidelberg (2006). doi:10.1007/978-3-540-34963-1_7
Schlieder, T., Meuss, H.: Result ranking for structured queries against xml documents. In: DELOS Workshop Information Seeking, Searching and Querying in Digital Libraries, Zurich, Switzerland (2000)
Tai, K.-C.: The tree-to-tree correction problem. J. ACM 26(3), 422–433 (1979)
Theobald, M., Schenkel, R., Weikum, G.: TopX and XXL at INEX 2005. In: Fuhr, N., Lalmas, M., Malik, S., Kazai, G. (eds.) INEX 2005. LNCS, vol. 3977, pp. 282–295. Springer, Heidelberg (2006). doi:10.1007/978-3-540-34963-1_21
Tufte, K., Maier, D.: Aggregation and accumulation of XML data. IEEE Data Eng. Bull. 24(2), 34–39 (2001)
Tufte, K., Maier, D.: Merge as a lattice-join of xml documents. In: 28th International Conference on VLDB (2002)
Turpin, L., Kelly, D., Arguello, J.: To blend or not to blend? Perceptual speed, visual memory and aggregated search. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2016, pp. 1021–1024. ACM, New York (2016)
Wei, W., Liu, M., Li, S.: Merging of XML documents. In: Atzeni, P., Chu, W., Lu, H., Zhou, S., Ling, T.-W. (eds.) ER 2004. LNCS, vol. 3288, pp. 273–285. Springer, Heidelberg (2004). doi:10.1007/978-3-540-30464-7_22
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Habi, A., Effantin, B., Kheddouci, H. (2017). Search and Aggregation in XML Documents. In: Benslimane, D., Damiani, E., Grosky, W., Hameurlain, A., Sheth, A., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2017. Lecture Notes in Computer Science(), vol 10438. Springer, Cham. https://doi.org/10.1007/978-3-319-64468-4_22
Download citation
DOI: https://doi.org/10.1007/978-3-319-64468-4_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64467-7
Online ISBN: 978-3-319-64468-4
eBook Packages: Computer ScienceComputer Science (R0)