Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/996350.996377acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
Article

The effectiveness of automatically structured queries in digital libraries

Published: 07 June 2004 Publication History

Abstract

Structured or fielded metadata is the basis for many digital library services, including searching and browsing. Yet, little is known about the impact of using structure on the effectiveness of such services. In this paper, we investigate a key research question: do structured queries improve effectiveness in DL searching? To answer this question, we empirically compared the use of unstructured queries to the use of structured queries. We then tested the capability of a simple Bayesian network system, built on top of a DL retrieval engine, to infer the best structured queries from the keywords entered by the user. Experiments performed with 20 subjects working with a DL containing a large collection of computer science literature clearly indicate that structured queries, either manually constructed or automatically generated, perform better than their unstructured counterparts, in the majority of cases. Also, automatic structuring of queries appears to be an effectiveand viable alternative to manual structuring that may significantly reduce the burden on users.

References

[1]
S. Acid, L. M. de Campos, J. M.Fernández-Luna, and J. F. Huete An information retrieval model based on simple Bayesian networks International Journal of Intelligent Systems, 18(2):251--265, January 2003.
[2]
S. Agrawal, S. Chaudhuri, and G. Das DBXplorer: A system for keyword--based search over relational databases. In Proceedings of the 18th International Conference on Data Engineering, pages 5--16, San Jose, CA, USA, February 2002.
[3]
R. Baeza-Yates and B. Ribeiro-Neto Modern Information Retrieval Addison Wesley, New York, NY, USA, 1999.
[4]
M. Baldonado, S. Katz, A. Paepcke, C-C. K. Chang, H. Garcia-Molina, and T Winograd An extensible constructor tool for the rapid, interactive design of query synthesizers. In DL'98: Proceedings of the 3rd ACM International Conference on Digital Libraries, pages 19--28, Pittsburgh, PA, USA, June 1998.
[5]
M. Baldonado and T. Winograd Sensemaker: An information-exploration interface supporting the contextual evolution of a user's interests. In Proceedings of ACM CHI 97 Conference on Human Factors in Computing Systems, pages 11--18, Atlanta, GA, USA, March 1997.
[6]
D. Cai, C. J. Van Rijsbergen, and J. M. Jose Automatic query expansion based on divergence. In Proceedings of the 10th International Conference on Information and Knowledge Management CIKM'01, pages 419--426, New York, November 2001.
[7]
P. Calado, M. Cristo, E. Moura, N. Ziviani, B. Ribeiro-Neto, and M. A. Gonçalves Combining link-based and content-based methods for web document classification In Proceedings of the 12th International Conference on Information and Knowledge Management, pages 394--401, New Orleans, LA, USA, 2003.
[8]
P. Calado, A. S. da Silva, R. C. Vieira, A. H. F. Laender, and B. A. Ribeiro-Neto Searching web databases by structuring keyword-based queries. In Proceedings of the 11th International Conference on Information and Knowledge Management, pages 26--33, McLean, VA, USA, 2002 ACM Press.
[9]
J. P. Callan Document filtering with inference networks. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 262--269, Zurich, Switzerland, August 1996.
[10]
F. Can, R Nuray, and A. B. Sevdik Automatic perfomance evaluation of Web search engines. Information Processing and Management, 2004 In press.
[11]
T. T. Chinenyanga and N. Kushmerick Expressive retrieval from XML documents. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 163--171, New Orleans, Louisiana, USA, September 2001.
[12]
G. V. Cormack, C. R. Palmer, and C. L. A. Clarke Efficient construction of large test collections. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 282--289, Melbourne, Australia, August 1998.
[13]
S. B. Cousins, A. Paepcke, T. Winograd, E. A. Bier, and K. Pier, The digital library integrated task environment (DLITE) In DL'97: Proceedings of the 2nd ACM International Conference on Digital Libraries, pages 142--151, Philadelphia, PA, USA, July 1997.
[14]
W. B. Croft, H. R. Turtle, and D. D. Lewis, The use of phrases and structured queries in information retrieval. In Proceedings of the 13th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 32--45, Chicago, IL, USA, October 1991.
[15]
A. S. da Silva, P. Calado, R. C. Vieira, A. H. F. Laender, and B. A. Ribeiro-Neto Effective Databases for Text & Document Management, chapter Keyword-based Queries over Web Databases, pages 74--92 Idea Group Publishing, Hershey, PA, USA, 2003.
[16]
S. Dar, G. Entin, S. Geva, and E Palmon DTL's DataSpot: Database exploration using plain language. In Proceedings of 24th International Conference on Very Large Data Bases VLBD'98, pages 645--649, New York, NY, USA, August 1998.
[17]
L. M. de Campos, J. M. Fernández-Luna, and J. F. Huete Query Expansion in Information Retrieval Systems Using a Bayesian Network-Based Thesaurus In Proceedings of the 14th Annual Conference on Uncertainty in Artificial Intelligence (UAI-98), pages 53--60, San Francisco, CA, July 1998.
[18]
S. T. Dumais, J. Platt, D. Hecherman, and M. Sahami Inductive learning algorithms and representations for text categorization. In Proceedings of the 7th International Conference on Information and Knowledge Management CIKM'98, pages 148--155, Bethesda, Maryland, USA, November 1998.
[19]
W. Fan, M. D. Gordon, and P. Pathak Discovery of context--specific ranking functions for effective information retrieval using genetic programming IEEE Transactions on Knowledge and Data Engineering, 16(4):523--527, 2003.
[20]
D. Florescu, D. Kossmann, and I. Manolescu Integrating keyword search into XML query processing WWW9/Computer Networks, 33(1-6):119--135, 2000.
[21]
E. A. Fox Relational Models of the Lexicon: Representing Knowledge in Semantic Networks, chapter Improved Retrieval Using a Relational Thesaurus for Automatic Expansion of Boolean Logic Queries, pages 199--210 Cambridge University Press, 1988.
[22]
E. A. Fox and F. D. Neves Extending retrieval with stepping stones and pathways -- NSF proposal (funded), 2003.
[23]
N. Fuhr and K. Gross XIRQL: a query language for information retrieval in XML documents. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 172--180, New Orleans, Louisiana, USA, September 2001.
[24]
D. Haines and W. B. Croft Relevance feedback and inference networks. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2--11, Pittsburgh, PA, USA, June 1993.
[25]
M. Mitra, A. Singhal, and C. Buckley Improving automatic query expansion In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 206--214, Melbourne, Australia, August 1998.
[26]
S. H. Myaeng, D-H. Jang, M-S. Kim, and Z.-C. Zhoo A exible model for retrieval of SGML documents. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 138--145, Melbourne, Australia, August 1998.
[27]
G. Navarro and R. Baeza-- Yates Proximal nodes: A model to query document databases by content and structure ACM Transactions on Information Systems, 15(4):400--435, Oct 1997.
[28]
J. Pearl Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference Morgan Kaufmann Publishers, San Mateo, California, 2nd edition, 1988.
[29]
B. Ribeiro-Neto and R. Muntz A. belief network model for IR. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 253--260, Zurich, Switzerland, August 1996.
[30]
G. Salton, C. Buckley, and E. A. Fox Automatic query formulations in information retrieval Journal of the American Society for Information Science, 34(4):262--280, July 1983.
[31]
G. Salton and M. J. McGill Introduction to Modern Information Retrieval McGraw--Hill, Tokio, 1983.
[32]
T. Schlieder and H. Meuss Querying and ranking XML documents JASIST, 53(6):489--503, 2002.
[33]
D. Shin, S. Nam, and M. Kim Hypertext construction using statistical and semantic similarity. In DL'97: Proceedings of the 2nd ACM International Conference on Digital Libraries, pages 57--63, Philadelphia, PA, USA, July 1997.
[34]
I. Silva, B. Ribeiro-Neto, P. Calado, E. Moura, and N. Ziviani Link-based and content-based evidential information in a belief network model. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Theory and Practice in Information Retrieval, pages 96--103, Athens, Greece, July 2000.
[35]
A. Theobald and G Weikum Adding Relevance to XML In Int'l Workshop on the Web and Databases (WebDB), Dallas, TX, May 2000.
[36]
H. R. Turtle and W B Croft Inference networks for document retrieval In Proceedings of the 13th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1--24, Brussels, Belgium, September 1990.
[37]
R. F. Valle, B. A. Ribeiro-Neto, L. R. S. de Lima, A. H. F. Laender, and H. R. Freitas-Junior Improving text retrieval in medical collections through automatic categorization In Proceedings of the 10th International Symposium on String Processing and Information Retrieval SPIRE 2003, pages 197--210, Manaus, Brazil, October 2003.
[38]
E. M. Voorhees Query expansion using lexical-semantic relations In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 61--69, Dublin, Ireland, July 1994.
[39]
E. M. Voorhees and D Harman Overview of the sixth text REtrieval conference (TREC-6) Nov 1997.
[40]
J. Zobel How reliable are the results of large-scale information retrieval experiments? In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 307--314, Melbourne, Australia, August 1998.

Cited By

View all
  • (2024)Improving search and rescue planning and resource allocation through case-based and concept-based retrievalJournal of Intelligent Information Systems10.1007/s10844-024-00861-0Online publication date: 1-Jun-2024
  • (2016)Results of a digital library curriculum field testInternational Journal on Digital Libraries10.1007/s00799-015-0151-517:4(273-286)Online publication date: 1-Nov-2016
  • (2012)Theoretical Foundations for Digital Libraries: The 5S (Societies, Scenarios, Spaces, Structures, Streams) ApproachSynthesis Lectures on Information Concepts, Retrieval, and Services10.2200/S00434ED1V01Y201207ICR0224:2(1-180)Online publication date: 30-Jul-2012
  • Show More Cited By

Index Terms

  1. The effectiveness of automatically structured queries in digital libraries

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      JCDL '04: Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
      June 2004
      440 pages
      ISBN:1581138326
      DOI:10.1145/996350
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 07 June 2004

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. bayesian networks
      2. digital libraries
      3. structured queries

      Qualifiers

      • Article

      Conference

      JCDL04

      Acceptance Rates

      JCDL '04 Paper Acceptance Rate 61 of 249 submissions, 24%;
      Overall Acceptance Rate 415 of 1,482 submissions, 28%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)4
      • Downloads (Last 6 weeks)3
      Reflects downloads up to 12 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Improving search and rescue planning and resource allocation through case-based and concept-based retrievalJournal of Intelligent Information Systems10.1007/s10844-024-00861-0Online publication date: 1-Jun-2024
      • (2016)Results of a digital library curriculum field testInternational Journal on Digital Libraries10.1007/s00799-015-0151-517:4(273-286)Online publication date: 1-Nov-2016
      • (2012)Theoretical Foundations for Digital Libraries: The 5S (Societies, Scenarios, Spaces, Structures, Streams) ApproachSynthesis Lectures on Information Concepts, Retrieval, and Services10.2200/S00434ED1V01Y201207ICR0224:2(1-180)Online publication date: 30-Jul-2012
      • (2010)EvaProceedings of the international conference on Multimedia information retrieval10.1145/1743384.1743455(413-416)Online publication date: 29-Mar-2010
      • (2010)Efficient evaluation of relevance feedback algorithms for XML content‐based retrieval systemsInternational Journal of Web Information Systems10.1108/174400810110531136:2(121-131)Online publication date: 22-Jun-2010
      • (2010)User-Oriented Evaluation of Color Descriptors for Web Image RetrievalResearch and Advanced Technology for Digital Libraries10.1007/978-3-642-15464-5_63(486-489)Online publication date: 2010
      • (2009)A sophisticated library search strategy using folksonomies and similarity matchingJournal of the American Society for Information Science and Technology10.1002/asi.2107260:7(1392-1406)Online publication date: 25-Mar-2009
      • (2008)Cooperative Research on Web Data Management at UFMG and UFAM - A Brief ReportProceedings of the 2008 Latin American Web Conference10.1109/LA-WEB.2008.25(144-150)Online publication date: 28-Oct-2008
      • (2007)LABRADORInformation Processing and Management: an International Journal10.1016/j.ipm.2006.09.01843:4(983-1004)Online publication date: 1-Jul-2007
      • (2006)Automatic structured query transformation over distributed digital librariesProceedings of the 2006 ACM symposium on Applied computing10.1145/1141277.1141531(1078-1083)Online publication date: 23-Apr-2006
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media