Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1242572.1242584acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
Article

Web object retrieval

Published: 08 May 2007 Publication History

Abstract

The primary function of current Web search engines is essentially relevance ranking at the document level. However, myriad structured information about real-world objects is embedded in static Web pages and online Web databases. Document-level information retrieval can unfortunately lead to highly inaccurate relevance ranking in answering object-oriented queries. In this paper, we propose a paradigm shift to enable searching at the object level. In traditional information retrieval models, documents are taken as the retrieval units and the content of a document is considered reliable. However, this reliability assumption is no longer valid in the object retrieval context when multiple copies of information about the same object typically exist. These copies may be inconsistent because of diversity of Web site qualities and the limited performance of current information extraction techniques. If we simply combine the noisy and inaccurate attribute information extracted from different sources, we may not be able to achieve satisfactory retrieval performance. In this paper, we propose several language models for Web object retrieval, namely an unstructured object retrieval model, a structured object retrieval model, and a hybrid model with both structured and unstructured retrieval features. We test these models on a paper search engine and compare their performances. We conclude that the hybrid model is the superior by taking into account the extraction errors at varying levels.

References

[1]
Ricardo Baeza-Yates and Berthier Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley Publishers, 1999.
[2]
Deng Cai, Xiaofei He, Ji-Rong Wen, and Wei-Ying Ma. Block-Level Link Analysis. In Proceedings of SIGIR, 2004.
[3]
Deng Cai, Shipeng Yu, Ji-Rong Wen and Wei-Ying Ma. Block-based Web Search. In Proceedings of SIGIR, 2004.
[4]
J. P. Callan. Passage-Level Evidence in Document Retrieval. In Proceedings of SIGIR, 1994.
[5]
J.P. Callan. Distributed information retrieval. In Advances in Information Retrieval: Recent Research from the Center for Intelligent Information Retrieval, edited by W. Bruce Croft. Kluwer Academic Publisher, pp. 127--150, 2000.
[6]
Abdur Chowdhury, Mohammed Aljlayl, Eric Jensen, Steve Beitzel, David Grossman and Ophir Frieder. Linear Combinations Based on Document Structure and Varied Stemming for Arabic Retrieval. In The Eleventh Text REtrieval Conference (TREC 2002), 2003.
[7]
Charles L.A. Clarke. Controlling Overlap in Content-Oriented XML Retrieval. In Proceedings of the SIGIR, 2005.
[8]
Nick Craswell, David Hawking and Stephen Roberson. Effective Site Finding using Link Anchor Information. In Proceedings of SIGIR, 2001.
[9]
Nick Craswell, David Hawking and Trystan Upstill. TREC12 Web and Interactive Tracks at CSIRO. In The Twelfth Text Retrieval Conference(TREC 2003), 2004.
[10]
Ronald Fagin, Ravi Kumar, Kevin S. McCurley, Jasmine Novak, D. Sivakumar, John A. Tomlin and David P. Williamson. Searching the Workplace Web. In Proceedings of the Twelfth International World Wide Web Conference, 2003.
[11]
Hui Fang, Tao Tao and ChengXiang Zhai. A Formal Study of Information Retrieval Heuristics. In Proceedings of SIGIR, 2004.
[12]
Norbert Fuhr. Probabilistic Models in Information Retrieval. The computer Journal, Vol.35, No.3, pp. 243--255.
[13]
Norbert Fuhr and Kai Großjohann. XIRQL: A Query Language for Information Retrieval in XML documents. In Proceedings of the SIGIR, 2001.
[14]
L. Gravano and H. Garcia-Molina. Generalizing gloss to vector-space databases and broker hierarchies. In Proceeding of the International Conference on Very Large Data Bases (VLDB), 1995.
[15]
Jiawei Han and Micheline Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmman Publishers, 2000.
[16]
David Hull. Using Statistical Testing in the Evaluation of Retrieval Experiments. In Proceedings of the ACM SIGIR, 1993.
[17]
Jaap Kamps, Maarten de Rijke and Borkur Sigurbjornsson. Length normalization in XML retrieval. In Proceedings of the SIGIR, 2004.
[18]
M. Kaszkiel and J. Zobel. Passage Retrieval Revisited. In Proceedings of SIGIR, 1997.
[19]
Mounia Lalmas. Dempster-Shafer's Theory of Evidence Applied to Structured Documents: Modeling Uncertainty. In Proceedings of SIGIR, 1997.
[20]
Mounia Lalmas, Uniform representation of content and structure for structured document retrieval. Technical Report, Queen Mary and Westfield College, University of London, 2000.
[21]
K. Lerman, L. Getoor, S. Minton, and C. A. Knoblock. Using the structure of Web sites for automatic segmentation of tables. In ACM SIGMOD Conference (SIGMOD), 2004.
[22]
Bing Liu, Robert Grossman, and Yanhong Zhai. Mining Data Records in Web Pages. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2003.
[23]
M. Meng, K. Liu, C. Yu, W. Wu, and N. Rishe. Estimating the usefulness of search engines. In ICDE Conference, 1999.
[24]
Amihai Motro and Igor Rakov. Estimating the quality of databases. In Proceedings of the 3rd International Conference on Flexible Query Answering (FQAS), Roskilde, Denmark, May 1998. Springer Verlag.
[25]
Felix Naumann and Rolker Claudia. Assessment Methods for Information Quality Criteria. In Proceedings of the International Conference on Information Quality (IQ), Cambridge, MA, 2000.
[26]
Zaiqing Nie, Yuanzhi Zhang, Ji-Rong Wen and Wei-Ying Ma. Object-Level Ranking: Bringing Order to Web Objects. In Proceedings of the 14th international World Wide Web Conference (WWW), 2005.
[27]
Zaiqing Nie, Ji-Rong Wen and Wei-Ying Ma. Object-level Vertical Search. To appear by the Third Biennial Conference on Innovative Data Systems Research (CIDR), 2007.
[28]
Paul Ogilvie and Jamie Callan. Combining Document Representations for known item search. In Proceedings of SIGIR, 2003.
[29]
S. E. Robertson, S. Walker, S. Jones and M. M. Hancock-Beaulieu. Okapi at TREC-3. In The Third Text REtrieval Conference (TREC 3), 1994.
[30]
Stephen Robertson, Hugo Zaragoza, and Michael Taylor. Simple BM25 Extension to Multiple Weighted Fields. ACM CIKM, 2004.
[31]
S. Tejada, C. A. Knoblock, and S. Minton. Learning domain-independent string transformation weights for high accuracy object identification. In Knowledge Discovery and Data Mining (KDD), 2002.
[32]
J. Wang and F. H. Lochovsky. Data extraction and label assignment for Web databases. In World Wide Web conference (WWW), 2003.
[33]
Thijs Westerveld, Wessel Kraaij and Djoerd Hiemstra. Retrieving Web Pages using Content, Links, URLs and Anchors. In The Tenth Text REtrieval Conference (TREC2001), 2001.
[34]
Ross Wilkinson. Effective Retrieval of Structured Documents. In Proceedings of SIGIR, 1994.
[35]
J. Xu, and J. Callan. Effective retrieval with distributed collections. In Proceedings of SIGIR, 1998.
[36]
Yiming Yang and Xin Liu. A re-examination of text categorization methods. In Proceedings of the ACM SIGIR, 1999.
[37]
Jun Zhu, Zaiqing Nie, Ji-Rong Wen, Bo Zhang, Wei-Ying Ma. 2D Conditional Random Fields for Web Information Extraction. In Proceedings of the 22nd International Conference on Machine Learning (ICML), 2005.
[38]
Jun Zhu, Zaiqing Nie, Ji-Rong Wen, Bo Zhang, Wei-Ying Ma. Simultaneous Record Detection and Attribute Labeling in Web Data Extraction. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2006.

Cited By

View all
  • (2023)A Light Touch Approach to Teaching Transformers Multi-view Geometry2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.00480(4958-4969)Online publication date: Jun-2023
  • (2021)Geographical Labeling of Web Objects Through Maximum Marginal ClassificationAdvances in Data Science and Information Engineering10.1007/978-3-030-71704-9_52(713-724)Online publication date: 30-Oct-2021
  • (2018)Venue-Influence Language Models for Expert Finding in Bibliometric NetworksInternational Journal on Semantic Web & Information Systems10.4018/IJSWIS.201807010914:3(184-201)Online publication date: 1-Jul-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WWW '07: Proceedings of the 16th international conference on World Wide Web
May 2007
1382 pages
ISBN:9781595936547
DOI:10.1145/1242572
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 May 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. information extraction
  2. information retrieval
  3. language model
  4. web objects

Qualifiers

  • Article

Conference

WWW'07
Sponsor:
WWW'07: 16th International World Wide Web Conference
May 8 - 12, 2007
Alberta, Banff, Canada

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)A Light Touch Approach to Teaching Transformers Multi-view Geometry2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.00480(4958-4969)Online publication date: Jun-2023
  • (2021)Geographical Labeling of Web Objects Through Maximum Marginal ClassificationAdvances in Data Science and Information Engineering10.1007/978-3-030-71704-9_52(713-724)Online publication date: 30-Oct-2021
  • (2018)Venue-Influence Language Models for Expert Finding in Bibliometric NetworksInternational Journal on Semantic Web & Information Systems10.4018/IJSWIS.201807010914:3(184-201)Online publication date: 1-Jul-2018
  • (2018)Deep Web Information Retrieval ProcessThe Dark Web10.4018/978-1-5225-3163-0.ch007(114-137)Online publication date: 2018
  • (2018)Probabilistic classification techniques to perform geographical labeling of web objectsCluster Computing10.1007/s10586-018-1822-yOnline publication date: 14-Feb-2018
  • (2017)Entity Summarization Based on Entity Grouping in Multilingual Projected Entity SpaceIEICE Transactions on Information and Systems10.1587/transinf.2016EDP7235E100.D:9(2138-2146)Online publication date: 2017
  • (2017)Geographical labeling of web objects through density estimator model2017 International Conference on Computing Methodologies and Communication (ICCMC)10.1109/ICCMC.2017.8282649(1130-1135)Online publication date: Jul-2017
  • (2017)Extracting and analyzing time-series HCI data from screen-captured task videosEmpirical Software Engineering10.1007/s10664-015-9417-122:1(134-174)Online publication date: 1-Feb-2017
  • (2016)Relationship Queries on Extended Knowledge GraphsProceedings of the Ninth ACM International Conference on Web Search and Data Mining10.1145/2835776.2835795(605-614)Online publication date: 8-Feb-2016
  • (2016)A Value-Added Approach to Design BI ApplicationsBig Data Analytics and Knowledge Discovery10.1007/978-3-319-43946-4_24(361-375)Online publication date: 6-Aug-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media