Article

Web object retrieval

Authors:

Wei-Ying MaAuthors Info & Claims

WWW '07: Proceedings of the 16th international conference on World Wide Web

Pages 81 - 90

https://doi.org/10.1145/1242572.1242584

Published: 08 May 2007 Publication History

Abstract

The primary function of current Web search engines is essentially relevance ranking at the document level. However, myriad structured information about real-world objects is embedded in static Web pages and online Web databases. Document-level information retrieval can unfortunately lead to highly inaccurate relevance ranking in answering object-oriented queries. In this paper, we propose a paradigm shift to enable searching at the object level. In traditional information retrieval models, documents are taken as the retrieval units and the content of a document is considered reliable. However, this reliability assumption is no longer valid in the object retrieval context when multiple copies of information about the same object typically exist. These copies may be inconsistent because of diversity of Web site qualities and the limited performance of current information extraction techniques. If we simply combine the noisy and inaccurate attribute information extracted from different sources, we may not be able to achieve satisfactory retrieval performance. In this paper, we propose several language models for Web object retrieval, namely an unstructured object retrieval model, a structured object retrieval model, and a hybrid model with both structured and unstructured retrieval features. We test these models on a paper search engine and compare their performances. We conclude that the hybrid model is the superior by taking into account the extraction errors at varying levels.

References

[1]

Ricardo Baeza-Yates and Berthier Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley Publishers, 1999.

Digital Library

[2]

Deng Cai, Xiaofei He, Ji-Rong Wen, and Wei-Ying Ma. Block-Level Link Analysis. In Proceedings of SIGIR, 2004.

Digital Library

[3]

Deng Cai, Shipeng Yu, Ji-Rong Wen and Wei-Ying Ma. Block-based Web Search. In Proceedings of SIGIR, 2004.

Digital Library

[4]

J. P. Callan. Passage-Level Evidence in Document Retrieval. In Proceedings of SIGIR, 1994.

Digital Library

[5]

J.P. Callan. Distributed information retrieval. In Advances in Information Retrieval: Recent Research from the Center for Intelligent Information Retrieval, edited by W. Bruce Croft. Kluwer Academic Publisher, pp. 127--150, 2000.

[6]

Abdur Chowdhury, Mohammed Aljlayl, Eric Jensen, Steve Beitzel, David Grossman and Ophir Frieder. Linear Combinations Based on Document Structure and Varied Stemming for Arabic Retrieval. In The Eleventh Text REtrieval Conference (TREC 2002), 2003.

[7]

Charles L.A. Clarke. Controlling Overlap in Content-Oriented XML Retrieval. In Proceedings of the SIGIR, 2005.

Digital Library

[8]

Nick Craswell, David Hawking and Stephen Roberson. Effective Site Finding using Link Anchor Information. In Proceedings of SIGIR, 2001.

Digital Library

[9]

Nick Craswell, David Hawking and Trystan Upstill. TREC12 Web and Interactive Tracks at CSIRO. In The Twelfth Text Retrieval Conference(TREC 2003), 2004.

[10]

Ronald Fagin, Ravi Kumar, Kevin S. McCurley, Jasmine Novak, D. Sivakumar, John A. Tomlin and David P. Williamson. Searching the Workplace Web. In Proceedings of the Twelfth International World Wide Web Conference, 2003.

Digital Library

[11]

Hui Fang, Tao Tao and ChengXiang Zhai. A Formal Study of Information Retrieval Heuristics. In Proceedings of SIGIR, 2004.

Digital Library

[12]

Norbert Fuhr. Probabilistic Models in Information Retrieval. The computer Journal, Vol.35, No.3, pp. 243--255.

Digital Library

[13]

Norbert Fuhr and Kai Großjohann. XIRQL: A Query Language for Information Retrieval in XML documents. In Proceedings of the SIGIR, 2001.

Digital Library

[14]

L. Gravano and H. Garcia-Molina. Generalizing gloss to vector-space databases and broker hierarchies. In Proceeding of the International Conference on Very Large Data Bases (VLDB), 1995.

Digital Library

[15]

Jiawei Han and Micheline Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmman Publishers, 2000.

Digital Library

[16]

David Hull. Using Statistical Testing in the Evaluation of Retrieval Experiments. In Proceedings of the ACM SIGIR, 1993.

Digital Library

[17]

Jaap Kamps, Maarten de Rijke and Borkur Sigurbjornsson. Length normalization in XML retrieval. In Proceedings of the SIGIR, 2004.

Digital Library

[18]

M. Kaszkiel and J. Zobel. Passage Retrieval Revisited. In Proceedings of SIGIR, 1997.

Digital Library

[19]

Mounia Lalmas. Dempster-Shafer's Theory of Evidence Applied to Structured Documents: Modeling Uncertainty. In Proceedings of SIGIR, 1997.

Digital Library

[20]

Mounia Lalmas, Uniform representation of content and structure for structured document retrieval. Technical Report, Queen Mary and Westfield College, University of London, 2000.

[21]

K. Lerman, L. Getoor, S. Minton, and C. A. Knoblock. Using the structure of Web sites for automatic segmentation of tables. In ACM SIGMOD Conference (SIGMOD), 2004.

Digital Library

[22]

Bing Liu, Robert Grossman, and Yanhong Zhai. Mining Data Records in Web Pages. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2003.

Digital Library

[23]

M. Meng, K. Liu, C. Yu, W. Wu, and N. Rishe. Estimating the usefulness of search engines. In ICDE Conference, 1999.

Digital Library

[24]

Amihai Motro and Igor Rakov. Estimating the quality of databases. In Proceedings of the 3rd International Conference on Flexible Query Answering (FQAS), Roskilde, Denmark, May 1998. Springer Verlag.

Digital Library

[25]

Felix Naumann and Rolker Claudia. Assessment Methods for Information Quality Criteria. In Proceedings of the International Conference on Information Quality (IQ), Cambridge, MA, 2000.

[26]

Zaiqing Nie, Yuanzhi Zhang, Ji-Rong Wen and Wei-Ying Ma. Object-Level Ranking: Bringing Order to Web Objects. In Proceedings of the 14th international World Wide Web Conference (WWW), 2005.

Digital Library

[27]

Zaiqing Nie, Ji-Rong Wen and Wei-Ying Ma. Object-level Vertical Search. To appear by the Third Biennial Conference on Innovative Data Systems Research (CIDR), 2007.

[28]

Paul Ogilvie and Jamie Callan. Combining Document Representations for known item search. In Proceedings of SIGIR, 2003.

Digital Library

[29]

S. E. Robertson, S. Walker, S. Jones and M. M. Hancock-Beaulieu. Okapi at TREC-3. In The Third Text REtrieval Conference (TREC 3), 1994.

[30]

Stephen Robertson, Hugo Zaragoza, and Michael Taylor. Simple BM25 Extension to Multiple Weighted Fields. ACM CIKM, 2004.

Digital Library

[31]

S. Tejada, C. A. Knoblock, and S. Minton. Learning domain-independent string transformation weights for high accuracy object identification. In Knowledge Discovery and Data Mining (KDD), 2002.

Digital Library

[32]

J. Wang and F. H. Lochovsky. Data extraction and label assignment for Web databases. In World Wide Web conference (WWW), 2003.

Digital Library

[33]

Thijs Westerveld, Wessel Kraaij and Djoerd Hiemstra. Retrieving Web Pages using Content, Links, URLs and Anchors. In The Tenth Text REtrieval Conference (TREC2001), 2001.

[34]

Ross Wilkinson. Effective Retrieval of Structured Documents. In Proceedings of SIGIR, 1994.

Digital Library

[35]

J. Xu, and J. Callan. Effective retrieval with distributed collections. In Proceedings of SIGIR, 1998.

Digital Library

[36]

Yiming Yang and Xin Liu. A re-examination of text categorization methods. In Proceedings of the ACM SIGIR, 1999.

Digital Library

[37]

Jun Zhu, Zaiqing Nie, Ji-Rong Wen, Bo Zhang, Wei-Ying Ma. 2D Conditional Random Fields for Web Information Extraction. In Proceedings of the 22nd International Conference on Machine Learning (ICML), 2005.

Digital Library

[38]

Jun Zhu, Zaiqing Nie, Ji-Rong Wen, Bo Zhang, Wei-Ying Ma. Simultaneous Record Detection and Attribute Labeling in Web Data Extraction. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2006.

Digital Library

Cited By

Bhalgat YHenriques JZisserman A(2023)A Light Touch Approach to Teaching Transformers Multi-view Geometry2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.00480(4958-4969)Online publication date: Jun-2023
https://doi.org/10.1109/CVPR52729.2023.00480
Anjan Kumar KSatish Kumar TReshma J(2021)Geographical Labeling of Web Objects Through Maximum Marginal ClassificationAdvances in Data Science and Information Engineering10.1007/978-3-030-71704-9_52(713-724)Online publication date: 30-Oct-2021
https://doi.org/10.1007/978-3-030-71704-9_52
Al-Barakati ADaud A(2018)Venue-Influence Language Models for Expert Finding in Bibliometric NetworksInternational Journal on Semantic Web & Information Systems10.4018/IJSWIS.201807010914:3(184-201)Online publication date: 1-Jul-2018
https://dl.acm.org/doi/10.4018/IJSWIS.2018070109
Show More Cited By

Index Terms

Web object retrieval
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Language models for web object retrieval
WiCOM'09: Proceedings of the 5th International Conference on Wireless communications, networking and mobile computing

Document-level information retrieval can unfortunately lead to highly inaccurate relevance ranking in answering object-oriented queries. A paradigm is proposed to enable searching at the object level. However, this reliability assumption is no longer ...
Language Models for Web Object Retrieval
NISS '09: Proceedings of the 2009 International Conference on New Trends in Information and Service Science

Document-level information retrieval can unfortunately lead to highly inaccurate relevance ranking in answering object-oriented queries. A paradigm is proposed to enable searching at the object level. However, this reliability assumption is no longer ...
The Study of Methods for Language Model Based Positive and Negative Relevance Feedback in Information Retrieval
ISISE '12: Proceedings of the 2012 Fourth International Symposium on Information Science and Engineering

Relevance feedback techniques are important to Information retrieval (IR), which can effectively improve the performance of IR. The feedback includes positive and negative relevance one. The most of the previous work using feedback have focused on ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '07: Proceedings of the 16th international conference on World Wide Web

May 2007

1382 pages

ISBN:9781595936547

DOI:10.1145/1242572

General Chairs:
Carey Williamson
University of Calgary, Canada
,
Mary Ellen Zurko
IBM, USA
,
Program Chairs:
Peter Patel-Schneider
Bell Labs Research, USA
,
Prashant Shenoy
University of Massachusetts at Amherst, USA

Copyright © 2007 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

ACM: Association for Computing Machinery

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 May 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

WWW'07

Sponsor:

ACM

WWW'07: 16th International World Wide Web Conference

May 8 - 12, 2007

Alberta, Banff, Canada

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

92
Total Citations
View Citations
1,318
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Bhalgat YHenriques JZisserman A(2023)A Light Touch Approach to Teaching Transformers Multi-view Geometry2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.00480(4958-4969)Online publication date: Jun-2023
https://doi.org/10.1109/CVPR52729.2023.00480
Anjan Kumar KSatish Kumar TReshma J(2021)Geographical Labeling of Web Objects Through Maximum Marginal ClassificationAdvances in Data Science and Information Engineering10.1007/978-3-030-71704-9_52(713-724)Online publication date: 30-Oct-2021
https://doi.org/10.1007/978-3-030-71704-9_52
Al-Barakati ADaud A(2018)Venue-Influence Language Models for Expert Finding in Bibliometric NetworksInternational Journal on Semantic Web & Information Systems10.4018/IJSWIS.201807010914:3(184-201)Online publication date: 1-Jul-2018
https://dl.acm.org/doi/10.4018/IJSWIS.2018070109
Sharma DSharma A(2018)Deep Web Information Retrieval ProcessThe Dark Web10.4018/978-1-5225-3163-0.ch007(114-137)Online publication date: 2018
https://doi.org/10.4018/978-1-5225-3163-0.ch007
AnjanKumar KChitra SSatish Kumar T(2018)Probabilistic classification techniques to perform geographical labeling of web objectsCluster Computing10.1007/s10586-018-1822-yOnline publication date: 14-Feb-2018
https://doi.org/10.1007/s10586-018-1822-y
KIM ECHOI K(2017)Entity Summarization Based on Entity Grouping in Multilingual Projected Entity SpaceIEICE Transactions on Information and Systems10.1587/transinf.2016EDP7235E100.D:9(2138-2146)Online publication date: 2017
https://doi.org/10.1587/transinf.2016EDP7235
Sanjay PRaju GReddy B(2017)Geographical labeling of web objects through density estimator model2017 International Conference on Computing Methodologies and Communication (ICCMC)10.1109/ICCMC.2017.8282649(1130-1135)Online publication date: Jul-2017
https://doi.org/10.1109/ICCMC.2017.8282649
Bao LLi JXing ZWang XXia XZhou B(2017)Extracting and analyzing time-series HCI data from screen-captured task videosEmpirical Software Engineering10.1007/s10664-015-9417-122:1(134-174)Online publication date: 1-Feb-2017
https://dl.acm.org/doi/10.1007/s10664-015-9417-1
Yahya MBarbosa DBerberich KWang QWeikum GBennett PJosifovski VNeville JRadlinski F(2016)Relationship Queries on Extended Knowledge GraphsProceedings of the Ninth ACM International Conference on Web Search and Data Mining10.1145/2835776.2835795(605-614)Online publication date: 8-Feb-2016
https://dl.acm.org/doi/10.1145/2835776.2835795
Berkani NBellatreche LBenatallah B(2016)A Value-Added Approach to Design BI ApplicationsBig Data Analytics and Knowledge Discovery10.1007/978-3-319-43946-4_24(361-375)Online publication date: 6-Aug-2016
https://doi.org/10.1007/978-3-319-43946-4_24
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents