research-article

Searching Locally-Defined Entities

Authors:

Ariel FuxmanAuthors Info & Claims

CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management

Pages 1499 - 1508

https://doi.org/10.1145/2661829.2661954

Published: 03 November 2014 Publication History

Abstract

When consuming content, users typically encounter entities that they are not familiar with. A common scenario is when users want to find information about entities directly within the content they are consuming. For example, when reading the book "Adventures of Huckleberry Finn", a user may lose track of the character Mary Jane and want to find some paragraph in the book that gives relevant information about her. The way this is achieved today is by invoking the ubiquitous Find function ("Ctrl-F"). However, this only returns exact-matching results without any relevance ranking, leading to a suboptimal user experience.

How can we go beyond the Ctrl-F function? To tackle this problem, we present algorithms for semantic matching and relevance ranking that enable users to effectively search and understand entities that have been defined in the content that they are consuming, which we call locally-defined entities. We first analyze the limitations of standard information retrieval models when applied to searching locally-defined entities, and then we propose a novel semantic entity retrieval model that addresses these limitations. We also present a ranking model that leverages multiple novel signals to model the relevance of a passage. A thorough experimental evaluation of the approach in the real-word application of searching characters within e-books shows that it outperforms the baselines by 60%+ in terms of NDCG.

References

[1]

S. Bonzi and E. Liddy. The use of anaphoric resolution for document description in information retrieval. Inf. Process. Manage., 25(4):429--441, June 1989.

Digital Library

[2]

C. J. Burges. From ranknet to lambdarank to lambdamart: An overview. MSR-TR-2010-82, 2010.

[3]

C. J. C. Burges, R. Ragno, and Q. V. Le. Learning to rank with nonsmooth cost functions. In NIPS, pages 193--200, 2006.

Digital Library

[4]

J. P. Callan. Passage-level evidence in document retrieval. In SIGIR '94, pages 302--310, 1994.

Digital Library

[5]

C. L. A. Clarke, G. V. Cormack, and T. R. Lynam. Exploiting redundancy in question answering. In SIGIR '01, pages 358--365, 2001.

Digital Library

[6]

S. Cucerzan. Large-scale named entity disambiguation based on Wikipedia data. In EMNLP-CoNLL, pages 708--716, 2007.

[7]

R. J. Edens, H. L. Gaylard, G. J. Jones, and A. M. Lam-Adesina. An investigation of broad coverage automatic pronoun resolution for information retrieval. In SIGIR '03, pages 381--382, 2003.

Digital Library

[8]

H. Fang, T. Tao, and C. Zhai. A formal study of information retrieval heuristics. In SIGIR '04, pages 49--56, 2004.

Digital Library

[9]

J. R. Finkel, T. Grenager, and C. Manning. Incorporating non-local information into information extraction systems by gibbs sampling. In ACL, pages 363--370, 2005.

Digital Library

[10]

J. H. Friedman. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29:1189--1232, 2000.

[11]

D. J. Harper, S. Coulthard, and S. Yixing. A language modelling approach to relevance profiling for document browsing. In JCDL '02, pages 76--83, 2002.

Digital Library

[12]

D. J. Harper, I. Koychev, Y. Sun, and I. Pirie. Within-document retrieval: A user-centred evaluation of relevance profiling. Inf. Retr., 7(3-4):265--290, Sept. 2004.

Digital Library

[13]

M. A. Hearst. Tilebars: Visualization of term distribution information in full text information access. In CHI '95, pages 59--66, 1995.

Digital Library

[14]

J. Jiang and C. Zhai. Extraction of coherent relevant passages using hidden markov models. ACM Trans. Inf. Syst., 24(3):295--319, July 2006.

Digital Library

[15]

K. S. Jones and C. J. van Rijsbergen. Report on the need for and the provision of an 'ideal' information retrieval test collection. Tech. Rep., University of Cambridge, 1975.

[16]

M. Kaszkiel and J. Zobel. Effective ranking with arbitrary passages. Journal of the American Society for Information Science and Technology, 52(4):344--364, 2001.

Digital Library

[17]

V. Lavrenko and W. B. Croft. Relevance-based language models. In SIGIR '01, pages 120--127, 2001.

Digital Library

[18]

H. Lee, A. Chang, Y. Peirsman, N. Chambers, M. Surdeanu, and D. Jurafsky. Deterministic coreference resolution based on entity-centric, precision-ranked rules. Computational Linguistics, 39(4):885--916, 2013.

Digital Library

[19]

J. Lin, D. Quan, V. Sinha, K. Bakshi, D. Huynh, B. Katz, and D. R. Karger. What makes a good answer? the role of context in question answering. In INTERACT '03, pages 25--32, 2003.

[20]

F. Loizides and G. R. Buchanan. The myth of find: user behaviour and attitudes towards the basic search feature. In JCDL '08, pages 48--51, 2008.

Digital Library

[21]

Y. Lv and C. Zhai. Positional language models for information retrieval. In SIGIR '09, pages 299--306, 2009.

Digital Library

[22]

D. Metzler and W. B. Croft. A markov random field model for term dependencies. In SIGIR '05, pages 472--479, 2005.

Digital Library

[23]

R. Mihalcea and A. Csomai. Wikify!: linking documents to encyclopedic knowledge. In CIKM '07, pages 233--242, 2007.

Digital Library

[24]

R. Mitkov. Anaphora resolution, volume 134. Longman London, 2002.

[25]

S.-H. Na and H. T. Ng. A 2-poisson model for probabilistic coreference of named entities for improved text retrieval. In SIGIR '09, pages 275--282, 2009.

Digital Library

[26]

D. Petkova and W. B. Croft. Proximity-based document representation for named entity retrieval. In CIKM '07, pages 731--740, Lisbon, Portugal, 2007.

Digital Library

[27]

J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In SIGIR '98, pages 275--281, 1998.

Digital Library

[28]

S. E. Robertson and S. Walker. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In SIGIR '94, pages 232--241, 1994.

Digital Library

[29]

S. E. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at trec-3. In TREC '94, pages 109--126, 1994.

[30]

A. Singhal, C. Buckley, and M. Mitra. Pivoted document length normalization. In SIGIR '96, pages 21--29, 1996.

Digital Library

[31]

K. Spärck Jones. Automatic summarising: The state of the art. Inf. Process. Manage., 43(6):1449--1481, Nov. 2007.

Digital Library

[32]

S. Tellex, B. Katz, J. Lin, A. Fernandes, and G. Marton. Quantitative evaluation of passage retrieval algorithms for question answering. In SIGIR '03, pages 41--47.

Digital Library

[33]

A. Tombros and M. Sanderson. Advantages of query biased summaries in information retrieval. In SIGIR '98, pages 2--10, 1998.

Digital Library

[34]

C. Zhai and J. D. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In SIGIR '01, pages 334--342, 2001.

Digital Library

Cited By

Ullah IAlam SAli ZKhan MJabeen FKhusro S(2023)On the current state of query formulation for book searchArtificial Intelligence Review10.1007/s10462-023-10483-756:10(12085-12130)Online publication date: 19-Apr-2023
https://dl.acm.org/doi/10.1007/s10462-023-10483-7

Index Terms

Searching Locally-Defined Entities
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
    2. Retrieval tasks and goals
      1. Document filtering
      2. Information extraction

Recommendations

Ranking related entities for web search queries
WWW '11: Proceedings of the 20th international conference companion on World wide web

Entity ranking is a recent paradigm that refers to retrieving and ranking related objects and entities from different structured sources in various scenarios. Entities typically have associated categories and relationships with other entities. In this ...
Learning to Recommend Related Entities to Search Users
WSDM '15: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining

Over the past few years, major web search engines have introduced knowledge bases to offer popular facts about people, places, and things on the entity pane next to regular search results. In addition to information about the entity searched by the user,...
Unsupervised Graph-Based Entity Resolution for Complex Entities
Entity resolution (ER) is the process of linking records that refer to the same entity. Traditionally, this process compares attribute values of records to calculate similarities and then classifies pairs of records as referring to the same entity or not ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management

November 2014

2152 pages

ISBN:9781450325981

DOI:10.1145/2661829

General Chairs:
Jianzhong Li
Harbin Inst. of Technology
,
X. Sean Wang
Fudan University
,
Program Chairs:
Minos Garofalakis
Technical University of Crete, Greece
,
Ian Soboroff
National Institute of Standards, USA
,
Torsten Suel
New York University, USA
,
Min Wang
Google Research, USA

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 November 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CIKM '14

Sponsor:

CIKM '14: 2014 ACM Conference on Information and Knowledge Management

November 3 - 7, 2014

Shanghai, China

Acceptance Rates

CIKM '14 Paper Acceptance Rate 175 of 838 submissions, 21%;

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
119
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 23 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ullah IAlam SAli ZKhan MJabeen FKhusro S(2023)On the current state of query formulation for book searchArtificial Intelligence Review10.1007/s10462-023-10483-756:10(12085-12130)Online publication date: 19-Apr-2023
https://dl.acm.org/doi/10.1007/s10462-023-10483-7

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten