Both the WWW research community and industry have shown increasing interests in not just finding relevant documents, but specific objects or entities to satisfy more sophisticated user information needs. TREC launched an Entity Track in 2009 to investigate the task of related entity finding. This paper proposes two novel probabilistic models to integrate several components into a unified modeling process. In particular, the type matching component can characterize the degree of matching between the expected entity type that is inferred from query and the candidate entity type that is inferred from entity profile. Another important component can incorporate prior knowledge about entities into the retrieval process. The main difference of the two models is that the second model explicitly considers the effect of source entity while the first one does not. A comprehensive set of experiments were conducted on the TREC Entity Track testbeds from 2009 to 2011 with careful design to show the contributions of individual components. The results demonstrate that both the type matching component and the entity prior modeling component can effectively boost the entity retrieval performance. Furthermore, the second model performs better than the first one in all the settings, indicating the benefits of explicitly modeling source entity in related entity finding. Both models generate better or competitive results than the state-of-the-art results in the TREC REF tasks. In addition, the proposed unified probabilistic approach is applied to the TREC Entity List Completion task and also demonstrates good performance.
Similar content being viewed by others
Alasiry, A., Levene, M., Poulovassilis, A.: Extraction and evaluation of candidate named entities in search engine queries. In: Web Information Systems Engineering, pp. 483–496. Springer (2012)
Balog, K.: People search in the enterprise. In: SIGIR, pp. 916–916. ACM (2007)
Balog, K., de Rijke, M.: Determining expert profiles (with an application to expert finding). In: IJCAI’07, pp. 2657–2662 (2007)
Balog, K., Azzopardi, L., De Rijke, M.: Formal models for expert finding in enterprise corpora. In: SIGIR, pp. 43–50. ACM (2006)
Balog, K., de Vries, A., Serdyukov, P., Thomas, P., Westerveld, T.: Overview of the trec 2009 entity track. In: TREC (2009)
Balog, K., Meij, E., de Rijke, M.: Entity search: building bridges between two worlds. In: SemSearch Workshop, p. 9. ACM (2010)
Balog, K., Serdyukov, P., de Vries, A.: Overview of the trec 2010 entity track. In: TREC (2010)
Balog, K., Bron, M., De Rijke, M.: Query modeling for entity search based on terms, categories, and examples. ACM Trans. Inf. Syst. (TOIS) 29(4), 22 (2011)
Balog, K., Serdyukov, P., de Vries, A.: Overview of the trec 2011 entity track. In: TREC (2011)
Balog, K., Fang, Y., de Rijke, M., Serdyukov, P., Si, L.: Expertise retrieval. Found. Trends Inf. Retr. 6(2–3), 127–256 (2012)
Bast, H., Chitea, A., Suchanek, F., Weber, I.: Ester: efficient search on text, entities, and relations. In: SIGIR, pp. 671–678. ACM (2007)
Bron, M., Balog, K., de Rijke, M.: Ranking related entities: components and analyses. In: CIKM, pp. 1079–1088. ACM (2010)
Chakrabarti, S., Puniyani, K., Das, S.: Optimizing scoring functions and indexes for proximity search in type-annotated corpora. In: WWW, pp. 717–726. ACM (2006)
Cheng, T., Yan, X., Chang, K.: Entityrank: searching entities directly and holistically. In: VLDB, pp. 387–398. VLDB Endowment (2007)
Craswell, N., de Vries, A., Soboroff, I.: Overview of the trec-2005 enterprise track. In: TREC, pp. 199–205 (2005)
Craswell, N., Demartini, G., Gaugaz, J., Iofciu, T.: L3s at inex 2008: retrieving entities using structured information. Adv. Focus. Retr. 5631, 253–263 (2009)
Dalton, J., Huston, S.: Semantic entity retrieval using web queries over structured rdf data. In: SemSearch Workshop (2010)
de Vries, A., Vercoustre, A.M., Thom, J., Craswell, N., Lalmas, M.: Overview of the inex 2007 entity ranking track. Focus. Access XML Doc. (INEX) 4862, 245–251 (2008)
Du, J., Zhang, Z., Yan, J., Cui, Y., Chen, Z.: Using search session context for named entity recognition in query. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 765–766. ACM (2010)
Fang, H., Zhai, C.X.: Probabilistic models for expert finding. ECIR 4425, 418–430 (2007)
Fang, Y.: Entity retrieval by hierarchical relevance model, exploiting the structure of tables and learning homepage classifiers. In: TREC (2009)
Fang, Y., Si, L., Mathur, A.P.: Discriminative models of integrating document evidence and document-candidate associations for expert search. In: SIGIR, pp. 683–690. ACM (2010)
Guo, J., Xu, G., Cheng, X., Li, H.: Named entity recognition in query. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 267–274. ACM (2009)
Hastie, T., Tibshirani, R., Friedman, J., Franklin, J.: The elements of statistical learning: data mining, inference and prediction (2005)
Jiang, J., Lu, W., Rong, X., Gao, Y.: Adapting language modeling methods for expert search to rank wikipedia entities. Adv. Focus. Retr. 5631, 264–272 (2009)
Kaptein, R.: Result diversity and entity ranking experiments: anchors, links, text and wikipedia. In: TREC (2009)
Kaptein, R., Serdyukov, P., De Vries, A., Kamps, J.: Entity ranking using wikipedia as a pivot. In: CIKM, pp. 69–78. ACM (2010)
Kasneci, G., Suchanek, F.M., Ifrim, G., Ramanath, M., Weikum, G.: Naga: searching and ranking knowledge. In: ICDE, pp. 953–962. IEEE (2008)
Lafferty, J., Zhai, C.: Document language models, query models, and risk minimization for information retrieval. In: SIGIR, pp. 111–119. ACM (2001)
Lafferty, J., McCallum, A., Pereira, F.CN.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: ICML (2001)
Lin, B., Rosa, K.D., Shah, R., Agarwal, N.: Lads: rapid development of a learning-to-rank based related entity finding system using open advancement. In: The First International Workshop on Entity-Oriented Search (2011)
Macdonald, C., Ounis, I.: Voting for candidates: adapting data fusion techniques for an expert search task. In: CIKM, pp. 387–396. ACM (2006)
Macdonald, C., Hannah, D., Ounis, I.: High quality expertise evidence for expert search. In: ECIR, pp. 283–295. Springer-Verlag (2008)
McCreadie, R.: University of glasgow at trec 2009: experiments with terrier. In: TREC (2009)
Pan, Z., Chen, H.: Tongkey at entity track trec 2011: related entity finding (2011)
Pasca, M.A., Harabagiu, S.M.: High performance question/answering. In: SIGIR, pp. 366–374. ACM (2001)
Prager, J.: Open-domain question answering. Found. Trends Inf. Retr. 1(2), 91–231 (2006)
Ravichandran, D., Hovy, E.: Learning surface text patterns for a question answering system. In: ACL, Association for Computational Linguistics, pp. 41–47 (2002)
Santos, R.LT., Macdonald, C., Ounis, I.: Voting for related entities. In: Adaptivity, Personalization and Fusion of Heterogeneous Information, pp. 1–8 (2010)
Serdyukov, P.: Delft university at the trec 2009 entity track: ranking wikipedia entities. In: TREC (2009)
Serdyukov, P., Hiemstra, D.: Modeling documents as mixtures of persons for expert finding. In: ECIR, pp. 309–320 (2008)
Serdyukov, P., Rode, H., Hiemstra, D.: Modeling multi-step relevance propagation for expert finding. In: CIKM, pp. 1133–1142. ACM (2008)
Tsikrika, T., Serdyukov, P., Rode, H., Westerveld, T., Aly, R., Hiemstra, D., de Vries, A.: Structured document retrieval, multimedia retrieval, and entity ranking using pf/tijah. Focus. Access XML Doc. (INEX) 4862, 306–320 (2008)
Vechtomova, O., Robertson, S.E.: A domain-independent approach to finding related entities. Inf. Process. Manag. (2012)
Vercoustre, A.M., Pehcevski, J., Thom, J.: Using wikipedia categories and links in entity ranking. Focus. Access XML Doc. (INEX) 4862, 321–335 (2008)
Vinod Vydiswaran, V.G.: Finding related entities by retrieving relations: Uiuc at trec 2009 entity track. In: TREC (2009)
Voorhees, E.M.: The trec-8 question answering track report. In: TREC, vol. 8, pp. 77–82 (1999)
Wang, D., Wu, Q., Chen, H., Niu, J.: A multiple-stage framework for related entity rinding: Fdwim at trec 2010 entity track. TREC (2010)
Wang, Z.: Bupt at trec 2009: entity track. In: TREC (2009)
Weerkamp, W., Balog, K., Meij, E.: A generative language modeling approach for ranking entities. Adv. Focus. Retr. 5631, 292–299 (2009)
Wu, Y., Kashioka, H.: Nict at trec 2009: employing three models for entity ranking track. In: TREC (2009)
Yang, Q.: Experiments on related entity finding track at trec 2009. In: TREC (2009)
Zhai, H.: A novel framework for related entities finding: Ictnet at trec 2009 entity track. In: TREC (2009)
Zheng, W.: Udel/smu at trec 2009 entity track. In: TREC (2009)
Zhu, J., Song, D., Rüger, S.: Integrating document features for entity ranking. Focus. Access XML Doc. (INEX) 4862, 336–347 (2008)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Fang, Y., Si, L. Related entity finding by unified probabilistic models. World Wide Web 18, 521–543 (2015). https://doi.org/10.1007/s11280-013-0267-8
Issue Date:
DOI: https://doi.org/10.1007/s11280-013-0267-8