My research interests lie in the areas of Information Retrieval, Data Mining, Machine Learning, Natural Language Processing and Human computer Interaction. Address: Google Switzerland GmbH Brandschenkestrasse 110 CH-8002 Zurich Switzerland
Empirical modeling of the score distributions associated with retrieved documents is an essential... more Empirical modeling of the score distributions associated with retrieved documents is an essential task for many retrieval applications. In this work, we propose modeling the relevant documents’ scores by a mixture of Gaussians and modeling the non-relevant scores by a Gamma distribution. Applying variational inference we automatically trade-off the goodness-of-fit with the complexity of the model. We test our model on traditional retrieval functions and actual search engines submitted to TREC. We demonstrate the utility of our model in inferring precision-recall curves. In all experiments our model outperforms the dominant exponential-Gaussian model.
As document collections grow larger, the information needs and relevance judgments in a test coll... more As document collections grow larger, the information needs and relevance judgments in a test collection must be well-chosen within a limited budget to give the most reliable and robust evaluation results. In this work we analyze a sample of queries categorized by length and corpus-appropriateness to determine the right proportion needed to distinguish between systems. We also analyze the appropriate division of labor between developing topics and making relevance judgments, and show that only a small, biased sample of queries with sparse judgments is needed to produce the same results as a much larger sample of queries.
Empirical modeling of the score distributions associated with retrieved documents is an essential... more Empirical modeling of the score distributions associated with retrieved documents is an essential task for many retrieval applications. In this work, we propose modeling the relevant documents’ scores by a mixture of Gaussians and modeling the non-relevant scores by a Gamma distribution. Applying variational inference we automatically trade-off the goodness-of-fit with the complexity of the model. We test our model on traditional retrieval functions and actual search engines submitted to TREC. We demonstrate the utility of our model in inferring precision-recall curves. In all experiments our model outperforms the dominant exponential-Gaussian model.
As document collections grow larger, the information needs and relevance judgments in a test coll... more As document collections grow larger, the information needs and relevance judgments in a test collection must be well-chosen within a limited budget to give the most reliable and robust evaluation results. In this work we analyze a sample of queries categorized by length and corpus-appropriateness to determine the right proportion needed to distinguish between systems. We also analyze the appropriate division of labor between developing topics and making relevance judgments, and show that only a small, biased sample of queries with sparse judgments is needed to produce the same results as a much larger sample of queries.
Information Retrieval is a ubiquitous technology. The range of asks and information collections I... more Information Retrieval is a ubiquitous technology. The range of asks and information collections IR systems are applied to is ever growing. However, the ability of researchers and developers to build high quality searching systems for new environments is not keeping up with the growth in potential application areas. The key reason for this failing is the profound lack of testing resources that capture the unique and complex environments different retrieval systems are deployed in. The cost of building test resources is prohibitively expensive. Consequently, researchers are unable to assess the effectiveness of new ideas in such environments and search engine vendors struggle to tune their systems to the particular needs of their customers.
In this talk I will discuss efficient and reliable resource construction methodologies for the purpose of evaluating retrieval systems and training new ranking functions. In the first part, I will focus on low-cost evaluation, presenting a framework for obtaining relevance judgments and discussing the effect of the choice of evaluation metrics on the cost and reliability of the constructed collections. In the second part, I will examine the duality between low-cost evaluation and low-cost learning to rank. Can techniques used for obtaining relevance judgments also be used to obtain labels for effective training of ranking functions? What is a good metric to optimise for? Finally I will briefly introduce an extension of the current evaluation and learning paradigm to incorporate additional information about the different retrieval scenarios and users' activities.
Information retrieval systems assign scores to documents according to their relevance to a user's... more Information retrieval systems assign scores to documents according to their relevance to a user's request and return documents in a descending order of their scores. Given this ranked list of documents and their corresponding scores inferring the score distributions of relevant and non-relevant documents is an essential task for numerous information retrieval applications, such as information filtering, topic detection, meta-search, and distributed IR. Modeling score distributions in an accurate manner is often the basis of such inferences. In this talk I will revisit the choice of distributions used to model documents' scores. First, I will discuss some assumptions and intuitions behind modeling score distributions. Then, by utilizing a richer class of density functions than the ones dominating the literature and applying Variational Bayes to automatically trade-off the goodness-of-fit and the model complexity I will present a better model for score distributions directly dictated by the data.
Uploads
Papers by Evangelos Kanoulas
In this talk I will discuss efficient and reliable resource construction methodologies for the purpose of evaluating retrieval systems and training new ranking functions. In the first part, I will focus on low-cost evaluation, presenting a framework for obtaining relevance judgments and discussing the effect of the choice of evaluation metrics on the cost and reliability of the constructed collections. In the second part, I will examine the duality between low-cost evaluation and low-cost learning to rank. Can techniques used for obtaining relevance judgments also be used to obtain labels for effective training of ranking functions? What is a good metric to optimise for? Finally I will briefly introduce an extension of the current evaluation and learning paradigm to incorporate additional information about the different retrieval scenarios and users' activities.