Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 68 results for author: Zamani, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.12982  [pdf, other

    cs.LG cs.CL cs.IR

    Retrieval-Enhanced Machine Learning: Synthesis and Opportunities

    Authors: To Eun Kim, Alireza Salemi, Andrew Drozdov, Fernando Diaz, Hamed Zamani

    Abstract: In the field of language modeling, models augmented with retrieval components have emerged as a promising solution to address several challenges faced in the natural language processing (NLP) field, including knowledge grounding, interpretability, and scalability. Despite the primary focus on NLP, we posit that the paradigm of retrieval-enhancement can be extended to a broader spectrum of machine… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  2. arXiv:2407.12277  [pdf, other

    cs.CL cs.AI

    Multimodal Reranking for Knowledge-Intensive Visual Question Answering

    Authors: Haoyang Wen, Honglei Zhuang, Hamed Zamani, Alexander Hauptmann, Michael Bendersky

    Abstract: Knowledge-intensive visual question answering requires models to effectively use external knowledge to help answer visual questions. A typical pipeline includes a knowledge retriever and an answer generator. However, a retriever that utilizes local information, such as an image patch, may not provide reliable question-candidate relevance scores. Besides, the two-tower architecture also limits the… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  3. arXiv:2407.11605  [pdf, other

    cs.IR

    Interactions with Generative Information Retrieval Systems

    Authors: Mohammad Aliannejadi, Jacek Gwizdka, Hamed Zamani

    Abstract: At its core, information access and seeking is an interactive process. In existing search engines, interactions are limited to a few pre-defined actions, such as "requery", "click on a document", "scrolling up/down", "going to the next result page", "leaving the search engine", etc. A major benefit of moving towards generative IR systems is enabling users with a richer expression of information ne… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Draft of a chapter intended to appear in a forthcoming book on generative information retrieval, co-edited by Chirag Shah and Ryen White

  4. arXiv:2407.11016  [pdf, other

    cs.CL cs.LG

    LongLaMP: A Benchmark for Personalized Long-form Text Generation

    Authors: Ishita Kumar, Snigdha Viswanathan, Sushrita Yerra, Alireza Salemi, Ryan A. Rossi, Franck Dernoncourt, Hanieh Deilamsalehy, Xiang Chen, Ruiyi Zhang, Shubham Agarwal, Nedim Lipka, Hamed Zamani

    Abstract: Long-text generation is seemingly ubiquitous in real-world applications of large language models such as generating an email or writing a review. Despite the fundamental importance and prevalence of long-text generation in many practical applications, existing work on personalized generation has focused on the generation of very short text. To overcome these limitations, we study the problem of pe… ▽ More

    Submitted 26 June, 2024; originally announced July 2024.

    Comments: 9 pages, 4 figures, 20 tables(including appendix) submitted to EMNLP

  5. arXiv:2406.19928  [pdf, other

    cs.CL cs.HC cs.IR

    Interactive Topic Models with Optimal Transport

    Authors: Garima Dhanania, Sheshera Mysore, Chau Minh Pham, Mohit Iyyer, Hamed Zamani, Andrew McCallum

    Abstract: Topic models are widely used to analyze document collections. While they are valuable for discovering latent topics in a corpus when analysts are unfamiliar with the corpus, analysts also commonly start with an understanding of the content present in a corpus. This may be through categories obtained from an initial pass over the corpus or a desire to analyze the corpus through a predefined set of… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: Pre-print; Work in progress

  6. arXiv:2406.19546  [pdf

    cs.HC

    Understanding Modality Preferences in Search Clarification

    Authors: Leila Tavakoli, Giovanni Castiglia, Federica Calo, Yashar Deldjoo, Hamed Zamani, Johanne R. Trippas

    Abstract: This study is the first attempt to explore the impact of clarification question modality on user preference in search engines. We introduce the multi-modal search clarification dataset, MIMICS-MM, containing clarification questions with associated expert-collected and model-generated images. We analyse user preferences over different clarification modes of text, image, and combination of both thro… ▽ More

    Submitted 4 July, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

  7. ProCIS: A Benchmark for Proactive Retrieval in Conversations

    Authors: Chris Samarinas, Hamed Zamani

    Abstract: The field of conversational information seeking, which is rapidly gaining interest in both academia and industry, is changing how we interact with search engines through natural language interactions. Existing datasets and methods are mostly evaluating reactive conversational information seeking systems that solely provide response to every query from the user. We identify a gap in building and ev… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  8. arXiv:2405.02816  [pdf, other

    cs.CL cs.IR cs.LG

    Stochastic RAG: End-to-End Retrieval-Augmented Generation through Expected Utility Maximization

    Authors: Hamed Zamani, Michael Bendersky

    Abstract: This paper introduces Stochastic RAG--a novel approach for end-to-end optimization of retrieval-augmented generation (RAG) models that relaxes the simplifying assumptions of marginalization and document independence, made in most prior work. Stochastic RAG casts the retrieval process in RAG as a stochastic sampling without replacement process. Through this formulation, we employ straight-through G… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: To appear in the proceedings of SIGIR 2024

  9. arXiv:2405.00175  [pdf, other

    cs.CL cs.IR

    Towards a Search Engine for Machines: Unified Ranking for Multiple Retrieval-Augmented Large Language Models

    Authors: Alireza Salemi, Hamed Zamani

    Abstract: This paper introduces uRAG--a framework with a unified retrieval engine that serves multiple downstream retrieval-augmented generation (RAG) systems. Each RAG system consumes the retrieval results for a unique purpose, such as open-domain question answering, fact verification, entity linking, and relation extraction. We introduce a generic training guideline that standardizes the communication bet… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

  10. arXiv:2404.14772  [pdf, other

    cs.CL

    Simulating Task-Oriented Dialogues with State Transition Graphs and Large Language Models

    Authors: Chris Samarinas, Pracha Promthaw, Atharva Nijasure, Hansi Zeng, Julian Killingback, Hamed Zamani

    Abstract: This paper explores SynTOD, a new synthetic data generation approach for developing end-to-end Task-Oriented Dialogue (TOD) Systems capable of handling complex tasks such as intent classification, slot filling, conversational question-answering, and retrieval-augmented response generation, without relying on crowdsourcing or real-world data. SynTOD utilizes a state transition graph to define the d… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  11. arXiv:2404.14600  [pdf, other

    cs.IR cs.CL

    Planning Ahead in Generative Retrieval: Guiding Autoregressive Generation through Simultaneous Decoding

    Authors: Hansi Zeng, Chen Luo, Hamed Zamani

    Abstract: This paper introduces PAG-a novel optimization and decoding approach that guides autoregressive generation of document identifiers in generative retrieval models through simultaneous decoding. To this aim, PAG constructs a set-based and sequential identifier for each document. Motivated by the bag-of-words assumption in information retrieval, the set-based identifier is built on lexical tokens. Th… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: Accepted to SIGIR 2024

  12. arXiv:2404.13781  [pdf, other

    cs.CL cs.IR

    Evaluating Retrieval Quality in Retrieval-Augmented Generation

    Authors: Alireza Salemi, Hamed Zamani

    Abstract: Evaluating retrieval-augmented generation (RAG) presents challenges, particularly for retrieval models within these systems. Traditional end-to-end evaluation methods are computationally expensive. Furthermore, evaluation of the retrieval model's performance based on query-document relevance labels shows a small correlation with the RAG system's downstream performance. We propose a novel evaluatio… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  13. arXiv:2404.05970  [pdf, other

    cs.CL cs.IR

    Optimization Methods for Personalizing Large Language Models through Retrieval Augmentation

    Authors: Alireza Salemi, Surya Kallumadi, Hamed Zamani

    Abstract: This paper studies retrieval-augmented approaches for personalizing large language models (LLMs), which potentially have a substantial impact on various applications and domains. We propose the first attempt to optimize the retrieval models that deliver a limited number of personal documents to large language models for the purpose of personalized generation. We develop two optimization algorithms… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  14. arXiv:2403.09180  [pdf

    cs.IR

    Online and Offline Evaluation in Search Clarification

    Authors: Leila Tavakoli, Johanne R. Trippas, Hamed Zamani, Falk Scholer, Mark Sanderson

    Abstract: The effectiveness of clarification question models in engaging users within search systems is currently constrained, casting doubt on their overall usefulness. To improve the performance of these models, it is crucial to employ assessment approaches that encompass both real-time feedback from users (online evaluation) and the characteristics of clarification questions evaluated through human asses… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: 27 pages

  15. arXiv:2311.09649  [pdf, other

    cs.LG cs.CL

    ICXML: An In-Context Learning Framework for Zero-Shot Extreme Multi-Label Classification

    Authors: Yaxin Zhu, Hamed Zamani

    Abstract: This paper focuses on the task of Extreme Multi-Label Classification (XMC) whose goal is to predict multiple labels for each instance from an extremely large label space. While existing research has primarily focused on fully supervised XMC, real-world scenarios often lack supervision signals, highlighting the importance of zero-shot settings. Given the large label space, utilizing in-context lear… ▽ More

    Submitted 15 April, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

  16. arXiv:2311.09134  [pdf, other

    cs.IR

    Scalable and Effective Generative Information Retrieval

    Authors: Hansi Zeng, Chen Luo, Bowen Jin, Sheikh Muhammad Sarwar, Tianxin Wei, Hamed Zamani

    Abstract: Recent research has shown that transformer networks can be used as differentiable search indexes by representing each document as a sequences of document ID tokens. These generative retrieval models cast the retrieval problem to a document ID generation problem for each given query. Despite their elegant design, existing generative retrieval models only perform well on artificially-constructed and… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

  17. arXiv:2306.16478  [pdf, other

    cs.IR cs.CL cs.CV

    Pre-Training Multi-Modal Dense Retrievers for Outside-Knowledge Visual Question Answering

    Authors: Alireza Salemi, Mahta Rafiee, Hamed Zamani

    Abstract: This paper studies a category of visual question answering tasks, in which accessing external knowledge is necessary for answering the questions. This category is called outside-knowledge visual question answering (OK-VQA). A major step in developing OK-VQA systems is to retrieve relevant documents for the given multi-modal query. Current state-of-the-art asymmetric dense retrieval model for this… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

  18. arXiv:2306.02250  [pdf, other

    cs.IR cs.CL

    Large Language Model Augmented Narrative Driven Recommendations

    Authors: Sheshera Mysore, Andrew McCallum, Hamed Zamani

    Abstract: Narrative-driven recommendation (NDR) presents an information access problem where users solicit recommendations with verbose descriptions of their preferences and context, for example, travelers soliciting recommendations for points of interest while describing their likes/dislikes and travel circumstances. These requests are increasingly important with the rise of natural language-based conversa… ▽ More

    Submitted 21 July, 2023; v1 submitted 3 June, 2023; originally announced June 2023.

    Comments: RecSys 2023 Camera-ready

  19. Soft Prompt Decoding for Multilingual Dense Retrieval

    Authors: Zhiqi Huang, Hansi Zeng, Hamed Zamani, James Allan

    Abstract: In this work, we explore a Multilingual Information Retrieval (MLIR) task, where the collection includes documents in multiple languages. We demonstrate that applying state-of-the-art approaches developed for cross-lingual information retrieval to MLIR tasks leads to sub-optimal performance. This is due to the heterogeneous and imbalanced nature of multilingual collections -- some languages are be… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

  20. arXiv:2304.14522  [pdf, other

    cs.IR cs.CL cs.LG

    Multivariate Representation Learning for Information Retrieval

    Authors: Hamed Zamani, Michael Bendersky

    Abstract: Dense retrieval models use bi-encoder network architectures for learning query and document representations. These representations are often in the form of a vector representation and their similarities are often computed using the dot product function. In this paper, we propose a new representation learning framework for dense retrieval. Instead of learning a vector for each query and document, o… ▽ More

    Submitted 27 April, 2023; originally announced April 2023.

    Comments: Accepted for publication at SIGIR 2023

  21. arXiv:2304.13654  [pdf, other

    cs.IR

    A Personalized Dense Retrieval Framework for Unified Information Access

    Authors: Hansi Zeng, Surya Kallumadi, Zaid Alibadi, Rodrigo Nogueira, Hamed Zamani

    Abstract: Developing a universal model that can efficiently and effectively respond to a wide range of information access requests -- from retrieval to recommendation to question answering -- has been a long-lasting goal in the information retrieval community. This paper argues that the flexibility, efficiency, and effectiveness brought by the recent development in dense retrieval and approximate nearest ne… ▽ More

    Submitted 26 April, 2023; originally announced April 2023.

    Comments: Accepted to SIGIR 2023

  22. arXiv:2304.13649  [pdf, other

    cs.CV cs.CL cs.IR

    A Symmetric Dual Encoding Dense Retrieval Framework for Knowledge-Intensive Visual Question Answering

    Authors: Alireza Salemi, Juan Altmayer Pizzorno, Hamed Zamani

    Abstract: Knowledge-Intensive Visual Question Answering (KI-VQA) refers to answering a question about an image whose answer does not lie in the image. This paper presents a new pipeline for KI-VQA tasks, consisting of a retriever and a reader. First, we introduce DEDR, a symmetric dual encoding dense retrieval framework in which documents and queries are encoded into a shared embedding space using uni-modal… ▽ More

    Submitted 26 April, 2023; originally announced April 2023.

  23. arXiv:2304.11406  [pdf, other

    cs.CL

    LaMP: When Large Language Models Meet Personalization

    Authors: Alireza Salemi, Sheshera Mysore, Michael Bendersky, Hamed Zamani

    Abstract: This paper highlights the importance of personalization in large language models and introduces the LaMP benchmark -- a novel benchmark for training and evaluating language models for producing personalized outputs. LaMP offers a comprehensive evaluation framework with diverse language tasks and multiple entries for each user profile. It consists of seven personalized tasks, spanning three text cl… ▽ More

    Submitted 4 June, 2024; v1 submitted 22 April, 2023; originally announced April 2023.

  24. arXiv:2304.08912  [pdf, other

    cs.IR

    Generalized Weak Supervision for Neural Information Retrieval

    Authors: Yen-Chieh Lien, Hamed Zamani, W. Bruce Croft

    Abstract: Neural ranking models (NRMs) have demonstrated effective performance in several information retrieval (IR) tasks. However, training NRMs often requires large-scale training data, which is difficult and expensive to obtain. To address this issue, one can train NRMs via weak supervision, where a large dataset is automatically generated using an existing ranking model (called the weak labeler) for tr… ▽ More

    Submitted 18 April, 2023; originally announced April 2023.

  25. arXiv:2304.04250  [pdf, other

    cs.IR cs.CL cs.HC cs.LG

    Editable User Profiles for Controllable Text Recommendation

    Authors: Sheshera Mysore, Mahmood Jasim, Andrew McCallum, Hamed Zamani

    Abstract: Methods for making high-quality recommendations often rely on learning latent representations from interaction data. These methods, while performant, do not provide ready mechanisms for users to control the recommendation they receive. Our work tackles this problem by proposing LACE, a novel concept value bottleneck model for controllable text recommendations. LACE represents each user with a succ… ▽ More

    Submitted 16 October, 2023; v1 submitted 9 April, 2023; originally announced April 2023.

    Comments: SIGIR-2023 paper with extended results

  26. arXiv:2212.10764  [pdf, other

    cs.IR cs.AI cs.CL cs.LG

    Learning List-Level Domain-Invariant Representations for Ranking

    Authors: Ruicheng Xian, Honglei Zhuang, Zhen Qin, Hamed Zamani, Jing Lu, Ji Ma, Kai Hui, Han Zhao, Xuanhui Wang, Michael Bendersky

    Abstract: Domain adaptation aims to transfer the knowledge learned on (data-rich) source domains to (low-resource) target domains, and a popular method is invariant representation learning, which matches and aligns the data distributions on the feature space. Although this method is studied extensively and applied on classification and regression problems, its adoption on ranking problems is sporadic, and t… ▽ More

    Submitted 31 October, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: NeurIPS 2023. Comparison to v1: revised presentation and proof of Corollary 4.9

  27. arXiv:2210.15859  [pdf, other

    cs.CL cs.LG

    You can't pick your neighbors, or can you? When and how to rely on retrieval in the $k$NN-LM

    Authors: Andrew Drozdov, Shufan Wang, Razieh Rahimi, Andrew McCallum, Hamed Zamani, Mohit Iyyer

    Abstract: Retrieval-enhanced language models (LMs), which condition their predictions on text retrieved from large external datastores, have recently shown significant perplexity improvements compared to standard LMs. One such approach, the $k$NN-LM, interpolates any existing LM's predictions with the output of a $k$-nearest neighbors model and requires no additional training. In this paper, we explore the… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

  28. arXiv:2209.14290  [pdf, other

    cs.CL cs.IR

    FiD-Light: Efficient and Effective Retrieval-Augmented Text Generation

    Authors: Sebastian Hofstätter, Jiecao Chen, Karthik Raman, Hamed Zamani

    Abstract: Retrieval-augmented generation models offer many benefits over standalone language models: besides a textual answer to a given query they provide provenance items retrieved from an updateable knowledge base. However, they are also more complex systems and need to handle long inputs. In this work, we introduce FiD-Light to strongly increase the efficiency of the state-of-the-art retrieval-augmented… ▽ More

    Submitted 28 September, 2022; originally announced September 2022.

  29. arXiv:2207.03030  [pdf, other

    cs.CL cs.IR

    Multi-Task Retrieval-Augmented Text Generation with Relevance Sampling

    Authors: Sebastian Hofstätter, Jiecao Chen, Karthik Raman, Hamed Zamani

    Abstract: This paper studies multi-task training of retrieval-augmented generation models for knowledge-intensive tasks. We propose to clean the training set by utilizing a distinct property of knowledge-intensive generation: The connection of query-answer pairs to items in the knowledge base. We filter training examples via a threshold of confidence on the relevance labels, whether a pair is answerable by… ▽ More

    Submitted 6 July, 2022; originally announced July 2022.

    Comments: Accepted at the ICML 2022 Workshop on Knowledge Retrieval and Language Models (KRLM)

  30. arXiv:2206.12993  [pdf, other

    cs.IR cs.CL

    Are We There Yet? A Decision Framework for Replacing Term Based Retrieval with Dense Retrieval Systems

    Authors: Sebastian Hofstätter, Nick Craswell, Bhaskar Mitra, Hamed Zamani, Allan Hanbury

    Abstract: Recently, several dense retrieval (DR) models have demonstrated competitive performance to term-based retrieval that are ubiquitous in search systems. In contrast to term-based matching, DR projects queries and documents into a dense vector space and retrieves results via (approximate) nearest neighbor search. Deploying a new system, such as DR, inevitably involves tradeoffs in aspects of its perf… ▽ More

    Submitted 26 June, 2022; originally announced June 2022.

  31. MIMICS-Duo: Offline & Online Evaluation of Search Clarification

    Authors: Leila Tavakoli, Johanne R. Trippas, Hamed Zamani, Falk Scholer, Mark Sanderson

    Abstract: Asking clarification questions is an active area of research; however, resources for training and evaluating search clarification methods are not sufficient. To address this issue, we describe MIMICS-Duo, a new freely available dataset of 306 search queries with multiple clarifications (a total of 1,034 query-clarification pairs). MIMICS-Duo contains fine-grained annotations on clarification quest… ▽ More

    Submitted 9 June, 2022; originally announced June 2022.

    Comments: 11 pages

    MSC Class: 68-06

  32. arXiv:2205.01230  [pdf, other

    cs.LG cs.CL cs.IR

    Retrieval-Enhanced Machine Learning

    Authors: Hamed Zamani, Fernando Diaz, Mostafa Dehghani, Donald Metzler, Michael Bendersky

    Abstract: Although information access systems have long supported people in accomplishing a wide range of tasks, we propose broadening the scope of users of information access systems to include task-driven machines, such as machine learning models. In this way, the core principles of indexing, representation, retrieval, and ranking can be applied and extended to substantially improve model generalization,… ▽ More

    Submitted 2 May, 2022; originally announced May 2022.

    Comments: To appear in proceedings of ACM SIGIR 2022

  33. arXiv:2204.13679  [pdf, other

    cs.IR cs.LG

    Curriculum Learning for Dense Retrieval Distillation

    Authors: Hansi Zeng, Hamed Zamani, Vishwa Vinay

    Abstract: Recent work has shown that more effective dense retrieval models can be obtained by distilling ranking knowledge from an existing base re-ranking model. In this paper, we propose a generic curriculum learning based optimization framework called CL-DRD that controls the difficulty level of training data produced by the re-ranking (teacher) model. CL-DRD iteratively optimizes the dense retrieval (st… ▽ More

    Submitted 28 April, 2022; originally announced April 2022.

    Comments: Accepted to SIGIR 2022

  34. arXiv:2201.08808  [pdf, other

    cs.IR cs.CL cs.HC

    Conversational Information Seeking

    Authors: Hamed Zamani, Johanne R. Trippas, Jeff Dalton, Filip Radlinski

    Abstract: Conversational information seeking (CIS) is concerned with a sequence of interactions between one or more users and an information system. Interactions in CIS are primarily based on natural language dialogue, while they may include other types of interactions, such as click, touch, and body gestures. This monograph provides a thorough overview of CIS definitions, applications, interactions, interf… ▽ More

    Submitted 25 January, 2023; v1 submitted 21 January, 2022; originally announced January 2022.

    Comments: Draft Version 1.2

  35. arXiv:2111.01314  [pdf, other

    cs.IR

    Explaining Documents' Relevance to Search Queries

    Authors: Razieh Rahimi, Youngwoo Kim, Hamed Zamani, James Allan

    Abstract: We present GenEx, a generative model to explain search results to users beyond just showing matches between query and document words. Adding GenEx explanations to search results greatly impacts user satisfaction and search performance. Search engines mostly provide document titles, URLs, and snippets for each result. Existing model-agnostic explanation methods similarly focus on word matching or c… ▽ More

    Submitted 1 November, 2021; originally announced November 2021.

  36. DISAPERE: A Dataset for Discourse Structure in Peer Review Discussions

    Authors: Neha Kennard, Tim O'Gorman, Rajarshi Das, Akshay Sharma, Chhandak Bagchi, Matthew Clinton, Pranay Kumar Yelugam, Hamed Zamani, Andrew McCallum

    Abstract: At the foundation of scientific evaluation is the labor-intensive process of peer review. This critical task requires participants to consume vast amounts of highly technical text. Prior work has annotated different aspects of review argumentation, but discourse relations between reviews and rebuttals have yet to be examined. We present DISAPERE, a labeled dataset of 20k sentences contained in 506… ▽ More

    Submitted 6 November, 2022; v1 submitted 16 October, 2021; originally announced October 2021.

  37. arXiv:2109.05955  [pdf, other

    cs.IR cs.CL cs.HC

    Analysing Mixed Initiatives and Search Strategies during Conversational Search

    Authors: Mohammad Aliannejadi, Leif Azzopardi, Hamed Zamani, Evangelos Kanoulas, Paul Thomas, Nick Craswel

    Abstract: Information seeking conversations between users and Conversational Search Agents (CSAs) consist of multiple turns of interaction. While users initiate a search session, ideally a CSA should sometimes take the lead in the conversation by obtaining feedback from the user by offering query suggestions or asking for query clarifications i.e. mixed initiative. This creates the potential for more engagi… ▽ More

    Submitted 13 September, 2021; originally announced September 2021.

    Comments: Accepted in CIKM 2021

  38. arXiv:2106.09227  [pdf, other

    cs.IR

    Current Challenges and Future Directions in Podcast Information Access

    Authors: Rosie Jones, Hamed Zamani, Markus Schedl, Ching-Wei Chen, Sravana Reddy, Ann Clifton, Jussi Karlgren, Helia Hashemi, Aasish Pappu, Zahra Nazari, Longqi Yang, Oguz Semerci, Hugues Bouchard, Ben Carterette

    Abstract: Podcasts are spoken documents across a wide-range of genres and styles, with growing listenership across the world, and a rapidly lowering barrier to entry for both listeners and creators. The great strides in search and recommendation in research and industry have yet to see impact in the podcast space, where recommendations are still largely driven by word of mouth. In this perspective paper, we… ▽ More

    Submitted 16 June, 2021; originally announced June 2021.

    Comments: SIGIR 2021

  39. arXiv:2105.09816  [pdf, other

    cs.IR cs.CL

    Intra-Document Cascading: Learning to Select Passages for Neural Document Ranking

    Authors: Sebastian Hofstätter, Bhaskar Mitra, Hamed Zamani, Nick Craswell, Allan Hanbury

    Abstract: An emerging recipe for achieving state-of-the-art effectiveness in neural document re-ranking involves utilizing large pre-trained language models - e.g., BERT - to evaluate all individual passages in the document and then aggregating the outputs by pooling or additional Transformer layers. A major drawback of this approach is high query latency due to the cost of evaluating every passage in the d… ▽ More

    Submitted 20 May, 2021; originally announced May 2021.

    Comments: Accepted at SIGIR 2021 (Full Paper Track)

  40. Passage Retrieval for Outside-Knowledge Visual Question Answering

    Authors: Chen Qu, Hamed Zamani, Liu Yang, W. Bruce Croft, Erik Learned-Miller

    Abstract: In this work, we address multi-modal information needs that contain text questions and images by focusing on passage retrieval for outside-knowledge visual question answering. This task requires access to outside knowledge, which in our case we define to be a large unstructured passage collection. We first conduct sparse retrieval with BM25 and study expanding the question with object names and im… ▽ More

    Submitted 9 May, 2021; originally announced May 2021.

    Comments: Accepted to SIGIR'21 as a short paper

  41. arXiv:2104.09393  [pdf, other

    cs.IR cs.AI cs.LG

    Improving Transformer-Kernel Ranking Model Using Conformer and Query Term Independence

    Authors: Bhaskar Mitra, Sebastian Hofstatter, Hamed Zamani, Nick Craswell

    Abstract: The Transformer-Kernel (TK) model has demonstrated strong reranking performance on the TREC Deep Learning benchmark -- and can be considered to be an efficient (but slightly less effective) alternative to other Transformer-based architectures that employ (i) large-scale pretraining (high training cost), (ii) joint encoding of query and document (high inference cost), and (iii) larger number of Tra… ▽ More

    Submitted 19 April, 2021; originally announced April 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2007.10434

  42. arXiv:2103.12906  [pdf, other

    cs.IR cs.CL

    CSFCube -- A Test Collection of Computer Science Research Articles for Faceted Query by Example

    Authors: Sheshera Mysore, Tim O'Gorman, Andrew McCallum, Hamed Zamani

    Abstract: Query by Example is a well-known information retrieval task in which a document is chosen by the user as the search query and the goal is to retrieve relevant documents from a large collection. However, a document often covers multiple aspects of a topic. To address this scenario we introduce the task of faceted Query by Example in which users can also specify a finer grained aspect in addition to… ▽ More

    Submitted 7 November, 2021; v1 submitted 23 March, 2021; originally announced March 2021.

    Comments: Accepted to the NeurIPS 2021 Track on Datasets and Benchmarks

  43. arXiv:2101.07124  [pdf, ps, other

    cs.IR cs.HC

    Tip of the Tongue Known-Item Retrieval: A Case Study in Movie Identification

    Authors: Jaime Arguello, Adam Ferguson, Emery Fine, Bhaskar Mitra, Hamed Zamani, Fernando Diaz

    Abstract: While current information retrieval systems are effective for known-item retrieval where the searcher provides a precise name or identifier for the item being sought, systems tend to be much less effective for cases where the searcher is unable to express a precise name or identifier. We refer to this as tip of the tongue (TOT) known-item retrieval, named after the cognitive state of not being abl… ▽ More

    Submitted 18 January, 2021; originally announced January 2021.

  44. arXiv:2101.03394  [pdf, other

    cs.IR cs.AI cs.HC

    Context-Aware Target Apps Selection and Recommendation for Enhancing Personal Mobile Assistants

    Authors: Mohammad Aliannejadi, Hamed Zamani, Fabio Crestani, W. Bruce Croft

    Abstract: Users install many apps on their smartphones, raising issues related to information overload for users and resource management for devices. Moreover, the recent increase in the use of personal assistants has made mobile devices even more pervasive in users' lives. This paper addresses two research problems that are vital for developing effective personal mobile assistants: target apps selection an… ▽ More

    Submitted 9 January, 2021; originally announced January 2021.

    Comments: Accepted to ACM TOIS, 30 pages

  45. arXiv:2011.07368  [pdf, other

    cs.IR cs.AI cs.LG

    Conformer-Kernel with Query Term Independence at TREC 2020 Deep Learning Track

    Authors: Bhaskar Mitra, Sebastian Hofstatter, Hamed Zamani, Nick Craswell

    Abstract: We benchmark Conformer-Kernel models under the strict blind evaluation setting of the TREC 2020 Deep Learning track. In particular, we study the impact of incorporating: (i) Explicit term matching to complement matching based on learned representations (i.e., the "Duet principle"), (ii) query term independence (i.e., the "QTI assumption") to scale the model to the full retrieval setting, and (iii)… ▽ More

    Submitted 11 February, 2021; v1 submitted 14 November, 2020; originally announced November 2020.

  46. arXiv:2007.10434  [pdf, other

    cs.IR cs.CL cs.LG

    Conformer-Kernel with Query Term Independence for Document Retrieval

    Authors: Bhaskar Mitra, Sebastian Hofstatter, Hamed Zamani, Nick Craswell

    Abstract: The Transformer-Kernel (TK) model has demonstrated strong reranking performance on the TREC Deep Learning benchmark---and can be considered to be an efficient (but slightly less effective) alternative to BERT-based ranking models. In this work, we extend the TK architecture to the full retrieval setting by incorporating the query term independence assumption. Furthermore, to reduce the memory comp… ▽ More

    Submitted 20 July, 2020; originally announced July 2020.

  47. arXiv:2006.10174  [pdf, other

    cs.IR cs.CL cs.LG

    MIMICS: A Large-Scale Data Collection for Search Clarification

    Authors: Hamed Zamani, Gord Lueck, Everest Chen, Rodolfo Quispe, Flint Luu, Nick Craswell

    Abstract: Search clarification has recently attracted much attention due to its applications in search engines. It has also been recognized as a major component in conversational information seeking systems. Despite its importance, the research community still feels the lack of a large-scale data for studying different aspects of search clarification. In this paper, we introduce MIMICS, a collection of sear… ▽ More

    Submitted 17 June, 2020; originally announced June 2020.

  48. arXiv:2006.07548  [pdf, other

    cs.IR cs.CL cs.LG

    Guided Transformer: Leveraging Multiple External Sources for Representation Learning in Conversational Search

    Authors: Helia Hashemi, Hamed Zamani, W. Bruce Croft

    Abstract: Asking clarifying questions in response to ambiguous or faceted queries has been recognized as a useful technique for various information retrieval systems, especially conversational search systems with limited bandwidth interfaces. Analyzing and generating clarifying questions have been studied recently but the accurate utilization of user responses to clarifying questions has been relatively les… ▽ More

    Submitted 12 June, 2020; originally announced June 2020.

    Comments: To appear in the Proceedings of ACM SIGIR 2020. 10 pages

  49. arXiv:2006.00166  [pdf, other

    cs.IR

    Analyzing and Learning from User Interactions for Search Clarification

    Authors: Hamed Zamani, Bhaskar Mitra, Everest Chen, Gord Lueck, Fernando Diaz, Paul N. Bennett, Nick Craswell, Susan T. Dumais

    Abstract: Asking clarifying questions in response to search queries has been recognized as a useful technique for revealing the underlying intent of the query. Clarification has applications in retrieval systems with different interfaces, from the traditional web search interfaces to the limited bandwidth interfaces as in speech-only and small screen devices. Generation and evaluation of clarifying question… ▽ More

    Submitted 29 May, 2020; originally announced June 2020.

    Comments: To appear in the Proceedings of SIGIR 2020

  50. arXiv:2005.04908  [pdf, other

    cs.IR

    Local Self-Attention over Long Text for Efficient Document Retrieval

    Authors: Sebastian Hofstätter, Hamed Zamani, Bhaskar Mitra, Nick Craswell, Allan Hanbury

    Abstract: Neural networks, particularly Transformer-based architectures, have achieved significant performance improvements on several retrieval benchmarks. When the items being retrieved are documents, the time and memory cost of employing Transformers over a full sequence of document terms can be prohibitive. A popular strategy involves considering only the first n terms of the document. This can, however… ▽ More

    Submitted 11 May, 2020; originally announced May 2020.

    Comments: Accepted at SIGIR 2020 (short paper)