research-article

An Attention-based Deep Relevance Model for Few-shot Document Filtering

Authors:

Bulou Liu,

Chenliang Li,

Wei Zhou,

Feng Ji,

Yu Duan,

Haiqing ChenAuthors Info & Claims

ACM Transactions on Information Systems (TOIS), Volume 39, Issue 1

Article No.: 6, Pages 1 - 35

https://doi.org/10.1145/3419972

Published: 06 October 2020 Publication History

Get Access

Abstract

With the large quantity of textual information produced on the Internet, a critical necessity is to filter out the irrelevant information and organize the rest into categories of interest (e.g., an emerging event). However, supervised-learning document filtering methods heavily rely on a large number of labeled documents for model training. Manually identifying plenty of positive examples for each category is expensive and time-consuming. Also, it is unrealistic to cover all the categories from an evolving text source that covers diverse kinds of events, user opinions, and daily life activities. In this article, we propose a novel attention-based deep relevance model for few-shot document filtering (named ADRM), inspired by the relevance feedback methodology proposed for ad hoc retrieval. ADRM calculates the relevance score between a document and a category by taking a set of seed words and a few seed documents relevant to the category. It constructs the category-specific conceptual representation of the document based on the corresponding seed words and seed documents. Specifically, to filter irrelevant yet noisy information in the seed documents, ADRM employs two types of attention mechanisms (namely whole-match attention and max-match attention) and generates category-specific representations for them. Then ADRM is devised to extract the relevance signals by modeling the hidden feature interactions in the word embedding space. The relevance signals are extracted through a gated convolutional process, a self-attention layer, and a relevance aggregation layer. Extensive experiments on three real-world datasets show that ADRM consistently outperforms the existing technical alternatives, including the conventional classification and retrieval baselines, and the state-of-the-art deep relevance ranking models for few-shot document filtering. We also perform an ablation study to demonstrate that each component in ADRM is effective for enhancing filtering performance. Further analysis shows that ADRM is robust under varying parameter settings.

References

[1]

Zeynep Akata, Florent Perronnin, Zaïd Harchaoui, and Cordelia Schmid. 2013. Label-embedding for attribute-based classification. In CVPR. 819--826.

Abstract

References

Cited By

Index Terms

Recommendations

Seed-Guided Topic Model for Document Filtering and Classification

Information retrieval with geographical references. Relevant documents filtering vs. query expansion

Content-Based Video Relevance Prediction with Second-Order Relevance and Attention Modeling

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

HTML Format

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations