Automatic prediction of news intent for search queries: an exploration of contextual and temporal features

X Zhang, S Han, W Lu - The Electronic Library, 2018 - emerald.com
X Zhang, S Han, W Lu
The Electronic Library, 2018emerald.com
Purpose The purpose of this paper is to predict news intent by exploring contextual and
temporal features directly mined from a general search engine query log.
Design/methodology/approach First, a ground-truth data set with correctly marked news and
non-news queries was built. Second, a detailed analysis of the search goals and topics
distribution of news/non-news queries was conducted. Third, three news features, that is, the
relationship between entity and contextual words extended from query sessions, topical …
Purpose
The purpose of this paper is to predict news intent by exploring contextual and temporal features directly mined from a general search engine query log.
Design/methodology/approach
First, a ground-truth data set with correctly marked news and non-news queries was built. Second, a detailed analysis of the search goals and topics distribution of news/non-news queries was conducted. Third, three news features, that is, the relationship between entity and contextual words extended from query sessions, topical similarity among clicked results and temporal burst point were obtained. Finally, to understand the utilities of the new features and prior features, extensive prediction experiments on SogouQ (a Chinese search engine query log) were conducted.
Findings
News intent can be predicted with high accuracy by using the proposed contextual and temporal features, and the macro average F1 of classification is around 0.8677. Contextual features are more effective than temporal features. All the three new features are useful and significant in improving the accuracy of news intent prediction.
Originality/value
This paper provides a new and different perspective in recognizing queries with news intent without use of such large corpora as social media (e.g. Wikipedia, Twitter and blogs) and news data sets. The research will be helpful for general-purpose search engines to address search intents for news events. In addition, the authors believe that the approaches described here in this paper are general enough to apply to other verticals with dynamic content and interest, such as blog or financial data.
Emerald Insight