Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Unsupervised Head--Modifier Detection in Search Queries

Published: 26 December 2016 Publication History

Abstract

Interpreting the user intent in search queries is a key task in query understanding. Query intent classification has been widely studied. In this article, we go one step further to understand the query from the view of head--modifier analysis. For example, given the query “popular iphone 5 smart cover,” instead of using coarse-grained semantic classes (e.g., find electronic product), we interpret that “smart cover” is the head or the intent of the query and “iphone 5” is its modifier. Query head--modifier detection can help search engines to obtain particularly relevant content, which is also important for applications such as ads matching and query recommendation. We introduce an unsupervised semantic approach for query head--modifier detection. First, we mine a large number of instance level head--modifier pairs from search log. Then, we develop a conceptualization mechanism to generalize the instance level pairs to concept level. Finally, we derive weighted concept patterns that are concise, accurate, and have strong generalization power in head--modifier detection. The developed mechanism has been used in production for search relevance and ads matching. We use extensive experiment results to demonstrate the effectiveness of our approach.

References

[1]
Ganesh Agarwal, Govind Kabra, and Kevin Chen-Chuan Chang. 2010. Towards rich query interpretation: Walking back and forth for mining query templates. In WWW. ACM, 1--10.
[2]
Eugene Agichtein and Luis Gravano. 2000. Snowball: Extracting relations from large plain-text collections. In DL. ACM, 85--94.
[3]
Michael Bendersky, Donald Metzler, and W. Bruce Croft. 2010. Learning concept importance using a weighted dependence model. In WSDM. ACM, 31--40.
[4]
Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: A collaboratively created graph database for structuring human knowledge. In SIGMOD. ACM, 1247--1250.
[5]
Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 3 (2011), 27:1--27:27.
[6]
Jackie Chi Kit Cheung and Xiao Li. 2012a. Sequence clustering and labeling for unsupervised query intent discovery. In WSDM. ACM, 383--392.
[7]
Jackie Chi Kit Cheung and Xiao Li. 2012b. Sequence clustering and labeling for unsupervised query intent discovery. In WSDM. ACM, 383--392.
[8]
David A. Evans and Chengxiang Zhai. 1996. Noun-phrase analysis in unrestricted text for information retrieval. In ACL. ACL, 17--24.
[9]
Andrew Hippisley, David Cheng, and Khurshid Ahmad. 2005. The head-modifier principle and multilingual term extraction. Natural Language Engineering 11, 2 (2005), 129--157.
[10]
Jian Hu, Gang Wang, Fred Lochovsky, Jian-tao Sun, and Zheng Chen. 2009. Understanding user’s query intent with Wikipedia. In WWW. ACM, 471--480.
[11]
Yunhua Hu, Yanan Qian, Hang Li, Daxin Jiang, Jian Pei, and Qinghua Zheng. 2012. Mining query subtopics from search log data. In SIGIR. ACM, 305--314.
[12]
Jinyoung Kim, Xiaobing Xue, and W. Bruce Croft. 2009. A probabilistic retrieval model for semistructured data. In ECIR. Springer, 228--239.
[13]
Giridhar Kumaran and Vitor R. Carvalho. 2009. Reducing long queries using query quality predictors. In SIGIR. ACM, 564--571.
[14]
Taesung Lee, Zhongyuan Wang, Haixun Wang, and Seung Won Hwang. 2011. Web scale taxonomy cleansing. Proceedings of the VLDB Endowment 4 (2011), 1295--1306.
[15]
Hang Li, Gu Xu, Bruce Croft, and Michael Bendersky. 2011. Query representation and understanding. In SIGIR Workshop on Query Representation and Understanding, Vol. 44. ACM, 48--53.
[16]
Peipei Li, Haixun Wang, Kenny Zhu, Zhongyuan Wang, and Xindong Wu. 2013. Computing term similarity by large probabilistic isa knowledge. In CIKM. ACM, 1401--1410.
[17]
Xiao Li. 2010. Understanding the semantic structure of noun phrase queries. In ACL. ACL, 1337--1345.
[18]
Xiao Li, Ye-Yi Wang, and Alex Acero. 2008. Learning query intent from regularized click graphs. In SIGIR. ACM, 339--346.
[19]
Yanen Li, Bo-June Paul Hsu, and ChengXiang Zhai. 2013. Unsupervised identification of synonymous query intent templates for attribute intents. In CIKM. ACM, 2029--2038.
[20]
Ying Li, Zijian Zheng, and Honghua (Kathy) Dai. 2005. KDD CUP-2005 report: Facing a great challenge. ACM SIGKDD Explorations Newsletter 7, 2 (2005), 91--99.
[21]
Marius Paşca and Benjamin Van Durme. 2008. Weakly-supervised acquisition of open-domain classes and class attributes from web documents and query logs. In ACL-08: HLT. ACL, 19--27.
[22]
Stelios Paparizos, Alexandros Ntoulas, John Shafer, and Rakesh Agrawal. 2009. Answering web queries using structured data sources. In SIGMOD. ACM, 1127--1130.
[23]
Marius Pasca and Benjamin Van Durme. 2007. What you seek is what you get: Extraction of class attributes from query logs. In IJCAI. Morgan Kaufmann, 2832--2837.
[24]
Xiang Ren, Yujing Wang, Xiao Yu, Jun Yan, Zheng Chen, and Jiawei Han. 2014. Heterogeneous graph-based intent learning with queries, web pages and Wikipedia concepts. In WSDM. ACM, New York, NY, 23--32.
[25]
Stephen Robertson, Hugo Zaragoza, and Michael Taylor. 2004. Simple BM25 extension to multiple weighted fields. In CIKM. ACM, 42--49.
[26]
Dou Shen, Jian-Tao Sun, Qiang Yang, and Zheng Chen. 2006. Building bridges for web query classification. In SIGIR. ACM, 131--138.
[27]
Stephen Soderland, David Fisher, Jonathan Aseltine, and Wendy Lehnert. 1995. CRYSTAL: Inducing a conceptual dictionary. In IJCAI, Vol. 2. AAAI, N12.
[28]
Yangqiu Song, Haixun Wang, Zhongyuan Wang, Hongsong Li, and Weizhu Chen. 2011. Short text conceptualization using a probabilistic knowledgebase. In IJCAI. AAAI Press, 2330--2336.
[29]
Wentao Wu, Hongsong Li, Haixun Wang, and Kenny Q. Zhu. 2012. Probase: A probabilistic taxonomy for text understanding. In SIGMOD. ACM, 481--492.

Cited By

View all
  • (2022)Experience: Analyzing Missing Web Page Visits and Unintentional Web Page Visits from the Client-side Web LogsJournal of Data and Information Quality10.1145/3490392Online publication date: 8-Feb-2022
  • (2018)Unsupervised meta-path selection for text similarity measure based on heterogeneous information networksData Mining and Knowledge Discovery10.1007/s10618-018-0581-y32:6(1735-1767)Online publication date: 1-Nov-2018
  • (2017)Parallelization of Massive Textstream Compression Based on Compressed SensingACM Transactions on Information Systems10.1145/308670236:2(1-18)Online publication date: 21-Aug-2017

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 11, Issue 2
May 2017
419 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3017677
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 December 2016
Accepted: 01 August 2016
Revised: 01 July 2016
Received: 01 May 2015
Published in TKDD Volume 11, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Query intent
  2. concept pattern
  3. head and modifier
  4. knowledge modeling

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • Beijing Advanced Innovation Center for Imaging Technology
  • National Natural Science Foundation of China
  • National High Technology Research and Development Program of China
  • State Key Laboratory of Software Development Environment
  • National Key Basic Research Program (973 Program) of China

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Experience: Analyzing Missing Web Page Visits and Unintentional Web Page Visits from the Client-side Web LogsJournal of Data and Information Quality10.1145/3490392Online publication date: 8-Feb-2022
  • (2018)Unsupervised meta-path selection for text similarity measure based on heterogeneous information networksData Mining and Knowledge Discovery10.1007/s10618-018-0581-y32:6(1735-1767)Online publication date: 1-Nov-2018
  • (2017)Parallelization of Massive Textstream Compression Based on Compressed SensingACM Transactions on Information Systems10.1145/308670236:2(1-18)Online publication date: 21-Aug-2017

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media