Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2623330.2623739acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Unifying learning to rank and domain adaptation: enabling cross-task document scoring

Published: 24 August 2014 Publication History

Abstract

For document scoring, although learning to rank and domain adaptation are treated as two different problems in previous works, we discover that they actually share the same challenge of adapting keyword contribution across different queries or domains. In this paper, we propose to study the cross-task document scoring problem, where a task refers to a query to rank or a domain to adapt to, as the first attempt to unify these two problems. Existing solutions for learning to rank and domain adaptation either leave the heavy burden of adapting keyword contribution to feature designers, or are difficult to be generalized. To resolve such limitations, we abstract the keyword scoring principle, pointing out that the contribution of a keyword essentially depends on, first, its importance to a task and, second, its importance to the document. For determining these two aspects of keyword importance, we further propose the concept of feature decoupling, suggesting using two types of easy-to-design features: meta-features and intra-features. Towards learning a scorer based on the decoupled features, we require that our framework fulfill inferred sparsity to eliminate the interference of noisy keywords, and employ distant supervision to tackle the lack of keyword labels. We propose the Tree-structured Boltzmann Machine (T-RBM), a novel two-stage Markov Network, as our solution. Experiments on three different applications confirm the effectiveness of T-RBM, which achieves significant improvement compared with four state-of-the-art baseline methods.

Supplementary Material

MP4 File (p781-sidebyside.mp4)

References

[1]
M. Bendersky and W. B. Croft. Discovering key concepts in verbose queries. In Proc. of SIGIR '11, page 491. ACM, 2008.
[2]
M. Bendersky, D. Metzler, and W. B. Croft. Learning concept importance using a weighted dependence model. In Proc. of WSDM '10, pages 31--40. ACM, 2010.
[3]
M. Bendersky, D. Metzler, and W. B. Croft. Parameterized concept weighting in verbose queries. In Proc. SIGIR '11, pages 605--614. ACM, 2011.
[4]
J. Blitzer, M. Dredze, and F. Pereira. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Proc. of ACL '07, number 1, page 440, 2007.
[5]
Y. Cao, J. Xu, T.-Y. Liu, H. Li, Y. Huang, and H.-W. Hon. Adapting ranking svm to document retrieval. In Proc. SIGIR '06, pages 186--193. ACM, 2006.
[6]
G. E. Hinton, S. Osindero, and Y.-W. Teh. A fast learning algorithm for deep belief nets. Neural computation, 18(7):1527--1554, 2006.
[7]
A. Huang. Similarity measures for text document clustering. In Proc. of NZCSRSC '08, pages 49--56, 2008.
[8]
T. Joachims. Text categorization with support vector machines: Learning with many relevant features. In LNCS, volume 1398, pages 137--142, 1998.
[9]
T. Joachims. Making large scale svm learning practical. 1999.
[10]
T. Joachims. Optimizing search engines using clickthrough data. In Proc. SIGKDD '02, pages 133--142. ACM, 2002.
[11]
M. Lease. An improved markov random field model for supporting verbose queries. In Proc. of SIGIR '09, pages 476--483. ACM, 2009.
[12]
P. Li, Q. Wu, and C. J. Burges. Mcrank: Learning to rank using multiple classification and gradient boosting. In Proc. of NIPS '07, pages 897--904, 2007.
[13]
T. Li, V. Sindhwani, C. Ding, and Y. Zhang. Knowledge transformation for cross-domain sentiment classification. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 716--717. ACM, 2009.
[14]
T.-Y. Liu, J. Xu, T. Qin, W. Xiong, and H. Li. Letor: Benchmark dataset for research on learning to rank for information retrieval. In Proc. of SIGIR '07 workshop, pages 3--10, 2007.
[15]
N. Okazaki. Liblbfgs: a library of limited-memory broyden-fletcher-goldfarb-shanno (l-bfgs). URL http://www.chokkan.org/software/liblbfgs/index. html, 2011.
[16]
S. J. Pan, X. Ni, J.-T. Sun, Q. Yang, and Z. Chen. Cross-domain sentiment classification via spectral feature alignment. In Proceedings of the 19th international conference on World wide web, pages 751--760. ACM, 2010.
[17]
J. Pearl. Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, 1988.
[18]
R. Salakhutdinov and G. E. Hinton. Replicated softmax: an undirected topic model. In NIPS '09, pages 1607--1614, 2009.
[19]
P. Smolensky. Information processing in dynamical systems: Foundations of harmony theory. 1986.
[20]
R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267--288, 1996.
[21]
V. Vapnik. Statistical learning theory, 1998.
[22]
M. Zhou and K. C.-C. Chang. Entity-centric document filtering: boosting feature mapping through meta-features. In Proc. of CIKM '13, pages 119--128. ACM, 2013.

Cited By

View all
  • (2022)Light Transport Induced Domain Adaptation for Semantic Segmentation in Thermal Infrared Urban ScenesIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2022.319493123:12(23194-23211)Online publication date: Dec-2022
  • (2018)Learning Distribution-Matched Landmarks for Unsupervised Domain AdaptationDatabase Systems for Advanced Applications10.1007/978-3-319-91458-9_30(491-508)Online publication date: 12-May-2018
  • (2016)Which Doctor to Trust: A Recommender System for Identifying the Right DoctorsJournal of Medical Internet Research10.2196/jmir.601518:7(e186)Online publication date: 7-Jul-2016
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2014
2028 pages
ISBN:9781450329569
DOI:10.1145/2623330
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 August 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. domain adaptation
  2. feature decoupling
  3. learning to rank
  4. tree-structured restricted boltzmann machine

Qualifiers

  • Research-article

Funding Sources

Conference

KDD '14
Sponsor:

Acceptance Rates

KDD '14 Paper Acceptance Rate 151 of 1,036 submissions, 15%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)1
Reflects downloads up to 12 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Light Transport Induced Domain Adaptation for Semantic Segmentation in Thermal Infrared Urban ScenesIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2022.319493123:12(23194-23211)Online publication date: Dec-2022
  • (2018)Learning Distribution-Matched Landmarks for Unsupervised Domain AdaptationDatabase Systems for Advanced Applications10.1007/978-3-319-91458-9_30(491-508)Online publication date: 12-May-2018
  • (2016)Which Doctor to Trust: A Recommender System for Identifying the Right DoctorsJournal of Medical Internet Research10.2196/jmir.601518:7(e186)Online publication date: 7-Jul-2016
  • (2016)Domain Adaptation in the Absence of Source Domain DataProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining10.1145/2939672.2939716(451-460)Online publication date: 13-Aug-2016
  • (2016)Unsupervised Domain Adaptation with Regularized Domain Instance DenoisingComputer Vision – ECCV 2016 Workshops10.1007/978-3-319-49409-8_37(458-466)Online publication date: 24-Nov-2016

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media