research-article

Unifying learning to rank and domain adaptation: enabling cross-task document scoring

Authors:

Kevin C. ChangAuthors Info & Claims

KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 781 - 790

https://doi.org/10.1145/2623330.2623739

Published: 24 August 2014 Publication History

Abstract

For document scoring, although learning to rank and domain adaptation are treated as two different problems in previous works, we discover that they actually share the same challenge of adapting keyword contribution across different queries or domains. In this paper, we propose to study the cross-task document scoring problem, where a task refers to a query to rank or a domain to adapt to, as the first attempt to unify these two problems. Existing solutions for learning to rank and domain adaptation either leave the heavy burden of adapting keyword contribution to feature designers, or are difficult to be generalized. To resolve such limitations, we abstract the keyword scoring principle, pointing out that the contribution of a keyword essentially depends on, first, its importance to a task and, second, its importance to the document. For determining these two aspects of keyword importance, we further propose the concept of feature decoupling, suggesting using two types of easy-to-design features: meta-features and intra-features. Towards learning a scorer based on the decoupled features, we require that our framework fulfill inferred sparsity to eliminate the interference of noisy keywords, and employ distant supervision to tackle the lack of keyword labels. We propose the Tree-structured Boltzmann Machine (T-RBM), a novel two-stage Markov Network, as our solution. Experiments on three different applications confirm the effectiveness of T-RBM, which achieves significant improvement compared with four state-of-the-art baseline methods.

Supplementary Material

MP4 File (p781-sidebyside.mp4)

Download
228.51 MB

References

[1]

M. Bendersky and W. B. Croft. Discovering key concepts in verbose queries. In Proc. of SIGIR '11, page 491. ACM, 2008.

Digital Library

[2]

M. Bendersky, D. Metzler, and W. B. Croft. Learning concept importance using a weighted dependence model. In Proc. of WSDM '10, pages 31--40. ACM, 2010.

Digital Library

[3]

M. Bendersky, D. Metzler, and W. B. Croft. Parameterized concept weighting in verbose queries. In Proc. SIGIR '11, pages 605--614. ACM, 2011.

Digital Library

[4]

J. Blitzer, M. Dredze, and F. Pereira. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Proc. of ACL '07, number 1, page 440, 2007.

[5]

Y. Cao, J. Xu, T.-Y. Liu, H. Li, Y. Huang, and H.-W. Hon. Adapting ranking svm to document retrieval. In Proc. SIGIR '06, pages 186--193. ACM, 2006.

Digital Library

[6]

G. E. Hinton, S. Osindero, and Y.-W. Teh. A fast learning algorithm for deep belief nets. Neural computation, 18(7):1527--1554, 2006.

Digital Library

[7]

A. Huang. Similarity measures for text document clustering. In Proc. of NZCSRSC '08, pages 49--56, 2008.

[8]

T. Joachims. Text categorization with support vector machines: Learning with many relevant features. In LNCS, volume 1398, pages 137--142, 1998.

Digital Library

[9]

T. Joachims. Making large scale svm learning practical. 1999.

[10]

T. Joachims. Optimizing search engines using clickthrough data. In Proc. SIGKDD '02, pages 133--142. ACM, 2002.

Digital Library

[11]

M. Lease. An improved markov random field model for supporting verbose queries. In Proc. of SIGIR '09, pages 476--483. ACM, 2009.

Digital Library

[12]

P. Li, Q. Wu, and C. J. Burges. Mcrank: Learning to rank using multiple classification and gradient boosting. In Proc. of NIPS '07, pages 897--904, 2007.

[13]

T. Li, V. Sindhwani, C. Ding, and Y. Zhang. Knowledge transformation for cross-domain sentiment classification. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 716--717. ACM, 2009.

Digital Library

[14]

T.-Y. Liu, J. Xu, T. Qin, W. Xiong, and H. Li. Letor: Benchmark dataset for research on learning to rank for information retrieval. In Proc. of SIGIR '07 workshop, pages 3--10, 2007.

[15]

N. Okazaki. Liblbfgs: a library of limited-memory broyden-fletcher-goldfarb-shanno (l-bfgs). URL http://www.chokkan.org/software/liblbfgs/index. html, 2011.

[16]

S. J. Pan, X. Ni, J.-T. Sun, Q. Yang, and Z. Chen. Cross-domain sentiment classification via spectral feature alignment. In Proceedings of the 19th international conference on World wide web, pages 751--760. ACM, 2010.

Digital Library

[17]

J. Pearl. Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, 1988.

Digital Library

[18]

R. Salakhutdinov and G. E. Hinton. Replicated softmax: an undirected topic model. In NIPS '09, pages 1607--1614, 2009.

[19]

P. Smolensky. Information processing in dynamical systems: Foundations of harmony theory. 1986.

[20]

R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267--288, 1996.

[21]

V. Vapnik. Statistical learning theory, 1998.

[22]

M. Zhou and K. C.-C. Chang. Entity-centric document filtering: boosting feature mapping through meta-features. In Proc. of CIKM '13, pages 119--128. ACM, 2013.

Digital Library

Cited By

Chen JLiu ZJin DWang YYang FBai X(2022)Light Transport Induced Domain Adaptation for Semantic Segmentation in Thermal Infrared Urban ScenesIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2022.319493123:12(23194-23211)Online publication date: Dec-2022
https://doi.org/10.1109/TITS.2022.3194931
Jing MLi JZhao JLu K(2018)Learning Distribution-Matched Landmarks for Unsupervised Domain AdaptationDatabase Systems for Advanced Applications10.1007/978-3-319-91458-9_30(491-508)Online publication date: 12-May-2018
https://doi.org/10.1007/978-3-319-91458-9_30
Guo LJin BYao CYang HHuang DWang F(2016)Which Doctor to Trust: A Recommender System for Identifying the Right DoctorsJournal of Medical Internet Research10.2196/jmir.601518:7(e186)Online publication date: 7-Jul-2016
https://doi.org/10.2196/jmir.6015
Show More Cited By

Index Terms

Unifying learning to rank and domain adaptation: enabling cross-task document scoring
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources
2. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Ranking model adaptation for domain-specific search
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

Recently, various domain-specific search engines emerge, which are restricted to specific topicalities or document formats, and vertical to the broad-based search. Simply applying the ranking model trained for the broad-based search to the verticals ...
Learning to rank only using training data from related domain
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

Like traditional supervised and semi-supervised algorithms, learning to rank for information retrieval requires document annotations provided by domain experts. It is costly to annotate training data for different search domains and tasks. We propose to ...
Ranking function adaptation with boosting trees

Machine-learned ranking functions have shown successes in Web search engines. With the increasing demands on developing effective ranking functions for different search domains, we have seen a big bottleneck, that is, the problem of insufficient labeled ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining

August 2014

2028 pages

ISBN:9781450329569

DOI:10.1145/2623330

General Chairs:
Sofus Macskassy
Facebook
,
Claudia Perlich
Dstillery
,
Program Chairs:
Jure Leskovec
Stanford University
,
Wei Wang
UCLA
,
Rayid Ghani
University of Chicago

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 August 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

KDD '14

Sponsor:

KDD '14: The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 24 - 27, 2014

New York, New York, USA

Acceptance Rates

KDD '14 Paper Acceptance Rate 151 of 1,036 submissions, 15%;

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
495
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)1

Reflects downloads up to 12 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Chen JLiu ZJin DWang YYang FBai X(2022)Light Transport Induced Domain Adaptation for Semantic Segmentation in Thermal Infrared Urban ScenesIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2022.319493123:12(23194-23211)Online publication date: Dec-2022
https://doi.org/10.1109/TITS.2022.3194931
Jing MLi JZhao JLu K(2018)Learning Distribution-Matched Landmarks for Unsupervised Domain AdaptationDatabase Systems for Advanced Applications10.1007/978-3-319-91458-9_30(491-508)Online publication date: 12-May-2018
https://doi.org/10.1007/978-3-319-91458-9_30
Guo LJin BYao CYang HHuang DWang F(2016)Which Doctor to Trust: A Recommender System for Identifying the Right DoctorsJournal of Medical Internet Research10.2196/jmir.601518:7(e186)Online publication date: 7-Jul-2016
https://doi.org/10.2196/jmir.6015
Chidlovskii BClinchant SCsurka GKrishnapuram BShah MSmola AAggarwal CShen DRastogi R(2016)Domain Adaptation in the Absence of Source Domain DataProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining10.1145/2939672.2939716(451-460)Online publication date: 13-Aug-2016
https://dl.acm.org/doi/10.1145/2939672.2939716
Csurka GChidlowskii BClinchant SMichel S(2016)Unsupervised Domain Adaptation with Regularized Domain Instance DenoisingComputer Vision – ECCV 2016 Workshops10.1007/978-3-319-49409-8_37(458-466)Online publication date: 24-Nov-2016
https://doi.org/10.1007/978-3-319-49409-8_37

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents