research-article

Open access

Domain Adaptation for Enterprise Email Search

Authors:

Maryam Karimzadehgan,

Rama Kumar Pasumarthi,

Michael Bendersky,

Donald MetzlerAuthors Info & Claims

SIGIR'19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 25 - 34

https://doi.org/10.1145/3331184.3331204

Published: 18 July 2019 Publication History

Abstract

In the enterprise email search setting, the same search engine often powers multiple enterprises from various industries: technology, education, manufacturing, etc. However, using the same global ranking model across different enterprises may result in suboptimal search quality, due to the corpora differences and distinct information needs. On the other hand, training an individual ranking model for each enterprise may be infeasible, especially for smaller institutions with limited data. To address this data challenge, in this paper we propose a domain adaptation approach that fine-tunes the global model to each individual enterprise. In particular, we propose a novel application of the Maximum Mean Discrepancy (MMD) approach to information retrieval, which attempts to bridge the gap between the global data distribution and the data distribution for a given individual enterprise. We conduct a comprehensive set of experiments on a large-scale email search engine, and demonstrate that the MMD approach consistently improves the search quality for multiple individual domains, both in comparison to the global ranking model, as well as several competitive domain adaptation baselines including adversarial learning methods.

Supplementary Material

MP4 File (cite1-12h00-d1.mp4)

Download
470.14 MB

References

[1]

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. Tensorflow: a system for large-scale machine learning. In Operating Systems Design and Implementation, Vol. 16. 265--283.

Digital Library

[2]

Qingyao Ai, Susan T. Dumais, Nick Craswell, and Dan Liebling. 2017. Characterizing email search using large-scale behavioral logs and surveys. In Proceedings of the 26th International Conference on World Wide Web. 1511--1520.

Digital Library

[3]

Martin Arjovsky and Léon Bottou. 2017. Towards principled methods for training generative adversarial networks. arXiv preprint arXiv:1701.04862 (2017).

[4]

Alexey Borisov, Ilya Markov, Maarten de Rijke, and Pavel Serdyukov. 2016. A neural click model for web search. In Proceedings of the 25th International Conference on World Wide Web. 531--541.

Digital Library

[5]

Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Hullender. 2005. Learning to rank using gradient descent. In Proceedings of the 22nd International Conference on Machine Learning. 89--96.

Digital Library

[6]

Christopher J. C. Burges. 2010. From Ranknet to Lambdarank to Lambdamart: An overview. Learning, Vol. 11, 23--581 (2010), 81.

[7]

Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li. 2007. Learning to rank: from pairwise approach to listwise approach. In Proceedings of the 24th International Conference on Machine Learning. 129--136.

Digital Library

[8]

David Carmel, Guy Halawi, Liane Lewin-Eytan, Yoelle Maarek, and Ariel Raviv. 2015. Rank by time or by relevance? Revisiting email search. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. 283--292.

Digital Library

[9]

Daniel Cohen, Bhaskar Mitra, Katja Hofmann, and W. Bruce Croft. 2018. Cross domain regularization for neural ranking models using adversarial learning. arXiv preprint arXiv:1805.03403 (2018).

Digital Library

[10]

Mostafa Dehghani, Hamed Zamani, Aliaksei Severyn, Jaap Kamps, and W. Bruce Croft. 2017. Neural ranking models with weak supervision. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 65--74.

Digital Library

[11]

Ilias Diakonikolas, Gautam Kamath, Daniel M. Kane, Jerry Li, Ankur Moitra, and Alistair Stewart. 2016. Robust estimators in high dimensions without the computational intractability. In IEEE 57th Annual Symposium on Foundations of Computer Science. 655--664.

[12]

Susan Dumais, Edward Cutrell, Jonathan J. Cadiz, Gavin Jancke, Raman Sarin, and Daniel C. Robbins. 2016. Stuff I've seen: a system for personal information retrieval and re-use. In ACM SIGIR Forum, Vol. 49. 28--35.

Digital Library

[13]

Bora Edizel, Amin Mantrach, and Xiao Bai. 2017. Deep Character-Level Click-Through Rate Prediction for Sponsored Search. arXiv preprint arXiv:1707.02158 (2017).

Digital Library

[14]

Jerome H. Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of Statistics (2001), 1189--1232.

[15]

Yaroslav Ganin and Victor Lempitsky. 2014. Unsupervised domain adaptation by backpropagation. arXiv preprint arXiv:1409.7495 (2014).

Digital Library

[16]

Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, and Victor Lempitsky. 2016. Domain-adversarial training of neural networks. The Journal of Machine Learning Research, Vol. 17, 1 (2016), 2096--2030.

Digital Library

[17]

Muhammad Ghifary, W Bastiaan Kleijn, Mengjie Zhang, David Balduzzi, and Wen Li. 2016. Deep reconstruction-classification networks for unsupervised domain adaptation. In European Conference on Computer Vision. 597--613.

[18]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems. 2672--2680.

Digital Library

[19]

A. Gretton, A. J. Smola, J. Huang, M. Schmittfull, K. M. Borgwardt, and B. Schölkopf. 2009. Covariate shift and local learning by distribution matching. MIT Press, Cambridge, MA, USA, 131--160.

[20]

Catherine Grevet, David Choi, Debra Kumar, and Eric Gilbert. 2014. Overload is overloaded: Email in the age of Gmail. In Proceedings of the Sigchi Conference on Human Factors in Computing Systems. 793--802.

Digital Library

[21]

Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C. Courville. 2017. Improved training of Wasserstein Gans. In Advances in Neural Information Processing Systems. 5767--5777.

Digital Library

[22]

Jiafeng Guo, Yixing Fan, Qingyao Ai, and W. Bruce Croft. 2016. A deep relevance matching model for ad-hoc retrieval. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. 55--64.

Digital Library

[23]

David Hawking. 2010. Enterprise Search. In Modern Information Retrieval, 2nd Edition, Ricardo Baeza-Yates and Berthier Ribeiro-Neto (Eds.). Addison-Wesley, 645--686. http://david-hawking.net/pubs/ModernIR2_Hawking_chapter.pdf

[24]

Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM International Conference on Conference on Information & Knowledge Management. 2333--2338.

Digital Library

[25]

Thorsten Joachims. 2002. Optimizing search engines using clickthrough data. In Proceedings of the eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 133--142.

Digital Library

[26]

Jin Young Kim, Nick Craswell, Susan Dumais, Filip Radlinski, and Fang Liu. 2017. Understanding and Modeling Success in Email Search. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 265--274.

Digital Library

[27]

Udo Kruschwitz and Charlie Hull. 2017. Searching the Enterprise. Foundations and Trends® in Information Retrieval, Vol. 11, 1 (2017), 1--142.

[28]

Kevin A Lai, Anup B Rao, and Santosh Vempala. 2016. Agnostic estimation of mean and covariance. In IEEE 57th Annual Symposium on Foundations of Computer Science. 665--674.

[29]

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep Learning. Nature, Vol. 521, 7553 (2015), 436.

[30]

Ming-Yu Liu and Oncel Tuzel. 2016. Coupled generative adversarial networks. In Advances in neural information processing systems. 469--477.

Digital Library

[31]

Fuchen Long, Ting Yao, Qi Dai, Xinmei Tian, Jiebo Luo, and Tao Mei. 2018. Deep Domain Adaptation Hashing with Adversarial Learning. In the 41st International ACM SIGIR Conference on Research; Development in Information Retrieval.

Digital Library

[32]

Mingsheng Long, Yue Cao, Jianmin Wang, and Michael I. Jordan. 2015. Learning transferable features with deep adaptation networks. arXiv preprint arXiv:1502.02791 (2015).

Digital Library

[33]

Sitong Mao, Xiao Shen, and Fu-lai Chung. 2018. Deep Domain Adaptation Based on Multi-layer Joint Kernelized Distance. In the 41st International ACM SIGIR Conference on Research; Development in Information Retrieval.

Digital Library

[34]

Bhaskar Mitra and Nick Craswell. 2017. Neural Models for Information Retrieval. arXiv preprint arXiv:1805.03403 (2017).

[35]

Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. 2018. Foundations of machine learning. MIT press.

Digital Library

[36]

Rama Kumar Pasumarthi, Sebastian Bruch, Xuanhui Wang, Cheng Li, Michael Bendersky, Marc Najork, Jan Pfeifer, Nadav Golbandi, Rohan Anil, and Stephan Wolf. 2019. TF-Ranking: Scalable TensorFlow Library for Learning-to-Rank. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. (to appear).

Digital Library

[37]

Kevin Roth, Aurelien Lucchi, Sebastian Nowozin, and Thomas Hofmann. 2017. Stabilizing training of generative adversarial networks through regularization. In Advances in Neural Information Processing Systems. 2018--2028.

Digital Library

[38]

Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. 2016. Improved techniques for training gans. In Advances in Neural Information Processing Systems. 2234--2242.

Digital Library

[39]

Jiaming Shen, Maryam Karimzadehgan, Michael Bendersky, Zhen Qin, and Don Metzler. 2018. Multi-Task Learning for Personal Search Ranking with Query Clustering. In Proceedings of ACM Conference on Information and Knowledge Management.

Digital Library

[40]

Baochen Sun and Kate Saenko. 2016. Deep coral: Correlation alignment for deep domain adaptation. In European Conference on Computer Vision. 443--450.

[41]

Eric Tzeng, Judy Hoffman, Trevor Darrell, and Kate Saenko. 2015. Simultaneous deep transfer across domains and tasks. In Proceedings of the IEEE International Conference on Computer Vision. 4068--4076.

Digital Library

[42]

Eric Tzeng, Judy Hoffman, Kate Saenko, and Trevor Darrell. 2017. Adversarial discriminative domain adaptation. In Computer Vision and Pattern Recognition. 4.

[43]

Eric Tzeng, Judy Hoffman, Ning Zhang, Kate Saenko, and Trevor Darrell. 2014. Deep domain confusion: Maximizing for domain invariance. arXiv preprint arXiv:1412.3474 (2014).

[44]

Xuanhui Wang, Michael Bendersky, Donald Metzler, and Marc Najork. 2016. Learning to rank with selection bias in personal search. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. 115--124.

Digital Library

[45]

Fen Xia, Tie-Yan Liu, Jue Wang, Wensheng Zhang, and Hang Li. 2008. Listwise approach to learning to rank: theory and algorithm. In Proceedings of the 25th International Conference on Machine Learning. 1192--1199.

Digital Library

[46]

Hamed Zamani, Michael Bendersky, Xuanhui Wang, and Mingyang Zhang. 2017. Situational context for ranking in personal search. In Proceedings of the 26th International Conference on World Wide Web. 1531--1540.

Digital Library

Cited By

Ichiuji YMabu SHatta SInai KHiguchi SKido S(2024)Domain transformation using semi-supervised CycleGAN for improving performance of classifying thyroid tissue imagesInternational Journal of Computer Assisted Radiology and Surgery10.1007/s11548-024-03061-xOnline publication date: 18-Jan-2024
https://doi.org/10.1007/s11548-024-03061-x
Liu WZheng XSu JZheng LChen CHu M(2023)Contrastive Proxy Kernel Stein Path Alignment for Cross-Domain Cold-Start RecommendationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.323378935:11(11216-11230)Online publication date: 1-Nov-2023
https://doi.org/10.1109/TKDE.2022.3233789
Yu WXu CXu JPang LWen J(2022)Distribution Distance Regularized Sequence Representation for Text Matching in Asymmetrical DomainsIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2022.314528930(721-733)Online publication date: 1-Feb-2022
https://dl.acm.org/doi/10.1109/TASLP.2022.3145289
Show More Cited By

Index Terms

Domain Adaptation for Enterprise Email Search
1. Information systems
  1. Information retrieval

Recommendations

Ranking model adaptation for domain-specific search
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

Recently, various domain-specific search engines emerge, which are restricted to specific topicalities or document formats, and vertical to the broad-based search. Simply applying the ranking model trained for the broad-based search to the verticals ...
Search result diversification for enterprise data
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

Search result diversification aims to return a list of diversified relevant documents in order to satisfy different user information needs. Most of the efforts focused on Web Search, and few studies have considered another important search domain, i.e., ...
Post-ranking query suggestion by diversifying search results
SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval

Query suggestion refers to the process of suggesting related queries to search engine users. Most existing researches have focused on improving the relevance of suggested queries. In this paper, we introduce the concept of diversifying the content of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR'19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2019

1512 pages

ISBN:9781450361729

DOI:10.1145/3331184

General Chairs:
Benjamin Piwowarski
CNRS - Sorbonne Universite, France
,
Max Chevalier
Universite de Toulouse, CNRS, France
,
Eric Gaussier
Universite Grenoble Alpes, CNRS, France
,
Program Chairs:
Yoelle Maarek
Amazon Research, Israel
,
Jian-Yun Nie
University of Montreal, Canada
,
Falk Scholer
RMIT University, Australia

Copyright © 2019 Owner/Author.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives International 4.0 License.

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 July 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGIR '19

Sponsor:

SIGIR

SIGIR '19: The 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

July 21 - 25, 2019

Paris, France

Acceptance Rates

SIGIR'19 Paper Acceptance Rate 84 of 426 submissions, 20%;

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
929
Total Downloads

Downloads (Last 12 months)138
Downloads (Last 6 weeks)33

Reflects downloads up to 02 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ichiuji YMabu SHatta SInai KHiguchi SKido S(2024)Domain transformation using semi-supervised CycleGAN for improving performance of classifying thyroid tissue imagesInternational Journal of Computer Assisted Radiology and Surgery10.1007/s11548-024-03061-xOnline publication date: 18-Jan-2024
https://doi.org/10.1007/s11548-024-03061-x
Liu WZheng XSu JZheng LChen CHu M(2023)Contrastive Proxy Kernel Stein Path Alignment for Cross-Domain Cold-Start RecommendationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.323378935:11(11216-11230)Online publication date: 1-Nov-2023
https://doi.org/10.1109/TKDE.2022.3233789
Yu WXu CXu JPang LWen J(2022)Distribution Distance Regularized Sequence Representation for Text Matching in Asymmetrical DomainsIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2022.314528930(721-733)Online publication date: 1-Feb-2022
https://dl.acm.org/doi/10.1109/TASLP.2022.3145289
Aso TAmagasa TKitagawa H(2021)A Method for Searching Documents using Knowledge BasesThe 23rd International Conference on Information Integration and Web Intelligence10.1145/3487664.3487699(250-258)Online publication date: 29-Nov-2021
https://dl.acm.org/doi/10.1145/3487664.3487699
Wang CLiao YKao MLiang WHung S(2021)Toward accurate platform-aware performance modeling for deep neural networksACM SIGAPP Applied Computing Review10.1145/3477133.347713721:1(50-61)Online publication date: 20-Jul-2021
https://dl.acm.org/doi/10.1145/3477133.3477137
Bi KMetrikov PLi CByun B(2021)Leveraging User Behavior History for Personalized Email SearchProceedings of the Web Conference 202110.1145/3442381.3450110(2858-2868)Online publication date: 19-Apr-2021
https://dl.acm.org/doi/10.1145/3442381.3450110
Jagerman RKong WPasumarthi RQin ZBendersky MNajork MLewin-Eytan LCarmel DYom-Tov EAgichtein EGabrilovich E(2021)Improving Cloud Storage Search with User ActivityProceedings of the 14th ACM International Conference on Web Search and Data Mining10.1145/3437963.3441780(508-516)Online publication date: 8-Mar-2021
https://dl.acm.org/doi/10.1145/3437963.3441780
Chen LYuan FYang JHe XLi CYang M(2021)User-specific Adaptive Fine-tuning for Cross-domain RecommendationsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.3119619(1-1)Online publication date: 2021
https://doi.org/10.1109/TKDE.2021.3119619
Meng YKarimzadehgan MZhuang HMetzler DCaverlee JHu XLalmas MWang W(2020)Separate and Attend in Personal Email SearchProceedings of the 13th International Conference on Web Search and Data Mining10.1145/3336191.3371775(429-437)Online publication date: 20-Jan-2020
https://dl.acm.org/doi/10.1145/3336191.3371775

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents