short-paper

Open access

Multi-task Ranking with User Behaviors for Text-video Search

Authors:

Dongliang Liao,

Jin XuAuthors Info & Claims

WWW '22: Companion Proceedings of the Web Conference 2022

Pages 126 - 130

https://doi.org/10.1145/3487553.3524207

Published: 16 August 2022 Publication History

All formats PDF

Abstract

Text-video search has become an important demand in many industrial video sharing platforms, e.g., YouTube, TikTok, and WeChat Channels, thereby attracting increasing research attention. Traditional relevance-based ranking methods for text-video search concentrate on exploiting the semantic relevance between video and query. However, relevance is no longer the principal issue in the ranking stage, because the candidate items retrieved from the matching stage naturally guarantee adequate relevance. Instead, we argue that boosting user satisfaction should be an ultimate goal for ranking and it is promising to excavate cheap and rich user behaviors for model training. To achieve this goal, we propose an effective Multi-Task Ranking pipeline with User Behaviors (MTRUB) for text-video search. Specifically, to exploit the multi-modal data effectively, we put forward a Heterogeneous Multi-modal Fusion Module (HMFM) to fuse the query and video features of different modalities in adaptive ways. Besides that, we design an Independent Multi-modal Input Scheme (IMIS) to alleviate competing task correlation problems in multi-task learning. Experiments on the offline dataset gathered from WeChat Search demonstrate that MTRUB outperforms the baseline by 12.0% in mean gAUC and 13.3% in mean nDCG@10. We also conduct live experiments on a large-scale mobile search engine, i.e., WeChat Search, and MTRUB obtains substantial improvement compared with the traditional relevance-based ranking model.

References

[1]

Konstantinos Avgerinakis, Anastasia Moumtzidou, Damianos Galanopoulos, Georgios Orfanidis, Stelios Andreadis, Foteini Markatopoulou, Elissavet Batziou, Konstantinos Ioannidis, Stefanos Vrochidis, Vasileios Mezaris, 2018. ITI-CERTH participation in TRECVID 2018. In TRECVID.

[2]

Zhuyun Dai, Chenyan Xiong, Jamie Callan, and Zhiyuan Liu. 2018. Convolutional neural networks for soft-matching n-grams in ad-hoc search. In WSDM. 126–134.

[3]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805(2018).

[4]

Jianfeng Dong, Xirong Li, and Cees GM Snoek. 2018. Predicting visual features from text for image and video caption retrieval. IEEE Transactions on Multimedia 20, 12 (2018), 3377–3388.

Digital Library

[5]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In ICLR.

[6]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980(2014).

[7]

Duy-Dinh Le, Sang Phan, Vinh-Tiep Nguyen, Benjamin Renoust, Tuan A Nguyen, Van-Nam Hoang, Thanh Duc Ngo, Minh-Triet Tran, Yuki Watanabe, Martin Klinkigt, 2016. NII-HITACHI-UIT at TRECVID 2016. In TRECVID, Vol. 25.

[8]

Chenyi Lei, Yong Liu, Lingzi Zhang, Guoxin Wang, Haihong Tang, Houqiang Li, and Chunyan Miao. 2021. SEMI: A Sequential Multi-Modal Information Transfer Network for E-Commerce Micro-Video Recommendations. In KDD. 3161–3171.

[9]

Peidong Liu, Gengwei Zhang, Bochao Wang, Hang Xu, Xiaodan Liang, Yong Jiang, and Zhenguo Li. 2020. Loss Function Discovery for Object Detection via Convergence-Simulation Driven Search. In ICLR.

[10]

Qi Liu, Ruobing Xie, Lei Chen, Shukai Liu, Ke Tu, Peng Cui, Bo Zhang, and Leyu Lin. 2020. Graph neural network for tag ranking in tag-enhanced video recommendation. In CIKM. 2613–2620.

[11]

Shang Liu, Zhenzhong Chen, Hongyi Liu, and Xinghai Hu. 2019. User-video co-attention network for personalized micro-video recommendation. In WWW. 3020–3026.

[12]

Yang Liu, Samuel Albanie, Arsha Nagrani, and Andrew Zisserman. 2019. Use what you have: Video retrieval using representations from collaborative experts. In BMVC.

[13]

Jiaqi Ma, Zhe Zhao, Xinyang Yi, Jilin Chen, Lichan Hong, and Ed H Chi. 2018. Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. In KDD. 1930–1939.

[14]

Foteini Markatopoulou, Damianos Galanopoulos, Vasileios Mezaris, and Ioannis Patras. 2017. Query and keyframe representations for ad-hoc video search. In ICMR. 407–411.

[15]

Antoine Miech, Ivan Laptev, and Josef Sivic. 2018. Learning a text-video embedding from incomplete and heterogeneous data. arXiv preprint arXiv:1804.02516(2018).

[16]

Antoine Miech, Ivan Laptev, and Josef Sivic. 2018. Learning a text-video embedding from incomplete and heterogeneous data. arXiv preprint arXiv:1804.02516(2018).

[17]

Antoine Miech, Dimitri Zhukov, Jean-Baptiste Alayrac, Makarand Tapaswi, Ivan Laptev, and Josef Sivic. 2019. Howto100m: Learning a text-video embedding by watching hundred million narrated video clips. In ICCV. 2630–2640.

[18]

Phuong Anh Nguyen, Qing Li, Zhi-Qi Cheng, Yi-Jie Lu, Hao Zhang, Xiao Wu, and Chong-Wah Ngo. 2017. VIREO@ TRECVID 2017: Video-to-Text, Ad-hoc Video Search, and Video hyperlinking. In TRECVID.

[19]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, 2021. Learning transferable visual models from natural language supervision. arXiv preprint arXiv:2103.00020(2021).

[20]

Hongyan Tang, Junning Liu, Ming Zhao, and Xudong Gong. 2020. Progressive layered extraction (ple): A novel multi-task learning (mtl) model for personalized recommendations. In RecSys. 269–278.

[21]

Kazuya Ueki, Koji Hirakawa, Kotaro Kikuchi, Tetsuji Ogawa, and Tetsunori Kobayashi. 2017. Waseda_Meisei at TRECVID 2017: Ad-hoc Video Search. In TRECVID.

[22]

Yu Xiong, Qingqiu Huang, Lingfeng Guo, Hang Zhou, Bolei Zhou, and Dahua Lin. 2019. A graph-based framework to bridge movies and synopses. In ICCV. 4592–4601.

[23]

Xiaojun Yang, Lunjia Liao, Qin Yang, Bo Sun, and Jianxiang Xi. 2021. Limited-energy output formation for multiagent systems with intermittent interactions. Journal of the Franklin Institute 358, 13 (2021), 6462–6489.

[24]

Youngjae Yu, Jongseok Kim, and Gunhee Kim. 2018. A joint sequence fusion model for video question answering and retrieval. In ECCV. 471–487.

[25]

Youngjae Yu, Hyungjin Ko, Jongwook Choi, and Gunhee Kim. 2017. End-to-end concept word detection for video captioning, retrieval, and question answering. In CVPR. 3165–3173.

[26]

Linchao Zhu and Yi Yang. 2020. Actbert: Learning global-local video-text representations. In CVPR. 8746–8755.

Cited By

Gu PHu H(2024)A holistic view on positive and negative implicit feedback for micro-video recommendationKnowledge-Based Systems10.1016/j.knosys.2023.111299284:COnline publication date: 17-Apr-2024
https://dl.acm.org/doi/10.1016/j.knosys.2023.111299
Farshidi SRezaee KMazaheri SRahimi ADadashzadeh AZiabakhsh MEskandari SJansen S(2024)Understanding user intent modeling for conversational recommender systems: a systematic literature reviewUser Modeling and User-Adapted Interaction10.1007/s11257-024-09398-xOnline publication date: 6-Jun-2024
https://doi.org/10.1007/s11257-024-09398-x

Index Terms

Multi-task Ranking with User Behaviors for Text-video Search
1. Information systems
  1. World Wide Web
    1. Web searching and information discovery
      1. Content ranking

Recommendations

Improving local search ranking through external logs
SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval

The signals used for ranking in local search are very different from web search: in addition to (textual) relevance, measures of (geographic) distance between the user and the search result, as well as measures of popularity of the result are important ...
Re-ranking search results using query logs
CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management

This work addresses two common problems in search, frequently occurring with underspecified user queries: the top-ranked results for such queries may not contain documents relevant to the user's search intent, and fresh and relevant pages may not get ...
Actively predicting diverse search intent from user browsing behaviors
WWW '10: Proceedings of the 19th international conference on World wide web

This paper is concerned with actively predicting search intent from user browsing behavior data. In recent years, great attention has been paid to predicting user search intent. However, the prediction was mostly passive because it was performed only ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '22: Companion Proceedings of the Web Conference 2022

April 2022

1338 pages

ISBN:9781450391306

DOI:10.1145/3487553

Editors:
Frédérique Laforest
INSA Lyon, France
,
Raphaël Troncy
EURECOM, France
,
Lionel Médini
Université Lyon 1, France
,
Ivan Herman
W3C / retired

Copyright © 2022 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 August 2022

Check for updates

Author Tags

Qualifiers

Short-paper
Research
Refereed limited

Funding Sources

National Natural Science Foundation of China
PCNL KEY project

Conference

WWW '22

Sponsor:

SIGWEB

WWW '22: The ACM Web Conference 2022

April 25 - 29, 2022

Virtual Event, Lyon, France

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
448
Total Downloads

Downloads (Last 12 months)138
Downloads (Last 6 weeks)17

Reflects downloads up to 09 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Gu PHu H(2024)A holistic view on positive and negative implicit feedback for micro-video recommendationKnowledge-Based Systems10.1016/j.knosys.2023.111299284:COnline publication date: 17-Apr-2024
https://dl.acm.org/doi/10.1016/j.knosys.2023.111299
Farshidi SRezaee KMazaheri SRahimi ADadashzadeh AZiabakhsh MEskandari SJansen S(2024)Understanding user intent modeling for conversational recommender systems: a systematic literature reviewUser Modeling and User-Adapted Interaction10.1007/s11257-024-09398-xOnline publication date: 6-Jun-2024
https://doi.org/10.1007/s11257-024-09398-x

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents