research-article

Building User-oriented Personalized Machine Translator based on User-Generated Textual Content

Authors:

Zhengqing Guan,

Xianghua (Sharon) Ding,

Ning GuAuthors Info & Claims

Proceedings of the ACM on Human-Computer Interaction, Volume 6, Issue CSCW2

Article No.: 280, Pages 1 - 26

https://doi.org/10.1145/3555171

Published: 11 November 2022 Publication History

Abstract

Machine Translation (MT) has been a very useful tool to assist multilingual communication and collaboration. In recent years, by taking advantage of the exciting developments of neural networks and deep learning, the accuracy and speed of machine translation have been continuously improved. However, most machine translation methods and systems are data-driven. They tend to select a consensus response represented in training data, while a user's preferred linguistic style, which is important for translation comprehension and user experience, is ignored. For this problem, we aim to build a user-oriented personalized machine translation model in this paper. The model aims to learn each user's linguistic style from the textual content that is generated by her/him (User-Generated Textual Content, UGTC) in social media context and generate personalized translation results utilizing several state-of-the-art deep learning techniques like Transformer and pre-training. We also implemented a user-oriented personalized machine translator using Weibo as a case of the source of UGTC to provide a systematical implementation scheme of a user-oriented personalized machine translation system based on our model. The translator was evaluated by automatic evaluation in combination with human evaluation. The results suggest that our model can generate more personalized, natural and lively translation results and enhance the comprehensibility of translation results, which makes its generations more preferred by users versus general translation results.

References

[1]

Kholoud Khalil Aldous, Jisun An, and Bernard J Jansen. 2019. View, like, comment, post: Analyzing user engagement by topic at 4 levels across 5 social media platforms for 53 news organizations. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 13. 47--57.

[2]

Georgios Balikas, Simon Moura, and Massih-Reza Amini. 2017. Multitask learning for fine-grained twitter sentiment analysis. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Rretrieval. 1005--1008.

Digital Library

[3]

Deana Brown and Rebecca E Grinter. 2016. Designing for transient use: A human-in-the-loop translation platform for refugees. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. 321--330.

Digital Library

[4]

Chris Callison-Burch, Miles Osborne, and Philipp Koehn. 2006. Re-evaluating the role of BLEU in machine translation research. In Proceedings of the 11th Conference of the European Chapter of the Association for Compuational Linguistics. 249--256.

[5]

Guanhua Chen, Yun Chen, and Victor OK Li. 2021. Lexically constrained neural machine translation with explicit alignment guidance. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 12630--12638.

[6]

Xuetong Chen, Martin D Sykora, Thomas W Jackson, and Suzanne Elayan. 2018. What about mood swings: Identifying depression on Twitter with temporal measures of emotions. In Companion Proceedings of the The Web Conference 2018. 1653--1660.

Digital Library

[7]

Yu-Hsiu Chen, Pin-Yu Chen, Hong-Han Shuai, and Wen-Chih Peng. 2020. TemPEST: Soft template-based personalized EDM subject generation through collaborative summarization. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 7538--7545.

[8]

Na Cheng, Rajarathnam Chandramouli, and KP Subbalakshmi. 2011. Author gender identification from text. Digital Investigation 8, 1 (2011), 78--88.

Digital Library

[9]

Ning Dai, Jianze Liang, Xipeng Qiu, and Xuan-Jing Huang. 2019. Style Transformer: Unpaired text style transfer without disentangled latent representation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 5997--6007.

[10]

Kalyanmoy Deb. 2014. Multi-objective optimization. In Search Methodologies. Springer, 403--449.

[11]

Sergey Edunov, Myle Ott, Michael Auli, and David Grangier. 2018. Understanding back-translation at scale. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 489--500.

[12]

Lucie Flek. 2020. Returning the N to NLP: Towards contextually personalized classification models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 7828--7838.

[13]

Ge Gao, Bin Xu, Dan Cosley, and Susan R Fussell. 2014. How beliefs about the presence of machine translation impact multilingual collaborations. In Proceedings of the 17th ACM conference on Computer Supported Cooperative Work & Social Computing. 1549--1560.

Digital Library

[14]

Ge Gao, Bin Xu, David C Hau, Zheng Yao, Dan Cosley, and Susan R Fussell. 2015. Two is better than one: improving multilingual collaboration by giving two machine translation outputs. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing. 852--863.

Digital Library

[15]

Spence Green, Jeffrey Heer, and Christopher D Manning. 2013. The efficacy of human post-editing for language translation. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 439--448.

Digital Library

[16]

Kotaro Hara and Shamsi T Iqbal. 2015. Effect of machine translation in interlingual conversation: Lessons from a formative study. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. 3473--3482.

Digital Library

[17]

Wei He, Zhongjun He, Hua Wu, and Haifeng Wang. 2016. Improved neural machine translation with SMT features. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 30.

[18]

Nico Herbig, Santanu Pal, Josef van Genabith, and Antonio Krüger. 2019. Multi-modal approaches for post-editing machine translation. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1--11.

Digital Library

[19]

Chang Hu, Benjamin B Bederson, Philip Resnik, and Yakov Kronrod. 2011. Monotrans2: A new human computation system to support monolingual translation. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 1133--1136.

Digital Library

[20]

Harsh Jhamtani, Varun Gangal, Eduard Hovy, and Eric Nyberg. 2017. Shakespearizing modern language using copy-enriched sequence-to-sequence models. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing 6 (2017), 10.

[21]

Osama Khalid and Padmini Srinivasan. 2020. Style matters! Investigating linguistic style in online communities. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 14. 360--369.

[22]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[23]

Satwik Kottur, Xiaoyu Wang, and Vitor R Carvalho. 2017. Exploring personalized neural conversational models. In Proceedings of the 26th International Joint Conference on Artificial Intelligence. 3728--3734.

[24]

Jiwei Li, Michel Galley, Chris Brockett, Georgios P Spithourakis, Jianfeng Gao, and Bill Dolan. 2016. A persona-based neural conversation model. arXiv preprint arXiv:1603.06155 (2016).

[25]

Junjie Li, Haoran Li, and Chengqing Zong. 2019. Towards personalized review summarization via user-aware sequence network. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 6690--6697.

Digital Library

[26]

Pan Li and Alexander Tuzhilin. 2019. Towards controllable and personalized review generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 3228--3236.

[27]

Hajin Lim, Dan Cosley, and Susan R Fussell. 2018. Beyond translation: Design and evaluation of an emotional and contextual knowledge interface for foreign language social media posts. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1--12.

Digital Library

[28]

Hajin Lim and Susan R Fussell. 2017. Making sense of foreign language posts in social media. Proceedings of the ACM on Human-Computer Interaction 1, CSCW (2017), 1--16.

Digital Library

[29]

Tianyi Lin, Wentao Tian, Qiaozhu Mei, and Hong Cheng. 2014. The dual-sparse topic model: Mining focused topics and focused terms in short text. In Proceedings of the 23rd International Conference on World Wide Web. 539--550.

Digital Library

[30]

Bodhisattwa Prasad Majumder, Shuyang Li, Jianmo Ni, and Julian McAuley. 2019. Generating personalized recipes from historical user preferences. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 5978--5984.

[31]

Paul Michel and Graham Neubig. 2018. Extreme adaptation for personalized neural machine translation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 312--318.

[32]

Shachar Mirkin, Scott Nowson, Caroline Brun, and Julien Perez. 2015. Motivating personality-aware machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1102--1108.

[33]

Daisuke Oba, Naoki Yoshinaga, Shoetsu Sato, Satoshi Akasaki, and Masashi Toyoda. 2019. Modeling personal biases in language use by inducing personalized word embeddings. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2102--2108.

[34]

Thierry Poibeau. 2017. Machine Translation. MIT Press.

[35]

Ella Rabinovich, Raj Nath Patel, Shachar Mirkin, Lucia Specia, and Shuly Wintner. 2017. Personalized machine translation: Preserving original author traits. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers. 1074--1084.

[36]

Sudha Rao and Joel Tetreault. 2018. Dear sir or madam, may I introduce the GYAFC dataset: Corpus, benchmarks and metrics for formality style transfer. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 129--140.

[37]

Ramit Sawhney, Harshit Joshi, Saumya Gandhi, and Rajiv Ratn Shah. 2021. Towards ordinal suicide ideation detection on social media. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining. 22--30.

Digital Library

[38]

Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Improving neural machine translation models with monolingual data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 86--96.

[39]

Abhishek Singh, Eduardo Blanco, and Wei Jin. 2019. Incorporating emoji descriptions improves tweet classification. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2096--2101.

[40]

Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of the 2014 Annual Conference on Neural Information Processing Systems. 3104--3112.

[41]

Alexandru Tatar, Marcelo Dias De Amorim, Serge Fdida, and Panayotis Antoniadis. 2014. A survey on predicting the popularity of web content. Journal of Internet Services and Applications 5, 1 (2014), 8.

[42]

Hans Van Halteren, Harald Baayen, Fiona Tweedie, Marco Haverkort, and Anneke Neijt. 2005. New machine learning methods demonstrate the existence of a human stylome. Journal of Quantitative Linguistics 12, 1 (2005), 65--77.

[43]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 6000--6010.

[44]

Hao-Chuan Wang, Susan Fussell, and Dan Cosley. 2013. Machine translation vs. common language: Effects on idea exchange in cross-lingual groups. In Proceedings of the 2013 Conference on Computer Supported Cooperative Work. 935--944.

Digital Library

[45]

Qiang Wang, Bei Li, Tong Xiao, Jingbo Zhu, Changliang Li, Derek F Wong, and Lidia S Chao. 2019. Learning deep Transformer models for machine translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 1810--1822.

[46]

Wen Wang, Wei Zhang, Jun Wang, Junchi Yan, and Hongyuan Zha. 2018. Learning sequential correlation for user generated textual content popularity prediction. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. 1625--1631.

[47]

Xing Wang, Zhengdong Lu, Zhaopeng Tu, Hang Li, Deyi Xiong, and Min Zhang. 2017. Neural machine translation advised by statistical machine translation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 31.

[48]

Yue Wang, Cuong Hoang, and Marcello Federico. 2021. Towards modeling the style of translators in neural machine translation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1193--1199.

[49]

Yunli Wang, Yu Wu, Lili Mou, Zhoujun Li, and Wenhan Chao. 2020. Formality style transfer with shared latent space. In Proceedings of the 28th International Conference on Computational Linguistics. 2236--2249.

[50]

Kyle Wiggers. 2020. How Google is using emerging AI techniques to improve language translation quality. https://venturebeat.com/2020/06/03/how-googleis-using-emerging-ai-techniques-to-improve-language-translation-quality/

[51]

Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. 2016. Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016).

[52]

Bin Xu, Ge Gao, Susan R Fussell, and Dan Cosley. 2014. Improving machine translation by showing two outputs. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 3743--3746.

Digital Library

[53]

Naomi Yamashita, Rieko Inaba, Hideaki Kuzuoka, and Toru Ishida. 2009. Difficulties in establishing common ground in multiparty groups using machine translation. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 679--688.

Digital Library

[54]

Naomi Yamashita and Toru Ishida. 2006. Effects of machine translation on collaborative work. In Proceedings of the 2006 Conference on Computer Supported Cooperative Work. 515--524.

Digital Library

[55]

Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1480--1489.

[56]

Wenhuan Zeng, Abulikemu Abuduweili, Lei Li, and Pengcheng Yang. 2019. Automatic generation of personalized comment based on user profile. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. 229--235.

[57]

Tianfu Zhang, Heyan Huang, Chong Feng, and Longbing Cao. 2021. Self-supervised bilingual syntactic alignment for neural machine translation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 14454--14462.

[58]

Wei-Nan Zhang, Qingfu Zhu, Yifa Wang, Yanyan Zhao, and Ting Liu. 2019. Neural personalized response generation as domain adaptation. World Wide Web 22, 4 (2019), 1427--1446.

Digital Library

[59]

Yinhe Zheng, Guanyi Chen, Minlie Huang, Song Liu, and Xuan Zhu. 2019. Personalized dialogue generation with diversified traits. arXiv preprint arXiv:1901.09672 (2019).

[60]

Yinhe Zheng, Rongsheng Zhang, Minlie Huang, and Xiaoxi Mao. 2020. A pre-training based personalized dialogue generation model with persona-sparse data. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 9693--9700.

[61]

Jianling Zhong, Weiwei Guo, Huiji Gao, and Bo Long. 2020. Personalized query suggestions. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1645--1648.

Digital Library

Cited By

Xu ZZhang JGreenberg JFrumkin MJaveed SZhang JBenedict BBotterbush KRodebaugh TRay WLu C(2024)Predicting Multi-dimensional Surgical Outcomes with Multi-modal Mobile SensingProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36596288:2(1-30)Online publication date: 15-May-2024
https://dl.acm.org/doi/10.1145/3659628
Li WGao RXiong JZhou JWang LMao XYi EZhang D(2024)WiFi-CSI Difference ParadigmProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36596088:2(1-29)Online publication date: 15-May-2024
https://dl.acm.org/doi/10.1145/3659608
Jeong DHan K(2024)PRECYSE: Predicting Cybersickness using Transformer for Multimodal Time-Series Sensor DataProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36595948:2(1-24)Online publication date: 15-May-2024
https://dl.acm.org/doi/10.1145/3659594
Show More Cited By

Index Terms

Building User-oriented Personalized Machine Translator based on User-Generated Textual Content
1. Human-centered computing
  1. Collaborative and social computing

Recommendations

Language Modeling for Syntax-Based Machine Translation Using Tree Substitution Grammars: A Case Study on Chinese-English Translation

The poor grammatical output of Machine Translation (MT) systems appeals syntax-based approaches within language modeling. However, previous studies showed that syntax-based language modeling using (Context-Free) Treebank Grammars was not very helpful in ...
Large aligned treebanks for syntax-based machine translation

We present a collection of parallel treebanks that have been automatically aligned on both the terminal and the non-terminal constituent level for use in syntax-based machine translation. We describe how they were constructed and applied to a syntax- ...
A Neural Network Classifier Based on Dependency Tree for English-Vietnamese Statistical Machine Translation
Computational Linguistics and Intelligent Text Processing
Abstract
Reordering in MT is a major challenge when translating between languages with different of sentence structures. In Phrase-based statistical machine translation (PBSMT) systems, syntactic pre-ordering is a commonly used pre-processing technique. ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Human-Computer Interaction

Proceedings of the ACM on Human-Computer Interaction Volume 6, Issue CSCW2

CSCW

November 2022

8205 pages

EISSN:2573-0142

DOI:10.1145/3571154

Editor:
Jeff Nichols
Google

Issue’s Table of Contents

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 November 2022

Published in PACMHCI Volume 6, Issue CSCW2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

34
Total Citations
View Citations
219
Total Downloads

Downloads (Last 12 months)87
Downloads (Last 6 weeks)6

Reflects downloads up to 27 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Xu ZZhang JGreenberg JFrumkin MJaveed SZhang JBenedict BBotterbush KRodebaugh TRay WLu C(2024)Predicting Multi-dimensional Surgical Outcomes with Multi-modal Mobile SensingProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36596288:2(1-30)Online publication date: 15-May-2024
https://dl.acm.org/doi/10.1145/3659628
Li WGao RXiong JZhou JWang LMao XYi EZhang D(2024)WiFi-CSI Difference ParadigmProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36596088:2(1-29)Online publication date: 15-May-2024
https://dl.acm.org/doi/10.1145/3659608
Jeong DHan K(2024)PRECYSE: Predicting Cybersickness using Transformer for Multimodal Time-Series Sensor DataProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36595948:2(1-24)Online publication date: 15-May-2024
https://dl.acm.org/doi/10.1145/3659594
Zhou YZhao HHuang YRöddiger TKurnaz MRiedel TBeigl M(2024)AutoAugHAR: Automated Data Augmentation for Sensor-based Human Activity RecognitionProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36595898:2(1-27)Online publication date: 15-May-2024
https://dl.acm.org/doi/10.1145/3659589
Yang LAmin OShihada B(2024)Intelligent Wearable Systems: Opportunities and Challenges in Health and SportsACM Computing Surveys10.1145/364846956:7(1-42)Online publication date: 9-Apr-2024
https://dl.acm.org/doi/10.1145/3648469
Sheng BHan RXiao FGuo ZGui L(2024)MetaFormerProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36435508:1(1-27)Online publication date: 6-Mar-2024
https://dl.acm.org/doi/10.1145/3643550
Prinster GSmith CTan CKeegan B(2024)Community Archetypes: An Empirical Framework for Guiding Research Methodologies to Reflect User Experiences of Sense of Virtual Community on RedditProceedings of the ACM on Human-Computer Interaction10.1145/36373108:CSCW1(1-33)Online publication date: 26-Apr-2024
https://dl.acm.org/doi/10.1145/3637310
Gregersen SAguirre AHaselwarter PTassarotti JBirkedal L(2024)Asynchronous Probabilistic Couplings in Higher-Order Separation LogicProceedings of the ACM on Programming Languages10.1145/36328688:POPL(753-784)Online publication date: 5-Jan-2024
https://dl.acm.org/doi/10.1145/3632868
Cousot P(2024)Calculational Design of [In]Correctness Transformational Program Logics by Abstract InterpretationProceedings of the ACM on Programming Languages10.1145/36328498:POPL(175-208)Online publication date: 5-Jan-2024
https://dl.acm.org/doi/10.1145/3632849
Ge WMou GAgu ELee K(2024)Deep Heterogeneous Contrastive Hyper-Graph Learning for In-the-Wild Context-Aware Human Activity RecognitionProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36314447:4(1-23)Online publication date: 12-Jan-2024
https://dl.acm.org/doi/10.1145/3631444
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents