Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Building User-oriented Personalized Machine Translator based on User-Generated Textual Content

Published: 11 November 2022 Publication History
  • Get Citation Alerts
  • Abstract

    Machine Translation (MT) has been a very useful tool to assist multilingual communication and collaboration. In recent years, by taking advantage of the exciting developments of neural networks and deep learning, the accuracy and speed of machine translation have been continuously improved. However, most machine translation methods and systems are data-driven. They tend to select a consensus response represented in training data, while a user's preferred linguistic style, which is important for translation comprehension and user experience, is ignored. For this problem, we aim to build a user-oriented personalized machine translation model in this paper. The model aims to learn each user's linguistic style from the textual content that is generated by her/him (User-Generated Textual Content, UGTC) in social media context and generate personalized translation results utilizing several state-of-the-art deep learning techniques like Transformer and pre-training. We also implemented a user-oriented personalized machine translator using Weibo as a case of the source of UGTC to provide a systematical implementation scheme of a user-oriented personalized machine translation system based on our model. The translator was evaluated by automatic evaluation in combination with human evaluation. The results suggest that our model can generate more personalized, natural and lively translation results and enhance the comprehensibility of translation results, which makes its generations more preferred by users versus general translation results.

    References

    [1]
    Kholoud Khalil Aldous, Jisun An, and Bernard J Jansen. 2019. View, like, comment, post: Analyzing user engagement by topic at 4 levels across 5 social media platforms for 53 news organizations. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 13. 47--57.
    [2]
    Georgios Balikas, Simon Moura, and Massih-Reza Amini. 2017. Multitask learning for fine-grained twitter sentiment analysis. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Rretrieval. 1005--1008.
    [3]
    Deana Brown and Rebecca E Grinter. 2016. Designing for transient use: A human-in-the-loop translation platform for refugees. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. 321--330.
    [4]
    Chris Callison-Burch, Miles Osborne, and Philipp Koehn. 2006. Re-evaluating the role of BLEU in machine translation research. In Proceedings of the 11th Conference of the European Chapter of the Association for Compuational Linguistics. 249--256.
    [5]
    Guanhua Chen, Yun Chen, and Victor OK Li. 2021. Lexically constrained neural machine translation with explicit alignment guidance. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 12630--12638.
    [6]
    Xuetong Chen, Martin D Sykora, Thomas W Jackson, and Suzanne Elayan. 2018. What about mood swings: Identifying depression on Twitter with temporal measures of emotions. In Companion Proceedings of the The Web Conference 2018. 1653--1660.
    [7]
    Yu-Hsiu Chen, Pin-Yu Chen, Hong-Han Shuai, and Wen-Chih Peng. 2020. TemPEST: Soft template-based personalized EDM subject generation through collaborative summarization. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 7538--7545.
    [8]
    Na Cheng, Rajarathnam Chandramouli, and KP Subbalakshmi. 2011. Author gender identification from text. Digital Investigation 8, 1 (2011), 78--88.
    [9]
    Ning Dai, Jianze Liang, Xipeng Qiu, and Xuan-Jing Huang. 2019. Style Transformer: Unpaired text style transfer without disentangled latent representation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 5997--6007.
    [10]
    Kalyanmoy Deb. 2014. Multi-objective optimization. In Search Methodologies. Springer, 403--449.
    [11]
    Sergey Edunov, Myle Ott, Michael Auli, and David Grangier. 2018. Understanding back-translation at scale. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 489--500.
    [12]
    Lucie Flek. 2020. Returning the N to NLP: Towards contextually personalized classification models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 7828--7838.
    [13]
    Ge Gao, Bin Xu, Dan Cosley, and Susan R Fussell. 2014. How beliefs about the presence of machine translation impact multilingual collaborations. In Proceedings of the 17th ACM conference on Computer Supported Cooperative Work & Social Computing. 1549--1560.
    [14]
    Ge Gao, Bin Xu, David C Hau, Zheng Yao, Dan Cosley, and Susan R Fussell. 2015. Two is better than one: improving multilingual collaboration by giving two machine translation outputs. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing. 852--863.
    [15]
    Spence Green, Jeffrey Heer, and Christopher D Manning. 2013. The efficacy of human post-editing for language translation. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 439--448.
    [16]
    Kotaro Hara and Shamsi T Iqbal. 2015. Effect of machine translation in interlingual conversation: Lessons from a formative study. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. 3473--3482.
    [17]
    Wei He, Zhongjun He, Hua Wu, and Haifeng Wang. 2016. Improved neural machine translation with SMT features. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 30.
    [18]
    Nico Herbig, Santanu Pal, Josef van Genabith, and Antonio Krüger. 2019. Multi-modal approaches for post-editing machine translation. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1--11.
    [19]
    Chang Hu, Benjamin B Bederson, Philip Resnik, and Yakov Kronrod. 2011. Monotrans2: A new human computation system to support monolingual translation. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 1133--1136.
    [20]
    Harsh Jhamtani, Varun Gangal, Eduard Hovy, and Eric Nyberg. 2017. Shakespearizing modern language using copy-enriched sequence-to-sequence models. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing 6 (2017), 10.
    [21]
    Osama Khalid and Padmini Srinivasan. 2020. Style matters! Investigating linguistic style in online communities. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 14. 360--369.
    [22]
    Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
    [23]
    Satwik Kottur, Xiaoyu Wang, and Vitor R Carvalho. 2017. Exploring personalized neural conversational models. In Proceedings of the 26th International Joint Conference on Artificial Intelligence. 3728--3734.
    [24]
    Jiwei Li, Michel Galley, Chris Brockett, Georgios P Spithourakis, Jianfeng Gao, and Bill Dolan. 2016. A persona-based neural conversation model. arXiv preprint arXiv:1603.06155 (2016).
    [25]
    Junjie Li, Haoran Li, and Chengqing Zong. 2019. Towards personalized review summarization via user-aware sequence network. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 6690--6697.
    [26]
    Pan Li and Alexander Tuzhilin. 2019. Towards controllable and personalized review generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 3228--3236.
    [27]
    Hajin Lim, Dan Cosley, and Susan R Fussell. 2018. Beyond translation: Design and evaluation of an emotional and contextual knowledge interface for foreign language social media posts. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1--12.
    [28]
    Hajin Lim and Susan R Fussell. 2017. Making sense of foreign language posts in social media. Proceedings of the ACM on Human-Computer Interaction 1, CSCW (2017), 1--16.
    [29]
    Tianyi Lin, Wentao Tian, Qiaozhu Mei, and Hong Cheng. 2014. The dual-sparse topic model: Mining focused topics and focused terms in short text. In Proceedings of the 23rd International Conference on World Wide Web. 539--550.
    [30]
    Bodhisattwa Prasad Majumder, Shuyang Li, Jianmo Ni, and Julian McAuley. 2019. Generating personalized recipes from historical user preferences. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 5978--5984.
    [31]
    Paul Michel and Graham Neubig. 2018. Extreme adaptation for personalized neural machine translation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 312--318.
    [32]
    Shachar Mirkin, Scott Nowson, Caroline Brun, and Julien Perez. 2015. Motivating personality-aware machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1102--1108.
    [33]
    Daisuke Oba, Naoki Yoshinaga, Shoetsu Sato, Satoshi Akasaki, and Masashi Toyoda. 2019. Modeling personal biases in language use by inducing personalized word embeddings. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2102--2108.
    [34]
    Thierry Poibeau. 2017. Machine Translation. MIT Press.
    [35]
    Ella Rabinovich, Raj Nath Patel, Shachar Mirkin, Lucia Specia, and Shuly Wintner. 2017. Personalized machine translation: Preserving original author traits. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers. 1074--1084.
    [36]
    Sudha Rao and Joel Tetreault. 2018. Dear sir or madam, may I introduce the GYAFC dataset: Corpus, benchmarks and metrics for formality style transfer. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 129--140.
    [37]
    Ramit Sawhney, Harshit Joshi, Saumya Gandhi, and Rajiv Ratn Shah. 2021. Towards ordinal suicide ideation detection on social media. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining. 22--30.
    [38]
    Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Improving neural machine translation models with monolingual data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 86--96.
    [39]
    Abhishek Singh, Eduardo Blanco, and Wei Jin. 2019. Incorporating emoji descriptions improves tweet classification. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2096--2101.
    [40]
    Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of the 2014 Annual Conference on Neural Information Processing Systems. 3104--3112.
    [41]
    Alexandru Tatar, Marcelo Dias De Amorim, Serge Fdida, and Panayotis Antoniadis. 2014. A survey on predicting the popularity of web content. Journal of Internet Services and Applications 5, 1 (2014), 8.
    [42]
    Hans Van Halteren, Harald Baayen, Fiona Tweedie, Marco Haverkort, and Anneke Neijt. 2005. New machine learning methods demonstrate the existence of a human stylome. Journal of Quantitative Linguistics 12, 1 (2005), 65--77.
    [43]
    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 6000--6010.
    [44]
    Hao-Chuan Wang, Susan Fussell, and Dan Cosley. 2013. Machine translation vs. common language: Effects on idea exchange in cross-lingual groups. In Proceedings of the 2013 Conference on Computer Supported Cooperative Work. 935--944.
    [45]
    Qiang Wang, Bei Li, Tong Xiao, Jingbo Zhu, Changliang Li, Derek F Wong, and Lidia S Chao. 2019. Learning deep Transformer models for machine translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 1810--1822.
    [46]
    Wen Wang, Wei Zhang, Jun Wang, Junchi Yan, and Hongyuan Zha. 2018. Learning sequential correlation for user generated textual content popularity prediction. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. 1625--1631.
    [47]
    Xing Wang, Zhengdong Lu, Zhaopeng Tu, Hang Li, Deyi Xiong, and Min Zhang. 2017. Neural machine translation advised by statistical machine translation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 31.
    [48]
    Yue Wang, Cuong Hoang, and Marcello Federico. 2021. Towards modeling the style of translators in neural machine translation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1193--1199.
    [49]
    Yunli Wang, Yu Wu, Lili Mou, Zhoujun Li, and Wenhan Chao. 2020. Formality style transfer with shared latent space. In Proceedings of the 28th International Conference on Computational Linguistics. 2236--2249.
    [50]
    Kyle Wiggers. 2020. How Google is using emerging AI techniques to improve language translation quality. https://venturebeat.com/2020/06/03/how-googleis-using-emerging-ai-techniques-to-improve-language-translation-quality/
    [51]
    Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. 2016. Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016).
    [52]
    Bin Xu, Ge Gao, Susan R Fussell, and Dan Cosley. 2014. Improving machine translation by showing two outputs. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 3743--3746.
    [53]
    Naomi Yamashita, Rieko Inaba, Hideaki Kuzuoka, and Toru Ishida. 2009. Difficulties in establishing common ground in multiparty groups using machine translation. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 679--688.
    [54]
    Naomi Yamashita and Toru Ishida. 2006. Effects of machine translation on collaborative work. In Proceedings of the 2006 Conference on Computer Supported Cooperative Work. 515--524.
    [55]
    Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1480--1489.
    [56]
    Wenhuan Zeng, Abulikemu Abuduweili, Lei Li, and Pengcheng Yang. 2019. Automatic generation of personalized comment based on user profile. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. 229--235.
    [57]
    Tianfu Zhang, Heyan Huang, Chong Feng, and Longbing Cao. 2021. Self-supervised bilingual syntactic alignment for neural machine translation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 14454--14462.
    [58]
    Wei-Nan Zhang, Qingfu Zhu, Yifa Wang, Yanyan Zhao, and Ting Liu. 2019. Neural personalized response generation as domain adaptation. World Wide Web 22, 4 (2019), 1427--1446.
    [59]
    Yinhe Zheng, Guanyi Chen, Minlie Huang, Song Liu, and Xuan Zhu. 2019. Personalized dialogue generation with diversified traits. arXiv preprint arXiv:1901.09672 (2019).
    [60]
    Yinhe Zheng, Rongsheng Zhang, Minlie Huang, and Xiaoxi Mao. 2020. A pre-training based personalized dialogue generation model with persona-sparse data. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 9693--9700.
    [61]
    Jianling Zhong, Weiwei Guo, Huiji Gao, and Bo Long. 2020. Personalized query suggestions. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1645--1648.

    Cited By

    View all
    • (2024)Predicting Multi-dimensional Surgical Outcomes with Multi-modal Mobile SensingProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36596288:2(1-30)Online publication date: 15-May-2024
    • (2024)WiFi-CSI Difference ParadigmProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36596088:2(1-29)Online publication date: 15-May-2024
    • (2024)PRECYSE: Predicting Cybersickness using Transformer for Multimodal Time-Series Sensor DataProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36595948:2(1-24)Online publication date: 15-May-2024
    • Show More Cited By

    Index Terms

    1. Building User-oriented Personalized Machine Translator based on User-Generated Textual Content

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Proceedings of the ACM on Human-Computer Interaction
      Proceedings of the ACM on Human-Computer Interaction  Volume 6, Issue CSCW2
      CSCW
      November 2022
      8205 pages
      EISSN:2573-0142
      DOI:10.1145/3571154
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 11 November 2022
      Published in PACMHCI Volume 6, Issue CSCW2

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Weibo
      2. linguistic style
      3. machine translation
      4. personalized
      5. user-generated textual content

      Qualifiers

      • Research-article

      Funding Sources

      • National Natural Science Foundation of China

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)87
      • Downloads (Last 6 weeks)6
      Reflects downloads up to 27 Jul 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Predicting Multi-dimensional Surgical Outcomes with Multi-modal Mobile SensingProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36596288:2(1-30)Online publication date: 15-May-2024
      • (2024)WiFi-CSI Difference ParadigmProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36596088:2(1-29)Online publication date: 15-May-2024
      • (2024)PRECYSE: Predicting Cybersickness using Transformer for Multimodal Time-Series Sensor DataProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36595948:2(1-24)Online publication date: 15-May-2024
      • (2024)AutoAugHAR: Automated Data Augmentation for Sensor-based Human Activity RecognitionProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36595898:2(1-27)Online publication date: 15-May-2024
      • (2024)Intelligent Wearable Systems: Opportunities and Challenges in Health and SportsACM Computing Surveys10.1145/364846956:7(1-42)Online publication date: 9-Apr-2024
      • (2024)MetaFormerProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36435508:1(1-27)Online publication date: 6-Mar-2024
      • (2024)Community Archetypes: An Empirical Framework for Guiding Research Methodologies to Reflect User Experiences of Sense of Virtual Community on RedditProceedings of the ACM on Human-Computer Interaction10.1145/36373108:CSCW1(1-33)Online publication date: 26-Apr-2024
      • (2024)Asynchronous Probabilistic Couplings in Higher-Order Separation LogicProceedings of the ACM on Programming Languages10.1145/36328688:POPL(753-784)Online publication date: 5-Jan-2024
      • (2024)Calculational Design of [In]Correctness Transformational Program Logics by Abstract InterpretationProceedings of the ACM on Programming Languages10.1145/36328498:POPL(175-208)Online publication date: 5-Jan-2024
      • (2024)Deep Heterogeneous Contrastive Hyper-Graph Learning for In-the-Wild Context-Aware Human Activity RecognitionProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36314447:4(1-23)Online publication date: 12-Jan-2024
      • Show More Cited By

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media