research-article

Unsupervised Representation Learning of Player Behavioral Data with Confidence Guided Masking

Authors:

Runze WuAuthors Info & Claims

WWW '22: Proceedings of the ACM Web Conference 2022

Pages 3396 - 3406

https://doi.org/10.1145/3485447.3512275

Published: 25 April 2022 Publication History

Abstract

Players of online games generate rich behavioral data during gaming. Based on these data, game developers can build a range of data science applications, such as bot detection and social recommendation, to improve the gaming experience. However, the development of such applications requires data cleansing, training sample labeling, feature engineering, and model development, which makes the use of such applications in small and medium-sized game studios still uncommon. While acquiring supervised learning data is costly, unlabeled behavioral logs are often continuously and automatically generated in games. Thus we resort to unsupervised representation learning of player behavioral data to optimize intelligent services in games. Behavioral data has many unique properties, including semantic complexity, excessive length, etc. A worth noting property within raw player behavioral data is that a lot of it is task-irrelevant. For these data characteristics, we introduce a BPE-enhanced compression method and propose a novel adaptive masking strategy called Masking by Token Confidence (MTC) for the Masked Language Modeling (MLM) pre-training task. MTC is designed to increase the masking probabilities of task-relevant tokens. Experiments on four downstream tasks and successful deployment in a world-renowned Massively Multiplayer Online Role-Playing Game (MMORPG) prove the effectiveness of the MTC strategy1.

References

[1]

Sareh Aghaei, Mohammad Ali Nematbakhsh, and Hadi Khosravi Farsani. 2012. EVOLUTION OF THE WORLD WIDE WEB: FROM WEB 1.0 TO WEB 4.0. International Journal of Web & Semantic Technology (IJWesT) 3, 1(2012). https://doi.org/10.5121/ijwest.2012.3101

[2]

Armen Aghajanyan, Luke Zettlemoyer, and Sonal Gupta. 2020. Intrinsic dimensionality explains the effectiveness of language model fine-tuning. arXiv preprint arXiv:2012.13255(2020).

[3]

Mihael Ankerst, Markus M Breunig, Hans-Peter Kriegel, and Jörg Sander. 1999. OPTICS: Ordering points to identify the clustering structure. ACM Sigmod record 28, 2 (1999), 49–60.

[4]

Sanjeev Arora, Hrishikesh Khandeparkar, Mikhail Khodak, Orestis Plevrakis, and Nikunj Saunshi. 2019. A theoretical analysis of contrastive unsupervised representation learning. In 36th International Conference on Machine Learning, ICML 2019. International Machine Learning Society (IMLS), 9904–9923.

[5]

David Arthur and Sergei Vassilvitskii. 2006. k-means++: The advantages of careful seeding. Technical Report.

[6]

Iz Beltagy, Matthew E Peters, and Arman Cohan. 2020. Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150(2020).

[7]

David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. the Journal of machine Learning research 3 (2003), 993–1022.

[8]

Kaj Bostrom and Greg Durrett. 2020. Byte Pair Encoding is Suboptimal for Language Model Pretraining. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings. 4617–4624.

[9]

Kevin Clark, Minh-Thang Luong, Quoc V Le, and Christopher D Manning. 2019. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. In International Conference on Learning Representations.

[10]

Qilin Deng, Kai Wang, Minghao Zhao, Zhene Zou, Runze Wu, Jianrong Tao, Changjie Fan, and Liang Chen. 2020. Personalized Bundle Recommendation in Online Games. In CIKM. 2381–2388.

[11]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL. 4171–4186.

[12]

Anders Drachen, Alessandro Canossa, and Georgios N Yannakakis. 2009. Player modeling using self-organization in Tomb Raider: Underworld. In 2009 IEEE symposium on computational intelligence and games. IEEE, 1–8.

[13]

Sharanya Eswaran, Mridul Sachdeva, Vikram Vimal, Deepanshi Seth, Suhaas Kalpam, Sanjay Agarwal, Tridib Mukherjee, and Samrat Dattagupta. 2020. Game Action Modeling for Fine Grained Analyses of Player Behavior in Multi-player Card Games (Rummy as Case Study). In SIGKDD. 2657–2665.

[14]

Philip Gage. 1994. A new algorithm for data compression. C Users Journal 12, 2 (1994), 23–38.

Digital Library

[15]

Akhilesh Gotmare, Nitish Shirish Keskar, Caiming Xiong, and Richard Socher. 2018. A closer look at deep learning heuristics: Learning rate restarts, warmup and distillation. arXiv preprint arXiv:1810.13243(2018).

[16]

Cyril Goutte and Eric Gaussier. [n.d.]. A Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation. In Advances in Information Retrieval.

[17]

R Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Phil Bachman, Adam Trischler, and Yoshua Bengio. 2018. Learning deep representations by mutual information estimation and maximization. In International Conference on Learning Representations.

[18]

Christoffer Holmgard, Antonios Liapis, Julian Togelius, and Georgios N Yannakakis. 2014. Generative agents for player decision modeling in games. (2014).

[19]

Rafael-Michael Karampatsis, Hlib Babii, Romain Robbes, Charles Sutton, and Andrea Janes. 2020. Big code!= big vocabulary: Open-vocabulary models for source code. In 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). IEEE, 1073–1085.

Digital Library

[20]

Tae Kyun Kim. 2015. T test as a parametric statistic. Korean journal of anesthesiology 68, 6 (2015), 540.

[21]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980(2014).

[22]

Nikita Kitaev, Lukasz Kaiser, and Anselm Levskaya. 2019. Reformer: The Efficient Transformer. In International Conference on Learning Representations.

[23]

Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In International conference on machine learning. PMLR, 1188–1196.

[24]

Eunjo Lee, Yoonjae Jang, DuMim Yoon, JiHoon Jeon, Seong-il Yang, Sang-Kwang Lee, Dae-Wook Kim, Pei Pei Chen, Anna Guitart, Paul Bertens, 2018. Game data mining competition on churn prediction and survival analysis using commercial game log data. arXiv preprint arXiv:1802.02301(2018).

[25]

Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. 2020. BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 4 (2020), 1234–1240.

[26]

Yuhong Li, Xiaofan Zhang, and Deming Chen. 2018. Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1091–1100.

[27]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. (2019).

[28]

Damianos P Melidis, Brandon Malone, and Wolfgang Nejdl. 2020. dom2vec: Unsupervised protein domain embeddings capture domains structure and function providing data-driven insights into collocations in domain architectures. bioRxiv (2020).

[29]

Martin Pavlovski, Jelena Gligorijevic, Ivan Stojkovic, Shubham Agrawal, Shabhareesh Komirishetty, Djordje Gligorijevic, Narayan Bhamidipati, and Zoran Obradovic. 2020. Time-Aware User Embeddings as a Service. In SIGKDD. 3194–3202.

[30]

Johannes Pfau, Jan David Smeddinck, and Rainer Malaka. 2018. Towards deep player behavior models in mmorpgs. In Proceedings of the 2018 Annual Symposium on Computer-Human Interaction in Play. 381–392.

Digital Library

[31]

Di Qi, Lin Su, Jia Song, Edward Cui, Taroon Bharti, and Arun Sacheti. 2020. Imagebert: Cross-modal pre-training with large-scale weak-supervised image-text data. arXiv preprint arXiv:2001.07966(2020).

[32]

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2018. Language Models are Unsupervised Multitask Learners. (2018). https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf

[33]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research 21 (2020), 1–67.

[34]

Juan Ramos 2003. Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning, Vol. 242. Citeseer, 29–48.

[35]

Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural Machine Translation of Rare Words with Subword Units. In ACL. 1715–1725.

[36]

Kyong Jin Shim, Nishith Pathak, Muhammad A Ahmad, Colin DeLong, Zoheb Borbora, Amogh Mahapatra, and Jaideep Srivastava. 2011. Analyzing human behavior from multiplayer online game logs: A knowledge discovery approach. In IEEE Intelligent Systems, Vol. 26. IEEE, 85–89.

[37]

Adam M Smith, Chris Lewis, Kenneth Hullett, Gillian Smith, and Anne Sullivan. 2011. An inclusive taxonomy of player modeling. University of California, Santa Cruz, Tech. Rep. UCSC-SOE-11-13 (2011).

[38]

Douglas Steinley, Michael J Brusco, and Lawrence Hubert. 2016. The Variance of the Adjusted Rand Index. Psychological methods 21, 2 (2016), 261–272.

[39]

Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Hao Tian, Hua Wu, and Haifeng Wang. 2020. Ernie 2.0: A continual pre-training framework for language understanding. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 8968–8975.

[40]

Mirko Suznjevic, Ivana Stupar, and Maja Matijasevic. 2011. MMORPG player behavior model based on player action categories. In 2011 10th Annual Workshop on Network and Systems Support for Games. IEEE, 1–6.

[41]

Jianrong Tao, Jianshi Lin, Shize Zhang, Sha Zhao, Runze Wu, Changjie Fan, and Peng Cui. 2019. Mvan: Multi-view attention networks for real money trading detection in online games. In SIGKDD. 2536–2546.

[42]

Jianrong Tao, Jiarong Xu, Linxia Gong, Yifu Li, Changjie Fan, and Zhou Zhao. 2018. NGUARD: A Game Bot Detection Framework for NetEase MMORPGs. In SIGKDD. 811–820.

Digital Library

[43]

Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, and Donald Metzler. 2020. Long Range Arena: A Benchmark for Efficient Transformers. In International Conference on Learning Representations.

[44]

Bogdan Okresa Ðuric and Mario Konecki. 2015. Modeling MMORPG Players’ Behaviour. In Central European Conference on Information and Intelligent Systems. Faculty of Organization and Informatics Varazdin, 177.

[45]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In NeurIPS.

[46]

Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel Bowman. 2018. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. 353–355.

[47]

Wei Wang, Bin Bi, Ming Yan, Chen Wu, Jiangnan Xia, Zuyi Bao, Liwei Peng, and Luo Si. 2019. StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding. In International Conference on Learning Representations.

[48]

Weiran Wang, Guangsen Wang, Aadyot Bhatnagar, Yingbo Zhou, Caiming Xiong, and Richard Socher. 2020. An Investigation of Phone-Based Subword Units for End-to-End Speech Recognition. Proc. Interspeech 2020(2020), 1778–1782.

[49]

Runze Wu, Hao Deng, Jianrong Tao, Changjie Fan, Qi Liu, and Liang Chen. 2020. Deep Behavior Tracing with Multi-level Temporality Preserved Embedding. In CIKM. 2813–2820.

[50]

Jiarong Xu, Yifan Luo, Jianrong Tao, Changjie Fan, Zhou Zhao, and Jiangang Lu. 2020. NGUARD+ An Attention-based Game Bot Detection Framework via Player Behavior Sequences. ACM Transactions on Knowledge Discovery from Data (TKDD) 14, 6(2020), 1–24.

[51]

Georgios N Yannakakis, Pieter Spronck, Daniele Loiacono, and Elisabeth André. 2013. Player modeling. (2013).

[52]

Sha Yuan, Shuotian Bai, Mengmeng Song, and Zhenyu Zhou. 2017. Customer churn prediction in the online new media platform: a case study on juzi entertainment. In 2017 International Conference on Platform Technology and Service (PlatCon). IEEE, 1–5.

[53]

Shiwei Zhao, Runze Wu, Jianrong Tao, Manhu Qu, Hao Li, and Changjie Fan. 2020. Multi-source Data Multi-task Learning for Profiling Players in Online Games. In 2020 IEEE Conference on Games (CoG). IEEE, 104–111.

[54]

Angyu Zheng, Liang Chen, Fenfang Xie, Jianrong Tao, Changjie Fan, and Zibin Zheng. 2020. Keep You from Leaving: Churn Prediction in Online Games. In International Conference on Database Systems for Advanced Applications. Springer, 263–279.

Digital Library

Cited By

Zhang GBi S(2024)Evolutionary game analysis of online game studios and online game companies participating in the virtual economy of online gamesPLOS ONE10.1371/journal.pone.029637419:1(e0296374)Online publication date: 24-Jan-2024
https://doi.org/10.1371/journal.pone.0296374
Fan GZhang CWang KLi YChen JXu Z(2024)CUPID: Improving Battle Fairness and Position Satisfaction in Online MOBA Games with a Re-matchmaking SystemProceedings of the ACM on Human-Computer Interaction10.1145/36869788:CSCW2(1-39)Online publication date: 8-Nov-2024
https://dl.acm.org/doi/10.1145/3686978
Teng FChen YXu W(2023)IEC-FOF: An Industrial Electricity Consumption Forecasting and Optimization FrameworkEdge Computing and IoT: Systems, Management and Security10.1007/978-3-031-28990-3_8(97-110)Online publication date: 31-Mar-2023
https://doi.org/10.1007/978-3-031-28990-3_8

Index Terms

Unsupervised Representation Learning of Player Behavioral Data with Confidence Guided Masking

Index terms have been assigned to the content through auto-classification.

Recommendations

The Trojan Player Typology

New scale developed for motivations to play video games.Validated using a North American MOBA and a Chinese MMO.Validated using both survey and server-side behavioral measures. While many video game researchers have built scales to tackle the ...
Data Mining for Player Modeling in Videogames
PCI '09: Proceedings of the 2009 13th Panhellenic Conference on Informatics

In this paper we propose a method of videogame player modeling based on clustering of behavior data collected during game play. Based on the style of play, and game mechanics, we define two player types the action player and the tactical player. We then ...
Testing Behavioral Models with an Online Game

The WarPipe massively multiplayer online game engine promises to enhance games and more serious applications.

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '22: Proceedings of the ACM Web Conference 2022

April 2022

3764 pages

ISBN:9781450390965

DOI:10.1145/3485447

Editors:
Frédérique Laforest
INSA Lyon, France
,
Raphaël Troncy
EURECOM, France
,
Elena Simperl
King’s College London, UK
,
Deepak Agarwal
Pinterest, USA
,
Aristides Gionis
KTH Royal Institute of Technology, Sweden
,
Ivan Herman
W3C / retired
,
Lionel Médini
Université Lyon 1, France

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 April 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

WWW '22

Sponsor:

SIGWEB

WWW '22: The ACM Web Conference 2022

April 25 - 29, 2022

Virtual Event, Lyon, France

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
286
Total Downloads

Downloads (Last 12 months)40
Downloads (Last 6 weeks)2

Reflects downloads up to 22 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang GBi S(2024)Evolutionary game analysis of online game studios and online game companies participating in the virtual economy of online gamesPLOS ONE10.1371/journal.pone.029637419:1(e0296374)Online publication date: 24-Jan-2024
https://doi.org/10.1371/journal.pone.0296374
Fan GZhang CWang KLi YChen JXu Z(2024)CUPID: Improving Battle Fairness and Position Satisfaction in Online MOBA Games with a Re-matchmaking SystemProceedings of the ACM on Human-Computer Interaction10.1145/36869788:CSCW2(1-39)Online publication date: 8-Nov-2024
https://dl.acm.org/doi/10.1145/3686978
Teng FChen YXu W(2023)IEC-FOF: An Industrial Electricity Consumption Forecasting and Optimization FrameworkEdge Computing and IoT: Systems, Management and Security10.1007/978-3-031-28990-3_8(97-110)Online publication date: 31-Mar-2023
https://doi.org/10.1007/978-3-031-28990-3_8

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents