research-article

Building a Personalized Model for Social Media Textual Content Censorship

Authors:

Zhengqing Guan,

Ning GuAuthors Info & Claims

Proceedings of the ACM on Human-Computer Interaction, Volume 6, Issue CSCW2

Article No.: 499, Pages 1 - 31

https://doi.org/10.1145/3555657

Published: 11 November 2022 Publication History

Abstract

Social media users often suffer from the problem of content over-disclosure. Most existing studies attempt to solve this problem by recommending proper audiences for users when sharing content. However, the audience management strategy cannot filter out sensitive information from the post and narrow the scope of content permeation. On the contrary, this paper conducts research from the content perspective and aims to design a content censorship model to help users evaluate the publicity of a post and find the sensitive information from it. The user can revise the content accordingly to achieve goals of sensitive information protection and broader content permeation. For this intention, we first built a dataset to explore the factors related to the public level of a post and the sensitive information. Based on the findings, a novel personalized multi-task content censorship model was built using several state-of-the-art deep learning techniques such as Seq2Seq and Co-training. We also implemented a prototype, i.e. a Browser plugin-based content censorship tool, by utilizing Weibo as a research site. Our model and its prototype were evaluated through automatic and human evaluations. The automatic evaluation suggests that our model outperforms the baseline methods on several metrics including precision, recall, and F1-score. The human evaluation also reveals that our model and prototype play an important role in helping users identify sensitive information. Based on these results, we proposed several insights for the future design of the social media content censorship system.

References

[1]

Davide Alberto Albertini, Barbara Carminati, and Elena Ferrari. 2016. Privacy settings recommender for online social network. In 2016 IEEE 2nd International Conference on Collaboration and Internet Computing (CIC). IEEE, 514--521.

[2]

Irwin Altman. 1975. The environment and social behavior: privacy, personal space, territory, and crowding. (1975).

[3]

Irwin Altman, Anne Vinsel, and Barbara B Brown. 1981. Dialectic conceptions in social psychology: An application to social penetration and privacy regulation. In Advances in experimental social psychology. Vol. 14. Elsevier, 107--160.

[4]

Michael S Bernstein, Eytan Bakshy, Moira Burke, and Brian Karrer. 2013. Quantifying the invisible audience in social networks. In Proceedings of the SIGCHI conference on human factors in computing systems. 21--30.

Digital Library

[5]

Lindsay Blackwell, Jill Dimond, Sarita Schoenebeck, and Cliff Lampe. 2017. Classification and its consequences for online harassment: Design insights from heartmob. Proceedings of the ACM on Human-Computer Interaction 1, CSCW (2017), 1--19.

Digital Library

[6]

Pete Burnap and Matthew L Williams. 2015. Cyber hate speech on twitter: An application of machine classification and statistical modeling for policy and decision making. Policy & internet 7, 2 (2015), 223--242.

[7]

Liang Cai, Haoye Wang, Bowen Xu, Qiao Huang, Xin Xia, David Lo, and Zhenchang Xing. 2019. AnswerBot: an answer summary generation tool based on stack overflow. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1134--1138.

Digital Library

[8]

Stevie Chancellor, Zhiyuan Lin, Erica L Goodman, Stephanie Zerwas, and Munmun De Choudhury. 2016. Quantifying and predicting mental illness severity in online pro-eating disorder communities. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing. 1171--1184.

Digital Library

[9]

Gorrell P Cheek and Mohamed Shehab. 2012. Policy-by-example for online social networks. In Proceedings of the 17th ACM symposium on Access Control Models and Technologies. 23--32.

Digital Library

[10]

Eugene Cho, S Shyam Sundar, Saeed Abdullah, and Nasim Motalebi. 2020. Will deleting history make alexa more trustworthy? effects of privacy and content customization on user experience of smart speakers. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1--13.

Digital Library

[11]

Hichang Cho and Anna Filippova. 2016. Networked privacy management in facebook: A mixed-methods and multinational study. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing. 503--514.

Digital Library

[12]

Shaika Chowdhury, Chenwei Zhang, and Philip S Yu. 2018. Multi-task pharmacovigilance mining from social media posts. In Proceedings of the 2018 World Wide Web Conference. 117--126.

Digital Library

[13]

Sauvik Das and Adam Kramer. 2013. Self-censorship on Facebook. In Seventh international AAAI conference on weblogs and social media.

[14]

data.weibo.com. 2020. Weibo User Development Report. Retrieved Aug 12, 2022 from https://data.weibo.com/report/reportDetail?id=456

[15]

Munmun De Choudhury, Michael Gamon, Scott Counts, and Eric Horvitz. 2013. Predicting depression via social media. In Seventh international AAAI conference on weblogs and social media.

[16]

Michael A DeVito, Ashley Marie Walker, and Jeremy Birnholtz. 2018. 'Too Gay for Facebook' Presenting LGBTQ Identity Throughout the Personal Social Media Ecosystem. Proceedings of the ACM on Human-Computer Interaction 2, CSCW (2018), 1--23.

Digital Library

[17]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[18]

Daxiang Dong, Hua Wu, Wei He, Dianhai Yu, and Haifeng Wang. 2015. Multi-task learning for multiple language translation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 1723--1732.

[19]

Sindhu Kiranmai Ernala, Stephanie S Yang, Yuxi Wu, Rachel Chen, Kristen Wells, and Sauvik Das. 2021. Exploring the Utility Versus Intrusiveness of Dynamic Audience Selection on Facebook. Proceedings of the ACM on Human-Computer Interaction 5, CSCW2 (2021), 1--30.

Digital Library

[20]

Lujun Fang and Kristen LeFevre. 2010. Privacy wizards for social networking sites. In Proceedings of the 19th international conference on World wide web. 351--360.

Digital Library

[21]

Casey Fiesler, Michaelanne Dye, Jessica L Feuston, Chaya Hiruncharoenvate, Clayton J Hutto, Shannon Morrison, Parisa Khanipour Roshan, Umashanthi Pavalanathan, Amy S Bruckman, Munmun De Choudhury, et al. 2017. What (or who) is public? Privacy settings and social media content sharing. In Proceedings of the 2017 ACM conference on computer supported cooperative work and social computing. 567--580.

[22]

Kambiz Ghazinour, Stan Matwin, and Marina Sokolova. 2013. Monitoring and recommending privacy settings in social networks. In Proceedings of the Joint EDBT/ICDT 2013 Workshops. 164--168.

Digital Library

[23]

Anatoliy Gruzd, Barry Wellman, and Yuri Takhteyev. 2011. Imagining Twitter as an imagined community. American Behavioral Scientist 55, 10 (2011), 1294--1318.

[24]

Karim Hadjar and Ahmed Jedidi. 2019. A New Approach for Scheduling Tasks and/or Jobs in Big Data Cluster. In 2019 4th MEC International Conference on Big Data and Smart City (ICBDSC). IEEE, 1--4.

[25]

Ruidan He, Wee Sun Lee, Hwee Tou Ng, and Daniel Dahlmeier. 2018. Exploiting document knowledge for aspect-level sentiment classification. arXiv preprint arXiv:1806.04346 (2018).

[26]

Ruidan He, Wee Sun Lee, Hwee Tou Ng, and Daniel Dahlmeier. 2019. An interactive multi-task learning network for end-to-end aspect-based sentiment analysis. arXiv preprint arXiv:1906.06906 (2019).

[27]

Joanne Hinds and Adam N Joinson. 2018. What demographic attributes do our digital footprints reveal? A systematic review. PloS one 13, 11 (2018), e0207112.

[28]

Bernie Hogan. 2010. The presentation of self in the age of social media: Distinguishing performances and exhibitions online. Bulletin of Science, Technology & Society 30, 6 (2010), 377--386.

[29]

Lee Humphreys, Phillipa Gill, and Balachander Krishnamurthy. 2010. How much is too much? Privacy issues on Twitter. In Conference of international communication association, singapore. Citeseer.

[30]

Kokil Jaidka, Sharath Guntuku, and Lyle Ungar. 2018. Facebook versus Twitter: Differences in self-disclosure and trait prediction. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 12.

[31]

Carter Jernigan and Behram FT Mistree. 2009. Gaydar: Facebook friendships expose sexual orientation. First Monday (2009).

[32]

Shagun Jhaver, Iris Birman, Eric Gilbert, and Amy Bruckman. 2019. Human-machine collaboration for content regulation: The case of reddit automoderator. ACM Transactions on Computer-Human Interaction (TOCHI) 26, 5 (2019), 1--35.

Digital Library

[33]

Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Hérve Jégou, and Tomas Mikolov. 2016. Fasttext.zip: Compressing text classification models. arXiv preprint arXiv:1612.03651 (2016).

[34]

Prashant Kapil and Asif Ekbal. 2020. A deep neural network based multi-task learning approach to hate speech detection. Knowledge-Based Systems 210 (2020), 106458.

[35]

Jennifer King. 2019. " Becoming Part of Something Bigger" Direct to Consumer Genetic Testing, Privacy, and Personal Disclosure. Proceedings of the ACM on Human-Computer Interaction 3, CSCW (2019), 1--33.

[36]

A Can Kurtan and Pinar Yolum. 2021. Assisting humans in privacy management: an agent-based approach. Autonomous Agents and Multi-Agent Systems 35, 1 (2021), 1--33.

[37]

Bing Liu and Ian Lane. 2016. Attention-based recurrent neural network models for joint intent detection and slot filling. arXiv preprint arXiv:1609.01454 (2016).

[38]

Harish Tayyar Madabushi, Edward Gow-Smith, Carolina Scarton, and Aline Villavicencio. 2021. A Stitch In Language-Models: Dataset and Methods for the Exploration of Idiomaticity in Pre-Trained Language Models. arXiv preprint arXiv:2109.04413 (2021).

[39]

Michelle Madejski, Maritza Johnson, and Steven M Bellovin. 2012. A study of privacy settings errors in an online social network. In 2012 IEEE International Conference on Pervasive Computing and Communications Workshops. IEEE, 340--345.

[40]

Bodhisattwa Prasad Majumder, Shuyang Li, Jianmo Ni, and Julian McAuley. 2019. Generating personalized recipes from historical user preferences. arXiv preprint arXiv:1909.00105 (2019).

[41]

Alice E Marwick and Danah Boyd. 2011. I tweet honestly, I tweet passionately: Twitter users, context collapse, and the imagined audience. New media & society 13, 1 (2011), 114--133.

[42]

Gaurav Misra and Jose M Such. 2017. Pacman: Personal agent for access control in social media. IEEE Internet Computing 21, 6 (2017), 18--26.

[43]

Raphael Ottoni, Diego Las Casas, Joao Paulo Pesce, Wagner Meira Jr, Christo Wilson, Alan Mislove, and Virgilio Almeida. 2014. Of pins and tweets: Investigating how users behave across image-and text-based social networks. In Eighth international aaai conference on weblogs and social media.

[44]

Xinru Page, Reza Ghaiumy Anaraky, Bart P Knijnenburg, and Pamela J Wisniewski. 2019. Pragmatic Tool vs. Relational Hindrance: Exploring Why Some Social Media Users Avoid Privacy Features. Proceedings of the ACM on Human-Computer Interaction 3, CSCW (2019), 1--23.

Digital Library

[45]

James W Pennebaker, Martha E Francis, and Roger J Booth. 2001. Linguistic inquiry and word count: LIWC 2001. Mahway: Lawrence Erlbaum Associates 71, 2001 (2001), 2001.

[46]

Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532--1543.

[47]

Sandra Petronio. 2002. Boundaries of privacy: Dialectics of disclosure. Suny Press.

[48]

Sandra Petronio. 2013. Brief status report on communication privacy management theory. Journal of Family Communication 13, 1 (2013), 6--14.

[49]

Yasmeen Rashidi, Apu Kapadia, Christena Nippert-Eng, and Norman Makoto Su. 2020. " It's easier than causing confrontation": Sanctioning Strategies to Maintain Social Norms and Privacy on Social Media. Proceedings of the ACM on Human-Computer Interaction 4, CSCW1 (2020), 1--25.

Digital Library

[50]

Patrick Skeba and Eric PS Baumer. 2020. Informational Friction as a Lens for Studying Algorithmic Aspects of Privacy. Proceedings of the ACM on Human-Computer Interaction 4, CSCW2 (2020), 1--22.

Digital Library

[51]

Manya Sleeper, Rebecca Balebako, Sauvik Das, Amber Lynn McConahy, Jason Wiese, and Lorrie Faith Cranor. 2013. The post that wasn't: exploring self-censorship on facebook. In Proceedings of the 2013 conference on Computer supported cooperative work. 793--802.

Digital Library

[52]

Xuemeng Song, Xiang Wang, Liqiang Nie, Xiangnan He, Zhumin Chen, and Wei Liu. 2018. A personal privacy preserving framework: I let you know who can see what. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 295--304.

Digital Library

[53]

Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15, 1 (2014), 1929--1958.

Digital Library

[54]

Frederic Stutzman and Woodrow Hartzog. 2012. Boundary regulation in social media. In Proceedings of the ACM 2012 conference on computer supported cooperative work. 769--778.

Digital Library

[55]

Adrian Tear and Humphrey Southall. 2019. Social media data. Data in Society: Challenging Statistics in an Age of Globalisation (2019), 47.

[56]

Jessica Vitak, Katie Shilton, and Zahra Ashktorab. 2016. Beyond the Belmont principles: Ethical challenges, practices, and beliefs in the online data research community. In Proceedings of the 19th ACM conference on computer-supported cooperative work & social computing. 941--953.

Digital Library

[57]

Yang Wang, Gregory Norcie, Saranga Komanduri, Alessandro Acquisti, Pedro Giovanni Leon, and Lorrie Faith Cranor. 2011. " I regretted the minute I pressed share" a qualitative study of regrets on Facebook. In Proceedings of the seventh symposium on usable privacy and security. 1--16.

Digital Library

[58]

Pamela Wisniewski, Heather Lipford, and David Wilson. 2012. Fighting for my space: Coping mechanisms for SNS boundary regulation. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 609--618.

Digital Library

[59]

Austin P Wright, Omar Shaikh, Haekyu Park, Will Epperson, Muhammed Ahmed, Stephane Pinel, Duen Horng Chau, and Diyi Yang. 2021. RECAST: Enabling User Recourse and Interpretability of Toxicity Detection Models with Interactive Visualization. Proceedings of the ACM on Human-Computer Interaction 5, CSCW1 (2021), 1--26.

Digital Library

[60]

Jue Wu, Junyi Ma, Yasha Wang, and Jiangtao Wang. 2021. Understanding and Predicting the Burst of Burnout via Social Media. Proceedings of the ACM on Human-Computer Interaction 4, CSCW3 (2021), 1--27.

Digital Library

[61]

www.pewresearch.org. 2014. Public Perceptions of Privacy and Security in the Post-Snowden Era. Retrieved Aug 12, 2022 from https://www.pewresearch.org/internet/2014/11/12/public-privacy-perceptions/

[62]

Liang Xu, Qianqian Dong, Cong Yu, Yin Tian, Weitang Liu, Lu Li, and Xuanwei Zhang. 2020. CLUENER2020: Fine-grained Name Entity Recognition for Chinese. arXiv preprint arXiv:2001.04351 (2020).

[63]

Alyson L Young and Anabel Quan-Haase. 2009. Information revelation and internet privacy concerns on social network sites: a case study of facebook. In Proceedings of the fourth international conference on Communities and technologies. 265--274.

Digital Library

[64]

Peng Zhang, Baoxi Liu, Xianghua Ding, Tun Lu, Hansu Gu, and Ning Gu. 2021. Studying and understanding characteristics of post-syncing practice and goal in social network sites. ACM Transactions on the Web (TWEB) 15, 4 (2021), 1--26.

Digital Library

[65]

Yu Zhang and Qiang Yang. 2018. An overview of multi-task learning. National Science Review 5, 1 (2018), 30--43.

[66]

Yinhe Zheng, Rongsheng Zhang, Minlie Huang, and Xiaoxi Mao. 2020. A pre-training based personalized dialogue generation model with persona-sparse data. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 9693--9700.

Cited By

Zhang QLan YGuo KWang D(2024)Lipwatch: Enabling Silent Speech Recognition on Smartwatches using Acoustic SensingProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36596148:2(1-29)Online publication date: 15-May-2024
https://dl.acm.org/doi/10.1145/3659614
Stein BChang BSridharan M(2024)Interactive Abstract Interpretation with Demanded SummarizationACM Transactions on Programming Languages and Systems10.1145/364844146:1(1-40)Online publication date: 29-Mar-2024
https://dl.acm.org/doi/10.1145/3648441
He JXiong JHu WFeng CYao EWang XLiu CChen XOkoshi TKo JLiKamWa R(2024)CW-AcousLen: A Configurable Wideband Acoustic MetasurfaceProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661882(29-41)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3643832.3661882
Show More Cited By

Index Terms

Building a Personalized Model for Social Media Textual Content Censorship
1. Human-centered computing
  1. Collaborative and social computing

Recommendations

Social image tag enrichment based on textual similarity modeling

In social image sharing websites, users provide several descriptive tags to annotate their shared images. Usually, the user annotated tags are noisy, biased and incomplete. How to improve tag quality is very important for tag based applications. The ...
Does content determine information popularity in social media?: a case study of youtube videos' content and their popularity
CHI '14: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

We here investigate what drives the popularity of information on social media platforms. Focusing on YouTube, we seek to understand the extent to which content by itself determines a video's popularity. Using mechanical turk as experimental platform, we ...
Automatic creation of photo books from stories in social media
WSM '10: Proceedings of second ACM SIGMM workshop on Social media

Photos are a special way to tell stories of our best memories. The representation of those photos in appealing physical photo books is highly appreciated by many people. Today, many photos are shared via social networking sites, where people upload ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Human-Computer Interaction

Proceedings of the ACM on Human-Computer Interaction Volume 6, Issue CSCW2

CSCW

November 2022

8205 pages

EISSN:2573-0142

DOI:10.1145/3571154

Editor:
Jeff Nichols
Google

Issue’s Table of Contents

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 November 2022

Published in PACMHCI Volume 6, Issue CSCW2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

24
Total Citations
View Citations
336
Total Downloads

Downloads (Last 12 months)118
Downloads (Last 6 weeks)11

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang QLan YGuo KWang D(2024)Lipwatch: Enabling Silent Speech Recognition on Smartwatches using Acoustic SensingProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36596148:2(1-29)Online publication date: 15-May-2024
https://dl.acm.org/doi/10.1145/3659614
Stein BChang BSridharan M(2024)Interactive Abstract Interpretation with Demanded SummarizationACM Transactions on Programming Languages and Systems10.1145/364844146:1(1-40)Online publication date: 29-Mar-2024
https://dl.acm.org/doi/10.1145/3648441
He JXiong JHu WFeng CYao EWang XLiu CChen XOkoshi TKo JLiKamWa R(2024)CW-AcousLen: A Configurable Wideband Acoustic MetasurfaceProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661882(29-41)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3643832.3661882
Sheng BHan RXiao FGuo ZGui L(2024)MetaFormerProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36435508:1(1-27)Online publication date: 6-Mar-2024
https://dl.acm.org/doi/10.1145/3643550
Wang FLv YZhu MDing HHan J(2024)XRF55Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36435438:1(1-34)Online publication date: 6-Mar-2024
https://dl.acm.org/doi/10.1145/3643543
Liu HLiu XXie XTong XLi K(2024)PmTrackProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36314337:4(1-30)Online publication date: 12-Jan-2024
https://dl.acm.org/doi/10.1145/3631433
Li WHe TJing NWang L(2023)mmHSV: In-Air Handwritten Signature Verification via Millimeter-Wave RadarACM Transactions on Internet of Things10.1145/36144434:4(1-22)Online publication date: 22-Nov-2023
https://dl.acm.org/doi/10.1145/3614443
Dai JMoffatt K(2023)Enriching Social Sharing for the Dementia Community: Insights from In-Person and Online Social ProgramsACM Transactions on Accessible Computing10.1145/358255816:1(1-33)Online publication date: 29-Mar-2023
https://dl.acm.org/doi/10.1145/3582558
Monjur MLuo YWang ZNirjon SHui PAmiri Sani ANurmi PLiu Y(2023)SoundSieve: Seconds-Long Audio Event Recognition on Intermittently-Powered SystemsProceedings of the 21st Annual International Conference on Mobile Systems, Applications and Services10.1145/3581791.3596859(28-41)Online publication date: 18-Jun-2023
https://dl.acm.org/doi/10.1145/3581791.3596859
Deng KZhao DHan QZhang ZWang SZhou AMa H(2023)MidasProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/35808727:1(1-26)Online publication date: 28-Mar-2023
https://dl.acm.org/doi/10.1145/3580872
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents