Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3551349.3561155acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaseConference Proceedingsconference-collections
research-article
Public Access

Unsupervised Summarization of Privacy Concerns in Mobile Application Reviews

Published: 05 January 2023 Publication History

Abstract

The proliferation of mobile applications (app) over the past decade has imposed unprecedented challenges on end-users privacy. Apps constantly demand access to sensitive user information in exchange for more personalized services. These—mostly unjustifiable—data collection tactics have raised major privacy concerns among mobile app users. Such concerns are commonly expressed in mobile app reviews, however, they are typically overshadowed by more generic categories of user feedback, such as app reliability and usability. This makes extracting user privacy concerns manually, or even using automated tools, a challenging and time-consuming task. To address these challenges, in this paper, we propose an effective unsupervised approach for summarizing user privacy concerns in mobile app reviews. Our analysis is conducted using a dataset of 2.6 million app reviews sampled from three different application domains. The results show that users in different application domains express their privacy concerns using domain-specific vocabulary. This domain knowledge can be leveraged to help unsupervised automated text summarization algorithms to generate concise and comprehensive summaries of privacy concerns in app review collections. Our analysis is intended to help app developers quickly and accurately identify the most critical privacy concerns in their domain of operation, and ultimately, alter their data collection practices to address these concerns.

References

[1]
Alessandro Acquisti, Idris Adjerid, Rebecca Balebako, Laura Brandimarte, Lorrie Faith Cranor, Saranga Komanduri, Pedro Giovanni Leon, Norman Sadeh, Florian Schaub, Manya Sleeper, Yang Wang, and Shomir Wilson. 2017. Nudges for privacy and security: Understanding and assisting users’ choices online. Comput. Surveys 50, 3 (2017), 44.
[2]
Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2016. A simple but tough-to-beat baseline for sentence embeddings. In International Conference on Learning Representations.
[3]
Abdulbaki Aydin, David Piorkowski, Omer Tripp, Pietro Ferrara, and Marco Pistoia. 2017. Visual configuration of mobile privacy policies. In Inter. Conf. on Fundamental Approaches to Software Engineering. 338–355.
[4]
Muhammad Ajmal Azad, Junaid Arshad, Syed Muhammad Ali Akmal, Farhan Riaz, Sidrah Abdullah, Muhammad Imran, and Farhan Ahmad. 2021. A First Look at Privacy Analysis of COVID-19 Contact-Tracing Mobile Applications. IEEE Internet of Things Journal 8, 21 (2021), 15796–15806.
[5]
Yizhaq Benbenisty, Irit Hadar, Gil Luria, and Paola Spoletini. 2021. Privacy as first-class requirements in software development: A socio-technical approach. In IEEE/ACM International Conference on Automated Software Engineering. 1363–1367.
[6]
Andrew Besmer, Jason Watson, and Shane Banks. 2020. Investigating user perceptions of mobile app privacy: An analysis of user-submitted app reviews. International Journal of Information Security and Privacy 14, 4(2020), 74–91.
[7]
Jaspreet Bhatia and Travis Breaux. 2018. Semantic incompleteness in privacy policy goals. In International Requirements Engineering Conference. 159–169.
[8]
Lidong Bing, Wai Lam, and Tak-Lam Wong. 2011. Using query log and social tagging to refine queries based on latent topics. In International Conference on Information and Knowledge Management. 583–592.
[9]
David Blei, Andrew Ng, and Michael Jordan. 2003. Latent dirichlet allocation. Journal of Machine Learning Research 3 (2003), 993–1022.
[10]
Surekha Borra. 2020. COVID-19 Apps: Privacy and Security Concerns. Springer Singapore.
[11]
Gerlof Bouma. 2009. Normalized (pointwise) mutual information in collocation extraction. German Society for Computational Linguistics 30 (2009), 31–40.
[12]
Ning Chen, Jialiu Lin, Steven Hoi, Xiaokui Xiao, and Boshen Zhang. 2014. AR-miner: mining informative reviews for developers from mobile app marketplace. In International Conference on Software Engineering. 767–778.
[13]
Jackie Cheung. 2008. Comparing abstractive and extractive summarization of evaluative text: controversiality and content selection. Thesis in the Department of Computer Science of the Faculty of Science, University of British Columbia 47 (2008).
[14]
Adelina Ciurumelea, Andreas Schaufelbühl, Sebastiano Panichella, and Harald Gall. 2017. Analyzing reviews and code of mobile apps for better release planning. In Inter. Conf. on Software Analysis, Evolution and Reengineering. 91–102.
[15]
Lisa Cosgrove, Justin Karter, and Zenobia Morrill. 2020. Psychology and Surveillance Capitalism: The Risk of Pushing Mental Health Apps During the COVID-19 Pandemic. Journal of Humanistic Psychology 60, 5 (2020), 611–625.
[16]
Laura Dennison, Leanne Morrison, Gemma Conway, and Lucy Yardley. 2013. Opportunities and Challenges for Smartphone Applications in Supporting Health Behavior Change: Qualitative Study. Journal of Medical Internet Research 15, 4 (2013), e86.
[17]
Andrea Di Sorbo, Giovanni Grano, Corrado Aaron Visaggio, and Sebastiano Panichella. 2021. Investigating the criticality of user-reported issues through their relations with app rating. Journal of Software: Evolution and Process 33, 3 (2021), e2316.
[18]
Andrea Di Sorbo, Sebastiano Panichella, Corrado A. Visaggio, Massimiliano Di Penta, Gerardo Canfora, and Harald C. Gall. 2021. Exploiting Natural Language Structures in Software Informal Documentation. IEEE Transactions on Software Engineering 47, 8 (2021), 1587–1604.
[19]
Tawanna Dillahunt and Amelia Malone. 2015. The Promise of the Sharing Economy Among Disadvantaged Communities. In Annual ACM Conference on Human Factors in Computing Systems. 2285–2294.
[20]
Fahimeh Ebrahimi, Miroslav Tushev, and Anas Mahmoud. 2020. Mobile App Privacy in Software Engineering Research: A Systematic Mapping Study. Information and Software Technology 133 (2020).
[21]
Thomas Griffiths and Mark Steyvers. 2004. Finding scientific topics. National Academy of Sciences 101, 1 (2004), 5228–5235.
[22]
Jie Gu, Yunjie (Calvin) Xu, Heng Xu, Cheng Zhang, and Hong Ling. 2017. Privacy concerns for mobile app download: An elaboration likelihood model perspective. Decision Support Systems 94 (2017), 19–28.
[23]
Xiaodong Gu and Sunghun Kim. 2015. What Parts of Your Apps Are Loved by Users?. In International Conference on Automated Software Engineering. 760–770.
[24]
Hui Guo and Munindar Singh. 2020. Caspar: extracting and synthesizing user stories of problems from app reviews. In International Conference on Software Engineering. 628–640.
[25]
Emitza Guzman, Muhammad El-Haliby, and Bernd Bruegge. 2015. Ensemble methods for app review classification: An approach for software evolution (n). In International Conference on Automated Software Engineering. 771–776.
[26]
Udo Hahn and Inderjeet Mani. 2000. The challenges of automatic summarization. Computer 33, 11 (2000), 29–36.
[27]
Majid Hatamian, Jetzabel Serna, and Kai Rannenberg. 2019. Revealing the unrevealed: Mining smartphone users privacy perception on app markets. Computers & Security 83(2019), 332–353.
[28]
Liangjie Hong and Brian Davison. 2010. Empirical Study of Topic Modeling in Twitter. In Workshop on Social Media Analytics. 80–88.
[29]
Leonard Hoon, Rajesh Vasa, Jean-Guy Schneider, and Kon Mouzakis. 2012. A preliminary analysis of vocabulary in mobile app user reviews. In Australian Computer-Human Interaction Conference. 245–248.
[30]
Hanyang Hu, Shaowei Wang, Cor-Paul Bezemer, and Ahmed E. Hassan. 2019. Studying the Consistency of Star Ratings and Reviews of Popular Free Hybrid Android and IOS Apps. Empirical Softw. Engg. 24, 1 (2019), 7–32.
[31]
David Inouye and Jugal Kalita. 2011. Comparing twitter summarization algorithms for multiple post summaries. In International Conference on Privacy, Security, Risk and Trust and International Conference on Social Computing. 298–306.
[32]
Pew Internet. 2012. Apps and privacy: More than half of app users have uninstalled or decided to not install an app due to concerns about their personal information. Retrieved May 2022 from https://www.pewresearch.org/internet/2012/09/05/privacy-and-data-management-on-mobile-devices-2/
[33]
Nishant Jha and Anas Mahmoud. 2018. Using frame semantics for classifying and summarizing application store reviews. Empirical Software Engineering 23, 6 (2018), 3734–3767.
[34]
Tom Kenter, Alexey Borisov, and Maarten Rijke. 2016. Siamese cbow: Optimizing word embeddings for sentence representations. arXiv preprint arXiv:1606.04640(2016).
[35]
Elham Khabiri, James Caverlee, and Chiao-Fang Hsu. 2011. Summarizing user-contributed comments. In International Advancement of Artificial Intelligence Conference on Weblogs and Social Media.
[36]
Hammad Khalid, Emad Shihab, Meiyappan Nagappan, and Ahmed Hassan. 2015. What do mobile app users complain about?IEEE Software 32, 3 (2015), 70–77.
[37]
Zijad Kurtanović and Walid Maalej. 2017. Mining user rationale from software reviews. In International Requirements Engineering Conference. 61–70.
[38]
Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In International conference on machine learning. 1188–1196.
[39]
Li Li, Tegawendé Bissyandé, Mike Papadakis, Siegfried Rasthofer, Alexandre Bartel, Damien Octeau, Jacques Klein, and Le Traon. 2017. Static analysis of Android apps: A systematic literature review. Information and Software Technology 88 (2017), 67–95.
[40]
Clare Llewellyn, Claire Grover, and Jon Oberlander. 2014. Summarizing newspaper comments. In International Advancement of Artificial Intelligence Conference on Weblogs and Social Media. 599––602.
[41]
Robert Longyear and Kostadin Kushlev. 2021. Can mental health apps be effective for depression, anxiety, and stress during a pandemic?Practice Innovations 6, 2 (2021), 131–137.
[42]
Edward Loper and Steven Bird. 2002. NLTK: The natural language toolkit. In COLING/ACL on Interactive Presentation Sessions. 69–72.
[43]
Walid Maalej and Hadeer Nabil. 2015. Bug report, feature request, or simply praise? On automatically classifying app reviews. In International Requirements Engineering Conference. 116–125.
[44]
Anas Mahmoud and Grant Williams. 2016. Detecting, classifying, and tracing non-functional software requirements. Requirements Engineering 21, 3 (2016), 357–381.
[45]
Chris Martin. 2016. The sharing economy: A pathway to sustainability or a nightmarish form of neoliberal capitalism?Ecological Economics 121(2016), 149–159.
[46]
William Martin, Federica Sarro, Yue Jia, Yuanyuan Zhang, and Mark Harman. 2017. A Survey of App Store Analysis for Software Engineering. IEEE Transactions on Software Engineering 43, 9 (2017), 817–847.
[47]
Stuart McIlroy, Nasir Ali, Hammad Khalid, and Ahmed Hassan. 2016. Analyzing and automatically labelling the types of user issues that are raised in mobile app reviews. Empirical Software Engineering 21, 3 (2016), 1067–1106.
[48]
Stuart Mcilroy, Weiyi Shang, Nasir Ali, and Ahmed Hassan. 2017. User Reviews of Top Mobile Apps in Apple and Google App Stores. Commun. ACM 60, 11 (2017), 62–67.
[49]
Yu Meng, Jiaming Shen, Chao Zhang, and Jiawei Han. 2018. Weakly-Supervised Neural Text Classification. In ACM International Conference on Information and Knowledge Management. 983–992.
[50]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. 3111–3119.
[51]
Shahab Mokarizadeh, Mohammad Rahman, and Mihhail Matskin. 2013. Mining and Analysis of Apps in Google Play. In International Conference on Web Information Systems and Technologies. 527–535.
[52]
Debjyoti Mukherjee, Alireza Ahmadi, Maryam VahdatPour, and Joel Reardon. 2020. An Empirical Study on User Reviews Targeting Mobile Apps’ Security & Privacy. arXiv preprint arXiv:2010.06371(2020).
[53]
Maleknaz Nayebi, Homayoon Farrahi, Ada Lee, Henry Cho, and Guenther Ruhe. 2016. More insight from being more focused: Analysis of clustered market apps. In International Workshop on App Market Analytics. 30–36.
[54]
Duc Nguyen, Erik Derr, Michael Backes, and Sven Bugiel. 2019. Short text, large effect: Measuring the impact of user reviews on android app security & privacy. In Symposium on Security and Privacy. 555–569.
[55]
John Grundy Mohamed Abdelrazek Omar Haggag, Sherif Haggag. 2021. COVID-19 Vs Social Media apps: Does privacy really matter?. In International Conference on Software Engineering - Software Engineering in Society.
[56]
Sebastiano Panichella, Andrea Di Sorbo, Emitza Guzman, Corrado Visaggio, Gerardo Canfora, and Harald Gall. 2015. How can I improve my app? Classifying user reviews for software maintenance and evolution. In International Conference on Software Maintenance and Evolution. 281–290.
[57]
Elias Papadopoulos, Michalis Diamantaris, Panagiotis Papadopoulos, Thanasis Petsas, Sotiris Ioannidis, and Evangelos Markatos. 2017. The Long-Standing Privacy Debate: Mobile Websites vs Mobile Apps. In Inter. Conf. on World Wide Web. 153–162.
[58]
Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Conference on Empirical Methods in Natural Language Processing. 1532–1543.
[59]
Elizabeth Poché, Nishant Jha, Grant Williams, Jazmine Staten, Miles Vesper, and Anas Mahmoud. 2017. Analyzing user comments on YouTube coding tutorial videos. In International Conference on Program Comprehension. 196–206.
[60]
PYMNTS. 2021. High-Speed Traders Pay Robinhood $331 Million In Q1 To Execute Trades. Retrieved October 15, 2021 from https://www.pymnts.com/earnings/2021/high-speed-traders-pay-robinhood-331-million-dollars-q1-execute-trades/
[61]
Amir Sadeghian and Alireza Sharafat. 2015. Bag of words meets bags of popcorn. (2015).
[62]
Hani Safadi, Weifeng Li, Pouya Rahmati, Saber Soleymani, Krzysztof Kochut, and Amit Sheth. 2020. Curtailing Fake News Propagation with Psychographics. SSRN Electronic Journal(2020).
[63]
Andrea Sorbo, Sebastiano Panichella, Carol Alexandru, Junji Shimagaki, Corrado Visaggio, Gerardo Canfora, and Harald Gall. 2016. What would users change in my app? Summarizing app reviews for recommending software changes. In International Symposium on Foundations of Software Engineering. 499–510.
[64]
Lauren Squires. Language in society. Enregistering internet language. 2010 39, 4 (Language in society), 457–492.
[65]
Levi Sumagaysay. 2020. The pandemic has more than doubled food-delivery apps business. Now what?Retrieved October 15, 2021 from https://www.marketwatch.com.
[66]
Ramona Trestian, Guodong Xie, Pintu Lohar, Edoardo Celeste, Malika Bendechache, Rob Brennan, Evgeniia Jayasekera, Regina Connolly, and Irina Tal. 2021. Privacy in a Time of COVID-19: How Concerned Are You?IEEE Security and Privacy 19, 5 (2021), 26–35.
[67]
Miroslav Tushev, Fahimeh Ebrahimi, and Anas Mahmoud. 2022. Domain-Specific Analysis of Mobile App Reviews Using Keyword-Assisted Topic Models. In International Conference on Software Engineering.
[68]
Svitlana Vakulenko, Oliver Müller, and Jan Brocke. 2014. Enriching iTunes App Store categories via topic modeling. In International Conference on Information Systems. 1–11.
[69]
Axel van Lamsweerde. 2009. Requirements Engineering: From System Goals to UML Models to Software Specifications. Wiley.
[70]
Rajesh Vasa, Leonard Hoon, Kon Mouzakis, and Akihiro Noguchi. 2012. A preliminary analysis of mobile app user reviews. In Computer-Human Interaction Conference. 241–244.
[71]
Xiaoyin Wang, Xue Qin, Mitra Bokaei, Rocky Slavin, Travis Breaux, and Jianwei Niu. 2018. Guileak: Tracing privacy policy claims on user input data for Android applications. In Inter. Conf. on Software Engineering. 37–47.
[72]
Joshua West, P. Cougar Hall, Carl Hanson, Michael Barnes, Christophe Giraud-Carrier, and James Barrett. 2012. There’s an App for That: Content Analysis of Paid Health and Fitness Apps. Journal of Medical Internet Research 14, 3 (2012), e72.
[73]
Grant Williams, Miroslav Tushev, Fahimeh Ebrahimi, and Anas Mahmoud. 2020. Modeling user concerns in Sharing Economy: the case of food delivery apps. Automated Software Engineering 27 (2020), 229–263.
[74]
Xiaohui Yan, Jiafeng Guo, Yanyan Lan, and Xueqi Cheng. 2013. A biterm topic model for short texts. In International Conference on World Wide Web. 1445–1456.

Cited By

View all
  • (2024)To Share or Not to Share: Understanding and Modeling Individual Disclosure Preferences in Recommender Systems for the WorkplaceProceedings of the ACM on Human-Computer Interaction10.1145/36330748:GROUP(1-28)Online publication date: 21-Feb-2024
  • (2024)Exploring Privacy Practices of Female mHealth Apps in a Post-Roe WorldProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642521(1-24)Online publication date: 11-May-2024
  • (2024)How to effectively mine app reviews concerning software ecosystem? A survey of review characteristicsJournal of Systems and Software10.1016/j.jss.2024.112040213(112040)Online publication date: Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ASE '22: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering
October 2022
2006 pages
ISBN:9781450394758
DOI:10.1145/3551349
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 January 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Mobile Apps
  2. Privacy
  3. User Reviews

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

ASE '22

Acceptance Rates

Overall Acceptance Rate 82 of 337 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,257
  • Downloads (Last 6 weeks)154
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)To Share or Not to Share: Understanding and Modeling Individual Disclosure Preferences in Recommender Systems for the WorkplaceProceedings of the ACM on Human-Computer Interaction10.1145/36330748:GROUP(1-28)Online publication date: 21-Feb-2024
  • (2024)Exploring Privacy Practices of Female mHealth Apps in a Post-Roe WorldProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642521(1-24)Online publication date: 11-May-2024
  • (2024)How to effectively mine app reviews concerning software ecosystem? A survey of review characteristicsJournal of Systems and Software10.1016/j.jss.2024.112040213(112040)Online publication date: Jul-2024
  • (2024)Prioritizing user requirements for digital products using explainable artificial intelligenceFuture Generation Computer Systems10.1016/j.future.2024.04.037158:C(167-182)Online publication date: 1-Sep-2024
  • (2023)A Study of Gender Discussions in Mobile Apps2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)10.1109/MSR59073.2023.00086(598-610)Online publication date: May-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media