research-article

Public Access

Unsupervised Summarization of Privacy Concerns in Mobile Application Reviews

Authors:

Fahimeh Ebrahimi,

Anas MahmoudAuthors Info & Claims

ASE '22: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering

Article No.: 112, Pages 1 - 12

https://doi.org/10.1145/3551349.3561155

Published: 05 January 2023 Publication History

All formats PDF

Abstract

The proliferation of mobile applications (app) over the past decade has imposed unprecedented challenges on end-users privacy. Apps constantly demand access to sensitive user information in exchange for more personalized services. These—mostly unjustifiable—data collection tactics have raised major privacy concerns among mobile app users. Such concerns are commonly expressed in mobile app reviews, however, they are typically overshadowed by more generic categories of user feedback, such as app reliability and usability. This makes extracting user privacy concerns manually, or even using automated tools, a challenging and time-consuming task. To address these challenges, in this paper, we propose an effective unsupervised approach for summarizing user privacy concerns in mobile app reviews. Our analysis is conducted using a dataset of 2.6 million app reviews sampled from three different application domains. The results show that users in different application domains express their privacy concerns using domain-specific vocabulary. This domain knowledge can be leveraged to help unsupervised automated text summarization algorithms to generate concise and comprehensive summaries of privacy concerns in app review collections. Our analysis is intended to help app developers quickly and accurately identify the most critical privacy concerns in their domain of operation, and ultimately, alter their data collection practices to address these concerns.

References

[1]

Alessandro Acquisti, Idris Adjerid, Rebecca Balebako, Laura Brandimarte, Lorrie Faith Cranor, Saranga Komanduri, Pedro Giovanni Leon, Norman Sadeh, Florian Schaub, Manya Sleeper, Yang Wang, and Shomir Wilson. 2017. Nudges for privacy and security: Understanding and assisting users’ choices online. Comput. Surveys 50, 3 (2017), 44.

Digital Library

[2]

Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2016. A simple but tough-to-beat baseline for sentence embeddings. In International Conference on Learning Representations.

[3]

Abdulbaki Aydin, David Piorkowski, Omer Tripp, Pietro Ferrara, and Marco Pistoia. 2017. Visual configuration of mobile privacy policies. In Inter. Conf. on Fundamental Approaches to Software Engineering. 338–355.

Digital Library

[4]

Muhammad Ajmal Azad, Junaid Arshad, Syed Muhammad Ali Akmal, Farhan Riaz, Sidrah Abdullah, Muhammad Imran, and Farhan Ahmad. 2021. A First Look at Privacy Analysis of COVID-19 Contact-Tracing Mobile Applications. IEEE Internet of Things Journal 8, 21 (2021), 15796–15806.

[5]

Yizhaq Benbenisty, Irit Hadar, Gil Luria, and Paola Spoletini. 2021. Privacy as first-class requirements in software development: A socio-technical approach. In IEEE/ACM International Conference on Automated Software Engineering. 1363–1367.

Digital Library

[6]

Andrew Besmer, Jason Watson, and Shane Banks. 2020. Investigating user perceptions of mobile app privacy: An analysis of user-submitted app reviews. International Journal of Information Security and Privacy 14, 4(2020), 74–91.

[7]

Jaspreet Bhatia and Travis Breaux. 2018. Semantic incompleteness in privacy policy goals. In International Requirements Engineering Conference. 159–169.

[8]

Lidong Bing, Wai Lam, and Tak-Lam Wong. 2011. Using query log and social tagging to refine queries based on latent topics. In International Conference on Information and Knowledge Management. 583–592.

Digital Library

[9]

David Blei, Andrew Ng, and Michael Jordan. 2003. Latent dirichlet allocation. Journal of Machine Learning Research 3 (2003), 993–1022.

Digital Library

[10]

Surekha Borra. 2020. COVID-19 Apps: Privacy and Security Concerns. Springer Singapore.

[11]

Gerlof Bouma. 2009. Normalized (pointwise) mutual information in collocation extraction. German Society for Computational Linguistics 30 (2009), 31–40.

[12]

Ning Chen, Jialiu Lin, Steven Hoi, Xiaokui Xiao, and Boshen Zhang. 2014. AR-miner: mining informative reviews for developers from mobile app marketplace. In International Conference on Software Engineering. 767–778.

Digital Library

[13]

Jackie Cheung. 2008. Comparing abstractive and extractive summarization of evaluative text: controversiality and content selection. Thesis in the Department of Computer Science of the Faculty of Science, University of British Columbia 47 (2008).

[14]

Adelina Ciurumelea, Andreas Schaufelbühl, Sebastiano Panichella, and Harald Gall. 2017. Analyzing reviews and code of mobile apps for better release planning. In Inter. Conf. on Software Analysis, Evolution and Reengineering. 91–102.

[15]

Lisa Cosgrove, Justin Karter, and Zenobia Morrill. 2020. Psychology and Surveillance Capitalism: The Risk of Pushing Mental Health Apps During the COVID-19 Pandemic. Journal of Humanistic Psychology 60, 5 (2020), 611–625.

[16]

Laura Dennison, Leanne Morrison, Gemma Conway, and Lucy Yardley. 2013. Opportunities and Challenges for Smartphone Applications in Supporting Health Behavior Change: Qualitative Study. Journal of Medical Internet Research 15, 4 (2013), e86.

[17]

Andrea Di Sorbo, Giovanni Grano, Corrado Aaron Visaggio, and Sebastiano Panichella. 2021. Investigating the criticality of user-reported issues through their relations with app rating. Journal of Software: Evolution and Process 33, 3 (2021), e2316.

Digital Library

[18]

Andrea Di Sorbo, Sebastiano Panichella, Corrado A. Visaggio, Massimiliano Di Penta, Gerardo Canfora, and Harald C. Gall. 2021. Exploiting Natural Language Structures in Software Informal Documentation. IEEE Transactions on Software Engineering 47, 8 (2021), 1587–1604.

[19]

Tawanna Dillahunt and Amelia Malone. 2015. The Promise of the Sharing Economy Among Disadvantaged Communities. In Annual ACM Conference on Human Factors in Computing Systems. 2285–2294.

[20]

Fahimeh Ebrahimi, Miroslav Tushev, and Anas Mahmoud. 2020. Mobile App Privacy in Software Engineering Research: A Systematic Mapping Study. Information and Software Technology 133 (2020).

[21]

Thomas Griffiths and Mark Steyvers. 2004. Finding scientific topics. National Academy of Sciences 101, 1 (2004), 5228–5235.

[22]

Jie Gu, Yunjie (Calvin) Xu, Heng Xu, Cheng Zhang, and Hong Ling. 2017. Privacy concerns for mobile app download: An elaboration likelihood model perspective. Decision Support Systems 94 (2017), 19–28.

Digital Library

[23]

Xiaodong Gu and Sunghun Kim. 2015. What Parts of Your Apps Are Loved by Users?. In International Conference on Automated Software Engineering. 760–770.

Digital Library

[24]

Hui Guo and Munindar Singh. 2020. Caspar: extracting and synthesizing user stories of problems from app reviews. In International Conference on Software Engineering. 628–640.

Digital Library

[25]

Emitza Guzman, Muhammad El-Haliby, and Bernd Bruegge. 2015. Ensemble methods for app review classification: An approach for software evolution (n). In International Conference on Automated Software Engineering. 771–776.

Digital Library

[26]

Udo Hahn and Inderjeet Mani. 2000. The challenges of automatic summarization. Computer 33, 11 (2000), 29–36.

Digital Library

[27]

Majid Hatamian, Jetzabel Serna, and Kai Rannenberg. 2019. Revealing the unrevealed: Mining smartphone users privacy perception on app markets. Computers & Security 83(2019), 332–353.

Digital Library

[28]

Liangjie Hong and Brian Davison. 2010. Empirical Study of Topic Modeling in Twitter. In Workshop on Social Media Analytics. 80–88.

[29]

Leonard Hoon, Rajesh Vasa, Jean-Guy Schneider, and Kon Mouzakis. 2012. A preliminary analysis of vocabulary in mobile app user reviews. In Australian Computer-Human Interaction Conference. 245–248.

Digital Library

[30]

Hanyang Hu, Shaowei Wang, Cor-Paul Bezemer, and Ahmed E. Hassan. 2019. Studying the Consistency of Star Ratings and Reviews of Popular Free Hybrid Android and IOS Apps. Empirical Softw. Engg. 24, 1 (2019), 7–32.

Digital Library

[31]

David Inouye and Jugal Kalita. 2011. Comparing twitter summarization algorithms for multiple post summaries. In International Conference on Privacy, Security, Risk and Trust and International Conference on Social Computing. 298–306.

[32]

Pew Internet. 2012. Apps and privacy: More than half of app users have uninstalled or decided to not install an app due to concerns about their personal information. Retrieved May 2022 from https://www.pewresearch.org/internet/2012/09/05/privacy-and-data-management-on-mobile-devices-2/

[33]

Nishant Jha and Anas Mahmoud. 2018. Using frame semantics for classifying and summarizing application store reviews. Empirical Software Engineering 23, 6 (2018), 3734–3767.

Digital Library

[34]

Tom Kenter, Alexey Borisov, and Maarten Rijke. 2016. Siamese cbow: Optimizing word embeddings for sentence representations. arXiv preprint arXiv:1606.04640(2016).

[35]

Elham Khabiri, James Caverlee, and Chiao-Fang Hsu. 2011. Summarizing user-contributed comments. In International Advancement of Artificial Intelligence Conference on Weblogs and Social Media.

[36]

Hammad Khalid, Emad Shihab, Meiyappan Nagappan, and Ahmed Hassan. 2015. What do mobile app users complain about?IEEE Software 32, 3 (2015), 70–77.

[37]

Zijad Kurtanović and Walid Maalej. 2017. Mining user rationale from software reviews. In International Requirements Engineering Conference. 61–70.

[38]

Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In International conference on machine learning. 1188–1196.

Digital Library

[39]

Li Li, Tegawendé Bissyandé, Mike Papadakis, Siegfried Rasthofer, Alexandre Bartel, Damien Octeau, Jacques Klein, and Le Traon. 2017. Static analysis of Android apps: A systematic literature review. Information and Software Technology 88 (2017), 67–95.

Digital Library

[40]

Clare Llewellyn, Claire Grover, and Jon Oberlander. 2014. Summarizing newspaper comments. In International Advancement of Artificial Intelligence Conference on Weblogs and Social Media. 599––602.

[41]

Robert Longyear and Kostadin Kushlev. 2021. Can mental health apps be effective for depression, anxiety, and stress during a pandemic?Practice Innovations 6, 2 (2021), 131–137.

[42]

Edward Loper and Steven Bird. 2002. NLTK: The natural language toolkit. In COLING/ACL on Interactive Presentation Sessions. 69–72.

Digital Library

[43]

Walid Maalej and Hadeer Nabil. 2015. Bug report, feature request, or simply praise? On automatically classifying app reviews. In International Requirements Engineering Conference. 116–125.

[44]

Anas Mahmoud and Grant Williams. 2016. Detecting, classifying, and tracing non-functional software requirements. Requirements Engineering 21, 3 (2016), 357–381.

Digital Library

[45]

Chris Martin. 2016. The sharing economy: A pathway to sustainability or a nightmarish form of neoliberal capitalism?Ecological Economics 121(2016), 149–159.

[46]

William Martin, Federica Sarro, Yue Jia, Yuanyuan Zhang, and Mark Harman. 2017. A Survey of App Store Analysis for Software Engineering. IEEE Transactions on Software Engineering 43, 9 (2017), 817–847.

Digital Library

[47]

Stuart McIlroy, Nasir Ali, Hammad Khalid, and Ahmed Hassan. 2016. Analyzing and automatically labelling the types of user issues that are raised in mobile app reviews. Empirical Software Engineering 21, 3 (2016), 1067–1106.

Digital Library

[48]

Stuart Mcilroy, Weiyi Shang, Nasir Ali, and Ahmed Hassan. 2017. User Reviews of Top Mobile Apps in Apple and Google App Stores. Commun. ACM 60, 11 (2017), 62–67.

Digital Library

[49]

Yu Meng, Jiaming Shen, Chao Zhang, and Jiawei Han. 2018. Weakly-Supervised Neural Text Classification. In ACM International Conference on Information and Knowledge Management. 983–992.

[50]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. 3111–3119.

[51]

Shahab Mokarizadeh, Mohammad Rahman, and Mihhail Matskin. 2013. Mining and Analysis of Apps in Google Play. In International Conference on Web Information Systems and Technologies. 527–535.

[52]

Debjyoti Mukherjee, Alireza Ahmadi, Maryam VahdatPour, and Joel Reardon. 2020. An Empirical Study on User Reviews Targeting Mobile Apps’ Security & Privacy. arXiv preprint arXiv:2010.06371(2020).

[53]

Maleknaz Nayebi, Homayoon Farrahi, Ada Lee, Henry Cho, and Guenther Ruhe. 2016. More insight from being more focused: Analysis of clustered market apps. In International Workshop on App Market Analytics. 30–36.

Digital Library

[54]

Duc Nguyen, Erik Derr, Michael Backes, and Sven Bugiel. 2019. Short text, large effect: Measuring the impact of user reviews on android app security & privacy. In Symposium on Security and Privacy. 555–569.

[55]

John Grundy Mohamed Abdelrazek Omar Haggag, Sherif Haggag. 2021. COVID-19 Vs Social Media apps: Does privacy really matter?. In International Conference on Software Engineering - Software Engineering in Society.

[56]

Sebastiano Panichella, Andrea Di Sorbo, Emitza Guzman, Corrado Visaggio, Gerardo Canfora, and Harald Gall. 2015. How can I improve my app? Classifying user reviews for software maintenance and evolution. In International Conference on Software Maintenance and Evolution. 281–290.

Digital Library

[57]

Elias Papadopoulos, Michalis Diamantaris, Panagiotis Papadopoulos, Thanasis Petsas, Sotiris Ioannidis, and Evangelos Markatos. 2017. The Long-Standing Privacy Debate: Mobile Websites vs Mobile Apps. In Inter. Conf. on World Wide Web. 153–162.

Digital Library

[58]

Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Conference on Empirical Methods in Natural Language Processing. 1532–1543.

[59]

Elizabeth Poché, Nishant Jha, Grant Williams, Jazmine Staten, Miles Vesper, and Anas Mahmoud. 2017. Analyzing user comments on YouTube coding tutorial videos. In International Conference on Program Comprehension. 196–206.

Digital Library

[60]

PYMNTS. 2021. High-Speed Traders Pay Robinhood $331 Million In Q1 To Execute Trades. Retrieved October 15, 2021 from https://www.pymnts.com/earnings/2021/high-speed-traders-pay-robinhood-331-million-dollars-q1-execute-trades/

[61]

Amir Sadeghian and Alireza Sharafat. 2015. Bag of words meets bags of popcorn. (2015).

[62]

Hani Safadi, Weifeng Li, Pouya Rahmati, Saber Soleymani, Krzysztof Kochut, and Amit Sheth. 2020. Curtailing Fake News Propagation with Psychographics. SSRN Electronic Journal(2020).

[63]

Andrea Sorbo, Sebastiano Panichella, Carol Alexandru, Junji Shimagaki, Corrado Visaggio, Gerardo Canfora, and Harald Gall. 2016. What would users change in my app? Summarizing app reviews for recommending software changes. In International Symposium on Foundations of Software Engineering. 499–510.

Digital Library

[64]

Lauren Squires. Language in society. Enregistering internet language. 2010 39, 4 (Language in society), 457–492.

[65]

Levi Sumagaysay. 2020. The pandemic has more than doubled food-delivery apps business. Now what?Retrieved October 15, 2021 from https://www.marketwatch.com.

[66]

Ramona Trestian, Guodong Xie, Pintu Lohar, Edoardo Celeste, Malika Bendechache, Rob Brennan, Evgeniia Jayasekera, Regina Connolly, and Irina Tal. 2021. Privacy in a Time of COVID-19: How Concerned Are You?IEEE Security and Privacy 19, 5 (2021), 26–35.

[67]

Miroslav Tushev, Fahimeh Ebrahimi, and Anas Mahmoud. 2022. Domain-Specific Analysis of Mobile App Reviews Using Keyword-Assisted Topic Models. In International Conference on Software Engineering.

[68]

Svitlana Vakulenko, Oliver Müller, and Jan Brocke. 2014. Enriching iTunes App Store categories via topic modeling. In International Conference on Information Systems. 1–11.

[69]

Axel van Lamsweerde. 2009. Requirements Engineering: From System Goals to UML Models to Software Specifications. Wiley.

Digital Library

[70]

Rajesh Vasa, Leonard Hoon, Kon Mouzakis, and Akihiro Noguchi. 2012. A preliminary analysis of mobile app user reviews. In Computer-Human Interaction Conference. 241–244.

Digital Library

[71]

Xiaoyin Wang, Xue Qin, Mitra Bokaei, Rocky Slavin, Travis Breaux, and Jianwei Niu. 2018. Guileak: Tracing privacy policy claims on user input data for Android applications. In Inter. Conf. on Software Engineering. 37–47.

Digital Library

[72]

Joshua West, P. Cougar Hall, Carl Hanson, Michael Barnes, Christophe Giraud-Carrier, and James Barrett. 2012. There’s an App for That: Content Analysis of Paid Health and Fitness Apps. Journal of Medical Internet Research 14, 3 (2012), e72.

[73]

Grant Williams, Miroslav Tushev, Fahimeh Ebrahimi, and Anas Mahmoud. 2020. Modeling user concerns in Sharing Economy: the case of food delivery apps. Automated Software Engineering 27 (2020), 229–263.

Digital Library

[74]

Xiaohui Yan, Jiafeng Guo, Yanyan Lan, and Xueqi Cheng. 2013. A biterm topic model for short texts. In International Conference on World Wide Web. 1445–1456.

Digital Library

Cited By

Musick GDuan WNajafian SSengupta SFlathmann CKnijnenburg BMcNeese N(2024)To Share or Not to Share: Understanding and Modeling Individual Disclosure Preferences in Recommender Systems for the WorkplaceProceedings of the ACM on Human-Computer Interaction10.1145/36330748:GROUP(1-28)Online publication date: 21-Feb-2024
https://dl.acm.org/doi/10.1145/3633074
Malki LKaleva IPatel DWarner MAbu-Salma R(2024)Exploring Privacy Practices of Female mHealth Apps in a Post-Roe WorldProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642521(1-24)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642521
Wang XZhang TTan YShang WLi Y(2024)How to effectively mine app reviews concerning software ecosystem? A survey of review characteristicsJournal of Systems and Software10.1016/j.jss.2024.112040213(112040)Online publication date: Jul-2024
https://doi.org/10.1016/j.jss.2024.112040
Show More Cited By

Index Terms

Unsupervised Summarization of Privacy Concerns in Mobile Application Reviews
1. Security and privacy
  1. Human and societal aspects of security and privacy
    1. Privacy protections
2. Software and its engineering
  1. Software creation and management
    1. Designing software
      1. Requirements analysis

Recommendations

AR-miner: mining informative reviews for developers from mobile app marketplace
ICSE 2014: Proceedings of the 36th International Conference on Software Engineering

With the popularity of smartphones and mobile devices, mobile application (a.k.a. “app”) markets have been growing exponentially in terms of number of users and downloads. App developers spend considerable effort on collecting and exploiting user ...
Studying the consistency of star ratings and the complaints in 1 & 2-star user reviews for top free cross-platform Android and iOS apps

How users rate a mobile app via star ratings and user reviews is of utmost importance for the success of an app. Recent studies and surveys show that users rely heavily on star ratings and user reviews that are provided by other users, for deciding ...
Studying the consistency of star ratings and reviews of popular free hybrid Android and iOS apps

Nowadays, many developers make their mobile apps available on multiple platforms (e.g., Android and iOS). However, maintaining several versions of a cross-platform app that is built natively (i.e., using platform-specific tools) is a complicated task. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ASE '22: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering

October 2022

2006 pages

ISBN:9781450394758

DOI:10.1145/3551349

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 January 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Science Foundation

Conference

ASE '22

ASE '22: 37th IEEE/ACM International Conference on Automated Software Engineering

October 10 - 14, 2022

MI, Rochester, USA

Acceptance Rates

Overall Acceptance Rate 82 of 337 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
1,426
Total Downloads

Downloads (Last 12 months)1,257
Downloads (Last 6 weeks)154

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Musick GDuan WNajafian SSengupta SFlathmann CKnijnenburg BMcNeese N(2024)To Share or Not to Share: Understanding and Modeling Individual Disclosure Preferences in Recommender Systems for the WorkplaceProceedings of the ACM on Human-Computer Interaction10.1145/36330748:GROUP(1-28)Online publication date: 21-Feb-2024
https://dl.acm.org/doi/10.1145/3633074
Malki LKaleva IPatel DWarner MAbu-Salma R(2024)Exploring Privacy Practices of Female mHealth Apps in a Post-Roe WorldProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642521(1-24)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642521
Wang XZhang TTan YShang WLi Y(2024)How to effectively mine app reviews concerning software ecosystem? A survey of review characteristicsJournal of Systems and Software10.1016/j.jss.2024.112040213(112040)Online publication date: Jul-2024
https://doi.org/10.1016/j.jss.2024.112040
Bai SShi SHan CYang MGupta BArya V(2024)Prioritizing user requirements for digital products using explainable artificial intelligenceFuture Generation Computer Systems10.1016/j.future.2024.04.037158:C(167-182)Online publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1016/j.future.2024.04.037
Shahin MZahedi MKhalajzadeh HRezaei Nasab A(2023)A Study of Gender Discussions in Mobile Apps2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)10.1109/MSR59073.2023.00086(598-610)Online publication date: May-2023
https://doi.org/10.1109/MSR59073.2023.00086

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents