Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3551349.3560436acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaseConference Proceedingsconference-collections
research-article
Open access

Are they Toeing the Line? Diagnosing Privacy Compliance Violations among Browser Extensions

Published: 05 January 2023 Publication History

Abstract

Browser extensions have emerged as integrated characteristics in modern browsers, with the aim to boost the online browsing experience. Their advantageous position between a user and the Internet endows them with easy access to the user’s sensitive data, which has raised mounting privacy concerns from both legislators and extension users. In this work, we propose an end-to-end approach to automatically diagnosing the privacy compliance violations among extensions. It analyzes the compliance of privacy policy versus regulation requirements and their actual privacy-related practices during runtime. This approach can serve the extension users, developers and store operators as an efficient and practical detection mechanism for privacy compliance violations.
Our approach utilizes the state-of-the-art language processing model BERT for annotating the policy texts, and a hybrid technique to analyze an extension’s source code and runtime behavior. To facilitate the model training, we construct a corpus named PrivAud-100 which contains 100 manually annotated privacy policies. Our large-scale diagnostic evaluation reveals that the vast majority of existing extensions suffer from privacy non-compliance issues. Around 92% of them have at least one violation of either their privacy policies or data collection practices. Based on our findings, we further propose an index to facilitate the filtering and identification of privacy-incompliant extensions with high accuracy (over 90%). Our work should raise the awareness of extension users, service providers, and platform operators, and encourage them to implement solutions toward better privacy compliance. To facilitate future research in this area, we have released our dataset, corpus and analyzer.

References

[1]
[1] Directive 95/46/EC.1995. https://edps.europa.eu/data-protection/our-work/publications/legislation/directive-9546ec_en,visited in August 2022.
[2]
Benjamin Andow, Samin Yaseer Mahmud, Justin Whitaker, William Enck, Bradley Reaves, Kapil Singh, and Serge Egelman. 2020. Actions Speak Louder than Words: Entity-Sensitive Privacy Policy and Data Flow Analysis with POLICHECK.
[3]
Vanessa Ayala-Rivera and Liliana Pasquale. 2018. The Grace Period Has Ended: An Approach to Operationalize GDPR Requirements. In RE. 136–146.
[4]
[4] BeautifulSoup4.2014. https://pypi.org/project/beautifulsoup4,visited in August 2022.
[5]
Brazil. 2018. General Personal Data Protection Law. https://iapp.org/resources/article/brazilian-data-protection-law-lgpd-english-translation,visited in August 2022.
[6]
Cheng Chang, Huaxin Li, Yichi Zhang, Suguo Du, Hui Cao, and Haojin Zhu. 2019. Automated and Personalized Privacy Policy Extraction Under GDPR Consideration. In WASA.
[7]
[7] Chrome Extension Dataset Repository.Visited in August 2022. https://github.com/ExtPPCompliance/PPCompliance.
[8]
Corinna Cortes and Vladimir Vapnik. 1995. Support-Vector Networks. Mach. Learn. 20, 3 (sep 1995), 273–297.
[9]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT. 4171–4186.
[10]
[10] Esprima.Visited in August 2022. https://esprima.org.
[11]
[11] InsertLearning Chrome Extension.2017. https://chrome.google.com/webstore/detail/insertlearning/dehajjkfchegiinhcmoclkfbnmpgcahj, visited in August 2022.
[12]
Ming Fan, Le Yu, Sen Chen, Hao Zhou, Xiapu Luo, Shuyue Li, Yang Liu, Jun Liu, and Ting Liu. 2020. An Empirical Evaluation of GDPR Compliance Violations in Android mHealth Apps. In ISSRE.
[13]
Afina S. Glas, Jeroen G. Lijmer, Martin H. Prins, Gouke J. Bonsel, and Patrick M.M. Bossuyt. 2003. The diagnostic odds ratio: a single indicator of test performance. Journal of Clinical Epidemiology 56, 11 (2003), 1129–1135.
[14]
[14] Global Desktop Browser Market Share for 2022.Visited in August 2022. https://kinsta.com/browser-market-share.
[15]
Alex Graves, Navdeep Jaitly, and Abdel-rahman Mohamed. 2013. Hybrid speech recognition with Deep Bidirectional LSTM. In 2013 IEEE Workshop on Automatic Speech Recognition and Understanding. 273–278.
[16]
Kevin A Hallgren. 2012. Computing inter-rater reliability for observational data: an overview and tutorial. Tutorials in quantitative methods for psychology 8, 1 (2012), 23.
[17]
Hamza Harkous, Kassem Fawaz, Rémi Lebret, Florian Schaub, Kang G. Shin, and Karl Aberer. 2018. Polisis: Automated Analysis and Presentation of Privacy Policies Using Deep Learning. In USENIX Security. 531–548.
[18]
Christos Karageorgiou Kaneen and Euripides G.M. Petrakis. 2020. Towards evaluating GDPR compliance in IoT applications. KES 176(2020), 2989–2998.
[19]
Logan Lebanoff and Fei Liu. 2018. Automatic Detection of Vague Words and Sentences in Privacy Policies. In EMNLP. 3508–3517.
[20]
Song Liao, Christin Wilson, Long Cheng, Hongxin Hu, and Huixing Deng. 2020. Measuring the Effectiveness of Privacy Policies for Voice Assistant Applications. In ACSAC. 856–869.
[21]
Thomas Linden, Hamza Harkous, and Kassem Fawaz. 2020. The Privacy Policy Landscape After the GDPR. PETS 2020(2020), 47 – 64.
[22]
[22] List of Chrome extensions.Visited in August 2022. https://github.com/DebugBear/chrome-extension-list.
[23]
Fei Liu, Rohan Ramanath, Norman M. Sadeh, and Noah A. Smith. 2014. A Step Towards Usable Privacy Policy: Automatic Alignment of Privacy Statements. In COLING.
[24]
Shuang Liu, Baiyang Zhao, Renjie Guo, Guozhu Meng, Fan Zhang, and Meishan Zhang. 2021. Have You Been Properly Notified? Automatic Compliance Analysis of Privacy Policy Text with GDPR Article 13. In WWW. 2154–2164.
[25]
Kulani Mahadewa, KailongWang, Guangdong Bai, Ling Shi, Yang Liu, Jin Song Dong, and Zhenkai Liang. 2019. Scrutinizing Implementations of Smart Home Integrations. IEEE Transactions on Software Engineering (TSE) (2019). https://doi.org/10.1109/TSE.2019.2960690
[26]
Najmeh Mousavi Nejad, Simon Scerri, and Jens Lehmann. 2018. KnIGHT: Mapping Privacy Policies to GDPR. In Knowledge Engineering and Knowledge Management, Catherine Faron Zucker, Chiara Ghidini, Amedeo Napoli, and Yannick Toussaint (Eds.).
[27]
Kanthashree Mysore Sathyendra, Shomir Wilson, Florian Schaub, Sebastian Zimmeck, and Norman Sadeh. 2017. Identifying the Provision of Choices in Privacy Policy Text. In EMNLP. 2774–2779.
[28]
Monica Palmirani and Guido Governatori. 2018. Modelling Legal Knowledge for GDPR Compliance Checking. In JURIX, Vol. 313. 101–110.
[29]
Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word representation. In EMNLP. 1532–1543.
[30]
Irwin Reyes, Primal Wijesekera, Joel Reardon, Amit Elazari, Abbas Razaghpanah, Narseo Vallina-Rodriguez, and Serge Egelman. 2018. “Won’t Somebody Think of the Children?” Examining COPPA Compliance at Scale. PETS (06 2018), 63–83.
[31]
David Sarne, Jonathan Schler, Alon Singer, Ayelet Sela, and Ittai Bar Siman Tov. 2019. Unsupervised Topic Extraction from Privacy Policies. In WWW. 563–568.
[32]
Kanthashree Mysore Sathyendra, Florian Schaub, Shomir Wilson, and Norman M. Sadeh. 2016. Automatic Extraction of Opt-Out Choices from Privacy Policies. In AAAI.
[33]
Singapore. 2012. Personal Data Protection Act. https://www.pdpc.gov.sg/Overview-of-PDPA/The-Legislation/Personal-Data-Protection-Act, visited in August 2022.
[34]
Rocky Slavin, Xiaoyin Wang, Mitra Bokaei Hosseini, James Hester, Ram Krishnan, Jaspreet Bhatia, Travis D. Breaux, and Jianwei Niu. 2016. Toward a Framework for Detecting Privacy Policy Violations in Android Application Code. In ICSE. 25–36.
[35]
Peter Story, Sebastian Zimmeck, and Norman M. Sadeh. 2018. Which Apps Have Privacy Policies? - An Analysis of Over One Million Google Play Store Apps. In APF.
[36]
Welderufael B. Tesfay, Peter Hofmann, Toru Nakamura, Shinsaku Kiyomoto, and Jetzabel Serna. 2018. I Read but Don’t Agree: Privacy Policy Benchmarking Using Machine Learning and the EU GDPR. In WWW. 163–166.
[37]
[37] The Selenium Project.Visited in August 2022. https://www.selenium.dev.
[38]
Jake Tom, Eduard Sing, and Raimundas Matulevičius. 2018. Conceptual Representation of the GDPR: Model and Application Directions. In BIR, Jelena Zdravkovic, Jānis Grabis, Selmin Nurcan, and Janis Stirna (Eds.). 18–28.
[39]
Damiano Torre, Ghanem Soltana, Mehrdad Sabetzadeh, Lionel Briand, Yuri Auffinger, and Peter Goes. 2019. Using Models to Enable Compliance Checking Against the GDPR: An Experience Report. In MODELS. 1–11.
[40]
European Union. 2016. General Data Protection Regulation. https://gdpr-info.eu,visited in August 2022.
[41]
California (USA). 2018. California Consumer Privacy Act. https://oag.ca.gov/privacy/ccpa, visited in August 2022.
[42]
Kailong Wang, Junzhe Zhang, Guangdong Bai, Ryan Ko, and Jin Song Dong. 2021. It’s Not Just the Site, It’s the Contents: Intra-domain Fingerprinting Social Media Websites Through CDN Bursts. In 30th The Web Conference (WWW).
[43]
Kailong Wang, Yuwei Zheng, Qing Zhang, Guangdong Bai, Qin Mingchuang, Donghui Zhang, and Jin Song Dong. 2022. Assessing Certificate Validation User Interfaces of WPA Supplicants. In MobiCom.
[44]
Takuya Watanabe, Mitsuaki Akiyama, Tetsuya Sakai, and Tatsuya Mori. 2015. Understanding the Inconsistencies between Text Descriptions and the Use of Privacy-sensitive Resources of Mobile Apps. In SOUPS. 241–255.
[45]
Shomir Wilson, Florian Schaub, Aswarth Abhilash Dara, Frederick Liu, Sushain Cherivirala, Pedro Giovanni Leon, Mads Schaarup Andersen, Sebastian Zimmeck, Kanthashree Mysore Sathyendra, N. Cameron Russell, Thomas B. Norton, Eduard Hovy, Joel Reidenberg, and Norman Sadeh. 2016. The Creation and Analysis of a Website Privacy Policy Corpus. In ACL. 1330–1340.
[46]
Yan Xiao, Ivan Beschastnikh, Yun Lin, Rajdeep Singh Hundal, Xiaofei Xie, David S Rosenblum, and Jin Song Dong. 2022. Self-Checking Deep Neural Networks for Anomalies and Adversaries in Deployment. IEEE Transactions on Dependable and Secure Computing01 (2022), 1–18.
[47]
Yan Xiao, Ivan Beschastnikh, David S Rosenblum, Changsheng Sun, Sebastian Elbaum, Yun Lin, and Jin Song Dong. 2021. Self-Checking Deep Neural Networks in Deployment. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 372–384.
[48]
Le Yu, Xiapu Luo, Xule Liu, and Tao Zhang. 2016. Can We Trust the Privacy Policies of Android Apps?. In DSN. 538–549.
[49]
Le Yu, Tao Zhang, Xiapu Luo, Lei Xue, and Henry Chang. 2017. Toward Automatically Generating Privacy Policy for Android Apps. TIFS 12, 4 (2017), 865–880.
[50]
Sebastian Zimmeck, Rafael Goldstein, and David Baraka. 2021. PrivacyFlash Pro: Automating Privacy Policy Generation for Mobile Apps. In NDSS.
[51]
Sebastian Zimmeck, Peter Story, Daniel Smullen, Abhilasha Ravichander, Ziqi Wang, Joel Reidenberg, N. Cameron Russell, and Norman Sadeh. 2019. MAPS: Scaling Privacy Compliance Analysis to a Million Apps. PETS3(2019), 66–86.

Cited By

View all
  • (2024)Understanding GDPR Non-Compliance in Privacy Policies of Alexa Skills in European MarketplacesProceedings of the ACM Web Conference 202410.1145/3589334.3645409(1081-1091)Online publication date: 13-May-2024
  • (2024)Essential or Excessive? MINDAEXT: Measuring Data Minimization Practices among Browser Extensions2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER60148.2024.00104(964-975)Online publication date: 12-Mar-2024
  • (2023)Investigating Users’ Understanding of Privacy Policies of Virtual Personal Assistant ApplicationsProceedings of the 2023 ACM Asia Conference on Computer and Communications Security10.1145/3579856.3590335(65-79)Online publication date: 10-Jul-2023
  • Show More Cited By

Index Terms

  1. Are they Toeing the Line? Diagnosing Privacy Compliance Violations among Browser Extensions
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Information & Contributors

            Information

            Published In

            cover image ACM Other conferences
            ASE '22: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering
            October 2022
            2006 pages
            ISBN:9781450394758
            DOI:10.1145/3551349
            This work is licensed under a Creative Commons Attribution-NonCommercial International 4.0 License.

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            Published: 05 January 2023

            Check for updates

            Qualifiers

            • Research-article
            • Research
            • Refereed limited

            Conference

            ASE '22

            Acceptance Rates

            Overall Acceptance Rate 82 of 337 submissions, 24%

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • Downloads (Last 12 months)373
            • Downloads (Last 6 weeks)40
            Reflects downloads up to 30 Aug 2024

            Other Metrics

            Citations

            Cited By

            View all
            • (2024)Understanding GDPR Non-Compliance in Privacy Policies of Alexa Skills in European MarketplacesProceedings of the ACM Web Conference 202410.1145/3589334.3645409(1081-1091)Online publication date: 13-May-2024
            • (2024)Essential or Excessive? MINDAEXT: Measuring Data Minimization Practices among Browser Extensions2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER60148.2024.00104(964-975)Online publication date: 12-Mar-2024
            • (2023)Investigating Users’ Understanding of Privacy Policies of Virtual Personal Assistant ApplicationsProceedings of the 2023 ACM Asia Conference on Computer and Communications Security10.1145/3579856.3590335(65-79)Online publication date: 10-Jul-2023
            • (2023)Supervised Robustness-preserving Data-free Neural Network Pruning2023 27th International Conference on Engineering of Complex Computer Systems (ICECCS)10.1109/ICECCS59891.2023.00013(22-31)Online publication date: 14-Jun-2023
            • (2022)Characterizing Cryptocurrency-themed Malicious Browser ExtensionsProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35706036:3(1-31)Online publication date: 8-Dec-2022

            View Options

            View options

            PDF

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            HTML Format

            View this article in HTML Format.

            HTML Format

            Get Access

            Login options

            Media

            Figures

            Other

            Tables

            Share

            Share

            Share this Publication link

            Share on social media