research-article

Secure Featurization and Applications to Secure Phishing Detection

Authors:

Nishanth Chandran,

Arun Gururajan,

Huan YuAuthors Info & Claims

CCSW '21: Proceedings of the 2021 on Cloud Computing Security Workshop

Pages 83 - 95

https://doi.org/10.1145/3474123.3486759

Published: 15 November 2021 Publication History

Abstract

Secure inference allows a server holding a machine learning (ML) inference algorithm with private weights, and a client with a private input, to obtain the output of the inference algorithm, without revealing their respective private inputs to one another. While this problem has received plenty of attention, existing systems are not applicable to a large class of ML algorithms (such as in the domain of Natural Language Processing) that perform featurization as their first step. In this work, we address this gap and make the following contributions:

We initiate the formal study of secure featurization and its use in conjunction with secure inference protocols.

We build secure featurization protocols in the one/two/three-server settings that provide a tradeoff between security and efficiency.

Finally, we apply our algorithms in the context of secure phishing detection and evaluate our end-to-end protocol on models that are commonly used for phishing detection.

Supplementary Material

MP4 File (CCSW-fp60.mp4)

We present our paper titled "Secure Featurization and Applications to Secure Phishing Detection". We initiate the formal study of secure featurization and its use in conjunction with secure inference protocols. We build secure featurization protocols in one/two/three server settings that provide a tradeoff between security and efficiency.

Download
80.75 MB

References

[1]

Apple and Google. Exposure notification privacy-preserving analytics (enpa) white paper. https://covid19-static.cdn-apple.com/applications/covid19/current/static/contact-tracing/pdf/ENPA_White_Paper.pdf.

[2]

Toshinori Araki, Jun Furukawa, Yehuda Lindell, Ariel Nof, and Kazuma Ohara. High-throughput semi-honest secure three-party computation with an honest majority. In CCS, 2016.

[3]

Raphael Bost, Raluca Ada Popa, Stephen Tu, and Shafi Goldwasser. Machine Learning Classification over Encrypted Data. In NDSS 2015.

[4]

Elette Boyle, Niv Gilboa, and Yuval Ishai. Function secret sharing. In EUROCRYPT. Springer, 2015.

[5]

Elette Boyle, Niv Gilboa, and Yuval Ishai. Function secret sharing: Improvements and extensions. In CCS, 2016.

Digital Library

[6]

Ran Canetti. Universally composable security: A new paradigm for cryptographic protocols. In FOCS. IEEE Computer Society, 2001.

[7]

David Cash, Joseph Jaeger, Stanislaw Jarecki, Charanjit S. Jutla, Hugo Krawczyk, Marcel-Catalin Rosu, and Michael Steiner. Dynamic Searchable Encryption in Very-Large Databases: Data Structures and Implementation. In NDSS. The Internet Society, 2014.

[8]

Nishanth Chandran, Divya Gupta, Aseem Rastogi, Rahul Sharma, and Shardul Tripathi. EzPC: Programmable and Efficient Secure Two-Party Computation for Machine Learning. In IEEE EuroS&P 2019.

[9]

Nishanth Chandran, Divya Gupta, and Akash Shah. Circuit-PSI with linear complexity via relaxed batch OPPRF. PoPETs, 2022(1), 2022.

[10]

Edward J. Chou, Arun Gururajan, Kim Laine, Nitin Kumar Goel, Anna Bertiger, and Jack W. Stokes. Privacy-preserving phishing web page classification via fully homomorphic encryption. In 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 4-8, 2020. IEEE, 2020.

[11]

Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. Natural language processing (almost) from scratch. J. Mach. Learn. Res., 2011.

[12]

Henry Corrigan-Gibbs and Dan Boneh. Prio: Private, robust, and scalable computation of aggregate statistics. In 14th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2017, Boston, MA, USA, March 27-29, 2017, pages 259--282. USENIX Association, 2017.

[13]

Sanchari Das, Andrew Kim, Zachary Tingle, and Christena Nippert-Eng. All about phishing: Exploring user research through a systematic literature review. CoRR, abs/1908.05897, 2019.

[14]

Steven Englehardt. Next steps in privacy-preserving telemetry with prio. https://blog.mozilla.org/security/2019/06/06/next-steps-in-privacy-preserving-telemetry-with-prio/.

[15]

Ran Gilad-Bachrach, Nathan Dowlin, Kim Laine, Kristin E. Lauter, Michael Naehrig, and John Wernsing. CryptoNets: Applying Neural Networks to Encrypted Data with High Throughput and Accuracy. In ICML 2016.

[16]

Oded Goldreich, Silvio Micali, and Avi Wigderson. How to Play any Mental Game or A Completeness Theorem for Protocols with Honest Majority. In STOC 1987.

Digital Library

[17]

Yan Huang, David Evans, and Jonathan Katz. Private set intersection: Are garbled circuits better than custom protocols? In NDSS, 2012.

[18]

B. Issac, R. Chiong, and S. M. Jacob. Analysis of phishing attacks and countermeasures, 2014.

[19]

Chiraag Juvekar, Vinod Vaikuntanathan, and Anantha Chandrakasan. GAZELLE: A Low Latency Framework for Secure Neural Network Inference. In USENIX Security 2018.

[20]

Seny Kamara, Payman Mohassel, Mariana Raykova, and Seyed Saeed Sadeghian. Scaling private set intersection to billion-element sets. In FC. Springer, 2014.

[21]

Jonathan Katz and Yehuda Lindell. Introduction to Modern Cryptography, Second Edition. Chapman & Hall/CRC, 2nd edition, 2014.

[22]

Adam Kirsch, Michael Mitzenmacher, and Udi Wieder. More robust hashing: Cuckoo hashing with a stash. SIAM J. Comput., 2009.

[23]

Lea Kissner and Dawn Xiaodong Song. Privacy-preserving set operations. In CRYPTO. Springer, 2005.

[24]

Vladimir Kolesnikov, Naor Matania, Benny Pinkas, Mike Rosulek, and Ni Trieu. Practical multi-party private set intersection from symmetric-key techniques. In CCS, 2017.

Digital Library

[25]

Nishant Kumar, Mayank Rathee, Nishanth Chandran, Divya Gupta, Aseem Rastogi, and Rahul Sharma. CrypTFlow: Secure TensorFlow Inference. In IEEE S&P 2020.

[26]

Hung Le, Quang Pham, Doyen Sahoo, and Steven C. H. Hoi. Urlnet: Learning a URL representation with deep learning for malicious URL detection. CoRR, abs/1802.03162, 2018.

[27]

Yehuda Lindell. How to simulate it - a tutorial on the simulation proof technique. Cryptology ePrint Archive, Report 2016/046, 2016. https://eprint.iacr.org/2016/046.

[28]

Pranav Maneriker, Jack W. Stokes, Edir Garcia Lazo, Diana Carutasu, Farid Tajaddodianfar, and Arun Gururajan. Urltran: Improving phishing URL detection using transformers. CoRR, abs/2106.05256, 2021.

[29]

Pratyush Mishra, Ryan Lehmkuhl, Akshayaram Srinivasan, Wenting Zheng, and Raluca Ada Popa. Delphi: A Cryptographic Inference Service for Neural Networks. In USENIX Security 2020.

[30]

Payman Mohassel and Peter Rindal. ABY(^mbox3 ): A Mixed Protocol Framework for Machine Learning. In CCS 2018.

[31]

Payman Mohassel, Peter Rindal, and Mike Rosulek. Fast database joins and PSI for secret shared data. In CCS, 2020.

Digital Library

[32]

Payman Mohassel, Mike Rosulek, and Ye Zhang. Fast and secure three-party computation: The garbled circuit approach. In CCS, 2015.

[33]

Payman Mohassel and Yupeng Zhang. SecureML: A System for Scalable Privacy-Preserving Machine Learning. In IEEE S&P 2017.

[34]

mpc-msri. EzPC. https://github.com/mpc-msri/EzPC.

[35]

mpc msri. 2pc-circuit-psi, 2021.

[36]

Oleksandr-Tkachenko. HashingTables. https://github.com/Oleksandr-Tkachenko/HashingTables.

[37]

osu-crypto. libOTe. https://github.com/osu-crypto/libOTe. Accessed: 2020-10-07.

[38]

Rasmus Pagh and Flemming Friche Rodler. Cuckoo hashing. In Algorithms - ESA 2001, 9th Annual European Symposium, Aarhus, Denmark, August 28-31, 2001, Proceedings. Springer, 2001.

[39]

Benny Pinkas, Thomas Schneider, Gil Segev, and Michael Zohner. Phasing: Private set intersection using permutation-based hashing. In USENIX, 2015.

[40]

Benny Pinkas, Thomas Schneider, Oleksandr Tkachenko, and Avishay Yanai. Efficient circuit-based PSI with linear communication. In EUROCRYPT. Springer, 2019.

Digital Library

[41]

Benny Pinkas, Thomas Schneider, Christian Weinert, and Udi Wieder. Efficient circuit-based PSI via cuckoo hashing. In EUROCRYPT. Springer, 2018.

[42]

Benny Pinkas, Thomas Schneider, and Michael Zohner. Scalable private set intersection based on OT extension. ACM Trans. Priv. Secur., 2018.

[43]

Deevashwer Rathee, Mayank Rathee, Rahul Kranti Kiran Goli, Divya Gupta, Rahul Sharma, Nishanth Chandran, and Aseem Rastogi. SIRNN: A math library for secure inference of RNNs. In IEEE S&P 2020, 2020.

[44]

Deevashwer Rathee, Mayank Rathee, Nishant Kumar, Nishanth Chandran, Divya Gupta, Aseem Rastogi, and Rahul Sharma. Cryptflow2: Practical 2-party secure inference. In CCS, 2020.

Digital Library

[45]

M. Sadegh Riazi, Christian Weinert, Oleksandr Tkachenko, Ebrahim M. Songhori, Thomas Schneider, and Farinaz Koushanfar. Chameleon: A Hybrid Secure Computation Framework for Machine Learning Applications. In AsiaCCS 2018.

Digital Library

[46]

Peter Rindal and Mike Rosulek. Malicious-secure private set intersection via dual execution. In CCS, 2017.

[47]

Peter Rindal and Phillipp Schoppmann. VOLE-PSI: fast OPRF and circuit-psi from vector-ole. In EUROCRYPT, 2021.

[48]

Phillipp Schoppmann, Adrià Gascón, Leonie Reichert, and Mariana Raykova. Distributed vector-ole: Improved constructions and implementation. In CCS, 2019.

[49]

Adi Shamir. How to share a secret. Commun. ACM, 1979.

Digital Library

[50]

The OpenSSL Project. OpenSSL Cryptography and SSL/TLS Toolkit, https://www.openssl.org/. https://www.openssl.org/.

[51]

Sameer Wagh, Divya Gupta, and Nishanth Chandran. SecureNN: 3-Party Secure Computation for Neural Network Training. PoPETs 2019.

[52]

Andrew Chi-Chih Yao. How to Generate and Exchange Secrets (Extended Abstract). In FOCS 1986.

[53]

Huaping Yuan, Zhenguo Yang, Xu Chen, Yukun Li, and Wenyin Liu. Url2vec: Url modeling with character embeddings for fast and accurate phishing website detection. In 2018 IEEE Intl Conf on Parallel Distributed Processing with Applications, Ubiquitous Computing Communications, Big Data Cloud Computing, Social Computing Networking, Sustainable Computing Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom), 2018.

[54]

Yin Zhang, Rong Jin, and Zhi-Hua Zhou. Understanding bag-of-words model: a statistical framework. Int. J. Mach. Learn. Cybern., 2010.

Cited By

Morales DAgudo ILopez J(2023)Private set intersectionComputer Science Review10.1016/j.cosrev.2023.10056749:COnline publication date: 1-Aug-2023
https://dl.acm.org/doi/10.1016/j.cosrev.2023.100567

Index Terms

Secure Featurization and Applications to Secure Phishing Detection
1. Security and privacy
  1. Intrusion/anomaly detection and malware mitigation
    1. Social engineering attacks
      1. Phishing
  2. Security services
    1. Privacy-preserving protocols

Recommendations

An efficient fair UC-secure protocol for two-party computation

With the development of modern Internet and mobile networks, there is an increasing need for collaborative privacy-preserving applications. Secure multi-party computation SMPC gives a general solution to these applications and has become a hot topic. ...
Secure Multi-Party Computation without Agreement

It has recently been shown that authenticated Byzantine agreement, in which more than a third of the parties are corrupted, cannot be securely realized under concurrent or parallel (stateless) composition. This result puts into question any usage of ...
Round-Optimal Secure Multi-Party Computation
Advances in Cryptology – CRYPTO 2018
Abstract
Secure multi-party computation (MPC) is a central cryptographic task that allows a set of mutually distrustful parties to jointly compute some function of their private inputs where security should hold in the presence of a malicious adversary ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CCSW '21: Proceedings of the 2021 on Cloud Computing Security Workshop

November 2021

161 pages

ISBN:9781450386531

DOI:10.1145/3474123

Program Chairs:
Yinqian Zhang
Southern University of Science and Technology
,
Marten van Dijk
Centrum Wiskunde & Informatica

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSAC: ACM Special Interest Group on Security, Audit, and Control

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 November 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CCS '21

Sponsor:

SIGSAC

CCS '21: 2021 ACM SIGSAC Conference on Computer and Communications Security

November 15, 2021

Virtual Event, Republic of Korea

Acceptance Rates

Overall Acceptance Rate 37 of 108 submissions, 34%

Upcoming Conference

CCS '24

Sponsor:
sigsac

ACM SIGSAC Conference on Computer and Communications Security

October 14 - 18, 2024

Salt Lake City , UT , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
118
Total Downloads

Downloads (Last 12 months)24
Downloads (Last 6 weeks)4

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Morales DAgudo ILopez J(2023)Private set intersectionComputer Science Review10.1016/j.cosrev.2023.10056749:COnline publication date: 1-Aug-2023
https://dl.acm.org/doi/10.1016/j.cosrev.2023.100567

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents