short-paper

Open access

Compressed Models are NOT Miniature Versions of Large Models

Authors:

Amit AwekarAuthors Info & Claims

CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management

Pages 4005 - 4009

https://doi.org/10.1145/3627673.3679888

Published: 21 October 2024 Publication History

Abstract

Large neural models are often compressed before deployment. Model compression is necessary for many practical reasons, such as inference latency, memory footprint, and energy consumption. Compressed models are assumed to be miniature versions of corresponding large neural models. However, we question this belief in our work. We compare compressed models with corresponding large neural models using four model characteristics: prediction errors, data representation, data distribution, and vulnerability to adversarial attack. We perform experiments using the BERT-large model and its five compressed versions. For all four model characteristics, compressed models significantly differ from the BERT-large model. Even among compressed models, they differ from each other on all four model characteristics. Apart from the expected loss in model performance, there are major side effects of using compressed models to replace large neural models.

References

[1]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171--4186. https://doi.org/10.18653/v1/N19--1423

[2]

Manish Gupta and Puneet Agrawal. 2022. Compression of Deep Learning Models for Text: A Survey. ACM Trans. Knowl. Discov. Data (2022). https://doi.org/10.1145/3487045

Digital Library

[3]

Song Han, Jeff Pool, John Tran, and William J. Dally. 2015. Learning both weights and connections for efficient neural networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1. 1135--1143. https://dl.acm.org/doi/10.5555/2969239.2969366

Digital Library

[4]

Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. 2015. Teaching machines to read and comprehend. Advances in neural information processing systems (2015). https://dl.acm.org/doi/10.5555/2969239.2969428

[5]

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).

[6]

Akshita Jha and Chandan K Reddy. 2023. Codeattack: Code-based adversarial attacks for pre-trained programming language models. In Proceedings of the AAAI Conference on Artificial Intelligence. 14892--14900. https://doi.org/10.1609/aaai.v37i12.26739

Digital Library

[7]

Yannik Keller, Jan Mackensen, and Steffen Eger. 2021. BERT-Defense: A Probabilistic Model Based on BERT to Combat Cognitively Inspired Orthographic Adversarial Attacks. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 1616--1629. https://doi.org/10.18653/v1/2021.findings-acl.141

[8]

Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, and Kurt Keutzer. 2021. I-BERT: Integer-only BERT Quantization. In Proceedings of the 38th International Conference on Machine Learning. 5506--5518. https://dblp.org/rec/journals/corr/abs-2101-01321.html

[9]

Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. 2019. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics (2019), 1234--1240. https://doi.org/10.1093/bioinformatics/btz682

[10]

Linyang Li, Ruotian Ma, Qipeng Guo, Xiangyang Xue, and Xipeng Qiu. 2020. BERT-ATTACK: Adversarial Attack Against BERT Using BERT. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 6193--6202. https://doi.org/10.18653/v1/2020.emnlp-main.500

[11]

Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. 2011. Learning Word Vectors for Sentiment Analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 142--150. https://aclanthology.org/P11--1015/

Digital Library

[12]

John X Morris, Eli Lifland, Jin Yong Yoo, Jake Grigsby, Di Jin, and Yanjun Qi. 2020. TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP. EMNLP 2020 (2020), 119. https://doi.org/10.18653/v1/2020.emnlp-demos.16

[13]

Saeed Mian Qaisar. 2020. Sentiment analysis of IMDb movie reviews using long short-term memory. In Proceedings of the 2020 2nd International Conference on Computer and Information Sciences (ICCIS). 1--4. https://doi.org/10.1109/ICCIS49240.2020.9257657

[14]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research (2020), 1--67. https://dl.acm.org/doi/abs/10.5555/3455716.3455856

[15]

Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000 Questions for Machine Comprehension of Text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2383--2392. https://doi.org/10.18653/v1/D16--1264

[16]

Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019).

[17]

Nurullah Sevim, Furkan Sahinuç, and Aykut Koç. 2023. Gender bias in legal corpora and debiasing it. Natural Language Engineering (2023), 449--482. https://doi.org/10.1017/S1351324922000122

[18]

Yiyou Sun, Yifei Ming, Xiaojin Zhu, and Yixuan Li. 2022. Out-of-distribution detection with deep nearest neighbors. In Proceedings of International Conference on Machine Learning. 20827--20840. https://proceedings.mlr.press/v162/sun22d

[19]

Adam Trischler, Tong Wang, Xingdi Yuan, Justin Harris, Alessandro Sordoni, Philip Bachman, and Kaheer Suleman. 2017. NewsQA: A Machine Comprehension Dataset. In Proceedings of the 2nd Workshop on Representation Learning for NLP. 191--200. https://doi.org/10.18653/v1/W17--2623

[20]

Iulia Turc, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Well-Read Students Learn Better: The Impact of Student Initialization on Knowledge Distillation. CoRR (2019). https://doi.org/10.48550/arXiv.1908.08962

Index Terms

Compressed Models are NOT Miniature Versions of Large Models
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

QNCD: Quantization Noise Correction for Diffusion Models
MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

Diffusion models have revolutionized image synthesis, setting new benchmarks in quality and creativity. However, their widespread adoption is hindered by the intensive computation required during the iterative denoising process. Post-training ...
Compressing Pre-trained Models of Code into 3 MB
ASE '22: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering

Although large pre-trained models of code have delivered significant advancements in various code processing tasks, there is an impediment to the wide and fluent adoption of these powerful models in software developers’ daily workflow: these large ...
Bit allocation for joint source and channel coding of progressively compressed 3-D models

This work presents a joint source and channel coding method for transmission of progressively compressed three-dimensional (3-D) models where the bit budget is allocated optimally. The proposed system uses the Compressed Progressive Mesh (CPM) algorithm ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management

October 2024

5705 pages

ISBN:9798400704369

DOI:10.1145/3627673

General Chairs:
Edoardo Serra
Boise State University, USA
,
Francesca Spezzano
Boise State University, USA

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 October 2024

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

CIKM '24

Sponsor:

SIGIR

CIKM '24: The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

ID, Boise, USA

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
60
Total Downloads

Downloads (Last 12 months)60
Downloads (Last 6 weeks)34

Reflects downloads up to 24 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents