Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Learning Reliable Neural Networks with Distributed Architecture Representations

Published: 25 March 2023 Publication History

Abstract

Neural architecture search (NAS) has shown the strong performance of learning neural models automatically in recent years. But most NAS systems are unreliable due to the architecture gap brought by discrete representations of atomic architectures. In this article, we improve the performance and robustness of NAS via narrowing the gap between architecture representations. More specifically, we apply a general contraction mapping to model neural networks with distributed representations (Neural Architecture Search with Distributed Architecture Representations (ArchDAR)). Moreover, for a better search result, we present a joint learning approach to integrating distributed representations with advanced architecture search methods. We implement our ArchDAR in a differentiable architecture search model and test learned architectures on the language modeling task. On the Penn Treebank data, it outperforms a strong baseline significantly by 1.8 perplexity scores. Also, the search process with distributed representations is more stable, which yields a faster structural convergence when it works with the differentiable architecture search model.

References

[1]
Danilo P. Mandic and Jonathon A. Chambers. 2001. Recurrent neural networks for prediction. Stability Issues in RNN Architectures, John Wiley & Sons Ltd., 115--133. DOI:
[2]
Peter J. Angeline, Gregory M. Saunders, and Jordan B. Pollack. 1994. An evolutionary algorithm that constructs recurrent neural networks. IEEE Trans. Neural Netw. 5, 1 (1994), 54–65. DOI:
[3]
Ankur Bapna, Mia Xu Chen, Orhan Firat, Yuan Cao, and Yonghui Wu. 2018. Training deeper neural machine translation models with transparent attention. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii (Eds.). Association for Computational Linguistics, 3028–3033. DOI:
[4]
Gabriel Bender, Pieter-Jan Kindermans, Barret Zoph, Vijay Vasudevan, and Quoc V. Le. 2018. Understanding and simplifying one-shot architecture search. In Proceedings of the 35th International Conference on Machine Learning (ICML’18), Proceedings of Machine Learning Research, Jennifer G. Dy and Andreas Krause (Eds.), Vol. 80. PMLR, 549–558.
[5]
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language models are few-shot learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems (NeurIPS’20), Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.).
[6]
Han Cai, Tianyao Chen, Weinan Zhang, Yong Yu, and Jun Wang. 2018. Efficient architecture search by network transformation. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI’18), the 30th innovative Applications of Artificial Intelligence (IAAI’18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI’18), Sheila A. McIlraith and Kilian Q. Weinberger (Eds.). AAAI Press, 2787–2794.
[7]
Han Cai, Ligeng Zhu, and Song Han. 2019. ProxylessNAS: Direct neural architecture search on target task and hardware. In Proceedings of the 7th International Conference on Learning Representations (ICLR’19). OpenReview.net.
[8]
Xin Chen, Lingxi Xie, Jun Wu, and Qi Tian. 2021. Progressive DARTS: Bridging the optimization gap for NAS in the wild. Int. J. Comput. Vis. 129, 3 (2021), 638–655. DOI:
[9]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’19), Long and Short Papers, Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, 4171–4186. DOI:
[10]
Tobias Domhan, Jost Tobias Springenberg, and Frank Hutter. 2015. Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI’15), Qiang Yang and Michael J. Wooldridge (Eds.). AAAI Press, 3460–3468.
[11]
Xuanyi Dong and Yi Yang. 2019. Searching for a robust neural architecture in four GPU hours. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). Computer Vision Foundation / IEEE, 1761–1770. DOI:
[12]
Thomas Elsken, Jan Hendrik Metzen, and Frank Hutter. 2021. Neural architecture search: A survey. J. Mach. Learn. Res. 20, 1 (2021), 1997--2017.
[13]
Jiahui Gao, Hang Xu, Han Shi, Xiaozhe Ren, Philip L. H. Yu, Xiaodan Liang, Xin Jiang, and Zhenguo Li. 2022. AutoBERT-Zero: Evolving BERT backbone from scratch. In Proceedings of the 36th AAAI Conference on Artificial Intelligence (AAAI’22). AAAI Press.
[14]
Trevor Hastie, Robert Tibshirani, and Jerome H. Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, (2nd ed.). Springer. DOI:
[15]
Frank Hutter, Lars Kotthoff, and Joaquin Vanschoren. 2019. Automated Machine Learning: Methods, Systems, Challenges (1st ed.). Springer Publishing Company, Incorporated.
[16]
Yufan Jiang, Chi Hu, Tong Xiao, Chunliang Zhang, and Jingbo Zhu. 2019. Improved differentiable architecture search for language modeling and named entity recognition. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19), Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan (Eds.). Association for Computational Linguistics, 3583–3588. DOI:
[17]
D. G. Kelly. 1990. Stability in contractive nonlinear neural networks. IEEE Trans. Biomed. Eng. 37, 3 (1990), 231–242. DOI:
[18]
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015), Yoshua Bengio and Yann LeCun (Eds.).
[19]
Lisha Li, Kevin G. Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, and Ameet Talwalkar. 2017. Hyperband: Bandit-based configuration evaluation for hyperparameter optimization. In Proceedings of the 5th International Conference on Learning Representations (ICLR’17). OpenReview.net.
[20]
Liam Li and Ameet Talwalkar. 2019. Random search and reproducibility for neural architecture search. In Proceedings of the 35th Conference on Uncertainty in Artificial Intelligence (UAI’19). 129.
[21]
Yinqiao Li, Chi Hu, Yuhao Zhang, Nuo Xu, Yufan Jiang, Tong Xiao, Jingbo Zhu, Tongran Liu, and Changliang Li. 2020. Learning architectures from an extended search space for language modeling. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL’20), Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel R. Tetreault (Eds.). Association for Computational Linguistics, 6629–6639. DOI:
[22]
Hanxiao Liu, Karen Simonyan, and Yiming Yang. 2019. DARTS: Differentiable architecture search. In Proceedings of the 7th International Conference on Learning Representations (ICLR’19).
[23]
Renqian Luo, Fei Tian, Tao Qin, Enhong Chen, and Tie-Yan Liu. 2018. Neural architecture optimization. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems (NeurIPS’18), Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett (Eds.). 7827–7838.
[24]
Stephen Merity, Nitish Shirish Keskar, and Richard Socher. 2018. Regularizing and optimizing LSTM language models. In Proceedings of the 6th International Conference on Learning Representations (ICLR’18). OpenReview.net.
[25]
Tomás Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. In Proceedings of the 1st International Conference on Learning Representations (ICLR’13), Yoshua Bengio and Yann LeCun (Eds.).
[26]
Tomás Mikolov, Martin Karafiát, Lukás Burget, Jan Cernocký, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH’10), Takao Kobayashi, Keikichi Hirose, and Satoshi Nakamura (Eds.). ISCA, 1045–1048.
[27]
Tomás Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems (NeurIPS’13), Christopher J. C. Burges, Léon Bottou, Zoubin Ghahramani, and Kilian Q. Weinberger (Eds.). 3111–3119.
[28]
Hieu Pham, Melody Y. Guan, Barret Zoph, Quoc V. Le, and Jeff Dean. 2018. Efficient neural architecture search via parameter sharing. In Proceedings of the 35th International Conference on Machine Learning (ICML’18), Proceedings of Machine Learning Research, Jennifer G. Dy and Andreas Krause (Eds.), Vol. 80. PMLR, 4092–4101.
[29]
B. T. Polyak and A. B. Juditsky. 1992. Acceleration of stochastic approximation by averaging. SIAM J. Contr. Optimiz. 30, 4 (1992), 838–855. DOI: arXiv:
[30]
Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V. Le. 2019. Regularized evolution for image classifier architecture search. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI’19), the 31st Innovative Applications of Artificial Intelligence Conference (IAAI’19), the 9th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI’19). AAAI Press, 4780–4789. DOI:
[31]
Benjamin Sanchez-Lengeling, Emily Reif, Adam Pearce, and Alexander B. Wiltschko. 2021. A gentle introduction to graph neural networks. Distill 2021 (2021). DOI: https://distill.pub/2021/gnn-intro.
[32]
Alberto Sanfeliu and King-Sun Fu. 1983. A distance measure between attributed relational graphs for pattern recognition. IEEE Trans. Syst. Man Cybernet. 13, 3 (1983), 353–362. DOI:
[33]
Vincent Spruyt. 2014. The curse of dimensionality in classification. Comput. Vis. Dummies 21, 3 (2014), 35–40.
[34]
Kevin Swersky, Jasper Snoek, and Ryan Prescott Adams. 2014. Freeze-Thaw Bayesian optimization. arXiv:1406.3896. Retrieved from https://arxiv.org/abs/1406.3896.
[35]
Ryu Takeda, Naoyuki Kanda, and Nobuo Nukaga. 2014. Boundary contraction training for acoustic models based on discrete deep neural networks. In Proceedings of the 15th Annual Conference of the International Speech Communication Association (INTERSPEECH’14), Haizhou Li, Helen M. Meng, Bin Ma, Engsiong Chng, and Lei Xie (Eds.). ISCA, 1063–1067.
[36]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 6000–6010.
[37]
Qiang Wang, Bei Li, Tong Xiao, Jingbo Zhu, Changliang Li, Derek F. Wong, and Lidia S. Chao. 2019. Learning deep transformer models for machine translation. In Proceedings of the 57th Conference of the Association for Computational Linguistics (ACL’19), Volume 1: Long Papers, Anna Korhonen, David R. Traum, and Lluís Màrquez (Eds.). Association for Computational Linguistics, 1810–1822. DOI:
[38]
Tao Wei, Changhu Wang, Yong Rui, and Chang Wen Chen. 2016. Network morphism. In Proceedings of the 33nd International Conference on Machine Learning (ICML’16), JMLR Workshop and Conference Proceedings, Maria-Florina Balcan and Kilian Q. Weinberger (Eds.), Vol. 48. JMLR.org, 564–572.
[39]
Xiangpeng Wei, Heng Yu, Yue Hu, Yue Zhang, Rongxiang Weng, and Weihua Luo. 2020. Multiscale collaborative deep models for neural machine translation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL’20), Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel R. Tetreault (Eds.). Association for Computational Linguistics, 414–426. DOI:
[40]
Paul J Werbos. 1990. Backpropagation through time: What it does and how to do it. Proc. IEEE 78, 10 (1990), 1550–1560.
[41]
Yuhui Xu, Lingxi Xie, Xiaopeng Zhang, Xin Chen, Guo-Jun Qi, Qi Tian, and Hongkai Xiong. 2020. {PC}-{DARTS}: Partial channel connections for memory-efficient architecture search. In Proceedings of the International Conference on Learning Representations.
[42]
Kaicheng Yu, Christian Sciuto, Martin Jaggi, Claudiu Musat, and Mathieu Salzmann. 2020. Evaluating the search phase of neural architecture search. In Proceedings of the 8th International Conference on Learning Representations (ICLR’20). OpenReview.net.
[43]
Jie Zhou, Ganqu Cui, Shengding Hu, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, Lifeng Wang, Changcheng Li, and Maosong Sun. 2020. Graph neural networks: A review of methods and applications. AI Open 1 (2020), 57–81. DOI:
[44]
Julian G. Zilly, Rupesh Kumar Srivastava, Jan Koutník, and Jürgen Schmidhuber. 2016. Recurrent highway networks. arXiv:1607.03474. Retrieved from https://arxiv.org/abs/1607.03474.
[45]
Barret Zoph and Quoc V. Le. 2017. Neural architecture search with reinforcement learning. In Proceedings of the 5th International Conference on Learning Representations (ICLR’17). OpenReview.net.
[46]
Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le. 2018. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 8697–8710. DOI:

Cited By

View all
  • (2024)Technology of NiuTrans Open Source Statistical Machine Translation System2024 International Conference on Integrated Circuits and Communication Systems (ICICACS)10.1109/ICICACS60521.2024.10498323(1-5)Online publication date: 23-Feb-2024
  • (2023)Modeling Extractive Question Answering Using Encoder-Decoder Models with Constrained Decoding and Evaluation-Based Reinforcement LearningMathematics10.3390/math1107162411:7(1624)Online publication date: 27-Mar-2023
  • (2023)Multimodal Social Data Analytics on the Design and Implementation of an EEG-Mechatronic System InterfaceJournal of Data and Information Quality10.1145/359730615:3(1-25)Online publication date: 28-Sep-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing
ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 22, Issue 4
April 2023
682 pages
ISSN:2375-4699
EISSN:2375-4702
DOI:10.1145/3588902
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 March 2023
Online AM: 04 January 2023
Accepted: 25 December 2022
Revised: 25 October 2022
Received: 07 June 2022
Published in TALLIP Volume 22, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Neural architecture search
  2. neural networks
  3. language modeling
  4. natural language processing

Qualifiers

  • Research-article

Funding Sources

  • National Science Foundation of China
  • National Key R&D Project of China
  • China HTRD Center
  • Yunnan Provincial Major Science and Technology Special Plan Projects

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)96
  • Downloads (Last 6 weeks)10
Reflects downloads up to 22 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Technology of NiuTrans Open Source Statistical Machine Translation System2024 International Conference on Integrated Circuits and Communication Systems (ICICACS)10.1109/ICICACS60521.2024.10498323(1-5)Online publication date: 23-Feb-2024
  • (2023)Modeling Extractive Question Answering Using Encoder-Decoder Models with Constrained Decoding and Evaluation-Based Reinforcement LearningMathematics10.3390/math1107162411:7(1624)Online publication date: 27-Mar-2023
  • (2023)Multimodal Social Data Analytics on the Design and Implementation of an EEG-Mechatronic System InterfaceJournal of Data and Information Quality10.1145/359730615:3(1-25)Online publication date: 28-Sep-2023

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media