research-article

Learning Reliable Neural Networks with Distributed Architecture Representations

Authors:

Jingbo ZhuAuthors Info & Claims

ACM Transactions on Asian and Low-Resource Language Information Processing, Volume 22, Issue 4

Article No.: 109, Pages 1 - 20

https://doi.org/10.1145/3578709

Published: 25 March 2023 Publication History

Abstract

Neural architecture search (NAS) has shown the strong performance of learning neural models automatically in recent years. But most NAS systems are unreliable due to the architecture gap brought by discrete representations of atomic architectures. In this article, we improve the performance and robustness of NAS via narrowing the gap between architecture representations. More specifically, we apply a general contraction mapping to model neural networks with distributed representations (Neural Architecture Search with Distributed Architecture Representations (ArchDAR)). Moreover, for a better search result, we present a joint learning approach to integrating distributed representations with advanced architecture search methods. We implement our ArchDAR in a differentiable architecture search model and test learned architectures on the language modeling task. On the Penn Treebank data, it outperforms a strong baseline significantly by 1.8 perplexity scores. Also, the search process with distributed representations is more stable, which yields a faster structural convergence when it works with the differentiable architecture search model.

References

[1]

Danilo P. Mandic and Jonathon A. Chambers. 2001. Recurrent neural networks for prediction. Stability Issues in RNN Architectures, John Wiley & Sons Ltd., 115--133. DOI:

[2]

Peter J. Angeline, Gregory M. Saunders, and Jordan B. Pollack. 1994. An evolutionary algorithm that constructs recurrent neural networks. IEEE Trans. Neural Netw. 5, 1 (1994), 54–65. DOI:

Digital Library

[3]

Ankur Bapna, Mia Xu Chen, Orhan Firat, Yuan Cao, and Yonghui Wu. 2018. Training deeper neural machine translation models with transparent attention. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii (Eds.). Association for Computational Linguistics, 3028–3033. DOI:

[4]

Gabriel Bender, Pieter-Jan Kindermans, Barret Zoph, Vijay Vasudevan, and Quoc V. Le. 2018. Understanding and simplifying one-shot architecture search. In Proceedings of the 35th International Conference on Machine Learning (ICML’18), Proceedings of Machine Learning Research, Jennifer G. Dy and Andreas Krause (Eds.), Vol. 80. PMLR, 549–558.

[5]

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language models are few-shot learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems (NeurIPS’20), Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.).

[6]

Han Cai, Tianyao Chen, Weinan Zhang, Yong Yu, and Jun Wang. 2018. Efficient architecture search by network transformation. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI’18), the 30th innovative Applications of Artificial Intelligence (IAAI’18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI’18), Sheila A. McIlraith and Kilian Q. Weinberger (Eds.). AAAI Press, 2787–2794.

[7]

Han Cai, Ligeng Zhu, and Song Han. 2019. ProxylessNAS: Direct neural architecture search on target task and hardware. In Proceedings of the 7th International Conference on Learning Representations (ICLR’19). OpenReview.net.

[8]

Xin Chen, Lingxi Xie, Jun Wu, and Qi Tian. 2021. Progressive DARTS: Bridging the optimization gap for NAS in the wild. Int. J. Comput. Vis. 129, 3 (2021), 638–655. DOI:

Digital Library

[9]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’19), Long and Short Papers, Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, 4171–4186. DOI:

[10]

Tobias Domhan, Jost Tobias Springenberg, and Frank Hutter. 2015. Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI’15), Qiang Yang and Michael J. Wooldridge (Eds.). AAAI Press, 3460–3468.

Digital Library

[11]

Xuanyi Dong and Yi Yang. 2019. Searching for a robust neural architecture in four GPU hours. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). Computer Vision Foundation / IEEE, 1761–1770. DOI:

[12]

Thomas Elsken, Jan Hendrik Metzen, and Frank Hutter. 2021. Neural architecture search: A survey. J. Mach. Learn. Res. 20, 1 (2021), 1997--2017.

[13]

Jiahui Gao, Hang Xu, Han Shi, Xiaozhe Ren, Philip L. H. Yu, Xiaodan Liang, Xin Jiang, and Zhenguo Li. 2022. AutoBERT-Zero: Evolving BERT backbone from scratch. In Proceedings of the 36th AAAI Conference on Artificial Intelligence (AAAI’22). AAAI Press.

[14]

Trevor Hastie, Robert Tibshirani, and Jerome H. Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, (2nd ed.). Springer. DOI:

[15]

Frank Hutter, Lars Kotthoff, and Joaquin Vanschoren. 2019. Automated Machine Learning: Methods, Systems, Challenges (1st ed.). Springer Publishing Company, Incorporated.

[16]

Yufan Jiang, Chi Hu, Tong Xiao, Chunliang Zhang, and Jingbo Zhu. 2019. Improved differentiable architecture search for language modeling and named entity recognition. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19), Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan (Eds.). Association for Computational Linguistics, 3583–3588. DOI:

[17]

D. G. Kelly. 1990. Stability in contractive nonlinear neural networks. IEEE Trans. Biomed. Eng. 37, 3 (1990), 231–242. DOI:

[18]

Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015), Yoshua Bengio and Yann LeCun (Eds.).

[19]

Lisha Li, Kevin G. Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, and Ameet Talwalkar. 2017. Hyperband: Bandit-based configuration evaluation for hyperparameter optimization. In Proceedings of the 5th International Conference on Learning Representations (ICLR’17). OpenReview.net.

[20]

Liam Li and Ameet Talwalkar. 2019. Random search and reproducibility for neural architecture search. In Proceedings of the 35th Conference on Uncertainty in Artificial Intelligence (UAI’19). 129.

[21]

Yinqiao Li, Chi Hu, Yuhao Zhang, Nuo Xu, Yufan Jiang, Tong Xiao, Jingbo Zhu, Tongran Liu, and Changliang Li. 2020. Learning architectures from an extended search space for language modeling. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL’20), Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel R. Tetreault (Eds.). Association for Computational Linguistics, 6629–6639. DOI:

[22]

Hanxiao Liu, Karen Simonyan, and Yiming Yang. 2019. DARTS: Differentiable architecture search. In Proceedings of the 7th International Conference on Learning Representations (ICLR’19).

[23]

Renqian Luo, Fei Tian, Tao Qin, Enhong Chen, and Tie-Yan Liu. 2018. Neural architecture optimization. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems (NeurIPS’18), Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett (Eds.). 7827–7838.

[24]

Stephen Merity, Nitish Shirish Keskar, and Richard Socher. 2018. Regularizing and optimizing LSTM language models. In Proceedings of the 6th International Conference on Learning Representations (ICLR’18). OpenReview.net.

[25]

Tomás Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. In Proceedings of the 1st International Conference on Learning Representations (ICLR’13), Yoshua Bengio and Yann LeCun (Eds.).

[26]

Tomás Mikolov, Martin Karafiát, Lukás Burget, Jan Cernocký, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH’10), Takao Kobayashi, Keikichi Hirose, and Satoshi Nakamura (Eds.). ISCA, 1045–1048.

[27]

Tomás Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems (NeurIPS’13), Christopher J. C. Burges, Léon Bottou, Zoubin Ghahramani, and Kilian Q. Weinberger (Eds.). 3111–3119.

[28]

Hieu Pham, Melody Y. Guan, Barret Zoph, Quoc V. Le, and Jeff Dean. 2018. Efficient neural architecture search via parameter sharing. In Proceedings of the 35th International Conference on Machine Learning (ICML’18), Proceedings of Machine Learning Research, Jennifer G. Dy and Andreas Krause (Eds.), Vol. 80. PMLR, 4092–4101.

[29]

B. T. Polyak and A. B. Juditsky. 1992. Acceleration of stochastic approximation by averaging. SIAM J. Contr. Optimiz. 30, 4 (1992), 838–855. DOI: arXiv:

Digital Library

[30]

Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V. Le. 2019. Regularized evolution for image classifier architecture search. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI’19), the 31st Innovative Applications of Artificial Intelligence Conference (IAAI’19), the 9th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI’19). AAAI Press, 4780–4789. DOI:

Digital Library

[31]

Benjamin Sanchez-Lengeling, Emily Reif, Adam Pearce, and Alexander B. Wiltschko. 2021. A gentle introduction to graph neural networks. Distill 2021 (2021). DOI: https://distill.pub/2021/gnn-intro.

[32]

Alberto Sanfeliu and King-Sun Fu. 1983. A distance measure between attributed relational graphs for pattern recognition. IEEE Trans. Syst. Man Cybernet. 13, 3 (1983), 353–362. DOI:

[33]

Vincent Spruyt. 2014. The curse of dimensionality in classification. Comput. Vis. Dummies 21, 3 (2014), 35–40.

[34]

Kevin Swersky, Jasper Snoek, and Ryan Prescott Adams. 2014. Freeze-Thaw Bayesian optimization. arXiv:1406.3896. Retrieved from https://arxiv.org/abs/1406.3896.

[35]

Ryu Takeda, Naoyuki Kanda, and Nobuo Nukaga. 2014. Boundary contraction training for acoustic models based on discrete deep neural networks. In Proceedings of the 15th Annual Conference of the International Speech Communication Association (INTERSPEECH’14), Haizhou Li, Helen M. Meng, Bin Ma, Engsiong Chng, and Lei Xie (Eds.). ISCA, 1063–1067.

[36]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 6000–6010.

[37]

Qiang Wang, Bei Li, Tong Xiao, Jingbo Zhu, Changliang Li, Derek F. Wong, and Lidia S. Chao. 2019. Learning deep transformer models for machine translation. In Proceedings of the 57th Conference of the Association for Computational Linguistics (ACL’19), Volume 1: Long Papers, Anna Korhonen, David R. Traum, and Lluís Màrquez (Eds.). Association for Computational Linguistics, 1810–1822. DOI:

[38]

Tao Wei, Changhu Wang, Yong Rui, and Chang Wen Chen. 2016. Network morphism. In Proceedings of the 33nd International Conference on Machine Learning (ICML’16), JMLR Workshop and Conference Proceedings, Maria-Florina Balcan and Kilian Q. Weinberger (Eds.), Vol. 48. JMLR.org, 564–572.

[39]

Xiangpeng Wei, Heng Yu, Yue Hu, Yue Zhang, Rongxiang Weng, and Weihua Luo. 2020. Multiscale collaborative deep models for neural machine translation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL’20), Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel R. Tetreault (Eds.). Association for Computational Linguistics, 414–426. DOI:

[40]

Paul J Werbos. 1990. Backpropagation through time: What it does and how to do it. Proc. IEEE 78, 10 (1990), 1550–1560.

[41]

Yuhui Xu, Lingxi Xie, Xiaopeng Zhang, Xin Chen, Guo-Jun Qi, Qi Tian, and Hongkai Xiong. 2020. {PC}-{DARTS}: Partial channel connections for memory-efficient architecture search. In Proceedings of the International Conference on Learning Representations.

[42]

Kaicheng Yu, Christian Sciuto, Martin Jaggi, Claudiu Musat, and Mathieu Salzmann. 2020. Evaluating the search phase of neural architecture search. In Proceedings of the 8th International Conference on Learning Representations (ICLR’20). OpenReview.net.

[43]

Jie Zhou, Ganqu Cui, Shengding Hu, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, Lifeng Wang, Changcheng Li, and Maosong Sun. 2020. Graph neural networks: A review of methods and applications. AI Open 1 (2020), 57–81. DOI:

[44]

Julian G. Zilly, Rupesh Kumar Srivastava, Jan Koutník, and Jürgen Schmidhuber. 2016. Recurrent highway networks. arXiv:1607.03474. Retrieved from https://arxiv.org/abs/1607.03474.

[45]

Barret Zoph and Quoc V. Le. 2017. Neural architecture search with reinforcement learning. In Proceedings of the 5th International Conference on Learning Representations (ICLR’17). OpenReview.net.

[46]

Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le. 2018. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 8697–8710. DOI:

Cited By

Cui QSun XSun X(2024)Technology of NiuTrans Open Source Statistical Machine Translation System2024 International Conference on Integrated Circuits and Communication Systems (ICICACS)10.1109/ICICACS60521.2024.10498323(1-5)Online publication date: 23-Feb-2024
https://doi.org/10.1109/ICICACS60521.2024.10498323
Li SSun CLiu BLiu YJi Z(2023)Modeling Extractive Question Answering Using Encoder-Decoder Models with Constrained Decoding and Evaluation-Based Reinforcement LearningMathematics10.3390/math1107162411:7(1624)Online publication date: 27-Mar-2023
https://doi.org/10.3390/math11071624
Aume CPal SJolfaei AMukhopadhyay S(2023)Multimodal Social Data Analytics on the Design and Implementation of an EEG-Mechatronic System InterfaceJournal of Data and Information Quality10.1145/359730615:3(1-25)Online publication date: 28-Sep-2023
https://dl.acm.org/doi/10.1145/3597306

Index Terms

Learning Reliable Neural Networks with Distributed Architecture Representations
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Natural language generation
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Neural Architecture Search Applied to Hybrid Morphological Neural Networks
Intelligent Systems
Abstract
This work addresses a way to train morphological neural network differentially using backpropagation. The proposed algorithm can also learn whether to use erosion or dilation, based on the data being processed. Finally, we apply architecture ...
Graph neural architecture prediction
Abstract
Graph neural networks (GNNs) have shown their superiority in the modeling of graph data. Recently, increasing attention has been paid to automatic graph neural architecture search, aiming to overcome the shortcomings of manually constructing GNN ...
Designing convolutional neural networks with constrained evolutionary piecemeal training
Abstract
The automated architecture search methodology for neural networks is known as Neural Architecture Search (NAS). In recent times, Convolutional Neural Networks (CNNs) designed through NAS methodologies have achieved very high performance in several ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 22, Issue 4

April 2023

682 pages

ISSN:2375-4699

EISSN:2375-4702

DOI:10.1145/3588902

Editor:
Imed Zitouni
Google, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 March 2023

Online AM: 04 January 2023

Accepted: 25 December 2022

Revised: 25 October 2022

Received: 07 June 2022

Published in TALLIP Volume 22, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation of China
National Key R&D Project of China
China HTRD Center
Yunnan Provincial Major Science and Technology Special Plan Projects

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
231
Total Downloads

Downloads (Last 12 months)96
Downloads (Last 6 weeks)10

Reflects downloads up to 22 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Cui QSun XSun X(2024)Technology of NiuTrans Open Source Statistical Machine Translation System2024 International Conference on Integrated Circuits and Communication Systems (ICICACS)10.1109/ICICACS60521.2024.10498323(1-5)Online publication date: 23-Feb-2024
https://doi.org/10.1109/ICICACS60521.2024.10498323
Li SSun CLiu BLiu YJi Z(2023)Modeling Extractive Question Answering Using Encoder-Decoder Models with Constrained Decoding and Evaluation-Based Reinforcement LearningMathematics10.3390/math1107162411:7(1624)Online publication date: 27-Mar-2023
https://doi.org/10.3390/math11071624
Aume CPal SJolfaei AMukhopadhyay S(2023)Multimodal Social Data Analytics on the Design and Implementation of an EEG-Mechatronic System InterfaceJournal of Data and Information Quality10.1145/359730615:3(1-25)Online publication date: 28-Sep-2023
https://dl.acm.org/doi/10.1145/3597306

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents