Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2685048.2685094acmotherconferencesArticle/Chapter ViewAbstractPublication PagesosdiConference Proceedingsconference-collections
Article

Project Adam: building an efficient and scalable deep learning training system

Published: 06 October 2014 Publication History
  • Get Citation Alerts
  • Abstract

    Large deep neural network models have recently demonstrated state-of-the-art accuracy on hard visual recognition tasks. Unfortunately such models are extremely time consuming to train and require large amount of compute cycles. We describe the design and implementation of a distributed system called Adam comprised of commodity server machines to train such models that exhibits world-class performance, scaling and task accuracy on visual recognition tasks. Adam achieves high efficiency and scalability through whole system co-design that optimizes and balances workload computation and communication. We exploit asynchrony throughout the system to improve performance and show that it additionally improves the accuracy of trained models. Adam is significantly more efficient and scalable than was previously thought possible and used 30x fewer machines to train a large 2 billion connection model to 2x higher accuracy in comparable time on the ImageNet 22,000 category image classification task than the system that previously held the record for this benchmark. We also show that task accuracy improves with larger models. Our results provide compelling evidence that a distributed systems-driven approach to deep learning using current training algorithms is worth pursuing.

    References

    [1]
    Bengio, Y., and LeCun, Y. 2007. Scaling Learning Algorithms towards AI. In Large-Scale Kernel Machines, Bottou, L. et al. (Eds), MIT Press.
    [2]
    Bottou, L., 1991. Stochastic gradient learning in neural networks. In Proceedings of Neuro-Nîmes 91, EC2, Nimes, France.
    [3]
    S. Chakradhar, M. Sankaradas, V. Jakkula, and S. Cadambi. 2010. A dynamically configurable coprocessor for convolutional neural networks. In International symposium on Computer Architecture, ISCA'10.
    [4]
    Chen, T., Du, Z., Sun, N., Wang, J., Wu, C., Chen, Y., and Temam, O. 2014. DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning. In International Conference on Architectural Support for Programming Languages and Operating Systems. ASPLOS'14.
    [5]
    Ciresan, D. C, Meier, U., and Schmidhuber, J. 2012. Multicolumn deep neural networks for image classification. In Computer Vision and Pattern Recognition. CVPR'12.
    [6]
    Coates, A., Huval, B., Wang, T., Wu, D., Ng, A., and Catanzaro, B. 2013. Deep Learning with COTS HPC. In International Conference on Machine Learning. ICML'13.
    [7]
    Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Mao, M., Ranzato, M., Senior, A., Tucker, P., Yang, K., Le, Q., and Ng, A. 2012. Large Scale Distributed Deep Networks. In Advances in Neural Information Processing Systems. NIPS'12.
    [8]
    Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. 2009. ImageNet: A Large-Scale Hierarchical Image Database. In Computer Vision and Pattern Recognition. CVPR '09.
    [9]
    Faerber, P., and Asanovic, K. 1997. Parallel neural network training on Multi-Spert. In IEEE 3rd International Conference on Algorithms and Architectures for Parallel Processing (Melbourne, Australia, December 1997).
    [10]
    Farabet, C., Martini, B., Corda, B., Akselrod, P., Culurciello, E., and LeCun, Y. 2011. NeuFlow: A runtime reconfigurable dataflow processor for vision. In Computer Vision and Pattern Recognition Workshop (June 2011), pages 109-116.
    [11]
    Fukushima, K. 1980. Neocognitron: A self-organizing neural network for a mechanism of pattern recognition unaffected by shift in position. In Biological Cybernetics, 36(4): 93-202.
    [12]
    Goodfellow, I., Warde-Farley, D., Mirza, M., Courville, A., and Bengio, Y. 2013. Maxout Networks. In International Conference on Machine Learning. ICML'13.
    [13]
    Hahnloser, R. 2003. Permitted and Forbidden Sets in Symmetric Threshold-Linear Networks. In Neural Computing. (Mar. 2003), 15(3):621-38.
    [14]
    Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T., and Kingsbury, B. 2012. Deep neural networks for acoustic modeling in speech recognition. In IEEE Signal Processing Magazine, 2012.
    [15]
    Hubel, D. and Wiesel, T. 1968. Receptive fields and functional architecture of monkey striate cortex. In Journal of Physiology (London), 195, 215-243.
    [16]
    Kim, J., Member, S., Kim, M., Lee, S., Oh, J., Kim, K. and Yoo, H. 2010. A 201.4 GOPS 496 mW Real-Time Multi-Object Recognition Processor with Bio-Inspired Neural Perception Engine. In IEEE Journal of Solid-State Circuits, (Jan. 2010), 45(1):32-45.
    [17]
    Krizhevsky, A., Sutskever, I., and Hinton, G. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems. NIPS'12.
    [18]
    Le, Q., Ranzato, M., Monga, R., Devin, M., Chen, K., Corrado, G., Dean, J., and Ng, A. 2012. Building high-level features using large scale unsupervised learning. In International Conference on Machine Learning. ICML'12.
    [19]
    LeCun, Y., Boser, B., Denker, J., Henderson, D., Howard, R., Hubbard, W., and Jackel, L. 1989. Backpropagation Applied to Handwritten Zip Code Recognition. In Neural Computation, 1(4):541-551.
    [20]
    LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. 1998. Gradient-based learning applied to document recognition. In Proceedings of the IEEE, 86(11):2278-2324, (Nov. 1998).
    [21]
    Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., and Hellerstein, J. 2012. Distributed GraphLab: A framework for machine learning in the cloud. In International Conference on Very Large Databases. VLDB'12.
    [22]
    Maashri, A., Debole, M., Cotter, M., Chandramoorthy, N., Xiao, Y., Narayanan, V., and Chakrabarti, C. 2012. Accelerating neuromorphic vision algorithms for recognition. In Proceedings of the 49th Annual Design Automation Conference, DAC'12.
    [23]
    Niu, F., Retcht, B., Re, C., and Wright, S. 2011. Hogwild! A lock-free approach to parallelizing stochastic gradient descent. In Advances in Neural Information Processing Systems. NIPS'11.
    [24]
    Raina, R., Madhavan, A., and Ng., A. 2009. Large-scale deep unsupervised learning using graphics processors. In International Conference on Machine Learning. ICML'09.
    [25]
    Rumelhart, D., Hinton, G., and Williams, R. 1986. Learning representations by backpropagating errors. In Nature 323 (6088): 533-536.
    [26]
    Simard, P., Steinkraus, D., and Platt, J. 2003. Best Practices for Convolutional Neural Networks applied to Visual Document Analysis. In ICDAR, vol. 3, pp. 958-962.
    [27]
    Zeiler, M. and Fergus, R. 2013. Visualizing and Understanding Convolutional Networks. In Arxiv 1311.2901. http://arxiv.org/abs/1311.2901

    Cited By

    View all
    • (2024)SqueezeNIC: Low-Latency In-NIC Compression for Distributed Deep LearningProceedings of the 2024 SIGCOMM Workshop on Networks for AI Computing10.1145/3672198.3673801(61-68)Online publication date: 4-Aug-2024
    • (2024)Orion: Interference-aware, Fine-grained GPU Sharing for ML ApplicationsProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629578(1075-1092)Online publication date: 22-Apr-2024
    • (2024)PrimePar: Efficient Spatial-temporal Tensor Partitioning for Large Transformer Model TrainingProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651357(801-817)Online publication date: 27-Apr-2024
    • Show More Cited By
    1. Project Adam: building an efficient and scalable deep learning training system

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      OSDI'14: Proceedings of the 11th USENIX conference on Operating Systems Design and Implementation
      October 2014
      676 pages
      ISBN:9781931971164

      Sponsors

      • USENIX Assoc: USENIX Assoc

      In-Cooperation

      Publisher

      USENIX Association

      United States

      Publication History

      Published: 06 October 2014

      Check for updates

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 10 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)SqueezeNIC: Low-Latency In-NIC Compression for Distributed Deep LearningProceedings of the 2024 SIGCOMM Workshop on Networks for AI Computing10.1145/3672198.3673801(61-68)Online publication date: 4-Aug-2024
      • (2024)Orion: Interference-aware, Fine-grained GPU Sharing for ML ApplicationsProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629578(1075-1092)Online publication date: 22-Apr-2024
      • (2024)PrimePar: Efficient Spatial-temporal Tensor Partitioning for Large Transformer Model TrainingProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651357(801-817)Online publication date: 27-Apr-2024
      • (2024)Federated Learning Vulnerabilities: Privacy Attacks with Denoising Diffusion Probabilistic ModelsProceedings of the ACM on Web Conference 202410.1145/3589334.3645514(1149-1157)Online publication date: 13-May-2024
      • (2023)SWARM parallelismProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619631(29416-29440)Online publication date: 23-Jul-2023
      • (2023)FusionFlow: Accelerating Data Preprocessing for Machine Learning with CPU-GPU CooperationProceedings of the VLDB Endowment10.14778/3636218.363623817:4(863-876)Online publication date: 1-Dec-2023
      • (2023)Distributed Deep Learning-based Model for Large Image Data ClassificationProceedings of the 7th International Conference on Future Networks and Distributed Systems10.1145/3644713.3644750(283-291)Online publication date: 21-Dec-2023
      • (2023)Scheduling Hyperparameters to Improve Generalization: From Centralized SGD to Asynchronous SGDACM Transactions on Knowledge Discovery from Data10.1145/354478217:2(1-37)Online publication date: 20-Mar-2023
      • (2022)Distributed learning of fully connected neural networks using independent subnet trainingProceedings of the VLDB Endowment10.14778/3529337.352934315:8(1581-1590)Online publication date: 1-Apr-2022
      • (2022)Accelerating recommendation system training by leveraging popular choicesProceedings of the VLDB Endowment10.14778/3485450.348546215:1(127-140)Online publication date: 14-Jan-2022
      • Show More Cited By

      View Options

      View options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media