research-article

Public Access

Optimizing Word2Vec Performance on Multicore Systems

Authors:

Vasudevan Rengasamy,

Wang-Chien Lee,

Kamesh MadduriAuthors Info & Claims

IA3'17: Proceedings of the Seventh Workshop on Irregular Applications: Architectures and Algorithms

Article No.: 3, Pages 1 - 9

https://doi.org/10.1145/3149704.3149768

Published: 12 November 2017 Publication History

Abstract

The Skip-gram with negative sampling (SGNS) method of Word2Vec is an unsupervised approach to map words in a text corpus to low dimensional real vectors. The learned vectors capture semantic relationships between co-occurring words and can be used as inputs to many natural language processing and machine learning tasks. There are several high-performance implementations of the Word2Vec SGNS method. In this paper, we introduce a new optimization called context combining to further boost SGNS performance on multicore systems. For processing the One Billion Word benchmark dataset on a 16-core platform, we show that our approach is 3.53x faster than the original multithreaded Word2Vec implementation and 1.28x faster than a recent parallel Word2Vec implementation. We also show that our accuracy on benchmark queries is comparable to state-of-the-art implementations.

References

[1]

Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. (2015). http://tensorflow.org/ Software available from tensorflow.org.

[2]

Google Code Archive. 2013. word2vec: Tool for computing continuous distributed representations of words. (2013). https://code.google.com/archive/p/word2vec/.

[3]

Ehsaneddin Asgari and Mohammad RK Mofrad. 2015. Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics. PLoS ONE 10, 11 (2015), e0141287.

[4]

Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. 2003. A neural probabilistic language model. Journal of Machine Learning Research 3, Feb (2003), 1137--1155.

[5]

Elia Bruni, Gemma Boleda, Marco Baroni, and Nam-Khanh Tran. 2012. Distributional Semantics in Technicolor. In Proc. Annual Meeting of the Association for Computational Linguistics (ACL).

[6]

John Canny, Huasha Zhao, Bobby Jaros, Ye Chen, and Jiangchang Mao. 2015. Machine Learning at the Limit. In Proc. Int'l. Conf. on Big Data (Big Data).

Digital Library

[7]

Ciprian Chelba, Tomas Mikolov, Mike Schuster, Qi Ge, Thorsten Brants, Phillipp Koehn, and Tony Robinson. 2013. One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling. Technical Report. Google. http://arxiv.org/abs/1312.3005.

[8]

Kenneth Ward Church and Patrick Hanks. 1990. Word association norms, mutual information, and lexicography. Computational linguistics 16, 1 (1990), 22--29.

[9]

Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proc. Int'l. Conf. on Machine Learning (ICML).

Digital Library

[10]

Lev Finkelstein, Evgeniy Gabrilovich, Yossi Matias, Ehud Rivlin, Zach Solan, Gadi Wolfman, and Eytan Ruppin. 2001. Placing Search in Context: The Concept Revisited. In Proc. Int'l. Conf. on World Wide Web (WWW).

Digital Library

[11]

Felix Hill, Roi Reichart, and Anna Korhonen. 2015. SimLex-999: Evaluating Semantic Models with (Genuine) Similarity Estimation. Computational Linguistics 41, 4 (2015), 665--695.

Digital Library

[12]

Shihao Ji, Nadathur Satish, Sheng Li, and Pradeep Dubey. 2016. Parallelizing Word2Vec in Multi-Core and Many-Core Architectures. In Proc. Int'l. Workshop on Efficient Methods for Deep Neural Networks (EMDNN).

[13]

Ryan Kiros, Yukun Zhu, Ruslan R Salakhutdinov, Richard Zemel, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. Skip-thought vectors. In Proc. Conf. on Neural Information Processing Systems (NIPS).

[14]

Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Weinberger. 2015. From Word Embeddings To Document Distances. In Proc. Int'l. Conf. on Machine Learning (ICML).

[15]

Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural Architectures for Named Entity Recognition. In Proc. Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT).

[16]

Omer Levy, Yoav Goldberg, and Ido Dagan. 2015. Improving Distributional Similarity with Lessons Learned from Word Embeddings. Transactions of the Association for Computational Linguistics 3 (2015), 211--225.

[17]

Thang Luong, Richard Socher, and Christopher D Manning. 2013. Better word representations with recursive neural networks for morphology. In Proc. Conf. on Computational Natural Language Learning (CoNLL).

[18]

Xiangrui Meng, Joseph Bradley, Burak Yavuz, Evan Sparks, Shivaram Venkataraman, Davies Liu, Jeremy Freeman, DB Tsai, Manish Amde, Sean Owen, Doris Xin, Reynold Xin, Michael J. Franklin, Reza Zadeh, Matei Zaharia, and Ameet Talwalkar. 2016. MLlib: Machine Learning in Apache Spark. Journal of Machine Learning Research 17, 34 (2016), 1--7.

Digital Library

[19]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. In Proc. Int'l. Conf. on Learning Representations (ICLR) Workshop.

[20]

Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. 2013. Linguistic regularities in continuous space word representations. In Proc. Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT).

[21]

Feng Niu, Benjamin Recht, Christopher Ré, and Stephen J Wright. 2011. HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent. In Proc. Conf. on Neural Information Processing Systems (NIPS).

[22]

Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word representation. In Proc. Conf. on Empirical Methods in Natural Language Processing (EMNLP).

[23]

Kira Radinsky, Eugene Agichtein, Evgeniy Gabrilovich, and Shaul Markovitch. 2011. A Word at a Time: Computing Word Relatedness using Temporal Semantic Analysis. In Proc. Int'l. Conf. on World Wide Web (WWW).

Digital Library

[24]

Stergios Stergiou, Zygimantas Straznickas, Rolina Wu, and Kostas Tsioutsiouliklis. 2017. Distributed Negative Sampling for Word Embeddings. In Proc. AAAI Conf. on Artificial Intelligence (AAAI).

[25]

Deeplearning4j Development Team. 2017. Deeplearning4j: Open-source distributed deep learning for the JVM. (2017). Apache Software Foundation License 2.0, http://deeplearning4j.org.

[26]

John Towns, Timothy Cockerill, Maytal Dahan, Ian Foster, Kelly Gaither, Andrew Grimshaw, Victor Hazlewood, Scott Lathrop, Dave Lifka, Gregory D. Peterson, Ralph Roskies, J. Ray Scot, and Nancy Wilkins-Diehr. 2014. XSEDE: Accelerating Scientific Discovery. Computing in Science & Engineering 16, 5 (2014), 62--74.

[27]

Jeroen BP Vuurens, Carsten Eickhoff, and Arjen P de Vries. 2016. Efficient Parallel Learning of Word2Vec. In Proc. Int'l. Conf. on Machine Learning (ICML) ML Systems Workshop.

[28]

Will Y Zou, Richard Socher, Daniel M Cer, and Christopher D Manning. 2013. Bilingual Word Embeddings for Phrase-Based Machine Translation. In Proc. Conf. on Empirical Methods in Natural Language Processing (EMNLP).

Cited By

Waidyasooriya HHariyama M(2024)Performance evaluation of Word2vec accelerators exploiting spatial and temporal parallelism on DDR/HBM-based FPGAsThe Journal of Supercomputing10.1007/s11227-024-06120-x80:12(17192-17211)Online publication date: 24-Apr-2024
https://doi.org/10.1007/s11227-024-06120-x
Fang PKhan ALuo SWang FFeng DLi ZYin WCao Y(2023)Distributed Graph Embedding with Information-Oriented Random WalksProceedings of the VLDB Endowment10.14778/3587136.358714016:7(1643-1656)Online publication date: 8-May-2023
https://doi.org/10.14778/3587136.3587140
Waidyasooriya HIshihara SHariyama M(2023)Word2Vec FPGA Accelerator Based on Spatial and Temporal ParallelismParallel and Distributed Computing, Applications and Technologies10.1007/978-3-031-29927-8_6(69-77)Online publication date: 8-Apr-2023
https://doi.org/10.1007/978-3-031-29927-8_6
Show More Cited By

Index Terms

Optimizing Word2Vec Performance on Multicore Systems
1. Computing methodologies

Recommendations

BlazingText: Scaling and Accelerating Word2Vec using Multiple GPUs
MLHPC'17: Proceedings of the Machine Learning on HPC Environments

Word2Vec is a popular algorithm used for generating dense vector representations of words in large corpora using unsupervised learning. The resulting vectors have been shown to capture semantic relationships between the corresponding words and are used ...
A study of lexical function detection with word2vec and supervised machine learning
Special Section: Applied Machine Learning and Management of Volatility, Uncertainty, Complexity & Ambiguity (V.U.C.A)

In this work, we report the results of our experiments on the task of distinguishing the semantics of verb-noun collocations in a Spanish corpus. This semantics was represented by four lexical functions of the Meaning-Text Theory. Each lexical function ...
Morphological Segmentation to Improve Crosslingual Word Embeddings for Low Resource Languages

Crosslingual word embeddings developed from multiple parallel corpora help in understanding the relationships between languages and improving the prediction quality of machine translation. However, in low resource languages with complex and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

IA3'17: Proceedings of the Seventh Workshop on Irregular Applications: Architectures and Algorithms

November 2017

78 pages

ISBN:9781450351362

DOI:10.1145/3149704

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Science Foundation

Conference

SC '17

Sponsor:

SIGHPC

SC '17: The International Conference for High Performance Computing, Networking, Storage and Analysis

November 12 - 17, 2017

CO, Denver, USA

Acceptance Rates

IA3'17 Paper Acceptance Rate 6 of 22 submissions, 27%;

Overall Acceptance Rate 18 of 67 submissions, 27%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
629
Total Downloads

Downloads (Last 12 months)111
Downloads (Last 6 weeks)21

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Waidyasooriya HHariyama M(2024)Performance evaluation of Word2vec accelerators exploiting spatial and temporal parallelism on DDR/HBM-based FPGAsThe Journal of Supercomputing10.1007/s11227-024-06120-x80:12(17192-17211)Online publication date: 24-Apr-2024
https://doi.org/10.1007/s11227-024-06120-x
Fang PKhan ALuo SWang FFeng DLi ZYin WCao Y(2023)Distributed Graph Embedding with Information-Oriented Random WalksProceedings of the VLDB Endowment10.14778/3587136.358714016:7(1643-1656)Online publication date: 8-May-2023
https://doi.org/10.14778/3587136.3587140
Waidyasooriya HIshihara SHariyama M(2023)Word2Vec FPGA Accelerator Based on Spatial and Temporal ParallelismParallel and Distributed Computing, Applications and Technologies10.1007/978-3-031-29927-8_6(69-77)Online publication date: 8-Apr-2023
https://doi.org/10.1007/978-3-031-29927-8_6
Bui PLe MHoang BNgoc NPham H(2022)Data Partitioning and Asynchronous Processing to Improve the Embedded Software Performance on Multicore ProcessorsРазделение данных и асинхронная обработка для повышения производительности встроенного программного обеспечения на многокодных процессорахInformatics and AutomationИнформатика и автоматизация10.15622/ia.21.2.221:2(243-274)Online publication date: 17-Feb-2022
https://doi.org/10.15622/ia.21.2.2
Randall TAllen TGe RZhou HMoreira JMueller FEtsion Y(2021)FULL-W2VProceedings of the 35th ACM International Conference on Supercomputing10.1145/3447818.3460373(455-466)Online publication date: 3-Jun-2021
https://dl.acm.org/doi/10.1145/3447818.3460373
Wszola EJaggi MPuschel M(2021)Faster Parallel Training of Word Embeddings2021 IEEE 28th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC53243.2021.00017(31-41)Online publication date: Dec-2021
https://doi.org/10.1109/HiPC53243.2021.00017
Silva MMeyer VKirchoff DJoaquim Neto FVieira RCesar De Rose A(2020)Evaluating the performance and improving the usability of parallel and distributed Word Embeddings tools2020 28th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)10.1109/PDP50117.2020.00038(201-206)Online publication date: Mar-2020
https://doi.org/10.1109/PDP50117.2020.00038
Hirota MOda TEndo MIshikawa HChbeir RManolopoulos YIlarri SPapadopoulos A(2019)Generating Distributed Representation of User Movement for Extracting Detour SpotsProceedings of the 11th International Conference on Management of Digital EcoSystems10.1145/3297662.3365826(250-255)Online publication date: 12-Nov-2019
https://dl.acm.org/doi/10.1145/3297662.3365826
Moon GNewman-Griffis DKim JSukumaran-Rajam AFosler-Lussier ESadayappan P(2019)Parallel Data-Local Training for Optimizing Word2Vec Embeddings for Word and Graph Embeddings2019 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC)10.1109/MLHPC49564.2019.00010(44-55)Online publication date: Nov-2019
https://doi.org/10.1109/MLHPC49564.2019.00010
Ono TShoji TWaidyasooriya HHariyama MAoki YKondoh YNakagawa Y(2019)FPGA-Based Acceleration of Word2vec using OpenCL2019 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS.2019.8702700(1-5)Online publication date: May-2019
https://doi.org/10.1109/ISCAS.2019.8702700
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents