Google Scholar

Plda+ parallel latent dirichlet allocation with data placement and pipeline processing

Z Liu, Y Zhang, EY Chang, M Sun - ACM Transactions on Intelligent …, 2011 - dl.acm.org

ACM Transactions on Intelligent Systems and Technology (TIST), 2011•dl.acm.org

Previous methods of distributed Gibbs sampling for LDA run into either memory or
communication bottlenecks. To improve scalability, we propose four strategies: data
placement, pipeline processing, word bundling, and priority-based scheduling. Experiments
show that our strategies significantly reduce the unparallelizable communication bottleneck
and achieve good load balancing, and hence improve scalability of LDA.

Previous methods of distributed Gibbs sampling for LDA run into either memory or communication bottlenecks. To improve scalability, we propose four strategies: data placement, pipeline processing, word bundling, and priority-based scheduling. Experiments show that our strategies significantly reduce the unparallelizable communication bottleneck and achieve good load balancing, and hence improve scalability of LDA.

ACM Digital Library

Show moreShow less

Save Cite Cited by 224 Related articles All 11 versions

Cite

Advanced search

Saved to My library

Plda+ parallel latent dirichlet allocation with data placement and pipeline processing