research-article

Generalizations of the theory and deployment of triangular inequality for compiler-based strength reduction

Authors:

Xipeng ShenAuthors Info & Claims

PLDI 2017: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation

Pages 33 - 48

https://doi.org/10.1145/3062341.3062377

Published: 14 June 2017 Publication History

Abstract

Triangular Inequality (TI) has been used in many manual algorithm designs to achieve good efficiency in solving some distance calculation-based problems. This paper presents our generalization of the idea into a compiler optimization technique, named TI-based strength reduction. The generalization consists of three parts. The first is the establishment of the theoretic foundation of this new optimization via the development of a new form of TI named Angular Triangular Inequality, along with several fundamental theorems. The second is the revealing of the properties of the new forms of TI and the proposal of guided TI adaptation, a systematic method to address the difficulties in effective deployments of TI optimizations. The third is an integration of the new optimization technique in an open-source compiler. Experiments on a set of data mining and machine learning algorithms show that the new technique can speed up the standard implementations by as much as 134X and 46X on average for distance-related problems, outperforming previous TI-based optimizations by 2.35X on average. It also extends the applicability of TI-based optimizations to vector related problems, producing tens of times of speedup.

References

[1]

B. Aaron, D. E. Tamir, N. D. Rishe, and A. Kandel. Dynamic incremental k-means clustering. In Computational Science and Computational Intelligence (CSCI), 2014 International Conference on, volume 1, pages 308–313. IEEE, 2014.

Digital Library

[2]

K. Bache and M. Lichman. UCI machine learning repository, 2013.

[3]

P. Bhatotia, A. Wieder, R. Rodrigues, U. A. Acar, and R. Pasquin. Incoop: Mapreduce for incremental computations. In Proceedings of the 2nd ACM Symposium on Cloud Computing, page 7. ACM, 2011.

Digital Library

[4]

V. Bijalwan, V. Kumar, P. Kumari, and J. Pascual. Knn based machine learning approach for text and document mining. International Journal of Database Theory and Application, 7(1):61–70, 2014.

[5]

L. S. Blackford, A. Petitet, R. Pozo, K. Remington, R. C. Whaley, J. Demmel, J. Dongarra, I. Duff, S. Hammarling, G. Henry, et al. An updated set of basic linear algebra subprograms (blas). ACM Transactions on Mathematical Software, 28(2):135–151, 2002.

Digital Library

[6]

C. Böhm and F. Krebs. The k-nearest neighbour join: Turbo charging the kdd process. Knowledge and Information Systems, Springer, 6(6):728–749, 2004.

Digital Library

[7]

D. Cai, X. He, J. Han, and T. S. Huang. Graph regularized nonnegative matrix factorization for data representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(8):1548–1560, 2011.

Digital Library

[8]

M. Carbin, S. Misailovic, and M. C. Rinard. Verifying quantitative reliability for programs that execute on unreliable hardware. In ACM SIGPLAN Notices, volume 48, pages 33–52. ACM, 2013.

Digital Library

[9]

Y. Chen and G. Medioni. Object modeling by registration of multiple range images. In Robotics and Automation, IEEE, pages 2724–2729, 1991.

[10]

K. Cho, T. Raiko, and A. Ilin. Enhanced gradient and adaptive learning rate for training restricted boltzmann machines. In Proceedings of the 28th International Conference Proceedings of the 28th International Conference on Machine Learning, Bellevue, WA, USA, 2011.

Digital Library

[11]

K. Cooper, J. Eckhardt, and K. Kennedy. Redundancy elimination revisited. In Proceedings of the 17th international conference on Parallel architectures and compilation techniques, pages 12–21. ACM, 2008.

Digital Library

[12]

K. Cooper and L. Torczon. Engineering a Compiler. Morgan Kaufmann, 2003.

Digital Library

[13]

S. J. Deitz, B. L. Chamberlain, and L. Snyder. Eliminating redundancies in sum-of-product array computations. In Proceedings of the 15th international conference on Supercomputing, pages 65–77. ACM, 2001.

Digital Library

[14]

E. W. Dijkstra. A note on two problems in connexion with graphs. In Numerische mathematik, volume 1, pages 269–271, 1959.

Digital Library

[15]

Y. Ding, X. Shen, M. Musuvathi, and T. Mytkowicz. Top: A framework for enabling algorithmic optimizations for distance-related problems. In Proceedings of the 41st International Conference on Very Large Data Bases, 2015.

Digital Library

[16]

Y. Ding, X. Shen, M. Musuvathi, and T. Mytkowicz. Yinyang k-means: A drop-in replacement of the classic k-means with consistent speedup. In ICML, 2015.

Digital Library

[17]

J. Drake and G. Hamerly. Accelerated k-means with adaptive distance bounds. In 5th NIPS Workshop on Optimization for Machine Learning, 2012.

[18]

V. Eijkhout. Introduction to High Performance Scientific Computing. Lulu. com, 2010.

Digital Library

[19]

C. Elkan. Using the triangle inequality to accelerate k-means. In ICML, volume 3, pages 147–153, 2003.

Digital Library

[20]

C. Elkan. Using the triangle inequality to accelerate k-means. In ICML, volume 3, pages 147–153, 2003.

Digital Library

[21]

E. Fix and J. L. Hodges Jr. Discriminatory analysisnonparametric discrimination: consistency properties. In DTIC Document, 1951.

[22]

A. V. Goldberg and C. Harrelson. Computing the shortest path: A search meets graph theory. In Proceedings of the sixteenth annual ACM-SIAM, pages 156–165, 2005.

Digital Library

[23]

M. Greenspan and G. Godin. A nearest neighbor method for efficient ICP. In 3-D Digital Imaging and Modeling, IEEE, pages 161–168, 2001.

[24]

G. Gupta and S. V. Rajopadhye. Simplifying reductions. In POPL, volume 6, pages 30–41, 2006.

Digital Library

[25]

G. Hamerly. Making k-means even faster. In SDM, SIAM, pages 130–140, 2010.

[26]

G. Hamerly. Making k-means even faster. In SDM, pages 130–140. SIAM, 2010.

[27]

G. Hinton., S. Osindero, and Y. Teh. A fast learning algorithm for deep belief nets. Neural Comput., 18(7):1527–1554, July 2006.

Digital Library

[28]

A. Huang. Similarity measures for text document clustering. In Proceedings of the sixth new zealand computer science research student conference (NZCSRSC2008), Christchurch, New Zealand, pages 49–56, 2008.

[29]

J. Z. Lai, Y.-C. Liaw, and J. Liu. Fast k-nearest-neighbor search based on projection and triangular inequality. Pattern Recognition, Elsevier, 40(2):351–359, 2007.

Digital Library

[30]

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradientbased learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, November 1998.

[31]

S. Lloyd. Least squares quantization in pcm. In Information Theory, IEEE, volume 28,2, pages 129–137, 1982.

Digital Library

[32]

W. Lu, Y. Shen, S. Chen, and B. C. Ooi. Efficient processing of k nearest neighbor joins using mapreduce. Proceedings of the VLDB Endowment, 5(10):1016–1027, 2012.

Digital Library

[33]

B. M. Marlin, K. Swersky, B. Chen, and N. Freitas. Inductive principles for restricted boltzmann machine learning. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS), pages 509–516, Chia Laguna Resort, Sardinia, Italy, 2010.

[34]

S. Misailovic, M. Carbin, S. Achour, Z. Qi, and M. C. Rinard. Chisel: Reliability-and accuracy-aware optimization of approximate computational kernels. In ACM SIGPLAN Notices, volume 49, pages 309–328. ACM, 2014.

Digital Library

[35]

R. Paige and S. Koenig. Finite differencing of computable expressions. ACM Transactions on Programming Languages and Systems (TOPLAS), 4(3):402–454, 1982.

Digital Library

[36]

K. Ravichandran, R. Cledat, and S. Pande. Collaborative threads: exposing and leveraging dynamic thread state for efficient computation. In Proceedings of the 2nd USENIX conference on Hot topics in parallelism, pages 4–4. USENIX Association, 2010.

Digital Library

[37]

H. Schütze. Introduction to information retrieval. In Proceedings of the international communication of association for computing machinery conference, 2008.

[38]

T. Tieleman. Training restricted boltzmann machines using approximations to the likelihood gradient. In Proceedings of the 25th International Conference on Machine Learning, pages 1064–1071, New York, NY, USA, 2008. ACM.

Digital Library

[39]

J. Wang, J. Wang, Q. Ke, G. Zeng, and S. Li. Fast approximate k-means via cluster closures. In Computer Vision and Pattern Recognition (CVPR), IEEE, pages 3037–3044, 2012.

Digital Library

[40]

X. Wang. A fast exact k-nearest neighbors algorithm for high dimensional search using k-means clustering and triangle inequality. In Neural Networks (IJCNN), IEEE, pages 1293– 1299, 2011.

[41]

W. Xu, X. Liu, and Y. Gong. Document clustering based on non-negative matrix factorization. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pages 267–273. ACM, 2003.

Digital Library

[42]

Y. Yang and X. Liu. A re-examination of text categorization methods. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pages 42–49. ACM, 1999.

Digital Library

Cited By

Tan JJiao SChabbi MLiu XAyguadé EHwu WBadia RHofstee H(2020)What every scientific programmer should know about compiler optimizations?Proceedings of the 34th ACM International Conference on Supercomputing10.1145/3392717.3392754(1-12)Online publication date: 29-Jun-2020
https://dl.acm.org/doi/10.1145/3392717.3392754
Pistoia MLiu PChen CHu SWood S(2020)Parallelization of Classical Numerical optimization in Quantum Variational Algorithms2020 IEEE 13th International Conference on Software Testing, Validation and Verification (ICST)10.1109/ICST46399.2020.00039(309-320)Online publication date: Oct-2020
https://doi.org/10.1109/ICST46399.2020.00039
Su PWen SYang HChabbi MLiu XAtlee JBultan TWhittle J(2019)Redundant loadsProceedings of the 41st International Conference on Software Engineering10.1109/ICSE.2019.00103(982-993)Online publication date: 25-May-2019
https://dl.acm.org/doi/10.1109/ICSE.2019.00103
Show More Cited By

Index Terms

Generalizations of the theory and deployment of triangular inequality for compiler-based strength reduction
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Generalizations of the theory and deployment of triangular inequality for compiler-based strength reduction
PLDI '17

Triangular Inequality (TI) has been used in many manual algorithm designs to achieve good efficiency in solving some distance calculation-based problems. This paper presents our generalization of the idea into a compiler optimization technique, named ...
Natural instruction level parallelism-aware compiler for high-performance QueueCore processor architecture

This work presents a static method implemented in a compiler for extracting high instruction level parallelism for the 32-bit QueueCore, a queue computation-based processor. The instructions of a queue processor implicitly read and write their operands, ...
A compiler framework for extracting superword level parallelism
PLDI '12

SIMD (single-instruction multiple-data) instruction set extensions are quite common today in both high performance and embedded microprocessors, and enable the exploitation of a specific type of data parallelism called SLP (Superword Level Parallelism). ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PLDI 2017: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation

June 2017

708 pages

ISBN:9781450349888

DOI:10.1145/3062341

General Chair:
Albert Cohen
Inria, France
,
Program Chair:
Martin Vechev
DeepCode, Switzerland / ETH Zurich, Switzerland

ACM SIGPLAN Notices Volume 52, Issue 6
PLDI '17
June 2017
708 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/3140587
Editor:
Matthew Fluet
Issue’s Table of Contents

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

PLDI '17

Sponsor:

SIGPLAN

PLDI '17: ACM SIGPLAN Conference on Programming Language Design and Implementation

June 18 - 23, 2017

Barcelona, Spain

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
358
Total Downloads

Downloads (Last 12 months)24
Downloads (Last 6 weeks)1

Reflects downloads up to 18 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Tan JJiao SChabbi MLiu XAyguadé EHwu WBadia RHofstee H(2020)What every scientific programmer should know about compiler optimizations?Proceedings of the 34th ACM International Conference on Supercomputing10.1145/3392717.3392754(1-12)Online publication date: 29-Jun-2020
https://dl.acm.org/doi/10.1145/3392717.3392754
Pistoia MLiu PChen CHu SWood S(2020)Parallelization of Classical Numerical optimization in Quantum Variational Algorithms2020 IEEE 13th International Conference on Software Testing, Validation and Verification (ICST)10.1109/ICST46399.2020.00039(309-320)Online publication date: Oct-2020
https://doi.org/10.1109/ICST46399.2020.00039
Su PWen SYang HChabbi MLiu XAtlee JBultan TWhittle J(2019)Redundant loadsProceedings of the 41st International Conference on Software Engineering10.1109/ICSE.2019.00103(982-993)Online publication date: 25-May-2019
https://dl.acm.org/doi/10.1109/ICSE.2019.00103
Zhang FZhai JShen XMutlu OChen W(2018)Efficient document analytics on compressed dataProceedings of the VLDB Endowment10.14778/3236187.323620311:11(1522-1535)Online publication date: 1-Jul-2018
https://dl.acm.org/doi/10.14778/3236187.3236203
Ding YShen X(2017)GLORE: generalized loop redundancy elimination upon LER-notationProceedings of the ACM on Programming Languages10.1145/31338981:OOPSLA(1-28)Online publication date: 12-Oct-2017
https://dl.acm.org/doi/10.1145/3133898

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents