research-article

Graph neural networks based memory inefficiency detection using selective sampling

Authors:

Xu LiuAuthors Info & Claims

SC '22: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Article No.: 85, Pages 1 - 14

Published: 18 November 2022 Publication History

Abstract

Production software of data centers oftentimes suffers from unnecessary memory inefficiencies caused by inappropriate use of data structures, conservative compiler optimizations, and so forth. Nevertheless, whole-program monitoring tools often incur incredibly high overhead due to fine-grained memory access instrumentation. Consequently, the fine-grained monitoring tools are not viable for long-running, large-scale data center applications due to strict latency criteria (e.g., service-level agreement or SLA).

To this end, this work presents a novel learning-aided system, namely Puffin, to identify three kinds of unnecessary memory operations including dead stores, silent loads and silent stores, by applying gated graph neural networks onto fused static and dynamic program semantics with respect to relative positional embedding. To deploy the system in large-scale data centers, this work explores a sampling-based detection infrastructure with high efficacy and negligible overhead. We evaluate Puffin upon the well-known SPEC CPU 2017 benchmark suite for four compilation options. Experimental results show that the proposed method is able to capture the three kinds of memory inefficiencies with as high accuracy as 96% and a reduced checking overhead by 5.66× over the state-of-the-art tool.

Supplementary Material

MP4 File (SC22_Presentation_Li_Pengcheng.mp4)

Presentation at SC '22

Download
128.10 MB

References

[1]

S. L. Graham, P. B. Kessler, and M. K. McKusick, "gprof: a call graph execution profiler (with retrospective)," in Best of PLDI, 1982, pp. 49--57.

[2]

L. Adhianto, S. Banerjee, M. Fagan, M. Krentel, G. Marin, J. Mellor-Crummey, and N. R. Tallent, "Hpctoolkit: Tools for performance analysis of optimized parallel programs http://hpctoolkit.org," Concurrency and Computation: Practice Experience, vol. 22, no. 6, p. 685--701, Apr. 2010.

Digital Library

[3]

"Intel VTune Amplifier XE 2013," 2013, http://software.intel.com/en-us/articles/intel-vtune-amplifier-xe.

[4]

M. Chabbi and J. Mellor-Crummey, "Deadspy: A tool to pinpoint program inefficiencies," in Proceedings of the Tenth International Symposium on Code Generation and Optimization, ser. CGO '12. New York, NY, USA: Association for Computing Machinery, 2012, p. 124--134. [Online].

Digital Library

[5]

P. Su, S. Wen, H. Yang, M. Chabbi, and X. Liu, "Redundant loads: A software inefficiency indicator," in Proceedings of the 41st International Conference on Software Engineering, 2019. [Online].

Digital Library

[6]

J. Tan, S. Jiao, M. Chabbi, and X. Liu, "What every scientific programmer should know about compiler optimizations?" in Proceedings of the 34th ACM International Conference on Supercomputing, 2020, pp. 1--12.

[7]

S. Wen, M. Chabbi, and X. Liu, "Redspy: Exploring value locality in software," in Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017. [Online].

Digital Library

[8]

S. Wen, X. Liu, J. Byrne, and M. Chabbi, "Watching for software inefficiencies with witch," in Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, vol. 53, no. 2, New York, NY, USA, Mar. 2018, p. 332--347.

[9]

G. Ren, E. Tune, T. Moseley, Y. Shi, S. Rus, and R. Hundt, "Google-wide profiling: A continuous profiling infrastructure for data centers," IEEE Micro, vol. 30, no. 4, pp. 65--79, 2010.

Digital Library

[10]

B. H. Sigelman, L. A. Barroso, M. Burrows, P. Stephenson, M. Plakal, D. Beaver, S. Jaspan, and C. Shanbhag, "Dapper, a large-scale distributed systems tracing infrastructure," Google, Inc., Tech. Rep., 2010. [Online]. Available: https://research.google.com/archive/papers/dapper-2010-1.pdf

[11]

F. Yin, D. Dong, C. Lu, T. Zhang, S. Li, J. Guo, and K. Chow, "Cloud-scale java profiling at alibaba," in Companion of the 2018 ACM/SPEC International Conference on Performance Engineering. New York, NY, USA: Association for Computing Machinery, 2018, p. 99--100.

[12]

D. A. Jiménez and C. Lin, "Dynamic branch prediction with perceptrons," in Proceedings of the 7th International Symposium on High-Performance Computer Architecture, ser. HPCA '01. Washington, DC, USA: IEEE Computer Society, 2001, pp. 197--. [Online]. Available: http://dl.acm.org/citation.cfm?id=580550.876441

[13]

M. Hashemi, K. Swersky, J. Smith, G. Ayers, H. Litz, J. Chang, C. Kozyrakis, and P. Ranganathan, "Learning memory access patterns," in Proceedings of the 35th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, J. Dy and A. Krause, Eds., vol. 80. Stockholmsmässan, Stockholm Sweden: PMLR, 10--15 Jul 2018. [Online]. Available: http://proceedings.mlr.press/v80/hashemi18a.html

[14]

E. Z. Liu, M. Hashemi, K. Swersky, P. Ranganathan, and J. Ahn, "An imitation learning approach for cache replacement," arXiv preprint arXiv:2006.16239, 2020.

[15]

Y. Li, R. Zemel, M. Brockschmidt, and D. Tarlow, "Gated graph sequence neural networks," in Proceedings of ICLR'16, April 2016. [Online]. Available: https://www.microsoft.com/en-us/research/publication/gated-graph-sequence-neural-networks

[16]

J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl, "Neural message passing for quantum chemistry," CoRR, vol. abs/1704.01212, 2017. [Online]. Available: http://arxiv.org/abs/1704.01212

[17]

"Memcached," https://www.memcached.org/, 2020.

[18]

"Redis," https://redis.io/, 2021.

[19]

V. Raychev, M. Vechev, and E. Yahav, "Code completion with statistical language models," in Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2014, pp. 419--428.

[20]

A. Hindle, E. T. Barr, M. Gabel, Z. Su, and P. Devanbu, "On the naturalness of software," Communications of the ACM, vol. 59, no. 5, pp. 122--131, 2016.

Digital Library

[21]

U. Alon, O. Levy, and E. Yahav, "code2seq: Generating sequences from structured representations of code," in International Conference on Learning Representations, 2019. [Online]. Available: https://openreview.net/forum?id=H1gKYo09tX

[22]

X. Xu, C. Liu, Q. Feng, H. Yin, L. Song, and D. Song, "Neural network-based graph embedding for cross-platform binary code similarity detection," in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, 2017, pp. 363--376.

[23]

Z. Yu, R. Cao, Q. Tang, S. Nie, J. Huang, and S. Wu, "Order matters: Semantic-aware neural networks for binary code similarity detection," in Proceedings of the National Conference on Artificial Intelligence and on Innovative Applications of Artificial Intelligence, 2020, pp. 1145--1152. [Online]. Available: https://aaai.org/ojs/index.php/AAAI/article/view/5466

[24]

M. Allamanis, H. Peng, and C. Sutton, "A convolutional attention network for extreme summarization of source code," in International conference on machine learning. PMLR, 2016, pp. 2091--2100.

[25]

M. Pradel and K. Sen, "Deepbugs: A learning approach to name-based bug detection," Proceedings of the ACM on Programming Languages, vol. 2, no. OOPSLA, pp. 1--25, 2018.

Digital Library

[26]

M. Lu, D. Tan, N. Xiong, Z. Chen, and H. Li, "Program classification using gated graph attention neural network for online programming service," CoRR, vol. abs/1903.03804, 2019. [Online]. Available: http://arxiv.org/abs/1903.03804

[27]

T. Ben-Nun, A. S. Jakobovits, and T. Hoefler, "Neural code comprehension: A learnable representation of code semantics," arXiv preprint arXiv:1806.07336, 2018.

[28]

X. Cheng, H. Wang, J. Hua, G. Xu, and Y. Sui, "Deepwukong," ACM Transactions on Software Engineering and Methodology (TOSEM), 2021.

Digital Library

[29]

Y. Guo, P. Li, Y. Luo, X. Wang, and Z. Wang, "Graphspy: Fused program semantic embedding through graph neural networks for memory efficiency," in 2021 58th ACM/IEEE Design Automation Conference (DAC). IEEE, 2021, pp. 1045--1050.

[30]

F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini, "The graph neural network model," IEEE Transactions on Neural Networks, vol. 20, no. 1, pp. 61--80, 2009.

Digital Library

[31]

J. Chung, Ç. Gülçehre, K. Cho, and Y. Bengio, "Empirical evaluation of gated recurrent neural networks on sequence modeling," CoRR, vol. abs/1412.3555, 2014. [Online]. Available: http://arxiv.org/abs/1412.3555

[32]

J. F. I. Neamtiu and M. Hicks, "Understanding source code evolution using abstract syntax tree matching," ACM SIGSOFT Software Engineering Notes, vol. 30, pp. 1--5, 2005.

Digital Library

[33]

T. Mikolov, K. Chen, G. S. Corrado, and J. Dean, "Efficient estimation of wor representations in vector space," 2013. [Online]. Available: http://arxiv.org/abs/1301.3781

[34]

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018.

[35]

K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," 06 2016, pp. 770--778.

[36]

Y. Shoshitaishvili, R. Wang, C. Salls, N. Stephens, M. Polino, A. Dutcher, J. Grosen, S. Feng, C. Hauser, C. Kruegel, and G. Vigna, "SoK: (State of) The Art of War: Offensive Techniques in Binary Analysis," in IEEE Symposium on Security and Privacy, 2016.

[37]

Linux, "perf event open - linux man page," https://man7.org/linux/man-pages/man2/perf/event/open.2.html, 2020.

[38]

N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller, "Equation of state calculations by fast computing machines," Journal of Chemical Physics, vol. 21, no. 6, pp. 1087--1092, 1953.

[39]

K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770--778.

[40]

D. Bruening, "Dynamorio: Efficient, transparent, and comprehensive runtime code manipulation," https://dynamorio.org/, 2004.

Digital Library

[41]

Nvidia-Inc., "Nvidia tesla v100 gpu architecture," https://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf, 2017.

[42]

SPEC, "Spec cpu benchmarks," 2017, http://www.spec.org/benchmarks.html#cpu.

[43]

B. Gough, GNU scientific library reference manual. Network Theory Ltd., 2009.

Digital Library

[44]

M. Rajan, D. W. Doerfler, and S. D. Hammond, "Trinity benchmarks on the intel xeon phi (knights corner)," Sandia National Lab.(SNL-NM), Albuquerque, NM (United States), Tech. Rep., 2015.

[45]

A. Kanade, P. Maniatis, G. Balakrishnan, and K. Shi, "Learning and evaluating contextual embedding of source code," in International Conference on Machine Learning. PMLR, 2020, pp. 5110--5121.

[46]

Z. Feng, D. Guo, D. Tang, N. Duan, X. Feng, M. Gong, L. Shou, B. Qin, T. Liu, D. Jiang et al., "Codebert: A pre-trained model for programming and natural languages," in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, 2020, pp. 1536--1547.

[47]

X. Hu, G. Li, X. Xia, D. Lo, and Z. Jin, "Deep code comment generation," in 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC). IEEE, 2018, pp. 200--20 010.

[48]

X. Chen, C. Liu, and D. Song, "Tree-to-tree neural networks for program translation, 2018," in Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, QC, Canada, 2018, pp. 2--8.

[49]

U. Alon, M. Zilberstein, O. Levy, and E. Yahav, "code2vec: Learning distributed representations of code," Proceedings of the ACM on Programming Languages, vol. 3, no. POPL, pp. 1--29, 2019.

[50]

U. Alon, R. Sadaka, O. Levy, and E. Yahav, "Structural language models of code," in International Conference on Machine Learning. PMLR, 2020, pp. 245--256.

[51]

J. Zhang, X. Wang, H. Zhang, H. Sun, K. Wang, and X. Liu, "A novel neural source code representation based on abstract syntax tree," in 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 2019, pp. 783--794.

[52]

M. Allamanis, E. T. Barr, P. T. Devanbu, and C. A. Sutton, "A survey of machine learning for big code and naturalness," CoRR, vol. abs/1709.06182, 2017. [Online]. Available: http://arxiv.org/abs/1709.06182

[53]

Z. Shi, K. Swersky, D. Tarlow, P. Ranganathan, and M. Hashemi, "Learning execution through neural code fusion," in International Conference on Learning Representations, 2020. [Online]. Available: https://openreview.net/forum?id=SJetQpEYvB

[54]

Y. Wang, K. Wang, F. Gao, and L. Wang, "Learning semantic program embeddings with graph interval neural network," Proceedings of the ACM on Programming Languages, vol. 4, no. OOPSLA, pp. 1--27, 2020.

Digital Library

[55]

V. J. Hellendoorn, C. Sutton, R. Singh, P. Maniatis, and D. Bieber, "Global relational models of source code," in International conference on learning representations, 2019.

[56]

Y. Guo, P. Li, Y. Luo, X. Wang, and Z. Wang, "Exploring gnn based program embedding technologies for binary related tasks," in 2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC), 2022, pp. 366--377.

[57]

K. Wang, Z. Su, and R. Singh, "Dynamic neural program embeddings for program repair," in International Conference on Learning Representations, 2018. [Online]. Available: https://openreview.net/forum?id=BJuWrGW0Z

[58]

K. Wang and Z. Su, "Blended, precise semantic program embeddings," ser. PLDI 2020. New York, NY, USA: Association for Computing Machinery, 2020, p. 121--134. [Online].

Digital Library

[59]

H. Tian, K. Liu, A. K. Kaboré, A. Koyuncu, L. Li, J. Klein, and T. F. Bissyandé, "Evaluating representation learning of code changes for predicting patch correctness in program repair," in 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2020, pp. 981--992.

[60]

E. Dinella, H. Dai, Z. Li, M. Naik, L. Song, and K. Wang, "Hoppity: Learning graph transformations to detect and fix bugs in programs," in International Conference on Learning Representations, 2020. [Online]. Available: https://openreview.net/forum?id=SJeqs6EFvB

[61]

P. Li and Y. Gu, "Learning forward reuse distance," ArXiv, vol. abs/2007.15859, 2020.

[62]

N. Wu and P. Li, "Phoebe: Reuse-aware online caching with reinforcement learning for emerging storage models," CoRR, vol. abs/2011.07160, 2020. [Online]. Available: https://arxiv.org/abs/2011.07160

[63]

Z. Yu, W. Zheng, J. Wang, Q. Tang, S. Nie, and S. Wu, "Codecmr: Cross-modal retrieval for function-level binary source code matching," Advances in Neural Information Processing Systems, vol. 33, 2020.

Recommendations

A durable and energy efficient main memory using phase change memory technology
ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture

Using nonvolatile memories in memory hierarchy has been investigated to reduce its energy consumption because nonvolatile memories consume zero leakage power in memory cells. One of the difficulties is, however, that the endurance of most nonvolatile ...
A deployable sampling strategy for data race detection
FSE 2016: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering

Dynamic data race detection incurs heavy runtime overheads. Recently, many sampling techniques have been proposed to detect data races. However, some sampling techniques (e.g., Pacer) are based on traditional happens-before relation and incur a large ...
A Novel Memory Block Management Scheme for PCM Using WOM-Code
HPCC-CSS-ICESS '15: Proceedings of the 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conf on Embedded Software and Systems

Phase Change Memory (PCM) is a promising DRAM replacement in embedded systems due to its attractive characteristics including low static power consumption and high density. However, long write latency is one of the major drawbacks in current PCM ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SC '22: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

November 2022

1277 pages

ISBN:9784665454445

Conference Chairs:
Felix Wolf,
Sameer Shende,
General Chair:
Candace Culhane,
Program Chairs:
Sadaf Alam,
Heike Jagode

Sponsors

SIGHPC: ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing

In-Cooperation

IEEE CS

Publisher

IEEE Press

Publication History

Published: 18 November 2022

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SC '22

Sponsor:

SIGHPC

SC '22: The International Conference for High Performance Computing, Networking, Storage and Analysis

November 13 - 18, 2022

Texas, Dallas

Acceptance Rates

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
119
Total Downloads

Downloads (Last 12 months)24
Downloads (Last 6 weeks)3

Reflects downloads up to 25 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents