Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3571885.3571998acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Graph neural networks based memory inefficiency detection using selective sampling

Published: 18 November 2022 Publication History

Abstract

Production software of data centers oftentimes suffers from unnecessary memory inefficiencies caused by inappropriate use of data structures, conservative compiler optimizations, and so forth. Nevertheless, whole-program monitoring tools often incur incredibly high overhead due to fine-grained memory access instrumentation. Consequently, the fine-grained monitoring tools are not viable for long-running, large-scale data center applications due to strict latency criteria (e.g., service-level agreement or SLA).
To this end, this work presents a novel learning-aided system, namely Puffin, to identify three kinds of unnecessary memory operations including dead stores, silent loads and silent stores, by applying gated graph neural networks onto fused static and dynamic program semantics with respect to relative positional embedding. To deploy the system in large-scale data centers, this work explores a sampling-based detection infrastructure with high efficacy and negligible overhead. We evaluate Puffin upon the well-known SPEC CPU 2017 benchmark suite for four compilation options. Experimental results show that the proposed method is able to capture the three kinds of memory inefficiencies with as high accuracy as 96% and a reduced checking overhead by 5.66× over the state-of-the-art tool.

Supplementary Material

MP4 File (SC22_Presentation_Li_Pengcheng.mp4)
Presentation at SC '22

References

[1]
S. L. Graham, P. B. Kessler, and M. K. McKusick, "gprof: a call graph execution profiler (with retrospective)," in Best of PLDI, 1982, pp. 49--57.
[2]
L. Adhianto, S. Banerjee, M. Fagan, M. Krentel, G. Marin, J. Mellor-Crummey, and N. R. Tallent, "Hpctoolkit: Tools for performance analysis of optimized parallel programs http://hpctoolkit.org," Concurrency and Computation: Practice Experience, vol. 22, no. 6, p. 685--701, Apr. 2010.
[3]
"Intel VTune Amplifier XE 2013," 2013, http://software.intel.com/en-us/articles/intel-vtune-amplifier-xe.
[4]
M. Chabbi and J. Mellor-Crummey, "Deadspy: A tool to pinpoint program inefficiencies," in Proceedings of the Tenth International Symposium on Code Generation and Optimization, ser. CGO '12. New York, NY, USA: Association for Computing Machinery, 2012, p. 124--134. [Online].
[5]
P. Su, S. Wen, H. Yang, M. Chabbi, and X. Liu, "Redundant loads: A software inefficiency indicator," in Proceedings of the 41st International Conference on Software Engineering, 2019. [Online].
[6]
J. Tan, S. Jiao, M. Chabbi, and X. Liu, "What every scientific programmer should know about compiler optimizations?" in Proceedings of the 34th ACM International Conference on Supercomputing, 2020, pp. 1--12.
[7]
S. Wen, M. Chabbi, and X. Liu, "Redspy: Exploring value locality in software," in Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017. [Online].
[8]
S. Wen, X. Liu, J. Byrne, and M. Chabbi, "Watching for software inefficiencies with witch," in Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, vol. 53, no. 2, New York, NY, USA, Mar. 2018, p. 332--347.
[9]
G. Ren, E. Tune, T. Moseley, Y. Shi, S. Rus, and R. Hundt, "Google-wide profiling: A continuous profiling infrastructure for data centers," IEEE Micro, vol. 30, no. 4, pp. 65--79, 2010.
[10]
B. H. Sigelman, L. A. Barroso, M. Burrows, P. Stephenson, M. Plakal, D. Beaver, S. Jaspan, and C. Shanbhag, "Dapper, a large-scale distributed systems tracing infrastructure," Google, Inc., Tech. Rep., 2010. [Online]. Available: https://research.google.com/archive/papers/dapper-2010-1.pdf
[11]
F. Yin, D. Dong, C. Lu, T. Zhang, S. Li, J. Guo, and K. Chow, "Cloud-scale java profiling at alibaba," in Companion of the 2018 ACM/SPEC International Conference on Performance Engineering. New York, NY, USA: Association for Computing Machinery, 2018, p. 99--100.
[12]
D. A. Jiménez and C. Lin, "Dynamic branch prediction with perceptrons," in Proceedings of the 7th International Symposium on High-Performance Computer Architecture, ser. HPCA '01. Washington, DC, USA: IEEE Computer Society, 2001, pp. 197--. [Online]. Available: http://dl.acm.org/citation.cfm?id=580550.876441
[13]
M. Hashemi, K. Swersky, J. Smith, G. Ayers, H. Litz, J. Chang, C. Kozyrakis, and P. Ranganathan, "Learning memory access patterns," in Proceedings of the 35th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, J. Dy and A. Krause, Eds., vol. 80. Stockholmsmässan, Stockholm Sweden: PMLR, 10--15 Jul 2018. [Online]. Available: http://proceedings.mlr.press/v80/hashemi18a.html
[14]
E. Z. Liu, M. Hashemi, K. Swersky, P. Ranganathan, and J. Ahn, "An imitation learning approach for cache replacement," arXiv preprint arXiv:2006.16239, 2020.
[15]
Y. Li, R. Zemel, M. Brockschmidt, and D. Tarlow, "Gated graph sequence neural networks," in Proceedings of ICLR'16, April 2016. [Online]. Available: https://www.microsoft.com/en-us/research/publication/gated-graph-sequence-neural-networks
[16]
J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl, "Neural message passing for quantum chemistry," CoRR, vol. abs/1704.01212, 2017. [Online]. Available: http://arxiv.org/abs/1704.01212
[17]
"Memcached," https://www.memcached.org/, 2020.
[18]
"Redis," https://redis.io/, 2021.
[19]
V. Raychev, M. Vechev, and E. Yahav, "Code completion with statistical language models," in Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2014, pp. 419--428.
[20]
A. Hindle, E. T. Barr, M. Gabel, Z. Su, and P. Devanbu, "On the naturalness of software," Communications of the ACM, vol. 59, no. 5, pp. 122--131, 2016.
[21]
U. Alon, O. Levy, and E. Yahav, "code2seq: Generating sequences from structured representations of code," in International Conference on Learning Representations, 2019. [Online]. Available: https://openreview.net/forum?id=H1gKYo09tX
[22]
X. Xu, C. Liu, Q. Feng, H. Yin, L. Song, and D. Song, "Neural network-based graph embedding for cross-platform binary code similarity detection," in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, 2017, pp. 363--376.
[23]
Z. Yu, R. Cao, Q. Tang, S. Nie, J. Huang, and S. Wu, "Order matters: Semantic-aware neural networks for binary code similarity detection," in Proceedings of the National Conference on Artificial Intelligence and on Innovative Applications of Artificial Intelligence, 2020, pp. 1145--1152. [Online]. Available: https://aaai.org/ojs/index.php/AAAI/article/view/5466
[24]
M. Allamanis, H. Peng, and C. Sutton, "A convolutional attention network for extreme summarization of source code," in International conference on machine learning. PMLR, 2016, pp. 2091--2100.
[25]
M. Pradel and K. Sen, "Deepbugs: A learning approach to name-based bug detection," Proceedings of the ACM on Programming Languages, vol. 2, no. OOPSLA, pp. 1--25, 2018.
[26]
M. Lu, D. Tan, N. Xiong, Z. Chen, and H. Li, "Program classification using gated graph attention neural network for online programming service," CoRR, vol. abs/1903.03804, 2019. [Online]. Available: http://arxiv.org/abs/1903.03804
[27]
T. Ben-Nun, A. S. Jakobovits, and T. Hoefler, "Neural code comprehension: A learnable representation of code semantics," arXiv preprint arXiv:1806.07336, 2018.
[28]
X. Cheng, H. Wang, J. Hua, G. Xu, and Y. Sui, "Deepwukong," ACM Transactions on Software Engineering and Methodology (TOSEM), 2021.
[29]
Y. Guo, P. Li, Y. Luo, X. Wang, and Z. Wang, "Graphspy: Fused program semantic embedding through graph neural networks for memory efficiency," in 2021 58th ACM/IEEE Design Automation Conference (DAC). IEEE, 2021, pp. 1045--1050.
[30]
F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini, "The graph neural network model," IEEE Transactions on Neural Networks, vol. 20, no. 1, pp. 61--80, 2009.
[31]
J. Chung, Ç. Gülçehre, K. Cho, and Y. Bengio, "Empirical evaluation of gated recurrent neural networks on sequence modeling," CoRR, vol. abs/1412.3555, 2014. [Online]. Available: http://arxiv.org/abs/1412.3555
[32]
J. F. I. Neamtiu and M. Hicks, "Understanding source code evolution using abstract syntax tree matching," ACM SIGSOFT Software Engineering Notes, vol. 30, pp. 1--5, 2005.
[33]
T. Mikolov, K. Chen, G. S. Corrado, and J. Dean, "Efficient estimation of wor representations in vector space," 2013. [Online]. Available: http://arxiv.org/abs/1301.3781
[34]
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018.
[35]
K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," 06 2016, pp. 770--778.
[36]
Y. Shoshitaishvili, R. Wang, C. Salls, N. Stephens, M. Polino, A. Dutcher, J. Grosen, S. Feng, C. Hauser, C. Kruegel, and G. Vigna, "SoK: (State of) The Art of War: Offensive Techniques in Binary Analysis," in IEEE Symposium on Security and Privacy, 2016.
[37]
Linux, "perf event open - linux man page," https://man7.org/linux/man-pages/man2/perf/event/open.2.html, 2020.
[38]
N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller, "Equation of state calculations by fast computing machines," Journal of Chemical Physics, vol. 21, no. 6, pp. 1087--1092, 1953.
[39]
K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770--778.
[40]
D. Bruening, "Dynamorio: Efficient, transparent, and comprehensive runtime code manipulation," https://dynamorio.org/, 2004.
[41]
Nvidia-Inc., "Nvidia tesla v100 gpu architecture," https://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf, 2017.
[42]
SPEC, "Spec cpu benchmarks," 2017, http://www.spec.org/benchmarks.html#cpu.
[43]
B. Gough, GNU scientific library reference manual. Network Theory Ltd., 2009.
[44]
M. Rajan, D. W. Doerfler, and S. D. Hammond, "Trinity benchmarks on the intel xeon phi (knights corner)," Sandia National Lab.(SNL-NM), Albuquerque, NM (United States), Tech. Rep., 2015.
[45]
A. Kanade, P. Maniatis, G. Balakrishnan, and K. Shi, "Learning and evaluating contextual embedding of source code," in International Conference on Machine Learning. PMLR, 2020, pp. 5110--5121.
[46]
Z. Feng, D. Guo, D. Tang, N. Duan, X. Feng, M. Gong, L. Shou, B. Qin, T. Liu, D. Jiang et al., "Codebert: A pre-trained model for programming and natural languages," in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, 2020, pp. 1536--1547.
[47]
X. Hu, G. Li, X. Xia, D. Lo, and Z. Jin, "Deep code comment generation," in 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC). IEEE, 2018, pp. 200--20 010.
[48]
X. Chen, C. Liu, and D. Song, "Tree-to-tree neural networks for program translation, 2018," in Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, QC, Canada, 2018, pp. 2--8.
[49]
U. Alon, M. Zilberstein, O. Levy, and E. Yahav, "code2vec: Learning distributed representations of code," Proceedings of the ACM on Programming Languages, vol. 3, no. POPL, pp. 1--29, 2019.
[50]
U. Alon, R. Sadaka, O. Levy, and E. Yahav, "Structural language models of code," in International Conference on Machine Learning. PMLR, 2020, pp. 245--256.
[51]
J. Zhang, X. Wang, H. Zhang, H. Sun, K. Wang, and X. Liu, "A novel neural source code representation based on abstract syntax tree," in 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 2019, pp. 783--794.
[52]
M. Allamanis, E. T. Barr, P. T. Devanbu, and C. A. Sutton, "A survey of machine learning for big code and naturalness," CoRR, vol. abs/1709.06182, 2017. [Online]. Available: http://arxiv.org/abs/1709.06182
[53]
Z. Shi, K. Swersky, D. Tarlow, P. Ranganathan, and M. Hashemi, "Learning execution through neural code fusion," in International Conference on Learning Representations, 2020. [Online]. Available: https://openreview.net/forum?id=SJetQpEYvB
[54]
Y. Wang, K. Wang, F. Gao, and L. Wang, "Learning semantic program embeddings with graph interval neural network," Proceedings of the ACM on Programming Languages, vol. 4, no. OOPSLA, pp. 1--27, 2020.
[55]
V. J. Hellendoorn, C. Sutton, R. Singh, P. Maniatis, and D. Bieber, "Global relational models of source code," in International conference on learning representations, 2019.
[56]
Y. Guo, P. Li, Y. Luo, X. Wang, and Z. Wang, "Exploring gnn based program embedding technologies for binary related tasks," in 2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC), 2022, pp. 366--377.
[57]
K. Wang, Z. Su, and R. Singh, "Dynamic neural program embeddings for program repair," in International Conference on Learning Representations, 2018. [Online]. Available: https://openreview.net/forum?id=BJuWrGW0Z
[58]
K. Wang and Z. Su, "Blended, precise semantic program embeddings," ser. PLDI 2020. New York, NY, USA: Association for Computing Machinery, 2020, p. 121--134. [Online].
[59]
H. Tian, K. Liu, A. K. Kaboré, A. Koyuncu, L. Li, J. Klein, and T. F. Bissyandé, "Evaluating representation learning of code changes for predicting patch correctness in program repair," in 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2020, pp. 981--992.
[60]
E. Dinella, H. Dai, Z. Li, M. Naik, L. Song, and K. Wang, "Hoppity: Learning graph transformations to detect and fix bugs in programs," in International Conference on Learning Representations, 2020. [Online]. Available: https://openreview.net/forum?id=SJeqs6EFvB
[61]
P. Li and Y. Gu, "Learning forward reuse distance," ArXiv, vol. abs/2007.15859, 2020.
[62]
N. Wu and P. Li, "Phoebe: Reuse-aware online caching with reinforcement learning for emerging storage models," CoRR, vol. abs/2011.07160, 2020. [Online]. Available: https://arxiv.org/abs/2011.07160
[63]
Z. Yu, W. Zheng, J. Wang, Q. Tang, S. Nie, and S. Wu, "Codecmr: Cross-modal retrieval for function-level binary source code matching," Advances in Neural Information Processing Systems, vol. 33, 2020.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '22: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
November 2022
1277 pages
ISBN:9784665454445

Sponsors

In-Cooperation

  • IEEE CS

Publisher

IEEE Press

Publication History

Published: 18 November 2022

Check for updates

Author Tags

  1. graph neural network
  2. memory inefficiency detection
  3. program embedding
  4. sampling

Qualifiers

  • Research-article

Conference

SC '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 119
    Total Downloads
  • Downloads (Last 12 months)24
  • Downloads (Last 6 weeks)3
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media