Patel V, Biswas S and Chaudhuri M.
(2024). Leveraging Cache Coherence to Detect and Repair False Sharing On-the-fly 2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO). 10.1109/MICRO61859.2024.00066. 979-8-3503-5057-9. (823-839).
Zhao Y, Xiao L, Bondi A, Chen B and Liu Y. A Large-Scale Empirical Study of Real-Life Performance Issues in Open Source Projects. IEEE Transactions on Software Engineering. 10.1109/TSE.2022.3167628. 49:2. (924-946).
Yi Z and Yao Y.
(2021). A stealing mechanism for delegation methods. The Journal of Supercomputing. 77:10. (10827-10849). Online publication date: 1-Oct-2021.
Srivatsa A, Mansour M, Rheindt S, Gabriel D, Wild T and Herkersdorf A.
(2021). DynaCo: Dynamic Coherence Management for Tiled Manycore Architectures. International Journal of Parallel Programming. 49:4. (570-599). Online publication date: 1-Aug-2021.
Khan T, Zhao Y, Pokam G, Mozafari B and Kasikci B. Huron: hybrid false sharing detection and repair. Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation. (453-468).
Li X and Gulila A.
(2019). Optimised memory allocation for less false abortion and better performance in hardware transactional memory. International Journal of Parallel, Emergent and Distributed Systems. 10.1080/17445760.2019.1605605. (1-9).
Omar H, Shi Q, Ahmad M, Dogan H and Khan O.
(2018). Declarative Resilience. ACM Transactions on Embedded Computing Systems. 17:4. (1-27). Online publication date: 29-Aug-2018.
Rawat T and Shrivastava A. Enabling multi-threaded applications on hybrid shared memory manycore architectures. Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition. (742-747).
Asgharzadeh A, Gómez-Hernández E, Cebrian J, Kaxiras S and Ros A.
(2024). Hardware Cache Locking for All Memory Updates 2024 IEEE 42nd International Conference on Computer Design (ICCD). 10.1109/ICCD63220.2024.00092. 979-8-3503-8040-8. (566-574).
Tang S, Xiang M, Wang Y, Wu B, Chen J and Liu T. Scaler: Efficient and Effective Cross Flow Analysis. Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering. (907-918).
Tang W, Han Y, Ai T, Li G, Yu B and Yang X. Yggdrasil: Reducing Network I/O Tax with (CXL-Based) Distributed Shared Memory. Proceedings of the 53rd International Conference on Parallel Processing. (597-606).
Su J, Gu N and Qi D.
(2024). ParaShareDetect: Dynamic Instrumentation and Runtime Analysis for False Sharing Detection in Parallel Computing 2024 4th International Conference on Computer, Control and Robotics (ICCCR). 10.1109/ICCCR61138.2024.10585404. 979-8-3503-7314-1. (230-235).
Zhou J, Silvestro S, Tang S, Yang H, Liu H, Zeng G, Wu B, Liu C and Liu T.
(2023). MemPerf: Profiling Allocator-Induced Performance Slowdowns. Proceedings of the ACM on Programming Languages. 7:OOPSLA2. (1418-1441). Online publication date: 16-Oct-2023.
Garg S, Moghaddam R, Clement C, Sundaresan N and Wu C. DeepDev-PERF: a deep learning-based approach for improving software performance. Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. (948-958).
Haque Rafi M, Williams K and Qasem A.
(2022). Raptor: Mitigating CPU-GPU False Sharing Under Unified Memory Systems 2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC). 10.1109/IGSC55832.2022.9969376. 978-1-6654-6550-2. (1-8).
Qin B, Tu T, Liu Z, Yu T and Song L. Algorithmic Profiling for Real-World Complexity Problems. IEEE Transactions on Software Engineering. 10.1109/TSE.2021.3067652. 48:7. (2680-2694).
Jia X, Zhang J, Yu B, Qian X, Qi Z and Guan H.
(2022). GiantVM: A Novel Distributed Hypervisor for Resource Aggregation with DSM-aware Optimizations. ACM Transactions on Architecture and Code Optimization. 19:2. (1-27). Online publication date: 30-Jun-2022.
Alam M, Gottschlich J, Tatbul N, Turek J, Mattson T and Muzahid A. A zero-positive learning approach for diagnosing software performance regressions. Proceedings of the 33rd International Conference on Neural Information Processing Systems. (11627-11639).
Yoga A and Nagarakatte S. Parallelism-centric what-if and differential analyses. Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation. (485-501).
Helm C and Taura K.
(2019). PerfMemPlus: A Tool for Automatic Discovery of Memory Performance Problems. High Performance Computing. 10.1007/978-3-030-20656-7_11. (209-226).
Liu H, Silvestro S, Wang W, Tian C and Liu T.
(2018). iReplayer: in-situ and identical record-and-replay for multithreaded applications. ACM SIGPLAN Notices. 53:4. (344-358). Online publication date: 2-Dec-2018.
Silvestro S, Liu H, Zhang T, Jung C, Lee D and Liu T. Sampler. Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture. (231-244).
Liu H, Silvestro S, Wang W, Tian C and Liu T. iReplayer: in-situ and identical record-and-replay for multithreaded applications. Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation. (344-358).
Chabbi M, Wen S and Liu X. Featherlight on-the-fly false-sharing detection. Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. (152-167).
Hruby T, Giuffrida C, Sambuc L, Bos H and Tanenbaum A. A NEaT Design for Reliable and Scalable Network Stacks. Proceedings of the 12th International on Conference on emerging Networking EXperiments and Technologies. (359-373).
Jiang Y, Xu C, Li D, Ma X and Lu J. Online shared memory dependence reduction via bisectional coordination. Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. (822-832).
Eizenberg A, Hu S, Pokam G and Devietti J.
(2016). Remix: online detection and repair of cache contention for the JVM. ACM SIGPLAN Notices. 51:6. (251-265). Online publication date: 1-Aug-2016.
Eizenberg A, Hu S, Pokam G and Devietti J. Remix: online detection and repair of cache contention for the JVM. Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation. (251-265).
Thalheim J, Bhatotia P and Fetzer C.
(2016). INSPECTOR: Data Provenance Using Intel Processor Trace (PT) 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS). 10.1109/ICDCS.2016.86. 978-1-5090-1483-5. (25-34).
Luo L, Sriraman A, Fugate B, Hu S, Pokam G, Newburn C and Devietti J.
(2016). LASER: Light, Accurate Sharing dEtection and Repair 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA). 10.1109/HPCA.2016.7446070. 978-1-4673-9211-2. (261-273).
Liu T and Liu X. Cheetah: detecting false sharing efficiently and effectively. Proceedings of the 2016 International Symposium on Code Generation and Optimization. (1-11).
Gu R, Jin G, Song L, Zhu L and Lu S. What change history tells us about thread synchronization. Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. (426-438).
Nistor A, Chang P, Radoi C and Lu S.
(2015). CARAMEL: Detecting and Fixing Performance Problems That Have Non-Intrusive Fixes 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering (ICSE). 10.1109/ICSE.2015.100. 978-1-4799-1934-5. (902-912).
Bhatotia P, Fonseca P, Acar U, Brandenburg B and Rodrigues R. iThreads. Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems. (645-659).
Ghane M, Malik A, Chapman B and Qawasmeh A.
(2015). False Sharing Detection in OpenMP Applications Using OMPT API. OpenMP: Heterogenous Execution and Data Movements. 10.1007/978-3-319-24595-9_8. (102-114).
Song L and Lu S.
(2014). Statistical debugging for real-world performance problems. ACM SIGPLAN Notices. 49:10. (561-578). Online publication date: 31-Dec-2015.
Song L and Lu S. Statistical debugging for real-world performance problems. Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications. (561-578).
Nistor A, Song L, Marinov D and Lu S. Toddler: detecting performance problems via similar memory-access patterns. Proceedings of the 2013 International Conference on Software Engineering. (562-571).
Wicaksono B, Tolubaeva M and Chapman B.
(2013). Detecting False Sharing in OpenMP Applications Using the DARWIN Framework. Languages and Compilers for Parallel Computing. 10.1007/978-3-642-36036-7_19. (283-297).
Kalibera T, Mole M, Jones R and Vitek J.
(2012). A black-box approach to understanding concurrency in DaCapo. ACM SIGPLAN Notices. 47:10. (335-354). Online publication date: 15-Nov-2012.
Kalibera T, Mole M, Jones R and Vitek J. A black-box approach to understanding concurrency in DaCapo. Proceedings of the ACM international conference on Object oriented programming systems languages and applications. (335-354).
Jin G, Song L, Shi X, Scherpelz J and Lu S.
(2012). Understanding and detecting real-world performance bugs. ACM SIGPLAN Notices. 47:6. (77-88). Online publication date: 6-Aug-2012.
Jin G, Song L, Shi X, Scherpelz J and Lu S. Understanding and detecting real-world performance bugs. Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation. (77-88).