Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3524059.3532395acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article
Public Access

Beyond time complexity: data movement complexity analysis for matrix multiplication

Published: 28 June 2022 Publication History

Abstract

Data movement is becoming the dominant contributor to the time and energy costs of computation across a wide range of application domains. However, time complexity is inadequate to analyze data movement. This work expands upon Data Movement Distance, a recently proposed framework for memory-aware algorithm analysis, by 1) demonstrating that its assumptions conform with microarchitectural trends, 2) applying it to four variants of matrix multiplication, and 3) showing it to be capable of asymptotically differentiating algorithms with the same time complexity but different memory behavior, as well as locality optimized vs. non-optimized versions of the same algorithm. In doing so, we attempt to bridge theory and practice by combining the operation count analysis used by asymptotic time complexity with per-operation data movement cost resulting from hierarchical memory structure. Additionally, this paper derives the first fully precise, fully analytical form of recursive matrix multiplication's miss ratio curve on LRU caching systems. Our results indicate that the Data Movement Distance framework is a powerful tool going forward for engineers and algorithm designers to understand the algorithmic implications of hierarchical memory.

References

[1]
B. Alpern, L. Carter, E. Feig, and T. Selker. 1994. The uniform memory hierarchy model of computation. Algorithmica 12, 2/3 (1994), 72--109.
[2]
Bin Bao and Chen Ding. 2013. Defensive loop tiling for shared cache. In Proceedings of the International Symposium on Code Generation and Optimization. 1--11.
[3]
Guy E. Blelloch, Rezaul A. Chowdhury, Phillip B. Gibbons, Vijaya Ramachandran, Shimin Chen, and Michael Kozuch. 2008. Provably Good Multicore Cache Performance for Divide-and-Conquer Algorithms (SODA '08). Society for Industrial and Applied Mathematics, USA, 501--510.
[4]
Andrew S. Cassidy and Andreas G. Andreou. 2012. Beyond Amdahl's Law: An Objective Function That Links Multiprocessor Performance Gains to Delay and Energy. IEEE Trans. Comput. 61, 8 (2012), 1110--1126.
[5]
Ian Cutress. 2019. The Ice Lake Benchmark Preview: Inside Intel's 10nm. https://www.anandtech.com/show/14664/testing-intel-ice-lake-10nm/2
[6]
Bill Dally. [n.d.]. From Here to Exascale: Challenges and Potential Solutions.
[7]
Chen Ding and Wesley Smith. 2021. Memory Access Complexity: A Position Paper. In Proceedings of the International Symposium on Memory Systems (MEMSYS).
[8]
Matteo Frigo, Charles E. Leiserson, Harald Prokop, and Sridhar Ramachandran. 1999. Cache-Oblivious Algorithms. In Proceedings of the Symposium on Foundations of Computer Science. 285--298.
[9]
Jia-Wei Hong and H. T. Kung. 1981. I/O complexity: The red-blue pebble game. In Proceedings of the ACM Conference on Theory of Computing. Milwaukee, WI, 326--333.
[10]
Jianyu Huang, Tyler M. Smith, Greg M. Henry, and Robert A. van de Geijn. 2016. Strassen's Algorithm Reloaded. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (Salt Lake City, Utah) (SC '16). IEEE Press, Article 59, 12 pages.
[11]
Steven Huss-Lederman, Elaine Jacobson, Jeremy Johnson, Anna Tsao, and Thomas Turnbull. 1997. Implementation of Strassen's Algorithm for Matrix Multiplication. (10 1997).
[12]
H. T. Kung and Charles E. Leiserson. 1979. Systolic Arrays for (VLSI). Technical Report CMU-CS-79-103. Cargegie-Mellon University.
[13]
Andrea Lincoln, Quanquan C. Liu, Jayson Lynch, and Helen Xu. 2018. Cache-Adaptive Exploration: Experimental Results and Scan-Hiding for Adaptivity. In Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures (Vienna, Austria) (SPAA '18). Association for Computing Machinery, New York, NY, USA, 213--222.
[14]
R. L. Mattson, J. Gecsei, D. Slutz, and I. L. Traiger. 1970. Evaluation techniques for storage hierarchies. IBM System Journal 9, 2 (1970), 78--117.
[15]
Auguste Olivry, Julien Langou, Louis-Noël Pouchet, P. Sadayappan, and Fabrice Rastello. 2020. Automated Derivation of Parametric Data Movement Lower Bounds for Affine Programs. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (London, UK) (PLDI 2020). Association for Computing Machinery, New York, NY, USA, 808--822.
[16]
F. Olken. 1981. Efficient methods for calculating the success function of fixed space replacement policies. Technical Report LBL-12370. Lawrence Berkeley Laboratory.
[17]
V.Paul Pauca, Pauca Xiaobai, Sun Chatterjee, Xiaobai Sun, and Alvin Lebeck. 1998. Architecture-efficient Strassen's Matrix Multiplication: A Case Study of Divide-and-Conquer Algorithms. (07 1998).
[18]
Harald Prokop. 1999. Cache-Oblivious Algorithms.
[19]
Vikash Kumar Singh, Hemant Makwana, and Richa Gupta. 2015. Comparative Study of Cache Utilization for Matrix Multiplication Algorithms.
[20]
Donovan Snyder and Chen Ding. 2021. Measuring Cache Complexity Using Data Movement Distance (DMD). In 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, 417--419.
[21]
Yuan Tang. 2020. Balanced Partitioning of Several Cache-Oblivious Algorithms. Association for Computing Machinery, New York, NY, USA, 575--577.
[22]
Mithuna Thottethodi, Siddhartha Chatterjee, and Alvin Lebeck. 1998. Tuning Strassen's Matrix Multiplication for Memory Efficiency. 36--36.
[23]
Po-An Tsai, Nathan Beckmann, and Daniel Sanchez. 2017. Jenga: Software-Defined Cache Hierarchies (ISCA '17). Association for Computing Machinery, New York, NY, USA, 652--665.
[24]
Leslie G. Valiant. 2008. A Bridging Model for Multi-core Computing. In Algorithms - ESA 2008, 16th Annual European Symposium. 13--28.
[25]
Wikipedia contributors. 2021. Strassen algorithm --- Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=Strassen_algorithm&oldid=1049348598 [Online; accessed 24-January-2022].
[26]
Leonid Yavits, Amir Morad, and Ran Ginosar. 2014. Cache Hierarchy Optimization. IEEE Computer Architecture Letters 13, 2 (2014), 69--72.
[27]
Liang Yuan, Chen Ding, Wesley Smith, Peter J. Denning, and Yunquan Zhang. 2019. A Relational Theory of Locality. ACM Transactions on Architecture and Code Optimization 16, 3 (2019), 33:1--33:26.

Cited By

View all
  • (2024)Symmetric Locality: Definition and Initial ResultsSC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SCW63240.2024.00142(1025-1034)Online publication date: 17-Nov-2024

Index Terms

  1. Beyond time complexity: data movement complexity analysis for matrix multiplication

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICS '22: Proceedings of the 36th ACM International Conference on Supercomputing
    June 2022
    514 pages
    ISBN:9781450392815
    DOI:10.1145/3524059
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 June 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. algorithm analysis
    2. data movement
    3. hierarchical memory
    4. matrix multiplication

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    ICS '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 629 of 2,180 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)156
    • Downloads (Last 6 weeks)23
    Reflects downloads up to 20 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Symmetric Locality: Definition and Initial ResultsSC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SCW63240.2024.00142(1025-1034)Online publication date: 17-Nov-2024

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media