Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleSeptember 2018
CODA: Enabling Co-location of Computation and Data for Multiple GPU Systems
- Hyojong Kim,
- Ramyad Hadidi,
- Lifeng Nai,
- Hyesoon Kim,
- Nuwan Jayasena,
- Yasuko Eckert,
- Onur Kayiran,
- Gabriel Loh
ACM Transactions on Architecture and Code Optimization (TACO), Volume 15, Issue 3Article No.: 32, Pages 1–23https://doi.org/10.1145/3232521To exploit parallelism and scalability of multiple GPUs in a system, it is critical to place compute and data together. However, two key techniques that have been used to hide memory latency and improve thread-level parallelism (TLP), memory ...
- research-articleDecember 2017
CAIRO: A Compiler-Assisted Technique for Enabling Instruction-Level Offloading of Processing-In-Memory
ACM Transactions on Architecture and Code Optimization (TACO), Volume 14, Issue 4Article No.: 48, Pages 1–25https://doi.org/10.1145/3155287Three-dimensional (3D)-stacking technology and the memory-wall problem have popularized processing-in-memory (PIM) concepts again, which offers the benefits of bandwidth and energy savings by offloading computations to functional units inside the ...
- ArticleOctober 2014
ALGOL 68 and Its Impact on the USSR and Russian Programming
SORUCOM '14: Proceedings of the 2014 Third International Conference on Computer Technology in Russia and in the Former Soviet UnionPages 97–106https://doi.org/10.1109/SoRuCom.2014.29In this article history of Algol 68, USSR research groups implementing this language and it's impact on the programming as a whole in USSR are described. Algol 68 Working Group headed by academician Andrey Ershov was organized. This group had meetings 4-...
- posterFebruary 2012
Automatic datatype generation and optimization
PPoPP '12: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel ProgrammingPages 327–328https://doi.org/10.1145/2145816.2145878Many high performance applications spend considerable time packing noncontiguous data into contiguous communication buffers. MPI Datatypes provide an alternative by describing noncontiguous data layouts. This allows sophisticated hardware to retrieve ...
Also Published in:
ACM SIGPLAN Notices: Volume 47 Issue 8 - research-articleJune 2010
B2P2: bounds based procedure placement for instruction TLB power reduction in embedded systems
SCOPES '10: Proceedings of the 13th International Workshop on Software & Compilers for Embedded SystemsArticle No.: 2, Pages 1–10https://doi.org/10.1145/1811212.1811215High performance embedded processors are equipped with the Translation Look-aside Buffer (TLB) which forms the key ingredient to efficient and speedy virtual memory management. The TLB though small, is frequently accessed, and therefore not only ...
- research-articleApril 2010
Cache vulnerability equations for protecting data in embedded processor caches from soft errors
LCTES '10: Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, compilers, and tools for embedded systemsPages 143–152https://doi.org/10.1145/1755888.1755910Continuous technology scaling has brought us to a point, where transistors have become extremely susceptible to cosmic radiation strikes, or soft errors. Inside the processor, caches are most vulnerable to soft errors, and techniques at various levels ...
Also Published in:
ACM SIGPLAN Notices: Volume 45 Issue 4