Compilers

Applied Filters

People

Publications

Publication Date

Searched The ACM Guide to Computing Literature (3,736,548 records)|Limit your search to The ACM Full-Text Collection (750,375 records)

Showing 1 - 20of27 Results

Filters

Select All

Export Citations Save to Binder

per page:

Recency

research-article
January 2024
Modeling the Interplay between Loop Tiling and Fusion in Optimizing Compilers Using Affine Relations
- Jie Zhao,
- Jinchen Xu,
- Peng Di,
- Wang Nie,
- Jiahui Hu,
- Yanzhi Yi,
- Sijia Yang,
- Zhen Geng,
- Renwei Zhang,
- Bojie Li,
- Zhiliang Gan,
- Xuefeng Jin
ACM Transactions on Computer Systems (TOCS), Volume 41, Issue 1-4Article No.: 5, Pages 1–45https://doi.org/10.1145/3635305
Loop tiling and fusion are two essential transformations in optimizing compilers to enhance the data locality of programs. Existing heuristics either perform loop tiling and fusion in a particular order, missing some of their profitable compositions, or ...
3
587
Metrics
Total Citations3
Total Downloads587
Last 12 Months587
Last 6 weeks59
Get Access
research-article
May 2020
A Retargetable System-level DBT Hypervisor
ACM Transactions on Computer Systems (TOCS), Volume 36, Issue 4Article No.: 14, Pages 1–24https://doi.org/10.1145/3386161

System-level Dynamic Binary Translation (DBT) provides the capability to boot an Operating System (OS) and execute programs compiled for an Instruction Set Architecture (ISA) different from that of the host machine. Due to their performance-critical ...
3
396
Metrics
Total Citations3
Total Downloads396
Last 12 Months50
Last 6 weeks3
Get Access
research-article
June 2019
Software Prefetching for Indirect Memory Accesses: A Microarchitectural Perspective
- Sam Ainsworth,
- Timothy M. Jones
ACM Transactions on Computer Systems (TOCS), Volume 36, Issue 3Article No.: 8, Pages 1–34https://doi.org/10.1145/3319393

Many modern data processing and HPC workloads are heavily memory-latency bound. A tempting proposition to solve this is software prefetching, where special non-blocking loads are used to bring data into the cache hierarchy just before being required. ...
13
1,060
Metrics
Total Citations13
Total Downloads1,060
Last 12 Months145
Last 6 weeks9
Get Access
research-article
January 2016
Assisting Static Compiler Vectorization with a Speculative Dynamic Vectorizer in an HW/SW Codesigned Environment
ACM Transactions on Computer Systems (TOCS), Volume 33, Issue 4Article No.: 12, Pages 1–33https://doi.org/10.1145/2807694

Compiler-based static vectorization is used widely to extract data-level parallelism from computation-intensive applications. Static vectorization is very effective in vectorizing traditional array-based applications. However, compilers’ inability to do ...
3
432
Metrics
Total Citations3
Total Downloads432
Last 12 Months7
Last 6 weeks0
Get Access
research-article
August 2015
SKMD: Single Kernel on Multiple Devices for Transparent CPU-GPU Collaboration
ACM Transactions on Computer Systems (TOCS), Volume 33, Issue 3Article No.: 9, Pages 1–27https://doi.org/10.1145/2798725

Heterogeneous computing on CPUs and GPUs has traditionally used fixed roles for each device: the GPU handles data parallel work by taking advantage of its massive number of cores while the CPU handles non data-parallel work, such as the sequential code ...
37
830
Metrics
Total Citations37
Total Downloads830
Last 12 Months27
Last 6 weeks1
Get Access
research-article
August 2014
Scaling Performance via Self-Tuning Approximation for Graphics Engines
ACM Transactions on Computer Systems (TOCS), Volume 32, Issue 3Article No.: 7, Pages 1–29https://doi.org/10.1145/2631913

Approximate computing, where computation accuracy is traded off for better performance or higher data throughput, is one solution that can help data processing keep pace with the current and growing abundance of information. For particular domains, such ...
8
499
Metrics
Total Citations8
Total Downloads499
Last 12 Months7
Last 6 weeks0
Get Access
research-article
March 2008
Incrementally parallelizing database transactions with thread-level speculation
ACM Transactions on Computer Systems (TOCS), Volume 26, Issue 1Article No.: 2, Pages 1–50https://doi.org/10.1145/1328671.1328673

With the advent of chip multiprocessors, exploiting intratransaction parallelism in database systems is an attractive way of improving transaction performance. However, exploiting intratransaction parallelism is difficult for two reasons: first, ...
4
1,139
Metrics
Total Citations4
Total Downloads1,139
Last 12 Months5
Last 6 weeks0
Get Access
article
August 2005
The STAMPede approach to thread-level speculation
ACM Transactions on Computer Systems (TOCS), Volume 23, Issue 3Pages 253–300https://doi.org/10.1145/1082469.1082471

Multithreaded processor architectures are becoming increasingly commonplace: many current and upcoming designs support chip multiprocessing, simultaneous multithreading, or both. While it is relatively straightforward to use these architectures to ...
164
1,371
Metrics
Total Citations164
Total Downloads1,371
Last 12 Months27
Last 6 weeks8
Get Access
article
August 2004
A study of source-level compiler algorithms for automatic construction of pre-execution code
- Dongkeun Kim,
- Donald Yeung
ACM Transactions on Computer Systems (TOCS), Volume 22, Issue 3Pages 326–379https://doi.org/10.1145/1012268.1012270

Pre-execution is a promising latency tolerance technique that uses one or more helper threads running in spare hardware contexts ahead of the main computation to trigger long-latency memory operations early, hence absorbing their latency on behalf of ...
24
1,191
Metrics
Total Citations24
Total Downloads1,191
Last 12 Months5
Last 6 weeks0
Get Access
article
May 2004
A general framework for prefetch scheduling in linked data structures and its application to multi-chain prefetching
ACM Transactions on Computer Systems (TOCS), Volume 22, Issue 2Pages 214–280https://doi.org/10.1145/986533.986536

Pointer-chasing applications tend to traverse composite data structures consisting of multiple independent pointer chains. While the traversal of any single pointer chain leads to the serialization of memory operations, the traversal of independent ...
16
1,740
Metrics
Total Citations16
Total Downloads1,740
Last 12 Months29
Last 6 weeks1
Get Access
article
February 2003
Run-time support for distributed sharing in safe languages
ACM Transactions on Computer Systems (TOCS), Volume 21, Issue 1Pages 1–35https://doi.org/10.1145/592637.592638

We present a new run-time system that supports object sharing in a distributed system. The key insight in this system is that a handle-based implementation of such a system enables efficient and transparent sharing of data with both fine- and coarse-...
7
832
Metrics
Total Citations7
Total Downloads832
Last 12 Months4
Last 6 weeks0
Get Access
article
August 2002
Secure program partitioning
ACM Transactions on Computer Systems (TOCS), Volume 20, Issue 3Pages 283–328https://doi.org/10.1145/566340.566343

This paper presents secure program partitioning, a language-based technique for protecting confidential data during computation in distributed systems containing mutually untrusted hosts. Confidentiality and integrity policies can be expressed by ...
117
1,565
Metrics
Total Citations117
Total Downloads1,565
Last 12 Months30
Last 6 weeks1
Get Access
article
May 2001
Compiler-based I/O prefetching for out-of-core applications
ACM Transactions on Computer Systems (TOCS), Volume 19, Issue 2Pages 111–170https://doi.org/10.1145/377769.377774

Current operating systems offer poor performance when a numeric application's working set does not fit in main memory. As a result, programmers who wish to solve “out-of-core” problems efficiently are typically faced with the onerous task of rewriting ...
53
1,321
Metrics
Total Citations53
Total Downloads1,321
Last 12 Months14
Last 6 weeks1
Get Access
article
February 2001
Architectural and compiler support for effective instruction prefetching: a cooperative approach
ACM Transactions on Computer Systems (TOCS), Volume 19, Issue 1Pages 71–109https://doi.org/10.1145/367742.367786

Instruction cache miss latency is becoming an increasingly important performance bottleneck, especially for commercial applications. Although instruction prefetching is an attractive technique for tolerating this latency, we find that existing ...
22
889
Metrics
Total Citations22
Total Downloads889
Last 12 Months6
Last 6 weeks0
Get Access
article
Free
November 1999
Effective fine-grain synchronization for automatically parallelized programs using optimistic synchronization primitives
- Martin C. Rinard
ACM Transactions on Computer Systems (TOCS), Volume 17, Issue 4Pages 337–371https://doi.org/10.1145/329466.329486

This article presents our experience using optimistic synchronization to implement fine-grain atomic operations in the context of a parallelizing compiler for irregular, object-based computations. Our experience shows that the synchronization ...
30
581
Metrics
Total Citations30
Total Downloads581
Last 12 Months32
Last 6 weeks5
View online with eReader
PDF
article
Free
August 1999
Ace: a language for parallel programming with customizable protocols
- Mukund Raghavachari,
- Anne Rogers
ACM Transactions on Computer Systems (TOCS), Volume 17, Issue 3Pages 202–248https://doi.org/10.1145/320656.320657

Customizing the protocols that manage accesses to different data structures within an application can improve the performance of software shared-memory programs substantially. Existing systems for using customizable protocols are hard to use directly ...
3
628
Metrics
Total Citations3
Total Downloads628
Last 12 Months39
Last 6 weeks6
View online with eReader
PDF
article
Free
May 1999
Eliminating synchronization overhead in automatically parallelized programs using dynamic feedback
- Pedro C. Diniz,
- Martin C. Rinard
ACM Transactions on Computer Systems (TOCS), Volume 17, Issue 2Pages 89–132https://doi.org/10.1145/312203.312210

This article presents dynamic feedback, a technique that enables computations to adapt dynamically to different execution environments. A compiler that uses dynamic feedback produces several different versions of the same source code; each version uses ...
9
586
Metrics
Total Citations9
Total Downloads586
Last 12 Months42
Last 6 weeks5
View online with eReader
PDF
article
Free
May 1998
Informing memory operations: memory performance feedback mechanisms and their applications
ACM Transactions on Computer Systems (TOCS), Volume 16, Issue 2Pages 170–205https://doi.org/10.1145/279227.279230

Memory latency is an important bottleneck in system performance that cannot be adequately solved by hardware alone. Several promising software techniques have been shown to address this problem successfully in specific situations. However, the ...
45
849
Metrics
Total Citations45
Total Downloads849
Last 12 Months61
Last 6 weeks7
View online with eReader
PDF
article
Free
February 1998
Tolerating latency in multiprocessors through compiler-inserted prefetching
- Todd C. Mowry
ACM Transactions on Computer Systems (TOCS), Volume 16, Issue 1Pages 55–92https://doi.org/10.1145/273011.273021

The large latency of memory accesses in large-scale shared-memory multiprocessors is a key obstacle to achieving high processor utilization. Software-controlled prefetching is a technique for tolerating memory latency by explicitly executing ...
59
1,036
Metrics
Total Citations59
Total Downloads1,036
Last 12 Months83
Last 6 weeks7
View online with eReader
PDF
article
Free
February 1998
Performance evaluation of the Orca shared-object system
ACM Transactions on Computer Systems (TOCS), Volume 16, Issue 1Pages 1–40https://doi.org/10.1145/273011.273014

Orca is a portable, object-based distributed shared memory (DSM) system. This article studies and evaluates the design choices made in the Orca system and compares Orca with other DSMs. The article gives a quantitative analysis of Orca's coherence ...
103
808
Metrics
Total Citations103
Total Downloads808
Last 12 Months98
Last 6 weeks10
View online with eReader
PDF

Applied Filters

People

Names

Institutions

Authors

Reviewers

Publications

All Publications

Content Type

Media Formats

Publisher

Publication Date

Modeling the Interplay between Loop Tiling and Fusion in Optimizing Compilers Using Affine Relations

A Retargetable System-level DBT Hypervisor

Software Prefetching for Indirect Memory Accesses: A Microarchitectural Perspective

Assisting Static Compiler Vectorization with a Speculative Dynamic Vectorizer in an HW/SW Codesigned Environment

SKMD: Single Kernel on Multiple Devices for Transparent CPU-GPU Collaboration

Scaling Performance via Self-Tuning Approximation for Graphics Engines

Incrementally parallelizing database transactions with thread-level speculation

The STAMPede approach to thread-level speculation

A study of source-level compiler algorithms for automatic construction of pre-execution code

A general framework for prefetch scheduling in linked data structures and its application to multi-chain prefetching

Run-time support for distributed sharing in safe languages

Secure program partitioning

Compiler-based I/O prefetching for out-of-core applications

Architectural and compiler support for effective instruction prefetching: a cooperative approach

Effective fine-grain synchronization for automatically parallelized programs using optimistic synchronization primitives

Ace: a language for parallel programming with customizable protocols

Eliminating synchronization overhead in automatically parallelized programs using dynamic feedback

Informing memory operations: memory performance feedback mechanisms and their applications

Tolerating latency in multiprocessors through compiler-inserted prefetching

Performance evaluation of the Orca shared-object system