Keyword: instruction-level parallelism : Search

Applied Filters

People

Publications

Conferences

Reproducibility Badges

Publication Date

94 Results for: Keyword: instruction-level parallelismEdit SearchSave SearchRSS

Searched The ACM Guide to Computing Literature (3,723,099 records)|Limit your search to The ACM Full-Text Collection (748,249 records)

Showing 1 - 20of94 Results

Filters

Select All

Export Citations Save to Binder

per page:

Relevance

research-article
January 2024
Artifacts Available / v1.1
FDRA: A Framework for a Dynamically Reconfigurable Accelerator Supporting Multi-Level Parallelism
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 17, Issue 1Article No.: 4, Pages 1–26https://doi.org/10.1145/3614224
Coarse-grained reconfigurable architectures (CGRAs) have emerged as promising accelerators due to their high flexibility and energy efficiency. However, existing open source works often lack integration of CGRAs with CPU systems and corresponding ...
0
543
Metrics
Total Citations0
Total Downloads543
Last 12 Months543
Last 6 weeks33
Get Access
research-article
Public Access
February 2023
A Sound and Complete Algorithm for Code Generation in Distance-Based ISA
CC 2023: Proceedings of the 32nd ACM SIGPLAN International Conference on Compiler ConstructionFebruary 2023, Pages 73–84https://doi.org/10.1145/3578360.3580263

The single-thread performance of a processor core is essential even in the multicore era. However, increasing the processing width of a core to improve the single-thread performance leads to a super-linear increase in power consumption. To overcome ...
0
213
Metrics
Total Citations0
Total Downloads213
Last 12 Months125
Last 6 weeks8
View online with eReader
PDF
research-article
Free
March 2017
A mechanism for energy-efficient reuse of decoding and scheduling of x86 instruction streams
- Marcelo Brandalero,
- Antonio Carlos S. Beck
DATE '17: Proceedings of the Conference on Design, Automation & Test in EuropeMarch 2017, Pages 1472–1477

Current superscalar x86 processors decompose each CISC instruction (variable-length and with multiple addressing modes) into multiple RISC-like μops at runtime so they can be pipelined and scheduled for concurrent execution. This challenging and power-...
1
101
Metrics
Total Citations1
Total Downloads101
Last 12 Months20
Last 6 weeks5
View online with eReader
PDF
research-article
Open Access
December 2015
Integer Linear Programming-Based Scheduling for Transport Triggered Architectures
ACM Transactions on Architecture and Code Optimization (TACO), Volume 12, Issue 4Article No.: 59, Pages 1–22https://doi.org/10.1145/2845082

Static multi-issue machines, such as traditional Very Long Instructional Word (VLIW) architectures, move complexity from the hardware to the compiler. This is motivated by the ability to support high degrees of instruction-level parallelism without ...
5
659
Metrics
Total Citations5
Total Downloads659
Last 12 Months84
Last 6 weeks6
1
Supplementary Material
TACO1204-59
View online with eReader
PDF
research-article
Open Access
August 2015
Revisiting Clustered Microarchitecture for Future Superscalar Cores: A Case for Wide Issue Clusters
ACM Transactions on Architecture and Code Optimization (TACO), Volume 12, Issue 3Article No.: 28, Pages 1–22https://doi.org/10.1145/2800787

During the past 10 years, the clock frequency of high-end superscalar processors has not increased. Performance keeps growing mainly by integrating more cores on the same chip and by introducing new instruction set extensions. However, this benefits ...
2
760
Metrics
Total Citations2
Total Downloads760
Last 12 Months129
Last 6 weeks15
1
Supplementary Material
TACO1203-28
View online with eReader
PDF
Upcoming Conferences
Skip slideshow

ISLPED '24

August 5 - 7, 2024

Hyatt Regency Newport Beach, Newport Beach, CA, USA

ISLPED '24 Website

HT '24

September 10 - 13, 2024

Adam Mickiewicz University, Poznan, Poland

HT '24 Website

ESWEEK '24

September 29 - October 4, 2024

Sheraton Raleigh, Raleigh, NC, USA

ESWEEK '24 Website

PACT '24

October 14 - 16, 2024

Hilton Long Beach, Long Beach, CA, USA

PACT '24 Website

MICRO '24

November 2 - 6, 2024

TBD, Austin, TX, USA

ASPLOS '25

March 30 - April 3, 2025

World Trade Center, Rotterdam, Netherlands

ASPLOS '25 Website

ISCA '25

June 21 - 25, 2025

Waseda University & RIHGA Royal Hotel Tokyo, Tokyo, Japan

DAC '25

June 22 - 26, 2025

Moscone Center, San Francisco, CA, USA

DAC '25 Website
research-article
May 2015
An instrumentation approach for hardware-agnostic software characterization
CF '15: Proceedings of the 12th ACM International Conference on Computing FrontiersMay 2015, Article No.: 3, Pages 1–8https://doi.org/10.1145/2742854.2742859

Simulators and empirical profiling data are often used to understand how suitable a specific hardware architecture is for an application. However, simulators can be slow, and empirical profiling-based methods can only provide insights about the existing ...
8
320
Metrics
Total Citations8
Total Downloads320
Last 12 Months37
Last 6 weeks13
Get Access
Article
November 2014
Potential of Using a Reconfigurable System on a Superscalar Core for ILP Improvements
- Marcelo Brandalero,
- Antonio Carlos S. Beck
SBESC '14: Proceedings of the 2014 Brazilian Symposium on Computing Systems EngineeringNovember 2014, Pages 43–48https://doi.org/10.1109/SBESC.2014.19

As technology scaling reduces pace and energy efficiency becomes a new important design constraint, superscalar processor designs seem to be reaching their performance limits under the area and power constraints. As a result, new architectural paradigms ...
0
Metrics
Total Citations0
research-article
Free
August 2014
Warp-aware trace scheduling for GPUs
PACT '14: Proceedings of the 23rd international conference on Parallel architectures and compilationAugust 2014, Pages 163–174https://doi.org/10.1145/2628071.2628101

GPU performance depends not only on thread/warp level parallelism (TLP) but also on instruction-level parallelism (ILP). It is not enough to schedule instructions within basic blocks, it is also necessary to exploit opportunities for ILP optimization ...
14
734
Metrics
Total Citations14
Total Downloads734
Last 12 Months92
Last 6 weeks21
View online with eReader
PDF
research-article
December 2013
MLP-aware dynamic instruction window resizing for adaptively exploiting both ILP and MLP
MICRO-46: Proceedings of the 46th Annual IEEE/ACM International Symposium on MicroarchitectureDecember 2013, Pages 37–48https://doi.org/10.1145/2540708.2540713

It is difficult to improve the single-thread performance of a processor in memory-intensive programs because processors have hit the memory wall, i.e., the large speed discrepancy between the processors and the main memory. Exploiting memory-level ...
21
575
Metrics
Total Citations21
Total Downloads575
Last 12 Months28
Last 6 weeks2
Get Access
research-article
October 2013
A unified view of non-monotonic core selection and application steering in heterogeneous chip multiprocessors
PACT '13: Proceedings of the 22nd international conference on Parallel architectures and compilation techniquesOctober 2013, Pages 133–144

A single-ISA heterogeneous chip multiprocessor (HCMP) is an attractive substrate to improve single-thread performance and energy efficiency in the dark silicon era. We consider HCMPs comprised of non-monotonic core types where each core type is ...
8
399
Metrics
Total Citations8
Total Downloads399
Last 12 Months2
Last 6 weeks2
Get Access
research-article
September 2013
Software thread integration for instruction-level parallelism
- Won So,
- Alexander G. Dean
ACM Transactions on Embedded Computing Systems (TECS), Volume 13, Issue 1Article No.: 8, Pages 1–23https://doi.org/10.1145/2512466

Multimedia applications require a significantly higher level of performance than previous workloads of embedded systems. They have driven digital signal processor (DSP) makers to adopt high-performance architectures like VLIW (Very-Long Instruction Word)...
0
304
Metrics
Total Citations0
Total Downloads304
Last 12 Months9
Last 6 weeks0
Get Access
Article
December 2012
An Address-Based Compiling Optimization for FFT on Multi-cluster DSP
- Dongpeng Xu,
- Qilong Zheng
PAAP '12: Proceedings of the 2012 Fifth International Symposium on Parallel Architectures, Algorithms and ProgrammingDecember 2012, Pages 60–64https://doi.org/10.1109/PAAP.2012.17

This paper presents a compiling optimization for FFT program on multi-cluster DSP based on analysis of memory address. We transform the loops in order to reduce the number of instructions in the innermost loop. The interrelationship between each two ...
0
Metrics
Total Citations0
article
May 2012
FabScalar: Automating Superscalar Core Design
IEEE Micro (IMIC), Volume 32, Issue 3May 2012, Pages 48–59https://doi.org/10.1109/MM.2012.23

Providing multiple superscalar core types on a chip, each tailored to different classes of instruction-level behavior, is an exciting direction for increasing processor performance and energy efficiency. Unfortunately, processor design and verification ...
9
Metrics
Total Citations9
Article
February 2012
On Optimizing the Longest Common Subsequence Problem by Loop Unrolling Along Wavefronts
- Johann Steinbrecher,
- Weijia Shang
PDP '12: Proceedings of the 2012 20th Euromicro International Conference on Parallel, Distributed and Network-based ProcessingFebruary 2012, Pages 603–611https://doi.org/10.1109/PDP.2012.49

Loop unrolling is a loop transformation where a few loop iterations are grouped as a super iteration for exploring more independent instructions and to decrease the total loop overhead. This paper characterizes loop unrolling by the unrolling factor, ...
0
Metrics
Total Citations0
research-article
November 2011
Efficient Spilling Reduction for Software Pipelined Loops in Presence of Multiple Register Types in Embedded VLIW Processors
ACM Transactions on Embedded Computing Systems (TECS), Volume 10, Issue 4Article No.: 47, Pages 1–25https://doi.org/10.1145/2043662.2043671

Integrating register allocation and software pipelining of loops is an active research area. We focus on techniques that precondition the dependence graph before software pipelining in order to ensure that no register spill instructions are inserted by ...
2
219
Metrics
Total Citations2
Total Downloads219
Last 12 Months2
Last 6 weeks0
Get Access
research-article
June 2011
Parallelism and data movement characterization of contemporary application classes
- Victoria Caparrós Cabezas,
- Phillip Stanley-Marbell
SPAA '11: Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architecturesJune 2011, Pages 95–104https://doi.org/10.1145/1989493.1989506

This paper presents a framework for characterizing the distribution of fine-grained parallelism, data movement, and communication-minimizing code partitions. Understanding the spectrum of parallelism available in applications, and how much data movement ...
9
677
Metrics
Total Citations9
Total Downloads677
Last 12 Months3
Last 6 weeks1
Get Access
research-article
May 2011
Quantitative analysis of parallelism and data movement properties across the Berkeley computational motifs
- Victoria Caparrós Cabezas,
- Phillip Stanley-Marbell
CF '11: Proceedings of the 8th ACM International Conference on Computing FrontiersMay 2011, Article No.: 17, Pages 1–2https://doi.org/10.1145/2016604.2016625

This work presents the first thorough quantitative study of the available instruction-level parallelism, basic-block-granularity thread parallelism, and data movement, across the Berkeley dwarfs/computational motifs. Although this classification was ...
1
195
Metrics
Total Citations1
Total Downloads195
Last 12 Months2
Last 6 weeks0
Get Access
research-article
February 2011
Computing Floating-Point Square Roots via Bivariate Polynomial Evaluation
IEEE Transactions on Computers (ITCO), Volume 60, Issue 2February 2011, Pages 214–227https://doi.org/10.1109/TC.2010.152

In this paper, we show how to reduce the computation of correctly rounded square roots of binary floating-point data to the fixed-point evaluation of some particular integer polynomials in two variables. By designing parallel and accurate evaluation ...
5
Metrics
Total Citations5
article
January 2011
A scheduling approach for distributed resource architectures with scarce communication resources
- Akira Hatanaka,
- Nader Bagherzadeh
International Journal of High Performance Systems Architecture (IJHPSA), Volume 3, Issue 1January 2011, Pages 12–22https://doi.org/10.1504/IJHPSA.2011.038054

Advances in semiconductor fabrication technology will continue to enable exponential increase in the number of transistors available. However, conventional architectures, such as superscalars or VLIWs, will not be able to use the abundant on-chip ...
1
Metrics
Total Citations1
research-article
October 2010
Mighty-morphing power-SIMD
CASES '10: Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systemsOctober 2010, Pages 67–76https://doi.org/10.1145/1878921.1878934

In modern wireless devices, two broad classes of compute-intensive applications are common: those with high amounts of data-level parallelism, such as signal processing used in wireless baseband applications, and those that have little data-level ...
10
231
Metrics
Total Citations10
Total Downloads231
Last 12 Months0
Last 6 weeks0
Get Access