Alex Nicolau

University of California, Irvine, Donald Bren School of Information and Computer Science, Faculty Member

Followers

Following

Public Views

Interests

Uploads

Papers by Alex Nicolau

N-dimensional perfect pipelining

Download

COPPER: Compiler-Controlled On-Demand Approach to Power-Efficient Computing

Download

SPARK: A Parallelizing Approach to the High-Level Synthesis of Digital Circuits

... DIGITAL CIRCUITS SUMIT GUPTA Center for Embedded Computer Systems University of California Sa... more

Optimal loop parallelization

ACM SIGPLAN Notices

Download

Compile time vs. runtime: scheduling parallelism on dataflow machines

Fourth IEEE Region 10 International Conference TENCON

ABSTRACT

A hypergraph-based model for port allocation on multiple-register-file VLIW architectures

International Journal of Parallel Programming, 1995

Multiple-functional-unit architectures allow one to boost performance by simultaneously executing... more Multiple-functional-unit architectures allow one to boost performance by simultaneously executing many operations, but technological constraints limit the achievable register-file I/O bandwith and prevent one from fully exploiting the benefits of a large number of units. Dividing the register set into multiple banks can improve the overall I/O bandwidth but determines a nonhomogeneous register space onto which variables must be allocated subject to register-file-port constraining. We propose a hypergraph-based paradigm for modeling competition among variables for port-allocation on multiple-register-file VLIW architectures; by coloring such a hypergraph, we can identify legal allocations of variables to register banks and produce executable code.

SPARK: Implementation, Usage and Synthesis Scripts

SPARK: A Parallelizing Approach to the High-Level Synthesis of Digital Circuits, 2000

Using an oracle to measure potential parallelism in single instruction stream programs

ACM SIGMICRO Newsletter, 1981

Horizontally microprogrammable CPUs belong to a class of machines having statically schedulable p... more Horizontally microprogrammable CPUs belong to a class of machines having statically schedulable parallel instruction execution (SPIE machines). Several experiments have shown that within basic blocks, real code only gives a potential speed-up factor of 2 or 3 when compacted for SPIE machines, even in the presence of unlimited hardware. In this paper, similar experiments are described. However, these measure the

Speedup of band linear recurrences in the presence of resource constraints

Proceedings of the 6th international conference on Supercomputing - ICS '92, 1992

Incremental tree height reduction for high level synthesis

Proceedings of the 28th conference on ACM/IEEE design automation conference - DAC '91, 1991

A new local and incremental Tree Height Reduction (THR) technique for parallelization of applicat... more A new local and incremental Tree Height Reduction (THR) technique for parallelization of application programs is presented. Although THR was introduced many years ago-it haz not been widely used in HLS scheduling systems. The two main reasons for that were the inability of most systems to compact beyond basic blocks of the program, thus limiting the strength of THR and

Network topology exploration of mesh-based coarse-grain reconfigurable architectures

Proceedings Design, Automation and Test in Europe Conference and Exhibition, 2000

Download

Loop shifting and compaction for the high-level synthesis of designs with complex control flow

Proceedings Design, Automation and Test in Europe Conference and Exhibition, 2000

Download

Dynamically increasing the scope of code motions during the high-level synthesis of digital circuits

IEE Proceedings - Computers and Digital Techniques, 2003

Download

SPARK: a high-level synthesis framework for applying parallelizing compiler transformations

16th International Conference on VLSI Design, 2003. Proceedings., 2000

Download

Intererence analysis tools for parallelizing programs with recursive data structures

Proceedings of the 3rd international conference on Supercomputing - ICS '89, 1989

ABSTRACT

Caching values in the load store queue

The IEEE Computer Society's 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems, 2004. (MASCOTS 2004). Proceedings., 2004

Mutation scheduling: A unified approach to compiling for fine-grain parallelism

Lecture Notes in Computer Science, 1995

. Trade-offs between code selection, register allocation, and instructionscheduling are inherentl... more . Trade-offs between code selection, register allocation, and instructionscheduling are inherently interdependent, especially when compiling forfine-grain parallel architectures. However, the conventional approach to compilingfor such machines arbitrarily separates these phases so that decisions madeduring any one phase place unnecessary constraints on the remaining phases.Mutation Scheduling attempts to solve this problem by combining code selection,register allocation, and instruction...

Resource-Directed Loop Pipelining

Lecture Notes in Computer Science, 1997

A hierarchical approach to instruction-level parallelization

International Journal of Parallel Programming, 1995

In this paper we extend Percolation Scheduling (PS) to navigate through a hierarchical version of... more In this paper we extend Percolation Scheduling (PS) to navigate through a hierarchical version of the Control Flow Graph (CFG) representation of a VLIW program. This extension retains the completeness of PS by allowing the “normal” PS transformations to be applied incrementally between adjacent instructions but also enablesnonincremental code motions across arbitrarily large single-entry/single-exit regions of code in constant time. Such nonincremental transformations eliminate the useless code explosions that would otherwise be caused by using incremental transformations to move operations through regions containing multiple control paths and, in conjunction with the hierarchical representation of the CFG, provide a framework for trading offuseful code explosions for increases in parallelism. Simulation results comparing nonincremental with incremental PS are presented.

Trailblazing: A Hierarchical Approach to Percolation Scheduling

1993 International Conference on Parallel Processing - ICPP'93 Vol2, 1993

: Percolation Scheduling (PS) is a system for performingparallelizing transformations for the VLI... more : Percolation Scheduling (PS) is a system for performingparallelizing transformations for the VLIW and superscalarcomputation models. PS has various useful properties,such as completeness with respect to local transformations, andappears to be an effective means of exploiting instruction levelparallelism. However, compilers based on PS typically sufferfrom inefficiencies caused by the incremental application of PStransformations and significant code explosion. In this paperwe

N-dimensional perfect pipelining

Download

COPPER: Compiler-Controlled On-Demand Approach to Power-Efficient Computing

Download

SPARK: A Parallelizing Approach to the High-Level Synthesis of Digital Circuits

... DIGITAL CIRCUITS SUMIT GUPTA Center for Embedded Computer Systems University of California Sa... more

Optimal loop parallelization

ACM SIGPLAN Notices

Download

Compile time vs. runtime: scheduling parallelism on dataflow machines

Fourth IEEE Region 10 International Conference TENCON

ABSTRACT

A hypergraph-based model for port allocation on multiple-register-file VLIW architectures

International Journal of Parallel Programming, 1995

SPARK: Implementation, Usage and Synthesis Scripts

SPARK: A Parallelizing Approach to the High-Level Synthesis of Digital Circuits, 2000

Using an oracle to measure potential parallelism in single instruction stream programs

ACM SIGMICRO Newsletter, 1981

Speedup of band linear recurrences in the presence of resource constraints

Proceedings of the 6th international conference on Supercomputing - ICS '92, 1992

Incremental tree height reduction for high level synthesis

Proceedings of the 28th conference on ACM/IEEE design automation conference - DAC '91, 1991

Network topology exploration of mesh-based coarse-grain reconfigurable architectures

Proceedings Design, Automation and Test in Europe Conference and Exhibition, 2000

Download

Loop shifting and compaction for the high-level synthesis of designs with complex control flow

Proceedings Design, Automation and Test in Europe Conference and Exhibition, 2000

Download

Dynamically increasing the scope of code motions during the high-level synthesis of digital circuits

IEE Proceedings - Computers and Digital Techniques, 2003

Download

SPARK: a high-level synthesis framework for applying parallelizing compiler transformations

16th International Conference on VLSI Design, 2003. Proceedings., 2000

Download

Intererence analysis tools for parallelizing programs with recursive data structures

Proceedings of the 3rd international conference on Supercomputing - ICS '89, 1989

ABSTRACT

Caching values in the load store queue

The IEEE Computer Society's 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems, 2004. (MASCOTS 2004). Proceedings., 2004

Mutation scheduling: A unified approach to compiling for fine-grain parallelism

Lecture Notes in Computer Science, 1995

Resource-Directed Loop Pipelining

Lecture Notes in Computer Science, 1997

A hierarchical approach to instruction-level parallelization

International Journal of Parallel Programming, 1995

Trailblazing: A Hierarchical Approach to Percolation Scheduling

1993 International Conference on Parallel Processing - ICPP'93 Vol2, 1993

Alex Nicolau

Uploads

Papers by Alex Nicolau

Log In