Scott Mahlke

Abstract: An apparatus and method capable of reducing idle resources in a multicore device and improving the use of available resources in the multicore device are provided. The apparatus includes a static scheduling unit configured to... more

Abstract: An apparatus and method capable of reducing idle resources in a multicore device and improving the use of available resources in the multicore device are provided. The apparatus includes a static scheduling unit configured to generate one or more task groups, and to allocate the task groups to virtual cores by dividing or combining the tasks included in the task groups based on the execution time estimates of the task groups.

Publication Date: Jun 20, 2012

S. Abraham G. Adams A. Agarwal D. Agrawal G. Alvarez R. Alverson C. Amza C. Anderson M. Annavaram J. Archibald J. Arora K. Asanovic T. Austin D. Bacon S. Bagchi N. Bagerzadeh H. Bal S. Banerjia M. Banikazemi L. Barroso S. Basu J. Bennett... more

S. Abraham G. Adams A. Agarwal D. Agrawal G. Alvarez R. Alverson C. Amza C. Anderson M. Annavaram J. Archibald J. Arora K. Asanovic T. Austin D. Bacon S. Bagchi N. Bagerzadeh H. Bal S. Banerjia M. Banikazemi L. Barroso S. Basu J. Bennett D. Bhandarkar R. Bianchini A. Bilas B. Black D. Blough K. Bolding R. Boppana S. Breach M. Brorsson E. Bugnion D. Burger B. Calder C. Cascaval J. Cavallaro D. Chaiken PY. Chang P. Cao J. Chapin M. Charney B. Chen CH. Chen J. Chen TF. Chen W. Chen YK. Chen TC. Chiueh S. Cho F. Chong P. Chou

Journal Name: Talk or presentation, GSRC Annual Symposium

Publication Date: 2007

An integrated circuit is provided with latency detecting circuitry for detecting signal generation latency within one or more functional circuits and in response thereto to generate a wearout response. The wearout response can take a... more

An integrated circuit is provided with latency detecting circuitry for detecting signal generation latency within one or more functional circuits and in response thereto to generate a wearout response. The wearout response can take a variety of different forms such as reducing the operating frequency, increasing the operating voltage, operating task allocation within a multiprocessor system, manufacturing test binning and other wearout responses.

Publication Date: Jul 27, 2007

Page 1.

Publication Date: Mar 11, 2008

[57] ABSTRACT A compiler includes a branch statistics data analyzer to analyze branch statistics data of a branch instruction to construct a branch predictor function for the branch instruction. A branch prediction instruction generator... more

[57] ABSTRACT A compiler includes a branch statistics data analyzer to analyze branch statistics data of a branch instruction to construct a branch predictor function for the branch instruction. A branch prediction instruction generator is coupled to the branch statistics data analyzer to generate at least one prediction instruction to implement the branch predictor function. A main compiling engine is coupled to the branch prediction instruction generator to insert the prediction instruction before the branch instruction.

Publication Date: Jan 5, 1999

Publication Date: 2012

Journal Name: Talk or presentation, GSRC Workshop

Publication Date: Mar 2007

Journal Name: Computers, IEEE Transactions on

Publication Date: Jan 2011

We set a goal of five reviews for each paper and largely met this goal. Most papers received five reviews and none received fewer than four, for an average of 4.97 reviews per paper. Each paper was reviewed by three committee members and... more

We set a goal of five reviews for each paper and largely met this goal. Most papers received five reviews and none received fewer than four, for an average of 4.97 reviews per paper. Each paper was reviewed by three committee members and two external reviewers. For each paper, one of the external reviews was assigned by us and the other was assigned by a committee member. We made every effort to assign papers to committee members and external reviewers with matching interests and research areas.

A data processing system is provided having a processor and analysing circuitry for identifying a SIMD instruction associated with a first SIMD instruction set and replacing it by a functionally-equivalent scalar representation and... more

A data processing system is provided having a processor and analysing circuitry for identifying a SIMD instruction associated with a first SIMD instruction set and replacing it by a functionally-equivalent scalar representation and marking that functionally-equivalent scalar representation.

Publication Date: Sep 27, 2007

Journal Name: Computers, IEEE Transactions on

Publication Date: Aug 2005

Journal Name: Discrete Event Dynamic Systems

Publication Date: 2011

Abstract Application-specific extensions to the computational capabilities of a processor provide an efficient mechanism to meet the growing performance and power demands of embedded applications. Hardware, in the form of new function... more

Abstract Application-specific extensions to the computational capabilities of a processor provide an efficient mechanism to meet the growing performance and power demands of embedded applications. Hardware, in the form of new function units (or coprocessors), and the corresponding instructions are added to a baseline processor to meet the critical computational demands of a target application. In this paper, the design of a system to automate the instruction set customization process is presented.

Journal Name: Computers, IEEE Transactions on

Publication Date: Oct 2005

An instruction scheduling method and a processor using an instruction scheduling method are provided. The instruction scheduling method includes selecting a first instruction that has a highest priority from a plurality of instructions,... more

An instruction scheduling method and a processor using an instruction scheduling method are provided. The instruction scheduling method includes selecting a first instruction that has a highest priority from a plurality of instructions, and allocating the selected first instruction and a first time slot to one of the functional units, allocating a second instruction and a second time slot to one of the functional units, wherein the second instruction is dependent on the first instruction.

Publication Date: Mar 20, 2008

Abstract Transient faults are emerging as a critical reliability concern in modern microprocessors. Redundant hardware solutions are commonly deployed to detect transient faults, but they are less flexible and cost-effective than software... more

Abstract Transient faults are emerging as a critical reliability concern in modern microprocessors. Redundant hardware solutions are commonly deployed to detect transient faults, but they are less flexible and cost-effective than software solutions. However, software solutions are rendered impractical because of high performance overheads. To address this problem, this paper presents Runtime Asynchronous Fault Tolerance via Speculation (RAFT), the fastest transient fault detection technique known to date.

Publication Date: Mar 31, 2012

Provided are a computing apparatus and method based on SIMD architecture capable of supporting various SIMD widths without wasting resources. The computing apparatus includes a plurality of configurable execution cores (CECs) that have a... more

Provided are a computing apparatus and method based on SIMD architecture capable of supporting various SIMD widths without wasting resources. The computing apparatus includes a plurality of configurable execution cores (CECs) that have a plurality of execution modes, and a controller for detecting a loop region from a program, determining a Single Instruction Multiple Data (SIMD) width for the detected loop region, and determining an execution mode of the processor according to the determined SIMD width.

Publication Date: Jul 8, 2011

Publication Date: Feb 12, 2011

Journal Name: Talk or presentation, GSRC/C2S2 Microarchitecture Workshop

Publication Date: Feb 2007

Page 1. ix Program Committee Nader Bagherzadeh, University of California, Irvine Luc Bouge, LIP, ENS Lyon, France Bruce Childers, University of Pittsburgh Jong Choi, IBM TJ Watson Research Center Michel Cosnard, LORIA-INRIA, France Jack W.

Journal Name: Poster, GSRC Annual Symposium

Publication Date: 2009

Abstract In high-end embedded systems, coarse-grained reconfigurable architectures (CGRA) continue to replace traditional ASIC designs. CGRAs offer high performance at a low power consumption, yet provide flexibility through... more

Abstract In high-end embedded systems, coarse-grained reconfigurable architectures (CGRA) continue to replace traditional ASIC designs. CGRAs offer high performance at a low power consumption, yet provide flexibility through programmability. In this paper we introduce a recurrence cycle-aware scheduling technique for CGRAs. Our modulo scheduler groups operations belonging to a recurrence cycle into a clustered node and then computes a scheduling order for those clustered nodes.

Publication Date: Jun 19, 2009

An automated design system for VLIW processors explores a parameterized design space to assist in identifying candidate processor designs that satisfy desired design constraints, such as processor cost and performance. A VLIW synthesis... more

An automated design system for VLIW processors explores a parameterized design space to assist in identifying candidate processor designs that satisfy desired design constraints, such as processor cost and performance. A VLIW synthesis process takes as input a specification of processor parameters and synthesizes a datapath specification, an instruction format design, and a control path specification. The synthesis process also extracts a machine description suitable to re-target a compiler.

Publication Date: Jun 18, 2002

An information processor for executing a program comprising a plurality of separate program instructions is provided. The processor comprises processing logic operable to individually execute said separate program instructions of said... more

An information processor for executing a program comprising a plurality of separate program instructions is provided. The processor comprises processing logic operable to individually execute said separate program instructions of said program, an operand store operable to store operand values and an accelerator having a plurality of functional units.

Publication Date: Jan 8, 2008

Journal Name: Talk or presentation, GSRC Quarterly Workshop, San Francisco

Publication Date: Jul 2006

An accelerator 120 is tightly coupled to the normal execution unit 110. The operand store, which could be a register file 130, a stack based operand store or other operand store is shared by the execution unit and the accelerator unit.... more

An accelerator 120 is tightly coupled to the normal execution unit 110. The operand store, which could be a register file 130, a stack based operand store or other operand store is shared by the execution unit and the accelerator unit. Operands may also be accessed as immediate values within the instructions themselves.

Publication Date: Mar 25, 2008

Abstract Coarse-Grained Reconfigurable Array (CGRA) processors accelerate inner loops of applications by exploiting instructionlevel parallelism (ILP) and in some cases also data-level and task-level parallelism (DLP & TLP). The aim of... more

Abstract Coarse-Grained Reconfigurable Array (CGRA) processors accelerate inner loops of applications by exploiting instructionlevel parallelism (ILP) and in some cases also data-level and task-level parallelism (DLP & TLP). The aim of this tutorial is to give insight in CGRA architectures and their compilation techniques to exploit parallelism.

Publication Date: Oct 24, 2010

Journal Name: Ann Arbor

Publication Date: Dec 2007

A system is provided which simplifies and speeds up the process of designing a computer system by evaluating the components of the memory hierarchy for any member of a broad family of processors in an application-specific manner. The... more

A system is provided which simplifies and speeds up the process of designing a computer system by evaluating the components of the memory hierarchy for any member of a broad family of processors in an application-specific manner. The system uses traces produced by a reference processor in the design space for a particular cache design and characterizes the differences in behavior between the reference processor and an arbitrarily chosen processor.

Publication Date: Aug 5, 2003

Publication Date: Dec 10, 2012

Publication Date: Jun 28, 2010

Journal Name: Talk or presentation, GSRC Annual Symposium

Publication Date: 2007

Publication Date: Mar 20, 2004

Journal Name: Talk or presentation, GSRC Resilient Theme Workshop

Publication Date: Dec 2006

Abstract The rapid advancements in the computational capabilities of the graphics processing unit (GPU) and the deployment of general programming models for these devices have made the vision of a desktop supercomputer a reality. It is... more

Abstract The rapid advancements in the computational capabilities of the graphics processing unit (GPU) and the deployment of general programming models for these devices have made the vision of a desktop supercomputer a reality. It is now possible to assemble a system that provides TFLOPs of performance on scientific applications for the cost of a high-end laptop. While these devices have clearly changed the landscape of computing, there are two central problems that arise.

Publication Date: Jun 12, 2012

Abstract To efficiently schedule superscalar and superpipelined processors, it is necessary to move instructions across branches. This requires increasing the scheduling scope beyond the basic block. Superblock scheduling, a static... more

Abstract To efficiently schedule superscalar and superpipelined processors, it is necessary to move instructions across branches. This requires increasing the scheduling scope beyond the basic block. Superblock scheduling, a static scheduling method, is a variant of trace scheduling that removes the bookkeeping complexity associated with branches into a trace by removing these entrances using a method called tail duplication.

Publication Date: 1991

Alex Alet�� Gheorghe Almasi Erik Altman David August Eduard Ayguad�� Rosa M. Bad��a Ivan Baev Ron Barnes Rastislav Bodik Mike Boucher Ian Bratt Preston Briggs Brad Calder Steve Carr Calin Cascaval Deepak Chandra Ben Cheng Bruce Childers... more

Alex Alet�� Gheorghe Almasi Erik Altman David August Eduard Ayguad�� Rosa M. Bad��a Ivan Baev Ron Barnes Rastislav Bodik Mike Boucher Ian Bratt Preston Briggs Brad Calder Steve Carr Calin Cascaval Deepak Chandra Ben Cheng Bruce Childers Michael Chu Nathan Clark Josep M.

Publication Date: Jun 20, 2012

Journal Name: Talk or presentation, GSRC Annual Symposium

Publication Date: 2007

Publication Date: Jul 27, 2007

Publication Date: Mar 11, 2008

Publication Date: Jan 5, 1999

Publication Date: 2012

Journal Name: Talk or presentation, GSRC Workshop

Publication Date: Mar 2007

Journal Name: Computers, IEEE Transactions on

Publication Date: Jan 2011

Publication Date: Sep 27, 2007

Journal Name: Computers, IEEE Transactions on

Publication Date: Aug 2005

Journal Name: Discrete Event Dynamic Systems

Publication Date: 2011

Journal Name: Computers, IEEE Transactions on

Publication Date: Oct 2005

Publication Date: Mar 20, 2008

Publication Date: Mar 31, 2012

Publication Date: Jul 8, 2011

Publication Date: Feb 12, 2011

Journal Name: Talk or presentation, GSRC/C2S2 Microarchitecture Workshop

Publication Date: Feb 2007

Journal Name: Poster, GSRC Annual Symposium

Publication Date: 2009

Publication Date: Jun 19, 2009

Publication Date: Jun 18, 2002

Publication Date: Jan 8, 2008

Journal Name: Talk or presentation, GSRC Quarterly Workshop, San Francisco

Publication Date: Jul 2006

Publication Date: Mar 25, 2008

Publication Date: Oct 24, 2010

Journal Name: Ann Arbor

Publication Date: Dec 2007

Publication Date: Aug 5, 2003

Publication Date: Dec 10, 2012

Publication Date: Jun 28, 2010

Journal Name: Talk or presentation, GSRC Annual Symposium

Publication Date: 2007

Publication Date: Mar 20, 2004

Journal Name: Talk or presentation, GSRC Resilient Theme Workshop

Publication Date: Dec 2006

Publication Date: Jun 12, 2012

Publication Date: 1991

Journal Name: Talk or presentation, GSRC Annual Symposium

Publication Date: 2009

Publication Date: Oct 10, 2011

Log In