Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- ArticleDecember 1997
Resource-sensitive profile-directed data flow analysis for code optimization
MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on MicroarchitecturePages 358–368Instruction schedulers employ code motion as a means of instruction reordering to enable scheduling of instructions at points where the resources required for their execution are available. In addition, driven by the profiling data, schedulers take ...
- ArticleDecember 1997
Cache sensitive modulo scheduling
MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on MicroarchitecturePages 338–348This paper focuses on the interaction between software prefetching (both binding and nonbinding) and software pipelining for VLIW machines. First, it is shown that evaluating software pipelined schedules without considering memory effects can be rather ...
- ArticleDecember 1997
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on MicroarchitecturePages 330–335Significant advances have been made in compilation technology for capitalizing on instruction-level parallelism (ILP). The vast majority of ILP compilation research has been conducted in the context of general-purpose computing, and more specifically ...
- ArticleDecember 1997
Available paralellism in video applications
MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on MicroarchitecturePages 321–329Most recent research in instruction-level parallelism has focused on general-purpose applications such as the SPEC benchmarks. Many quantitative experiments have been performed over the years measuring the impact of different execution models and ...
- ArticleDecember 1997
Predicting data cache misses in non-numeric applications through correlation profiling
MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on MicroarchitecturePages 314–320To maximize the benefit and minimize the overhead of software-based latency tolerance techniques, we would like to apply them precisely to the set of dynamic references that suffer cache misses. Unfortunately, the information provided by the state-of-...
-
- ArticleDecember 1997
Procedure placement using temporal ordering information
MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on MicroarchitecturePages 303–313Instruction cache performance is very important to instruction fetch efficiency and overall processor performance. The layout of an executable has a substantial effect on the cache miss rate during execution. This means that the performance of an ...
- ArticleDecember 1997
ProfileMe: hardware support for instruction-level profiling on out-of-order processors
MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on MicroarchitecturePages 292–302Profile data is valuable for identifying performance bottlenecks and guiding optimizations. Periodic sampling of a processor's performance monitoring hardware is an effective, unobtrusive way to obtain detailed profiles. Unfortunately, existing hardware ...
- ArticleDecember 1997
Highly accurate data value prediction using hybrid predictors
MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on MicroarchitecturePages 281–290Data dependences (data flow constraints) present a major hurdle to the amount of instruction-level parallelism that can be exploited from a program. Recent work has suggested that the limits imposed by data dependences can be overcome to some extent ...
- ArticleDecember 1997
Can program profiling support value prediction?
MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on MicroarchitecturePages 270–280This paper explores the possibility of using program profiling to enhance the efficiency of value prediction. Value prediction attempts to eliminate true-data dependencies by predicting the outcome values of instructions at run-time and executing true-...
- ArticleDecember 1997
Value profiling
MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on MicroarchitecturePages 259–269Identifying variables as invariant or constant at compile-time allows the compiler to perform optimizations including constant folding, code specialization, and partial evaluation. Some variables, which cannot be labeled as constants, may exhibit semi-...
- ArticleDecember 1997
The predictability of data values
MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on MicroarchitecturePages 248–258The predictability of data values is studied at a fundamental level. Two basic predictor models are defined: Computational predictors perform an operation on previous values to yield predicted next values. Examples we study are stride value prediction (...
- ArticleDecember 1997
Streamlining inter-operation memory communication via data dependence prediction
MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on MicroarchitecturePages 235–245We revisit memory hierarchy design viewing memory as an inter-operation communication agent. This perspective leads to the development of novel methods of performing inter-operation memory communication. We use data dependence prediction to identify and ...
- ArticleDecember 1997
Microarchitecture support for improving the performance of load target prediction
MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on MicroarchitecturePages 228–234Presents a load target prediction scheme that mitigates the impact of load latency for modern microprocessors. The scheme uses a cache-like buffer to provide the base address, offset and operand size at the instruction fetching stage of a pipeline so ...
- ArticleDecember 1997
Procedure based program compression
MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on MicroarchitecturePages 204–213Cost and power consumption are two of the most important design factors for many embedded systems, particularly consumer devices. Products such as personal digital assistants, pagers with integrated data services and smart phones have fixed performance ...
- ArticleDecember 1997
Improving code density using compression techniques
MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on MicroarchitecturePages 194–203We propose a method for compressing programs in embedded processors where instruction memory size dominates cost. A post-compilation analyzer examines a program and replaces common sequences of instructions with a single instruction codeword. A ...
- ArticleDecember 1997
The filter cache: an energy efficient memory structure
MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on MicroarchitecturePages 184–193Most modern microprocessors employ one or two levels of on-chip caches in order to improve performance. These caches are typically implemented with static RAM cells and often occupy a large portion of the chip area. Not surprisingly, these caches often ...
- ArticleDecember 1997
Initial results on the performance and cost of vector microprocessors
MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on MicroarchitecturePages 171–182Increasingly wider superscalar processors are experiencing diminishing performance returns while requiring larger portions of die area dedicated to control rather than datapath. As an alternative to using these processors to exploit parallelism ...
- ArticleDecember 1997
Out-of-order vector architectures
MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on MicroarchitecturePages 160–170Register renaming and out-of-order instruction issue are now commonly used in superscalar processors. These techniques can also be used to significant advantage in vector processors, as this paper shows. Performance is improved and available memory ...
- ArticleDecember 1997
The multicluster architecture: reducing cycle time through partitioning
MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on MicroarchitecturePages 149–159The multicluster architecture that we introduce offers a decentralized, dynamically-scheduled architecture, in which the register files, dispatch queue, and functional units of the architecture are distributed across multiple clusters, and each cluster ...
- ArticleDecember 1997
Trace processors
MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on MicroarchitecturePages 138–148Traces are dynamic instruction sequences constructed and cached by hardware. A microarchitecture organized around traces is presented as a means for efficiently executing many instructions per cycle. Trace processors exploit both control flow and data ...