SIGOPS: Vol 28, No 5

Volume 28, Issue 5Dec. 1994

Volume 28, Issue 5

Dec. 1994

Publisher:

Association for Computing Machinery
New York
NY
United States

ISSN:0163-5980

Tags:

Bibliometrics

Newsletter Downloads

PDFFront matter material

PDFBack matter material

Select All

Export Citations Save to Binder

article

Free

Separating data and control transfer in distributed operating systems

Pages 2–11https://doi.org/10.1145/381792.195481

Advances in processor architecture and technology have resulted in workstations in the 100+ MIPS range. As well, newer local-area networks such as ATM promise a ten- to hundred-fold increase in throughput, much reduced latency, greater scalability, and ...

article

Free

Scheduling and page migration for multiprocessor compute servers

Pages 12–24https://doi.org/10.1145/381792.195485

Several cache-coherent shared-memory multiprocessors have been developed that are scalable and offer a very tight coupling between the processing resources. They are therefore quite attractive for use as compute servers for multiprogramming and parallel ...

article

Free

Reactive synchronization algorithms for multiprocessors

Pages 25–35https://doi.org/10.1145/381792.195490

Synchronization algorithms that are efficient across a wide range of applications and operating conditions are hard to design because their performance depends on unpredictable run-time factors. The designer of a synchronization algorithm has a choice ...

article

Free

Integration of message passing and shared memory in the Stanford FLASH multiprocessor

Pages 38–50https://doi.org/10.1145/381792.195494

The advantages of using message passing over shared memory for certain types of communication and synchronization have provided an incentive to integrate both models within a single architecture. A key goal of the FLASH (FLexible Architecture for SHared ...

article

Free

Software overhead in messaging layers: where does the time go?

Pages 51–60https://doi.org/10.1145/381792.195499

Despite improvements in network interfaces and software messaging layers, software communication overhead still dominates the hardware routing cost in most systems. In this study, we identify the sources of this overhead by analyzing software costs of ...

article

Free

Where is time spent in message-passing and shared-memory programs?

Pages 61–73https://doi.org/10.1145/381792.195501

Message passing and shared memory are two techniques parallel programs use for coordination and communication. This paper studies the strengths and weaknesses of these two mechanisms by comparing equivalent, well-written message-passing and shared-...

article

Free

Performance of a hardware-assisted real-time garbage collector

Pages 76–85https://doi.org/10.1145/381792.195504

Hardware-assisted real-time garbage collection offers high throughput and small worst-case bounds on the times required to allocate dynamic objects and to access the memory contained within previously allocated objects. Whether the proposed technology ...

article

Free

eNVy: a non-volatile, main memory storage system

Pages 86–97https://doi.org/10.1145/381792.195506

This paper describes the architecture of eNVy, a large non-volatile main memory storage system built primarily with Flash memory. eNVy presents its storage space as a linear, memory mapped array rather than as an emulated disk in order to provide an ...

article

Free

Resource allocation in a high clock rate microprocessor

Pages 98–109https://doi.org/10.1145/381792.195510

This paper discusses the design of a high clock rate (300MHz) processor. The architecture is described, and the goals for the design are explained. The performance of three processor models is evaluated using trace-driven simulation. A cost model is ...

article

Free

Hardware and software support for efficient exception handling

Pages 110–119https://doi.org/10.1145/381792.195515

Program-synchronous exceptions, for example, breakpoints, watchpoints, illegal opcodes, and memory access violations, provide information about exceptional conditions, interrupting the program and vectoring to an operating system handler. Over the last ...

article

Free

A technique for monitoring run-time dynamics of an operating system and a microprocessor executing user applications

Pages 122–131https://doi.org/10.1145/381792.195518

In this paper, we present a non-invasive and efficient technique for simulating applications complete with their operating system interaction. The technique involves booting and initiating an application on a hardware development system, capturing the ...

article

Free

Trap-driven simulation with Tapeworm II

Pages 132–144https://doi.org/10.1145/381792.195521

Tapeworm II is a software-based simulation tool that evaluates the cache and TLB performance of multiple-task and operating system intensive workloads. Tapeworm resides in an OS kernel and causes a host machine's hardware to drive simulations with ...

article

Free

Contrasting characteristics and cache performance of technical and multi-user commercial workloads

Pages 145–156https://doi.org/10.1145/381792.195524

Experience has shown that many widely used benchmarks are poor predictors of the performance of systems running commercial applications. Research into this anomaly has long been hampered by a lack of address traces from representative multi-user ...

article

Free

Avoiding conflict misses dynamically in large direct-mapped caches

Pages 158–170https://doi.org/10.1145/381792.195527

This paper describes a method for improving the performance of a large direct-mapped cache by reducing the number of conflict misses. Our solution consists of two components: an inexpensive hardware device called a Cache Miss Lookaside (CML) buffer that ...

article

Free

Surpassing the TLB performance of superpages with less operating system support

Pages 171–182https://doi.org/10.1145/381792.195531

Many commercial microprocessor architectures have added translation lookaside buffer (TLB) support for superpages. Superpages differ from segments because their size must be a power of two multiple of the base page size and they must be aligned in both ...

article

Free

Dynamic memory disambiguation using the memory conflict buffer

Pages 183–193https://doi.org/10.1145/381792.195534

To exploit instruction level parallelism, compilers for VLIW and superscalar processors often employ static code scheduling. However, the available code reordering may be severely restricted due to ambiguous dependences between memory instructions. This ...

article

Free

AP1000+: architectural support of PUT/GET interface for parallelizing compiler

Pages 196–207https://doi.org/10.1145/381792.195538

The scalability of distributed-memory parallel computers makes them attractive candidates for solving large-scale problems. New languages, such as HPF, FortranD, and VPP Fortran, have been developed to enable existing software to be easily ported to ...

article

Free

LCM: memory system support for parallel language implementation

Pages 208–218https://doi.org/10.1145/381792.195545

Higher-level parallel programming languages can be difficult to implement efficiently on parallel machines. This paper shows how a flexible, compiler-controlled memory system can help achieve good performance for language constructs that previously ...

article

Free

The performance advantages of integrating block data transfer in cache-coherent multiprocessors

Pages 219–229https://doi.org/10.1145/381792.195547

Integrating support for block data transfer has become an important emphasis in recent cache-coherent shared address space multiprocessors. This paper examines the potential performance benefits of adding this support. A set of ambitious hardware ...

article

Free

Improving the accuracy of static branch prediction using branch correlation

Pages 232–241https://doi.org/10.1145/381792.195549

Recent work in history-based branch prediction uses novel hardware structures to capture branch correlation and increase branch prediction accuracy. We present a profile-based code transformation that exploits branch correlation to improve the accuracy ...

article

Free

Reducing branch costs via branch alignment

Pages 242–251https://doi.org/10.1145/381792.195553

Several researchers have proposed algorithms for basic block reordering. We call these branch alignment algorithms. The primary emphasis of these algorithms has been on improving instruction cache locality, and the few studies concerned with branch ...

article

Free

Compiler optimizations for improving data locality

Pages 252–262https://doi.org/10.1145/381792.195557

In the past decade, processor speed has become significantly faster than memory speed. Small, fast cache memories are designed to overcome this discrepancy, but they are only effective when programs exhibit data locality. In this paper, we present ...

article

Free

DCG: an efficient, retargetable dynamic code generation system

Pages 263–272https://doi.org/10.1145/381792.195567

Dynamic code generation allows aggressive optimization through the use of runtime information. Previous systems typically relied on ad hoc code generators that were not designed for retargetability, and did not shield the client from machine-specific ...

article

Free

The performance impact of flexibility in the Stanford FLASH multiprocessor

Pages 274–285https://doi.org/10.1145/381792.195569

A flexible communication mechanism is a desirable feature in multiprocessors because it allows support for multiple communication protocols, expands performance monitoring capabilities, and leads to a simpler design and debug process. In the Stanford ...

article

Free

Simple compiler algorithms to reduce ownership overhead in cache coherence protocols

Pages 286–296https://doi.org/10.1145/381792.195572

We study in this paper the design and efficiency of compiler algorithms that remove ownership overhead in shared-memory multiprocessors with write-invalidate protocols. These algorithms detect loads followed by stores to the same address. Such loads are ...

article

Free

Fine-grain access control for distributed shared memory

Pages 297–306https://doi.org/10.1145/381792.195575

This paper discusses implementations of fine-grain memory access control, which selectively restricts reads and writes to cache-block-sized memory regions. Fine-grain access control forms the basis of efficient cache-coherent shared memory. This paper ...

article

Free

Interleaving: a multithreading technique targeting multiprocessors and workstations

Pages 308–318https://doi.org/10.1145/381792.195576

There is an increasing trend to use commodity microprocessors as the compute engines in large-scale multiprocessors. However, given that the majority of the microprocessors are sold in the workstation market, not in the multiprocessor market, it is only ...

article

Free

Hardware support for fast capability-based addressing

Pages 319–327https://doi.org/10.1145/381792.195579

Traditional methods of providing protection in memory systems do so at the cost of increased context switch time and/or increased storage to record access permissions for processes. With the advent of computers that supported cycle-by-cycle ...

article

Free

The effectiveness of multiple hardware contexts

Pages 328–337https://doi.org/10.1145/381792.195583

Multithreaded processors are used to tolerate long memory latencies. By executing threads loaded in multiple hardware contexts, an otherwise idle processor can keep busy, thus increasing its utilization. However, the larger size of a multi-thread ...

Sections

Newsletter Downloads

Save to Binder

Subjects

Comments