Traditionally, cache coherence in multiprocessors has been maintained in hardware. However, the cost-effectiveness of hardware protocols for Distributed Shared Memory (DSM) systems is questionable. Virtual Shared Memory systems have highlighted the many advantages of software-implemented protocols, albeit at a performance price. The performance gap is narrowed by hybrid systems with software-implemented coherence protocols and hardware support for fine-grain access control.
This work contains the first proposal and evaluation of a hybrid COMA (Cache-Only Memory Architecture). The system is called SC-COMA for Software-Controlled COMA, to emphasize that the protocol engine is emulated by software executed on the main processor. Contrary to user-level protocols, the software handling coherence events in SC-COMA runs in sub-kernel mode, transparently and efficiently providing the same services to applications as a hardware counterpart. SC-COMA is employing a novel coherence protocol, optimized for a hybrid implementation, which has been fully implemented. The support for fine-grain access control is embedded in the memory controller.
The evaluation methodology is based on execution-driven simulation of complete applications from the SPLASH-2 suite. Results show that SC-COMA is competitive and a viable solution to easily transform networks of workstations into powerful multiprocessors. On systems with 32 processors, it achieves a slowdown of 11-56% with respect to an aggressive hardware counterpart, across a range of applications and memory overhead. Scalability is good and faster processors favorably affect the performance. An investigation on the impact of memory organization on the performance of hybrid systems reveals that, in most of a wide range of cases, COMA outperforms other alternatives: CC-NUMA, Simple COMA, and RC-NUMA due to the lower node miss ratio.
The performance of SC-COMA is further improved by three techniques: relaxed inclusion, mastership hints, and replacement hints. Even more significant improvements are obtained by adapting the SC-COMA approach to other hardware platforms: symmetric multiprocessor (SMP) nodes and processors with non-blocking stores.
Recommendations
Comparative performance evaluation of cache-coherent NUMA and COMA architectures
ISCA '92: Proceedings of the 19th annual international symposium on Computer architectureTwo interesting variations of large-scale shared-memory machines that have recently emerged are cache-coherent non-uniform-memory-access machines (CC-NUMA) and cache-only memory architectures (COMA). They both have distributed main memory and use ...
Comparative performance evaluation of cache-coherent NUMA and COMA architectures
Special Issue: Proceedings of the 19th annual international symposium on Computer architecture (ISCA '92)Two interesting variations of large-scale shared-memory machines that have recently emerged are cache-coherent non-uniform-memory-access machines (CC-NUMA) and cache-only memory architectures (COMA). They both have distributed main memory and use ...