Empirical evaluation of the CRAY-T3D: A compiler perspective
Proceedings of the 22nd Annual International Symposium on Computer Architecture, 1995•dl.acm.org
Most recent MPP systems employ a fast microprocessor surrounded by a shell of
communication and synchronization logic. The CRAY-T3D provides an elaborate shell to
support global-memory access, prefetch, atomic operations, barriers, and block transfers.
We provide a detailed empirical performance characterization of these primitives using
micro-benchmarks and evaluate their utility in compiling for a parallel language. We have
found that the raw performance of the machine is quite impressive and the most effective …
communication and synchronization logic. The CRAY-T3D provides an elaborate shell to
support global-memory access, prefetch, atomic operations, barriers, and block transfers.
We provide a detailed empirical performance characterization of these primitives using
micro-benchmarks and evaluate their utility in compiling for a parallel language. We have
found that the raw performance of the machine is quite impressive and the most effective …
Most recent MPP systems employ a fast microprocessor surrounded by a shell of communication and synchronization logic. The CRAY-T3D provides an elaborate shell to support global-memory access, prefetch, atomic operations, barriers, and block transfers. We provide a detailed empirical performance characterization of these primitives using micro-benchmarks and evaluate their utility in compiling for a parallel language. We have found that the raw performance of the machine is quite impressive and the most effective forms of communication are prefetch and write. Other shell provisions, such as the bulk transfer engine and the external Annex register set, are cumbersome and of little use. By evaluating the system in the context of a language implementation, we shed light on important trade-offs and pitfalls in the machine architecture.
ACM Digital Library