9th Annual Workshop on Interaction between Compilers and Computer Architectures (INTERACT'05)
In this paper we demonstrate that effective structure optimization is essential to improve code q... more In this paper we demonstrate that effective structure optimization is essential to improve code quality and reduce compilation overhead for object-oriented programs. We propose to address this problem by using an effective representation of structure operation, folding indirect memory accesses to structure fields, flattening structures judiciously, and allowing more aggressive procedure inlining. These techniques enable the existing scalar optimizations, which
2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), 2021
The SIMT execution model is commonly used for general GPU development. CUDA and OpenCL developers... more The SIMT execution model is commonly used for general GPU development. CUDA and OpenCL developers write scalar code that is implicitly parallelized by compiler and hardware. On Intel GPUs, however, this abstraction has profound performance implications as the underlying ISA is SIMD and important hardware capabilities cannot be fully utilized. To close this performance gap we introduce C-For-Metal (CM), an explicit SIMD programming framework designed to deliver close-to-the-metal performance on Intel GPUs. The CM programming language and its vector/matrix types provide an intuitive interface to exploit the underlying hardware features, allowing fine-grained register management, SIMD size control and cross-lane data sharing. Experimental results show that CM applications from different domains outperform the bestknown SIMT-based OpenCL implementations, achieving up to 2.7x speedup on the latest Intel GPU.
Proceedings of the 14th international symposium on Systems synthesis - ISSS '01, 2001
Page 1. Retargetable Static Timing Analysis for Embedded Software Kaiyu Chen Sharad Malik David I... more Page 1. Retargetable Static Timing Analysis for Embedded Software Kaiyu Chen Sharad Malik David I. August Department of Electrical Engineering Department of Electrical Engineering Princeton University Princeton University Princeton University ...
9th International Symposium on Quality Electronic Design (isqed 2008), 2008
Transactional Memory (TM) has been proposed as a promising solution to effectively harness the in... more Transactional Memory (TM) has been proposed as a promising solution to effectively harness the increasing processing power of emerging multi/many- core systems. While there has been considerable research on the design and implementation of TM systems, it remains to be shown how to address the validation challenge of such systems in face of increasing design bugs and dynamic errors. This
2008 IEEE 14th International Symposium on High Performance Computer Architecture, 2008
An important correctness issue for emerging multi/many-core shared memory systems is to ensure th... more An important correctness issue for emerging multi/many-core shared memory systems is to ensure that the inter-processor communication through shared memory conforms to the memory ordering rules, as specified by the architecture's memory consistency model (1). This presents a significant validation challenge. Growing system complexity makes it increasingly hard to identify all deep-state logic bugs in pre-silicon verification. Further, aggressive technology
9th Annual Workshop on Interaction between Compilers and Computer Architectures (INTERACT'05), 2005
In this paper we demonstrate that effective structure optimization is essential to improve code q... more In this paper we demonstrate that effective structure optimization is essential to improve code quality and reduce compilation overhead for object-oriented programs. We propose to address this problem by using an effective representation of structure operation, folding indirect memory accesses to structure fields, flattening structures judiciously, and allowing more aggressive procedure inlining. These techniques enable the existing scalar optimizations, which
9th Annual Workshop on Interaction between Compilers and Computer Architectures (INTERACT'05)
In this paper we demonstrate that effective structure optimization is essential to improve code q... more In this paper we demonstrate that effective structure optimization is essential to improve code quality and reduce compilation overhead for object-oriented programs. We propose to address this problem by using an effective representation of structure operation, folding indirect memory accesses to structure fields, flattening structures judiciously, and allowing more aggressive procedure inlining. These techniques enable the existing scalar optimizations, which
2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), 2021
The SIMT execution model is commonly used for general GPU development. CUDA and OpenCL developers... more The SIMT execution model is commonly used for general GPU development. CUDA and OpenCL developers write scalar code that is implicitly parallelized by compiler and hardware. On Intel GPUs, however, this abstraction has profound performance implications as the underlying ISA is SIMD and important hardware capabilities cannot be fully utilized. To close this performance gap we introduce C-For-Metal (CM), an explicit SIMD programming framework designed to deliver close-to-the-metal performance on Intel GPUs. The CM programming language and its vector/matrix types provide an intuitive interface to exploit the underlying hardware features, allowing fine-grained register management, SIMD size control and cross-lane data sharing. Experimental results show that CM applications from different domains outperform the bestknown SIMT-based OpenCL implementations, achieving up to 2.7x speedup on the latest Intel GPU.
Proceedings of the 14th international symposium on Systems synthesis - ISSS '01, 2001
Page 1. Retargetable Static Timing Analysis for Embedded Software Kaiyu Chen Sharad Malik David I... more Page 1. Retargetable Static Timing Analysis for Embedded Software Kaiyu Chen Sharad Malik David I. August Department of Electrical Engineering Department of Electrical Engineering Princeton University Princeton University Princeton University ...
9th International Symposium on Quality Electronic Design (isqed 2008), 2008
Transactional Memory (TM) has been proposed as a promising solution to effectively harness the in... more Transactional Memory (TM) has been proposed as a promising solution to effectively harness the increasing processing power of emerging multi/many- core systems. While there has been considerable research on the design and implementation of TM systems, it remains to be shown how to address the validation challenge of such systems in face of increasing design bugs and dynamic errors. This
2008 IEEE 14th International Symposium on High Performance Computer Architecture, 2008
An important correctness issue for emerging multi/many-core shared memory systems is to ensure th... more An important correctness issue for emerging multi/many-core shared memory systems is to ensure that the inter-processor communication through shared memory conforms to the memory ordering rules, as specified by the architecture's memory consistency model (1). This presents a significant validation challenge. Growing system complexity makes it increasingly hard to identify all deep-state logic bugs in pre-silicon verification. Further, aggressive technology
9th Annual Workshop on Interaction between Compilers and Computer Architectures (INTERACT'05), 2005
In this paper we demonstrate that effective structure optimization is essential to improve code q... more In this paper we demonstrate that effective structure optimization is essential to improve code quality and reduce compilation overhead for object-oriented programs. We propose to address this problem by using an effective representation of structure operation, folding indirect memory accesses to structure fields, flattening structures judiciously, and allowing more aggressive procedure inlining. These techniques enable the existing scalar optimizations, which
Uploads
Papers by Kaiyu Chen