Zhao W, Yuan L, Yan B, Ma P, Zhang Y, Wang L and Wang Z. Stencil Computation with Vector Outer Product. Proceedings of the 38th ACM International Conference on Supercomputing. (247-258).
Tao X, Pang J, Xu J and Zhu Y.
(2021). Compiler-directed scratchpad memory data transfer optimization for multithreaded applications on a heterogeneous many-core architecture. The Journal of Supercomputing. 77:12. (14502-14524). Online publication date: 1-Dec-2021.
Li Y, Sun H and Pang J.
(2021). Revisiting split tiling for stencil computations in polyhedral compilation. The Journal of Supercomputing. 10.1007/s11227-021-03835-z.
Loffeld J and Hittinger J.
(2019). On the arithmetic intensity of high-order finite-volume discretizations for hyperbolic systems of conservation laws. International Journal of High Performance Computing Applications. 33:1. (25-52). Online publication date: 1-Jan-2019.
Kruse M and Grosser T. DeLICM: scalar dependence removal at zero memory cost. Proceedings of the 2018 International Symposium on Code Generation and Optimization. (241-253).
Yuan L, Zhang Y, Guo P and Huang S. Tessellating stencils. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. (1-13).
Bondhugula U, Bandishti V and Pananilath I.
(2017). Diamond Tiling. IEEE Transactions on Parallel and Distributed Systems. 28:5. (1285-1298). Online publication date: 1-May-2017.
Doerfert J, Grosser T and Hack S. Optimistic loop optimization. Proceedings of the 2017 International Symposium on Code Generation and Optimization. (292-304).
Doerfert J, Grosser T and Hack S.
(2017). Optimistic loop optimization 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). 10.1109/CGO.2017.7863748. 978-1-5090-4931-8. (292-304).
Bondhugula U, Acharya A and Cohen A.
(2016). The Pluto+ Algorithm. ACM Transactions on Programming Languages and Systems. 38:3. (1-32). Online publication date: 2-May-2016.
Bhaskaracharya S, Bondhugula U and Cohen A.
(2016). Automatic Storage Optimization for Arrays. ACM Transactions on Programming Languages and Systems. 38:3. (1-23). Online publication date: 2-May-2016.
Li D, Xu C, Wang Y, Song Z, Xiong M, Gao X and Deng X.
(2016). Parallelizing and optimizing large-scale 3D multi-phase flow simulations on the Tianhe-2 supercomputer. Concurrency and Computation: Practice & Experience. 28:5. (1678-1692). Online publication date: 10-Apr-2016.