Del Sozzo E, Conficconi D and Sano K. (2024). Across Time and Space: Senju’s Approach for Scaling Iterative Stencil Loop Accelerators on Single and Multiple FPGAs. ACM Transactions on Reconfigurable Technology and Systems. 17:2. (1-33). Online publication date: 30-Jun-2024.

Koraei M and Fatemi S. (2021). SASIAF, A Scalable Accelerator for Seismic Imaging on Amazon AWS FPGAs 2021 11th International Conference on Computer Engineering and Knowledge (ICCKE). 10.1109/ICCKE54056.2021.9721464. 978-1-6654-0208-8. (352-357).

https://ieeexplore.ieee.org/document/9721464/

Reggiani E, Del Sozzo E, Conficconi D, Natale G, Moroni C and Santambrogio M. (2021). Enhancing the Scalability of Multi-FPGA Stencil Computations via Highly Optimized HDL Components. ACM Transactions on Reconfigurable Technology and Systems. 14:3. (1-33). Online publication date: 30-Sep-2021.

https://doi.org/10.1145/3461478

Wang J, Kang Y, Li Y, Wu W, Liu S and Wang L. (2021). Hexagonal Tiling based Multiple FPGAs Stencil Computation Acceleration and Optimization Methodology 2021 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom). 10.1109/ISPA-BDCloud-SocialCom-SustainCom52081.2021.00101. 978-1-6654-3574-1. (697-705).

https://ieeexplore.ieee.org/document/9644864/

Du C and Yamaguchi Y. (2020). High-Level Synthesis Design for Stencil Computations on FPGA with High Bandwidth Memory. Electronics. 10.3390/electronics9081275. 9:8. (1275).

https://www.mdpi.com/2079-9292/9/8/1275

Yantır H, Eltawil A and Salama K. (2020). Efficient Acceleration of Stencil Applications through In-Memory Computing. Micromachines. 10.3390/mi11060622. 11:6. (622).

https://www.mdpi.com/2072-666X/11/6/622

Koraei M, Fatemi O and Jahre M. (2019). DCMI. ACM Transactions on Architecture and Code Optimization. 16:4. (1-24). Online publication date: 31-Dec-2020.

https://doi.org/10.1145/3352813

Nabi S and Vanderbauwhede W. (2019). Smart-Cache: Optimising Memory Accesses for Arbitrary Boundaries and Stencils on FPGAs 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 10.1109/IPDPSW.2019.00024. 978-1-7281-3510-6. (87-90).

https://ieeexplore.ieee.org/document/8778414/

Castano-Londono L, Alzate Anzola C, Marquez-Viloria D, Gallo G and Osorio G. (2019). Evaluation of Stencil Based Algorithm Parallelization over System-on-Chip FPGA Using a High Level Synthesis Tool. Applied Computer Sciences in Engineering. 10.1007/978-3-030-31019-6_5. (52-63).

http://link.springer.com/10.1007/978-3-030-31019-6_5

Reggiani E, Natale G, Moroni C and Santambrogio M. (2018). An FPGA-Based Acceleration Methodology and Performance Model for Iterative Stencils 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 10.1109/IPDPSW.2018.00026. 978-1-5386-5555-9. (115-122).

https://ieeexplore.ieee.org/document/8425393/

Deest G, Yuki T, Rajopadhye S and Derrien S. (2017). One size does not fit all: Implementation trade-offs for iterative stencil computations on FPGAs 2017 27th International Conference on Field Programmable Logic and Applications (FPL). 10.23919/FPL.2017.8056781. 978-9-0903-0428-1. (1-8).

http://ieeexplore.ieee.org/document/8056781/

Rabozzi M, Natale G, Festa B, Miele A and Santambrogio M. (2017). Optimizing streaming stencil time-step designs via FPGA floorplanning 2017 27th International Conference on Field Programmable Logic and Applications (FPL). 10.23919/FPL.2017.8056764. 978-9-0903-0428-1. (1-4).

http://ieeexplore.ieee.org/document/8056764/

Tsoutsouras V, Koliogeorgi K, Xydis S and Soudris D. (2017). An Exploration Framework for Efficient High-Level Synthesis of Support Vector Machines. Journal of Signal Processing Systems. 88:2. (127-147). Online publication date: 1-Aug-2017.

https://doi.org/10.1007/s11265-017-1230-1

Wang S and Liang Y. A Comprehensive Framework for Synthesizing Stencil Algorithms on FPGAs using OpenCL Model. Proceedings of the 54th Annual Design Automation Conference 2017. (1-6).

https://doi.org/10.1145/3061639.3062185

Waidyasooriya H, Takei Y, Tatsumi S and Hariyama M. (2017). OpenCL-Based FPGA-Platform for Stencil Computation and Its Optimization Methodology. IEEE Transactions on Parallel and Distributed Systems. 28:5. (1390-1402). Online publication date: 1-May-2017.

https://doi.org/10.1109/TPDS.2016.2614981

Rana V, Beretta I, Bruschi F, Nacci A, Atienza D and Sciuto D. (2016). Efficient Hardware Design of Iterative Stencil Loops. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. 35:12. (2018-2031). Online publication date: 1-Dec-2016.

https://doi.org/10.1109/TCAD.2016.2545408

Cong J, Li P, Xiao B and Zhang P. (2016). An Optimal Microarchitecture for Stencil Computation Acceleration Based on Nonuniform Partitioning of Data Reuse Buffers. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. 35:3. (407-418). Online publication date: 1-Mar-2016.

https://doi.org/10.1109/TCAD.2015.2488491

Cattaneo R, Natale G, Sicignano C, Sciuto D and Santambrogio M. (2015). On How to Accelerate Iterative Stencil Loops. ACM Transactions on Architecture and Code Optimization. 12:4. (1-26). Online publication date: 7-Jan-2016.

https://doi.org/10.1145/2842615

Zhang J and Schirner G. (2015). Towards closing the specification gap by integrating algorithm-level and system-level design. Design Automation for Embedded Systems. 19:4. (389-419). Online publication date: 1-Dec-2015.

https://doi.org/10.1007/s10617-015-9161-1

Chen Y, Cong J, Gill M, Reinman G and Xiao B. (2015). Customizable Computing. Synthesis Lectures on Computer Architecture. 10.2200/S00650ED1V01Y201505CAC033. 10:3. (1-118). Online publication date: 6-Jul-2015.

http://www.morganclaypool.com/doi/10.2200/S00650ED1V01Y201505CAC033

Reichenbach M, Pfundt B and Fey D. (2015). Framework for parameter analysis of FPGA-based image processing architectures 2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS). 10.1109/SAMOS.2015.7363664. 978-1-4673-7311-1. (96-102).

https://ieeexplore.ieee.org/document/7363664

Reagen B, Adolf R, Shao Y, Wei G and Brooks D. (2014). MachSuite: Benchmarks for accelerator design and customized architectures 2014 IEEE International Symposium on Workload Characterization (IISWC). 10.1109/IISWC.2014.6983050. 978-1-4799-6454-3. (110-119).

http://ieeexplore.ieee.org/document/6983050/

Cong J, Li P, Xiao B and Zhang P. An Optimal Microarchitecture for Stencil Computation Acceleration Based on Non-Uniform Partitioning of Data Reuse Buffers. Proceedings of the 51st Annual Design Automation Conference. (1-6).

https://doi.org/10.1145/2593069.2593090

Cong J, Li P, Xiao B and Zhang P. (2014). An optimal microarchitecture for stencil computation acceleration based on non-uniform partitioning of data reuse buffers 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC). 10.1109/DAC.2014.6881404. 978-1-4799-3017-3. (1-6).

http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=6881404

Bildosola I, Martinez-Corral U and Basterretxea K. (2014). Adaptive scalable SVD unit for fast processing of large LSE problems 2014 IEEE 25th International Conference on Application-specific Systems, Architectures and Processors (ASAP). 10.1109/ASAP.2014.6868625. 978-1-4799-3609-0. (17-24).

http://ieeexplore.ieee.org/document/6868625/