Del Sozzo E, Conficconi D and Sano K.
(2024). Across Time and Space: Senju’s Approach for Scaling Iterative Stencil Loop Accelerators on Single and Multiple FPGAs. ACM Transactions on Reconfigurable Technology and Systems. 17:2. (1-33). Online publication date: 30-Jun-2024.
Koraei M and Fatemi S.
(2021). SASIAF, A Scalable Accelerator for Seismic Imaging on Amazon AWS FPGAs 2021 11th International Conference on Computer Engineering and Knowledge (ICCKE). 10.1109/ICCKE54056.2021.9721464. 978-1-6654-0208-8. (352-357).
Reggiani E, Del Sozzo E, Conficconi D, Natale G, Moroni C and Santambrogio M.
(2021). Enhancing the Scalability of Multi-FPGA Stencil Computations via Highly Optimized HDL Components. ACM Transactions on Reconfigurable Technology and Systems. 14:3. (1-33). Online publication date: 30-Sep-2021.
Wang J, Kang Y, Li Y, Wu W, Liu S and Wang L.
(2021). Hexagonal Tiling based Multiple FPGAs Stencil Computation Acceleration and Optimization Methodology 2021 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom). 10.1109/ISPA-BDCloud-SocialCom-SustainCom52081.2021.00101. 978-1-6654-3574-1. (697-705).
Du C and Yamaguchi Y.
(2020). High-Level Synthesis Design for Stencil Computations on FPGA with High Bandwidth Memory. Electronics. 10.3390/electronics9081275. 9:8. (1275).
Yantır H, Eltawil A and Salama K.
(2020). Efficient Acceleration of Stencil Applications through In-Memory Computing. Micromachines. 10.3390/mi11060622. 11:6. (622).
Koraei M, Fatemi O and Jahre M.
(2019). DCMI. ACM Transactions on Architecture and Code Optimization. 16:4. (1-24). Online publication date: 31-Dec-2020.
Nabi S and Vanderbauwhede W.
(2019). Smart-Cache: Optimising Memory Accesses for Arbitrary Boundaries and Stencils on FPGAs 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 10.1109/IPDPSW.2019.00024. 978-1-7281-3510-6. (87-90).
Castano-Londono L, Alzate Anzola C, Marquez-Viloria D, Gallo G and Osorio G.
(2019). Evaluation of Stencil Based Algorithm Parallelization over System-on-Chip FPGA Using a High Level Synthesis Tool. Applied Computer Sciences in Engineering. 10.1007/978-3-030-31019-6_5. (52-63).
Reggiani E, Natale G, Moroni C and Santambrogio M.
(2018). An FPGA-Based Acceleration Methodology and Performance Model for Iterative Stencils 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 10.1109/IPDPSW.2018.00026. 978-1-5386-5555-9. (115-122).
Deest G, Yuki T, Rajopadhye S and Derrien S.
(2017). One size does not fit all: Implementation trade-offs for iterative stencil computations on FPGAs 2017 27th International Conference on Field Programmable Logic and Applications (FPL). 10.23919/FPL.2017.8056781. 978-9-0903-0428-1. (1-8).
Rabozzi M, Natale G, Festa B, Miele A and Santambrogio M.
(2017). Optimizing streaming stencil time-step designs via FPGA floorplanning 2017 27th International Conference on Field Programmable Logic and Applications (FPL). 10.23919/FPL.2017.8056764. 978-9-0903-0428-1. (1-4).
Tsoutsouras V, Koliogeorgi K, Xydis S and Soudris D.
(2017). An Exploration Framework for Efficient High-Level Synthesis of Support Vector Machines. Journal of Signal Processing Systems. 88:2. (127-147). Online publication date: 1-Aug-2017.
Wang S and Liang Y. A Comprehensive Framework for Synthesizing Stencil Algorithms on FPGAs using OpenCL Model. Proceedings of the 54th Annual Design Automation Conference 2017. (1-6).
Waidyasooriya H, Takei Y, Tatsumi S and Hariyama M.
(2017). OpenCL-Based FPGA-Platform for Stencil Computation and Its Optimization Methodology. IEEE Transactions on Parallel and Distributed Systems. 28:5. (1390-1402). Online publication date: 1-May-2017.
Rana V, Beretta I, Bruschi F, Nacci A, Atienza D and Sciuto D.
(2016). Efficient Hardware Design of Iterative Stencil Loops. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. 35:12. (2018-2031). Online publication date: 1-Dec-2016.
Cong J, Li P, Xiao B and Zhang P.
(2016). An Optimal Microarchitecture for Stencil Computation Acceleration Based on Nonuniform Partitioning of Data Reuse Buffers. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. 35:3. (407-418). Online publication date: 1-Mar-2016.
Cattaneo R, Natale G, Sicignano C, Sciuto D and Santambrogio M.
(2015). On How to Accelerate Iterative Stencil Loops. ACM Transactions on Architecture and Code Optimization. 12:4. (1-26). Online publication date: 7-Jan-2016.
Zhang J and Schirner G.
(2015). Towards closing the specification gap by integrating algorithm-level and system-level design. Design Automation for Embedded Systems. 19:4. (389-419). Online publication date: 1-Dec-2015.
Chen Y, Cong J, Gill M, Reinman G and Xiao B.
(2015). Customizable Computing. Synthesis Lectures on Computer Architecture. 10.2200/S00650ED1V01Y201505CAC033. 10:3. (1-118). Online publication date: 6-Jul-2015.
Reichenbach M, Pfundt B and Fey D.
(2015). Framework for parameter analysis of FPGA-based image processing architectures 2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS). 10.1109/SAMOS.2015.7363664. 978-1-4673-7311-1. (96-102).
Reagen B, Adolf R, Shao Y, Wei G and Brooks D.
(2014). MachSuite: Benchmarks for accelerator design and customized architectures 2014 IEEE International Symposium on Workload Characterization (IISWC). 10.1109/IISWC.2014.6983050. 978-1-4799-6454-3. (110-119).
Cong J, Li P, Xiao B and Zhang P. An Optimal Microarchitecture for Stencil Computation Acceleration Based on Non-Uniform Partitioning of Data Reuse Buffers. Proceedings of the 51st Annual Design Automation Conference. (1-6).
Cong J, Li P, Xiao B and Zhang P.
(2014). An optimal microarchitecture for stencil computation acceleration based on non-uniform partitioning of data reuse buffers 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC). 10.1109/DAC.2014.6881404. 978-1-4799-3017-3. (1-6).
Bildosola I, Martinez-Corral U and Basterretxea K.
(2014). Adaptive scalable SVD unit for fast processing of large LSE problems 2014 IEEE 25th International Conference on Application-specific Systems, Architectures and Processors (ASAP). 10.1109/ASAP.2014.6868625. 978-1-4799-3609-0. (17-24).