Biagio Cosenza

I am an Associate Professor (program AIM: Attraction and International Mobility) at the Department of Computer Science, University of Salerno, Italy.
In 2015-2019, I was Senior Researcher at TU Berlin, working with Prof. Ben Juurlink and leading as PI the DFG-funded international project CELERITY. From 2011 to 2015, I was a Post-Doctoral Researcher at the University of Innsbruck, Austria, working with Prof. Thomas Fahringer, contributing to the Insieme Compiler project and the DK-CIM program at the Scientific Computing multidisciplinary platform.
I received my Ph.D. from the University of Salerno in March 2011, supervised by Prof. Vittorio Scarano. There, I was the recipient of several grants and scholarships (HPC-Europa2, HPC-Europa++, DAAD, Cineca ISCRA) and I visited both the HLRS Supercomputing Center and the University of Stuttgart under the supervision of Prof. Carsten Dachsbacher and by Prof. Thomas Ertl.

My research interests include high-performance computing, compiler technology, and software optimization.

[ long bio | bio lunga | 简历 ] News

中国公派留学的申请

2-4 Dec, 2024: SYCL Hackathon at CINECA
Oct 10, 2024: Webinar at CINECA on Heterogeneous computing with SYCL & oneAPI on CINECA's Leonardo
25-27 Sept, 2024: Invited talk at XX ICAR CNR Workshop in Sicily
18 Sep 2024: presenting at the Workshop Scientific HPC in the pre-Exascale era in the 3rd Italian Conference on Big Data and Data Science (ITADATA) in Pisa, Italy
June 3, 2024: Invited keynote presentation at the Workshop on Performance and Energy Efficiency in Concurrent and Distributed Systems (PECS) in Pisa, colocated with HPDC 2024
My talk on Energy-Efficient Heterogenous Computing with SYNERGY presented at the oneAPI DevSummit for AI and HPC 2023 is now online
Nov 28, 2023: I will give a talk at La Maison de la Simulation in Paris Saclay on Modern C++ for Distributed, Approximate and Energy-Efficient Computing
Nov 11-17, 2023: I will be at SC in Denver presenting a paper at the SHiPS Workshop and attending our SYNERGY paper presentation by Kaijie Fan
Oct 24-25, 2023: I will give two presentations at the Intel oneAPI Workshop at Cyfronet in Krakow
August 29, 2023: I will give a tutorial about SYCL, Celerity and Synergy at Euro-Par 23 in Limassol, Cyprus
July 2023: New PRIN project (Projects of national interest) funded by the Italian Ministry of Education, University and Research
July 2023: New paper accepted to SuperComputing 2023
June 29th, 2023: Invited talk at CINECA on Modern C++ for High-Performance Computing
March 2023: Two new papers accepted for CCGrid 2023 on EMPI and on the new Celerity runtime
November 2022: Best Paper Award at the 14th BenchCouncil International Symposium on Benchmarking, Measuring and Optimizing for our paper on MPI performance variability on DragonFly+ networks
October 21, 2022: Hounoured to give an invited talk at TU Hamburg titled Portable Heterogeneous Programming for the Computing Continuum.
September 27, 2022: I have been elevated to the grade of Senior Member of the IEEE
July 2022: HiPEAC Info Magazine 66 features an interview with me and Peter Thoman about about Open Source, SYCL, and our experience in developing Celerity
June 3, 2022: I have been recognized by the Association for Computing Machinery as a ACM Senior Member
New publications at IWOCL 22 and Computing Frontiers 22
In the media: podcast interview by Gabriella Bernardi about how we use SYCL/Celerity in LIGATE and our article on why HPC is a key strategic area for Italy in Agenda Digitale (in Italian)
I have been invited to present Celerity at the Intel Developer Summit on Apr 29. Here you can see the video recording
Feb 2021: SYCL 2020 Specification have been released!
Happy to announce that, starting from Jan 1 2021, we will start the new European project LIGATE (EuroHPC)
Nov 2020: I received the (German) Habilitation from TU Berlin

Projects

[2023-25] Principal Investigator of the PRIN 2022 project LibreRT: Portable Heterogenous Real-time Programming for the Embedded Computing Continuum, with the Politecnico di Milano
[2021-24] Unit leader in the EuroHPC project LIGATE: LIgand Generator and portable drug discovery platform AT Exascale. Leader Dompe srl, partners POLIMI, CINECA, KTH, University of Salerno, University of Innsbruck, University of Basel, TU Ostrava, E4, Chelonia, tofmotion. Funding UNISA (EU+MISE) 463 750 euro, project overall 5 938 656 euro.
[2017-20] Principal Investigator of the DFG project CELERITY: advanCed modELing for scalablE distRIbuted runTime sYstems, an international DACH project between TU Berlin and the University of Innsbruck. Funding TU Berlin (DFG) 413 230 euro, overall (DFG+FWF) 698 000 euro.
[2011-13] Work Package coordinator of the project Automatic Portable Performance for Heterogeneous Multi-cores, Austrian Science Fund (FWF) TRP 220-N23, P.I. Prof. Thomas Fahringer.

Recent Academic Service

Track Chair: Euro-Par 2023 (Track 1. Programming, Compilers and Performance)
Program Committee: ICS 2024, ISC 2024, HiPC 2024, 2023, ICPP 2024, CF 2024, HLPP 2024, IWOCL 2024, SC 2023 (Programming Frameworks and System Software Track), Hetero-Par 2023, Euro-Par 2023.
Editorial Board: Journal FGCS Special Issue: On the Road to Exascale II
Artifact Evaluation Chair/Co-chair: IA3 2021-23, CF 2021-23
Senior Member of ACM and IEEE, Member of SIGHPC, ACM-W, and HiPEAC
Member of the CINI HPC Lab HPC: Key Technologies and Tools
Member of Khronos Group and SYCL Working Group

Research Highlights by Topic Programming models for HPC. Modern HPC systems are difficult to program. Our research focused on high-level programming models, capable to transparently handle data and task parallelism as well as heterogeneity, and to scale on large-scale compute clusters equipped with GPUs and other accelerators. We proposed CELERITY [Euro-Par19], a C++ SYCL-based programming supported by a distributed runtime system, integrated with a compiler. Under the hood, CELERITY uses two representations for scheduling and optimization: a task graph and a command graph [ICS13].

[CCGrid23a] Salzmann, Knorr, Thoman, Gschwandtner, Cosenza, Fahringer An Asynchronous Dataflow-Driven Execution Model For Distributed Accelerator Computing CCGrid 2023
[Euro-Par19] Thoman, Salzmann, Cosenza, Fahringer Celerity: High-Level C++ for Accelerator Clusters Euro-Par 2019: 291-303 (acc.rate: 25.3%)
[ICS13] Grasso, Pellegrini, Cosenza, Fahringer libwater: Heterogeneous distributed computing made easy ICS 2013: 161-172 (acc.rate: 21%)

Our work on programming models also focuses on the Message Passing Interface (MPI), in particular on integrating modern C++ features into existing message passing implementations. We have proposed EMPI, a high-level modern C++ interface to MPI that exploits modern C++ features to reduce programming errors (type mismatch, invalid argument type, unmatched/mismatched wait) and is competitive with MPI due to its ability to skip some of the runtime checks.

[CCGrid23b] Salimi Beni, Crisci, Cosenza EMPI: Enhanced Message Passing Interface in Modern C++ CCGrid 2023

Automatic tuning. Software often exposes parameters that affects performance and other metrics. Parallel programs for modern computer architectures requires the tuning of a large number of code variants. My research focuses on autotuners for parallel optimization, in particular on machine learning approaches integrated into compiler. Examples are classification for automatic task partitioning [ICS13] , regression for GPU frequency scaling [ICPP19] , and ordinal regression for stencil computation [IPDPS17].

[ICPP19] Fan, Cosenza, Juurlink. Predictable GPUs Frequency Scaling for Energy and Performance ICPP 2019: 52:1-52:10 (acc.rate: 26.2%)
[IPDPS17] Cosenza, Durillo, Ermon, Juurlink. Autotuning Stencil Computations with Structural Ordinal Regression Learning IPDPS 2017: 287-296 (acc.rate: 22%)
[ICS13] Kofler, Grasso, Cosenza, Fahringer. An automatic input-sensitive approach for heterogeneous task partitioning ICS 2013: 149-160 (acc.rate: 21%)

Approximate computing. Many applications provide inherent resilience to some amount of error and can potentially trade accuracy for performance. Our research focused on software approaches for approximate computing, such as kernel perforation, and their optimization for GPU architectures [CGO18].

[CGO18] Maier, Cosenza, Juurlink. Local Memory-Aware Kernel Perforation CGO 2018: 278-287 (acc.rate: 28.6%)

Vectorization. Modern processors come equipped with Single Instruction Multiple Data (SIMD) instructions. Examples are Intel AVX, ARM NEON, and recent Vector Length Agnostic ISAs such as ARM’s Scalable Vector Extensions (SVE). Our research investigated different aspects of efficient code generation for vectorization, such as cost modeling [MASCOTS19] and control flow [SCOPES18] .

[MASCOTS19] Pohl, Cosenza, Juurlink. Portable Cost Modeling for Auto-Vectorizers MASCOTS 2019: 359-369 (acc.rate: 23.8%)
[SCOPES18] Pohl, Cosenza, Juurlink. Control Flow Vectorization for ARM NEON SCOPES 2018: 66-75

Visit the publication page for a complete list of publications. Teaching University of Salerno [ time table ]

Programmazione Distribuita (AA 2022-23, 2021-22)
High Performance Computing (AA 2021-22, 2020-21)
Programmazione & Strutture Date (AA 2020-21)
Programming Models for Parallel Heterogenous Architectures (CS PhD course, 2020-21)
Ad Hoc Networks (AA 2020-21)

TU Berlin

Compiler Design (WS 2019-20, WS 2017-18, WS 2016-17, WS 2015-16)
Avanced Computer Architecture, lab (SoSe 2018, SoSe 2017, SoSe 2016)
AES Seminars (WS 2015, WS 2016, WS 2017, WS 2018)
Recent Advances in Computer Architectures (WS 2017-18, WS 2018-19)

University of Innsbruck (courses listed in LFU Online)