Search | arXiv e-print repository

Analysis Facilities White Paper

Authors: D. Ciangottini, A. Forti, L. Heinrich, N. Skidmore, C. Alpigiani, M. Aly, D. Benjamin, B. Bockelman, L. Bryant, J. Catmore, M. D'Alfonso, A. Delgado Peris, C. Doglioni, G. Duckeck, P. Elmer, J. Eschle, M. Feickert, J. Frost, R. Gardner, V. Garonne, M. Giffels, J. Gooding, E. Gramstad, L. Gray, B. Hegner , et al. (41 additional authors not shown)

Abstract: This white paper presents the current status of the R&D for Analysis Facilities (AFs) and attempts to summarize the views on the future direction of these facilities. These views have been collected through the High Energy Physics (HEP) Software Foundation's (HSF) Analysis Facilities forum, established in March 2022, the Analysis Ecosystems II workshop, that took place in May 2022, and the WLCG/HS… ▽ More This white paper presents the current status of the R&D for Analysis Facilities (AFs) and attempts to summarize the views on the future direction of these facilities. These views have been collected through the High Energy Physics (HEP) Software Foundation's (HSF) Analysis Facilities forum, established in March 2022, the Analysis Ecosystems II workshop, that took place in May 2022, and the WLCG/HSF pre-CHEP workshop, that took place in May 2023. The paper attempts to cover all the aspects of an analysis facility. △ Less

Submitted 15 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

arXiv:2403.03494 [pdf, other]

Scalable ATLAS pMSSM computational workflows using containerised REANA reusable analysis platform

Authors: Marco Donadoni, Matthew Feickert, Lukas Heinrich, Yang Liu, Audrius Mečionis, Vladyslav Moisieienkov, Tibor Šimko, Giordon Stark, Marco Vidal García

Abstract: In this paper we describe the development of a streamlined framework for large-scale ATLAS pMSSM reinterpretations of LHC Run-2 analyses using containerised computational workflows. The project is looking to assess the global coverage of BSM physics and requires running O(5k) computational workflows representing pMSSM model points. Following ATLAS Analysis Preservation policies, many analyses have… ▽ More In this paper we describe the development of a streamlined framework for large-scale ATLAS pMSSM reinterpretations of LHC Run-2 analyses using containerised computational workflows. The project is looking to assess the global coverage of BSM physics and requires running O(5k) computational workflows representing pMSSM model points. Following ATLAS Analysis Preservation policies, many analyses have been preserved as containerised Yadage workflows, and after validation were added to a curated selection for the pMSSM study. To run the workflows at scale, we utilised the REANA reusable analysis platform. We describe how the REANA platform was enhanced to ensure the best concurrent throughput by internal service scheduling changes. We discuss the scalability of the approach on Kubernetes clusters from 500 to 5000 cores. Finally, we demonstrate a possibility of using additional ad-hoc public cloud infrastructure resources by running the same workflows on the Google Cloud Platform. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: 8 pages, 9 figures. Contribution to the Proceedings of the 26th International Conference on Computing In High Energy and Nuclear Physics (CHEP 2023)

arXiv:2402.08417 [pdf, other]

doi 10.1140/epjc/s10052-024-13038-4

Constructing model-agnostic likelihoods, a method for the reinterpretation of particle physics results

Authors: Lorenz Gärtner, Nikolai Hartmann, Lukas Heinrich, Malin Horstmann, Thomas Kuhr, Méril Reboud, Slavomira Stefkova, Danny van Dyk

Abstract: Experimental High Energy Physics has entered an era of precision measurements. However, measurements of many of the accessible processes assume that the final states' underlying kinematic distribution is the same as the Standard Model prediction. This assumption introduces an implicit model-dependency into the measurement, rendering the reinterpretation of the experimental analysis complicated wit… ▽ More Experimental High Energy Physics has entered an era of precision measurements. However, measurements of many of the accessible processes assume that the final states' underlying kinematic distribution is the same as the Standard Model prediction. This assumption introduces an implicit model-dependency into the measurement, rendering the reinterpretation of the experimental analysis complicated without reanalysing the underlying data. We present a novel reweighting method in order to perform reinterpretation of particle physics measurements. It makes use of reweighting the Standard Model templates according to kinematic signal distributions of alternative theoretical models, prior to performing the statistical analysis. The generality of this method allows us to perform statistical inference in the space of theoretical parameters, assuming different kinematic distributions, according to a beyond Standard Model prediction. We implement our method as an extension to the pyhf software and interface it with the EOS software, which allows us to perform flavor physics phenomenology studies. Furthermore, we argue that, beyond the pyhf or HistFactory likelihood specification, only minimal information is necessary to make a likelihood model-agnostic and hence easily reinterpretable. We showcase that publishing such likelihoods is crucial for a full exploitation of experimental results. △ Less

Submitted 15 July, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

Report number: IPPP/24/06

arXiv:2401.16016 [pdf, other]

Combined track finding with GNN & CKF

Authors: Lukas Heinrich, Benjamin Huth, Andreas Salzburger, Tilo Wettig

Abstract: The application of Graph Neural Networks (GNN) in track reconstruction is a promising approach to cope with the challenges arising at the High-Luminosity upgrade of the Large Hadron Collider (HL-LHC). GNNs show good track-finding performance in high-multiplicity scenarios and are naturally parallelizable on heterogeneous compute architectures. Typical high-energy-physics detectors have high reso… ▽ More The application of Graph Neural Networks (GNN) in track reconstruction is a promising approach to cope with the challenges arising at the High-Luminosity upgrade of the Large Hadron Collider (HL-LHC). GNNs show good track-finding performance in high-multiplicity scenarios and are naturally parallelizable on heterogeneous compute architectures. Typical high-energy-physics detectors have high resolution in the innermost layers to support vertex reconstruction but lower resolution in the outer parts. GNNs mainly rely on 3D space-point information, which can cause reduced track-finding performance in the outer regions. In this contribution, we present a novel combination of GNN-based track finding with the classical Combinatorial Kalman Filter (CKF) algorithm to circumvent this issue: The GNN resolves the track candidates in the inner pixel region, where 3D space points can represent measurements very well. These candidates are then picked up by the CKF in the outer regions, where the CKF performs well even for 1D measurements. Using the ACTS infrastructure, we present a proof of concept based on truth tracking in the pixels as well as a dedicated GNN pipeline trained on $t\bar{t}$ events with pile-up 200 in the OpenDataDetector. △ Less

Submitted 29 January, 2024; originally announced January 2024.

Comments: 6 pages, 6 figures, to be published in the Connecting The Dots 2023 conference proceedings

arXiv:2401.13537 [pdf, other]

Masked Particle Modeling on Sets: Towards Self-Supervised High Energy Physics Foundation Models

Authors: Tobias Golling, Lukas Heinrich, Michael Kagan, Samuel Klein, Matthew Leigh, Margarita Osadchy, John Andrew Raine

Abstract: We propose masked particle modeling (MPM) as a self-supervised method for learning generic, transferable, and reusable representations on unordered sets of inputs for use in high energy physics (HEP) scientific data. This work provides a novel scheme to perform masked modeling based pre-training to learn permutation invariant functions on sets. More generally, this work provides a step towards bui… ▽ More We propose masked particle modeling (MPM) as a self-supervised method for learning generic, transferable, and reusable representations on unordered sets of inputs for use in high energy physics (HEP) scientific data. This work provides a novel scheme to perform masked modeling based pre-training to learn permutation invariant functions on sets. More generally, this work provides a step towards building large foundation models for HEP that can be generically pre-trained with self-supervised learning and later fine-tuned for a variety of down-stream tasks. In MPM, particles in a set are masked and the training objective is to recover their identity, as defined by a discretized token representation of a pre-trained vector quantized variational autoencoder. We study the efficacy of the method in samples of high energy jets at collider physics experiments, including studies on the impact of discretization, permutation invariance, and ordering. We also study the fine-tuning capability of the model, showing that it can be adapted to tasks such as supervised and weakly supervised jet classification, and that the model can transfer efficiently with small fine-tuning data sets to new classes and new data domains. △ Less

Submitted 11 July, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

arXiv:2401.13536 [pdf, other]

Finetuning Foundation Models for Joint Analysis Optimization

Authors: Matthias Vigl, Nicole Hartman, Lukas Heinrich

Abstract: In this work we demonstrate that significant gains in performance and data efficiency can be achieved in High Energy Physics (HEP) by moving beyond the standard paradigm of sequential optimization or reconstruction and analysis components. We conceptually connect HEP reconstruction and analysis to modern machine learning workflows such as pretraining, finetuning, domain adaptation and high-dimensi… ▽ More In this work we demonstrate that significant gains in performance and data efficiency can be achieved in High Energy Physics (HEP) by moving beyond the standard paradigm of sequential optimization or reconstruction and analysis components. We conceptually connect HEP reconstruction and analysis to modern machine learning workflows such as pretraining, finetuning, domain adaptation and high-dimensional embedding spaces and quantify the gains in the example usecase of searches of heavy resonances decaying via an intermediate di-Higgs system to four $b$-jets. △ Less

Submitted 25 January, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

Comments: 13 pages, 12 figures

arXiv:2309.17005 [pdf, other]

Bayesian Methodologies with pyhf

Authors: Matthew Feickert, Lukas Heinrich, Malin Horstmann

Abstract: bayesian_pyhf is a Python package that allows for the parallel Bayesian and frequentist evaluation of multi-channel binned statistical models. The Python library pyhf is used to build such models according to the HistFactory framework and already includes many frequentist inference methodologies. The pyhf-built models are then used as data-generating model for Bayesian inference and evaluated with… ▽ More bayesian_pyhf is a Python package that allows for the parallel Bayesian and frequentist evaluation of multi-channel binned statistical models. The Python library pyhf is used to build such models according to the HistFactory framework and already includes many frequentist inference methodologies. The pyhf-built models are then used as data-generating model for Bayesian inference and evaluated with the Python library PyMC. Based on Monte Carlo Chain Methods, PyMC allows for Bayesian modelling and together with the arviz library offers a wide range of Bayesian analysis tools. △ Less

Submitted 11 December, 2023; v1 submitted 29 September, 2023; originally announced September 2023.

Comments: 8 pages, 3 figures, 1 listing. Contribution to the Proceedings of the 26th International Conference on Computing In High Energy and Nuclear Physics (CHEP 2023)

arXiv:2308.16680 [pdf, other]

Branches of a Tree: Taking Derivatives of Programs with Discrete and Branching Randomness in High Energy Physics

Authors: Michael Kagan, Lukas Heinrich

Abstract: We propose to apply several gradient estimation techniques to enable the differentiation of programs with discrete randomness in High Energy Physics. Such programs are common in High Energy Physics due to the presence of branching processes and clustering-based analysis. Thus differentiating such programs can open the way for gradient based optimization in the context of detector design optimizati… ▽ More We propose to apply several gradient estimation techniques to enable the differentiation of programs with discrete randomness in High Energy Physics. Such programs are common in High Energy Physics due to the presence of branching processes and clustering-based analysis. Thus differentiating such programs can open the way for gradient based optimization in the context of detector design optimization, simulator tuning, or data analysis and reconstruction optimization. We discuss several possible gradient estimation strategies, including the recent Stochastic AD method, and compare them in simplified detector design experiments. In doing so we develop, to the best of our knowledge, the first fully differentiable branching program. △ Less

Submitted 31 August, 2023; originally announced August 2023.

Comments: 8 pages

arXiv:2306.12584 [pdf, other]

Hierarchical Neural Simulation-Based Inference Over Event Ensembles

Authors: Lukas Heinrich, Siddharth Mishra-Sharma, Chris Pollard, Philipp Windischhofer

Abstract: When analyzing real-world data it is common to work with event ensembles, which comprise sets of observations that collectively constrain the parameters of an underlying model of interest. Such models often have a hierarchical structure, where "local" parameters impact individual events and "global" parameters influence the entire dataset. We introduce practical approaches for frequentist and Baye… ▽ More When analyzing real-world data it is common to work with event ensembles, which comprise sets of observations that collectively constrain the parameters of an underlying model of interest. Such models often have a hierarchical structure, where "local" parameters impact individual events and "global" parameters influence the entire dataset. We introduce practical approaches for frequentist and Bayesian dataset-wide probabilistic inference in cases where the likelihood is intractable, but simulations can be realized via a hierarchical forward model. We construct neural estimators for the likelihood(-ratio) or posterior and show that explicitly accounting for the model's hierarchical structure can lead to significantly tighter parameter constraints. We ground our discussion using case studies from the physical sciences, focusing on examples from particle physics and cosmology. △ Less

Submitted 21 February, 2024; v1 submitted 21 June, 2023; originally announced June 2023.

Comments: 15+2 pages, 7 figures; v2, version published in TMLR

Report number: MIT-CTP/5576

arXiv:2306.03675 [pdf, other]

doi 10.1007/s41781-023-00104-x

Potential of the Julia programming language for high energy physics computing

Authors: J. Eschle, T. Gal, M. Giordano, P. Gras, B. Hegner, L. Heinrich, U. Hernandez Acosta, S. Kluth, J. Ling, P. Mato, M. Mikhasenko, A. Moreno Briceño, J. Pivarski, K. Samaras-Tsakiris, O. Schulz, G. . A. Stewart, J. Strube, V. Vassilev

Abstract: Research in high energy physics (HEP) requires huge amounts of computing and storage, putting strong constraints on the code speed and resource usage. To meet these requirements, a compiled high-performance language is typically used; while for physicists, who focus on the application when developing the code, better research productivity pleads for a high-level programming language. A popular app… ▽ More Research in high energy physics (HEP) requires huge amounts of computing and storage, putting strong constraints on the code speed and resource usage. To meet these requirements, a compiled high-performance language is typically used; while for physicists, who focus on the application when developing the code, better research productivity pleads for a high-level programming language. A popular approach consists of combining Python, used for the high-level interface, and C++, used for the computing intensive part of the code. A more convenient and efficient approach would be to use a language that provides both high-level programming and high-performance. The Julia programming language, developed at MIT especially to allow the use of a single language in research activities, has followed this path. In this paper the applicability of using the Julia language for HEP research is explored, covering the different aspects that are important for HEP code development: runtime performance, handling of large projects, interface with legacy code, distributed computing, training, and ease of programming. The study shows that the HEP community would benefit from a large scale adoption of this programming language. The HEP-specific foundation libraries that would need to be consolidated are identified △ Less

Submitted 6 October, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

Comments: 32 pages, 5 figures, 4 tables

ACM Class: J.2

Journal ref: Computing. Comput Softw Big Sci 7, 10 (2023)

arXiv:2304.05814 [pdf, other]

Scaling MadMiner with a deployment on REANA

Authors: Irina Espejo, Sinclert Pérez, Kenyi Hurtado, Lukas Heinrich, Kyle Cranmer

Abstract: MadMiner is a Python package that implements a powerful family of multivariate inference techniques that leverage matrix element information and machine learning. This multivariate approach neither requires the reduction of high-dimensional data to summary statistics nor any simplifications to the underlying physics or detector response. In this paper, we address some of the challenges arising fro… ▽ More MadMiner is a Python package that implements a powerful family of multivariate inference techniques that leverage matrix element information and machine learning. This multivariate approach neither requires the reduction of high-dimensional data to summary statistics nor any simplifications to the underlying physics or detector response. In this paper, we address some of the challenges arising from deploying MadMiner in a real-scale HEP analysis with the goal of offering a new tool in HEP that is easily accessible. The proposed approach encapsulates a typical MadMiner pipeline into a parametrized yadage workflow described in YAML files. The general workflow is split into two yadage sub-workflows, one dealing with the physics simulations and the other with the ML inference. After that, the workflow is deployed using REANA, a reproducible research data analysis platform that takes care of flexibility, scalability, reusability, and reproducibility features. To test the performance of our method, we performed scaling experiments for a MadMiner workflow on the National Energy Research Scientific Computer (NERSC) cluster with an HT-Condor back-end. All the stages of the physics sub-workflow had a linear dependency between resources or wall time and the number of events generated. This trend has allowed us to run a typical MadMiner workflow, consisting of 11M events, in 5 hours compared to days in the original study. △ Less

Submitted 12 April, 2023; originally announced April 2023.

Comments: To be published in proceedings of 21st International Workshop on Advanced Computing and Analysis Techniques in Physics Research

arXiv:2303.02101 [pdf, other]

Configurable calorimeter simulation for AI applications

Authors: Francesco Armando Di Bello, Anton Charkin-Gorbulin, Kyle Cranmer, Etienne Dreyer, Sanmay Ganguly, Eilam Gross, Lukas Heinrich, Lorenzo Santi, Marumi Kado, Nilotpal Kakati, Patrick Rieck, Matteo Tusoni

Abstract: A configurable calorimeter simulation for AI (COCOA) applications is presented, based on the Geant4 toolkit and interfaced with the Pythia event generator. This open-source project is aimed to support the development of machine learning algorithms in high energy physics that rely on realistic particle shower descriptions, such as reconstruction, fast simulation, and low-level analysis. Specificati… ▽ More A configurable calorimeter simulation for AI (COCOA) applications is presented, based on the Geant4 toolkit and interfaced with the Pythia event generator. This open-source project is aimed to support the development of machine learning algorithms in high energy physics that rely on realistic particle shower descriptions, such as reconstruction, fast simulation, and low-level analysis. Specifications such as the granularity and material of its nearly hermetic geometry are user-configurable. The tool is supplemented with simple event processing including topological clustering, jet algorithms, and a nearest-neighbors graph construction. Formatting is also provided to visualise events using the Phoenix event display software. △ Less

Submitted 8 March, 2023; v1 submitted 3 March, 2023; originally announced March 2023.

Comments: 9 pages, 11 figures

arXiv:2302.03583 [pdf, other]

doi 10.1140/epjc/s10052-023-11885-1

Data Preservation in High Energy Physics

Authors: T. Basaglia, M. Bellis, J. Blomer, J. Boyd, C. Bozzi, D. Britzger, S. Campana, C. Cartaro, G. Chen, B. Couturier, G. David, C. Diaconu, A. Dobrin, D. Duellmann, M. Ebert, P. Elmer, J. Fernandes, L. Fields, P. Fokianos, G. Ganis, A. Geiser, M. Gheata, J. B. Gonzalez Lopez, T. Hara, L. Heinrich , et al. (29 additional authors not shown)

Abstract: Data preservation is a mandatory specification for any present and future experimental facility and it is a cost-effective way of doing fundamental research by exploiting unique data sets in the light of the continuously increasing theoretical understanding. This document summarizes the status of data preservation in high energy physics. The paradigms and the methodological advances are discussed… ▽ More Data preservation is a mandatory specification for any present and future experimental facility and it is a cost-effective way of doing fundamental research by exploiting unique data sets in the light of the continuously increasing theoretical understanding. This document summarizes the status of data preservation in high energy physics. The paradigms and the methodological advances are discussed from a perspective of more than ten years of experience with a structured effort at international level. The status and the scientific return related to the preservation of data accumulated at large collider experiments are presented, together with an account of ongoing efforts to ensure long-term analysis capabilities for ongoing and future experiments. Transverse projects aimed at generic solutions, most of which are specifically inspired by open science and FAIR principles, are presented as well. A prospective and an action plan are also indicated. △ Less

Submitted 9 September, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

Report number: DPHEP-2023-01

Journal ref: Eur. Phys. J. C 83, 795 (2023)

arXiv:2212.04889 [pdf, ps, other]

doi 10.5281/zenodo.7418818

Second Analysis Ecosystem Workshop Report

Authors: Mohamed Aly, Jackson Burzynski, Bryan Cardwell, Daniel C. Craik, Tal van Daalen, Tomas Dado, Ayanabha Das, Antonio Delgado Peris, Caterina Doglioni, Peter Elmer, Engin Eren, Martin B. Eriksen, Jonas Eschle, Giulio Eulisse, Conor Fitzpatrick, José Flix Molina, Alessandra Forti, Ben Galewsky, Sean Gasiorowski, Aman Goel, Loukas Gouskos, Enrico Guiraud, Kanhaiya Gupta, Stephan Hageboeck, Allison Reinsvold Hall , et al. (44 additional authors not shown)

Abstract: The second workshop on the HEP Analysis Ecosystem took place 23-25 May 2022 at IJCLab in Orsay, to look at progress and continuing challenges in scaling up HEP analysis to meet the needs of HL-LHC and DUNE, as well as the very pressing needs of LHC Run 3 analysis. The workshop was themed around six particular topics, which were felt to capture key questions, opportunities and challenges. Each to… ▽ More The second workshop on the HEP Analysis Ecosystem took place 23-25 May 2022 at IJCLab in Orsay, to look at progress and continuing challenges in scaling up HEP analysis to meet the needs of HL-LHC and DUNE, as well as the very pressing needs of LHC Run 3 analysis. The workshop was themed around six particular topics, which were felt to capture key questions, opportunities and challenges. Each topic arranged a plenary session introduction, often with speakers summarising the state-of-the art and the next steps for analysis. This was then followed by parallel sessions, which were much more discussion focused, and where attendees could grapple with the challenges and propose solutions that could be tried. Where there was significant overlap between topics, a joint discussion between them was arranged. In the weeks following the workshop the session conveners wrote this document, which is a summary of the main discussions, the key points raised and the conclusions and outcomes. The document was circulated amongst the participants for comments before being finalised here. △ Less

Submitted 9 December, 2022; originally announced December 2022.

Report number: HSF-DOC-2022-02

arXiv:2212.01328 [pdf, other]

doi 10.1140/epjc/s10052-023-11677-7

Reconstructing particles in jets using set transformer and hypergraph prediction networks

Authors: Francesco Armando Di Bello, Etienne Dreyer, Sanmay Ganguly, Eilam Gross, Lukas Heinrich, Anna Ivina, Marumi Kado, Nilotpal Kakati, Lorenzo Santi, Jonathan Shlomi, Matteo Tusoni

Abstract: The task of reconstructing particles from low-level detector response data to predict the set of final state particles in collision events represents a set-to-set prediction task requiring the use of multiple features and their correlations in the input data. We deploy three separate set-to-set neural network architectures to reconstruct particles in events containing a single jet in a fully-simul… ▽ More The task of reconstructing particles from low-level detector response data to predict the set of final state particles in collision events represents a set-to-set prediction task requiring the use of multiple features and their correlations in the input data. We deploy three separate set-to-set neural network architectures to reconstruct particles in events containing a single jet in a fully-simulated calorimeter. Performance is evaluated in terms of particle reconstruction quality, properties regression, and jet-level metrics. The results demonstrate that such a high dimensional end-to-end approach succeeds in surpassing basic parametric approaches in disentangling individual neutral particles inside of jets and optimizing the use of complementary detector information. In particular, the performance comparison favors a novel architecture based on learning hypergraph structure, HGPflow, which benefits from a physically-interpretable approach to particle reconstruction. △ Less

Submitted 2 August, 2023; v1 submitted 2 December, 2022; originally announced December 2022.

Comments: 17 pages, 21 figures

Journal ref: Eur. Phys. J. C 83 (2023) 596

arXiv:2211.15838 [pdf, other]

pyhf: pure-Python implementation of HistFactory with tensors and automatic differentiation

Authors: Matthew Feickert, Lukas Heinrich, Giordon Stark

Abstract: The HistFactory p.d.f. template is per-se independent of its implementation in ROOT and it is useful to be able to run statistical analysis outside of the ROOT, RooFit, RooStats framework. pyhf is a pure-Python implementation of that statistical model for multi-bin histogram-based analysis and its interval estimation is based on the asymptotic formulas of "Asymptotic formulae for likelihood-based… ▽ More The HistFactory p.d.f. template is per-se independent of its implementation in ROOT and it is useful to be able to run statistical analysis outside of the ROOT, RooFit, RooStats framework. pyhf is a pure-Python implementation of that statistical model for multi-bin histogram-based analysis and its interval estimation is based on the asymptotic formulas of "Asymptotic formulae for likelihood-based tests of new physics". pyhf supports modern computational graph libraries such as TensorFlow, PyTorch, and JAX in order to make use of features such as auto-differentiation and GPU acceleration. In addition, pyhf's JSON serialization specification for HistFactory models has been used to publish 23 full probability models from published ATLAS collaboration analyses to HEPData. △ Less

Submitted 28 November, 2022; originally announced November 2022.

Comments: 6 pages, 1 figure, 1 listing. Contribution to the Proceedings of the 41st International Conference on High Energy physics (ICHEP 2022). If you are looking to cite pyhf as software, please follow the citation instructions at https://pyhf.readthedocs.io/en/stable/citations.html

arXiv:2211.06406 [pdf, other]

Set-Conditional Set Generation for Particle Physics

Authors: Francesco Armando Di Bello, Etienne Dreyer, Sanmay Ganguly, Eilam Gross, Lukas Heinrich, Marumi Kado, Nilotpal Kakati, Jonathan Shlomi, Nathalie Soybelman

Abstract: The simulation of particle physics data is a fundamental but computationally intensive ingredient for physics analysis at the Large Hadron Collider, where observational set-valued data is generated conditional on a set of incoming particles. To accelerate this task, we present a novel generative model based on a graph neural network and slot-attention components, which exceeds the performance of p… ▽ More The simulation of particle physics data is a fundamental but computationally intensive ingredient for physics analysis at the Large Hadron Collider, where observational set-valued data is generated conditional on a set of incoming particles. To accelerate this task, we present a novel generative model based on a graph neural network and slot-attention components, which exceeds the performance of pre-existing baselines. △ Less

Submitted 21 November, 2023; v1 submitted 11 November, 2022; originally announced November 2022.

Comments: 10 pages, 9 figures

arXiv:2210.08973 [pdf, ps, other]

doi 10.1038/s41597-023-02298-6

FAIR for AI: An interdisciplinary and international community building perspective

Authors: E. A. Huerta, Ben Blaiszik, L. Catherine Brinson, Kristofer E. Bouchard, Daniel Diaz, Caterina Doglioni, Javier M. Duarte, Murali Emani, Ian Foster, Geoffrey Fox, Philip Harris, Lukas Heinrich, Shantenu Jha, Daniel S. Katz, Volodymyr Kindratenko, Christine R. Kirkpatrick, Kati Lassila-Perini, Ravi K. Madduri, Mark S. Neubauer, Fotis E. Psomopoulos, Avik Roy, Oliver Rübel, Zhizhen Zhao, Ruike Zhu

Abstract: A foundational set of findable, accessible, interoperable, and reusable (FAIR) principles were proposed in 2016 as prerequisites for proper data management and stewardship, with the goal of enabling the reusability of scholarly data. The principles were also meant to apply to other digital assets, at a high level, and over time, the FAIR guiding principles have been re-interpreted or extended to i… ▽ More A foundational set of findable, accessible, interoperable, and reusable (FAIR) principles were proposed in 2016 as prerequisites for proper data management and stewardship, with the goal of enabling the reusability of scholarly data. The principles were also meant to apply to other digital assets, at a high level, and over time, the FAIR guiding principles have been re-interpreted or extended to include the software, tools, algorithms, and workflows that produce data. FAIR principles are now being adapted in the context of AI models and datasets. Here, we present the perspectives, vision, and experiences of researchers from different countries, disciplines, and backgrounds who are leading the definition and adoption of FAIR principles in their communities of practice, and discuss outcomes that may result from pursuing and incentivizing FAIR AI research. The material for this report builds on the FAIR for AI Workshop held at Argonne National Laboratory on June 7, 2022. △ Less

Submitted 1 August, 2023; v1 submitted 30 September, 2022; originally announced October 2022.

Comments: 10 pages, comments welcome!; v2: 12 pages, accepted to Scientific Data

ACM Class: I.2.0; E.0

Journal ref: Scientific Data 10, 487 (2023)

arXiv:2203.10057 [pdf, other]

Data and Analysis Preservation, Recasting, and Reinterpretation

Authors: Stephen Bailey, Christian Bierlich, Andy Buckley, Jon Butterworth, Kyle Cranmer, Matthew Feickert, Lukas Heinrich, Axel Huebl, Sabine Kraml, Anders Kvellestad, Clemens Lange, Andre Lessa, Kati Lassila-Perini, Christine Nattrass, Mark S. Neubauer, Sezen Sekmen, Giordon Stark, Graeme Watt

Abstract: We make the case for the systematic, reliable preservation of event-wise data, derived data products, and executable analysis code. This preservation enables the analyses' long-term future reuse, in order to maximise the scientific impact of publicly funded particle-physics experiments. We cover the needs of both the experimental and theoretical particle physics communities, and outline the goals… ▽ More We make the case for the systematic, reliable preservation of event-wise data, derived data products, and executable analysis code. This preservation enables the analyses' long-term future reuse, in order to maximise the scientific impact of publicly funded particle-physics experiments. We cover the needs of both the experimental and theoretical particle physics communities, and outline the goals and benefits that are uniquely enabled by analysis recasting and reinterpretation. We also discuss technical challenges and infrastructure needs, as well as sociological challenges and changes, and give summary recommendations to the particle-physics community. △ Less

Submitted 18 March, 2022; originally announced March 2022.

Comments: 25 pages, 4 sets of recommendations. Contribution to Snowmass 2021

arXiv:2203.07460 [pdf, other]

doi 10.21468/SciPostPhys.14.4.079

Machine Learning and LHC Event Generation

Authors: Anja Butter, Tilman Plehn, Steffen Schumann, Simon Badger, Sascha Caron, Kyle Cranmer, Francesco Armando Di Bello, Etienne Dreyer, Stefano Forte, Sanmay Ganguly, Dorival Gonçalves, Eilam Gross, Theo Heimel, Gudrun Heinrich, Lukas Heinrich, Alexander Held, Stefan Höche, Jessica N. Howard, Philip Ilten, Joshua Isaacson, Timo Janßen, Stephen Jones, Marumi Kado, Michael Kagan, Gregor Kasieczka , et al. (26 additional authors not shown)

Abstract: First-principle simulations are at the heart of the high-energy physics research program. They link the vast data output of multi-purpose detectors with fundamental theory predictions and interpretation. This review illustrates a wide range of applications of modern machine learning to event generation and simulation-based inference, including conceptional developments driven by the specific requi… ▽ More First-principle simulations are at the heart of the high-energy physics research program. They link the vast data output of multi-purpose detectors with fundamental theory predictions and interpretation. This review illustrates a wide range of applications of modern machine learning to event generation and simulation-based inference, including conceptional developments driven by the specific requirements of particle physics. New ideas and tools developed at the interface of particle physics and machine learning will improve the speed and precision of forward simulations, handle the complexity of collision data, and enhance inference as an inverse simulation problem. △ Less

Submitted 28 December, 2022; v1 submitted 14 March, 2022; originally announced March 2022.

Comments: Review article based on a Snowmass 2021 contribution

Journal ref: SciPost Phys. 14, 079 (2023)

arXiv:2203.05570 [pdf, other]

doi 10.1088/1742-6596/2438/1/012105

neos: End-to-End-Optimised Summary Statistics for High Energy Physics

Authors: Nathan Simpson, Lukas Heinrich

Abstract: The advent of deep learning has yielded powerful tools to automatically compute gradients of computations. This is because training a neural network equates to iteratively updating its parameters using gradient descent to find the minimum of a loss function. Deep learning is then a subset of a broader paradigm; a workflow with free parameters that is end-to-end optimisable, provided one can keep t… ▽ More The advent of deep learning has yielded powerful tools to automatically compute gradients of computations. This is because training a neural network equates to iteratively updating its parameters using gradient descent to find the minimum of a loss function. Deep learning is then a subset of a broader paradigm; a workflow with free parameters that is end-to-end optimisable, provided one can keep track of the gradients all the way through. This work introduces neos: an example implementation following this paradigm of a fully differentiable high-energy physics workflow, capable of optimising a learnable summary statistic with respect to the expected sensitivity of an analysis. Doing this results in an optimisation process that is aware of the modelling and treatment of systematic uncertainties. △ Less

Submitted 10 March, 2022; originally announced March 2022.

Comments: 6 pages, 3 figures, Proceedings of the 20th International Workshop on Advanced Computing and Analysis Techniques in Physics Research (ACAT 2021)

arXiv:2109.04981 [pdf, other]

doi 10.21468/SciPostPhys.12.1.037

Publishing statistical models: Getting the most out of particle physics experiments

Authors: Kyle Cranmer, Sabine Kraml, Harrison B. Prosper, Philip Bechtle, Florian U. Bernlochner, Itay M. Bloch, Enzo Canonero, Marcin Chrzaszcz, Andrea Coccaro, Jan Conrad, Glen Cowan, Matthew Feickert, Nahuel Ferreiro Iachellini, Andrew Fowlie, Lukas Heinrich, Alexander Held, Thomas Kuhr, Anders Kvellestad, Maeve Madigan, Farvah Mahmoudi, Knut Dundas Morå, Mark S. Neubauer, Maurizio Pierini, Juan Rojo, Sezen Sekmen , et al. (8 additional authors not shown)

Abstract: The statistical models used to derive the results of experimental analyses are of incredible scientific value and are essential information for analysis preservation and reuse. In this paper, we make the scientific case for systematically publishing the full statistical models and discuss the technical developments that make this practical. By means of a variety of physics cases -- including parto… ▽ More The statistical models used to derive the results of experimental analyses are of incredible scientific value and are essential information for analysis preservation and reuse. In this paper, we make the scientific case for systematically publishing the full statistical models and discuss the technical developments that make this practical. By means of a variety of physics cases -- including parton distribution functions, Higgs boson measurements, effective field theory interpretations, direct searches for new physics, heavy flavor physics, direct dark matter detection, world averages, and beyond the Standard Model global fits -- we illustrate how detailed information on the statistical modelling can enhance the short- and long-term impact of experimental results. △ Less

Submitted 10 September, 2021; originally announced September 2021.

Comments: 60 pages, 15 figures

Journal ref: SciPost Phys. 12, 037 (2022)

arXiv:2105.14027 [pdf, other]

doi 10.21468/SciPostPhys.12.1.043

The Dark Machines Anomaly Score Challenge: Benchmark Data and Model Independent Event Classification for the Large Hadron Collider

Authors: T. Aarrestad, M. van Beekveld, M. Bona, A. Boveia, S. Caron, J. Davies, A. De Simone, C. Doglioni, J. M. Duarte, A. Farbin, H. Gupta, L. Hendriks, L. Heinrich, J. Howarth, P. Jawahar, A. Jueid, J. Lastow, A. Leinweber, J. Mamuzic, E. Merényi, A. Morandini, P. Moskvitina, C. Nellist, J. Ngadiuba, B. Ostdiek , et al. (14 additional authors not shown)

Abstract: We describe the outcome of a data challenge conducted as part of the Dark Machines Initiative and the Les Houches 2019 workshop on Physics at TeV colliders. The challenged aims at detecting signals of new physics at the LHC using unsupervised machine learning algorithms. First, we propose how an anomaly score could be implemented to define model-independent signal regions in LHC searches. We defin… ▽ More We describe the outcome of a data challenge conducted as part of the Dark Machines Initiative and the Les Houches 2019 workshop on Physics at TeV colliders. The challenged aims at detecting signals of new physics at the LHC using unsupervised machine learning algorithms. First, we propose how an anomaly score could be implemented to define model-independent signal regions in LHC searches. We define and describe a large benchmark dataset, consisting of >1 Billion simulated LHC events corresponding to $10~\rm{fb}^{-1}$ of proton-proton collisions at a center-of-mass energy of 13 TeV. We then review a wide range of anomaly detection and density estimation algorithms, developed in the context of the data challenge, and we measure their performance in a set of realistic analysis environments. We draw a number of useful conclusions that will aid the development of unsupervised new physics searches during the third run of the LHC, and provide our benchmark dataset for future studies at https://www.phenoMLdata.org. Code to reproduce the analysis is provided at https://github.com/bostdiek/DarkMachines-UnsupervisedChallenge. △ Less

Submitted 9 December, 2021; v1 submitted 28 May, 2021; originally announced May 2021.

Comments: v1: 54 pages, 24 figures. v2: 56 pages, citations added, extend discussion of look-elsewhere-effect, results unchanged; v3. minor typos and updated references

Journal ref: SciPost Phys. 12, 043 (2022)

arXiv:2103.02182 [pdf, other]

doi 10.1051/epjconf/202125102070

Distributed statistical inference with pyhf enabled through funcX

Authors: Matthew Feickert, Lukas Heinrich, Giordon Stark, Ben Galewsky

Abstract: In High Energy Physics facilities that provide High Performance Computing environments provide an opportunity to efficiently perform the statistical inference required for analysis of data from the Large Hadron Collider, but can pose problems with orchestration and efficient scheduling. The compute architectures at these facilities do not easily support the Python compute model, and the configurat… ▽ More In High Energy Physics facilities that provide High Performance Computing environments provide an opportunity to efficiently perform the statistical inference required for analysis of data from the Large Hadron Collider, but can pose problems with orchestration and efficient scheduling. The compute architectures at these facilities do not easily support the Python compute model, and the configuration scheduling of batch jobs for physics often requires expertise in multiple job scheduling services. The combination of the pure-Python libraries pyhf and funcX reduces the common problem in HEP analyses of performing statistical inference with binned models, that would traditionally take multiple hours and bespoke scheduling, to an on-demand (fitting) "function as a service" that can scalably execute across workers in just a few minutes, offering reduced time to insight and inference. We demonstrate execution of a scalable workflow using funcX to simultaneously fit 125 signal hypotheses from a published ATLAS search for new physics using pyhf with a wall time of under 3 minutes. We additionally show performance comparisons for other physics analyses with openly published probability models and argue for a blueprint of fitting as a service systems at HPC centers. △ Less

Submitted 31 August, 2021; v1 submitted 3 March, 2021; originally announced March 2021.

Comments: 10 pages, 2 figures, 2 listings, 1 table, presented at the 25th International Conference on Computing in High Energy & Nuclear Physics

Journal ref: EPJ Web Conf. 251 (2021) 02070

arXiv:2103.00659 [pdf, other]

doi 10.1007/s41781-021-00069-9

Software Training in HEP

Authors: Sudhir Malik, Samuel Meehan, Kilian Lieret, Meirin Oan Evans, Michel H. Villanueva, Daniel S. Katz, Graeme A. Stewart, Peter Elmer, Sizar Aziz, Matthew Bellis, Riccardo Maria Bianchi, Gianluca Bianco, Johan Sebastian Bonilla, Angela Burger, Jackson Burzynski, David Chamont, Matthew Feickert, Philipp Gadow, Bernhard Manfred Gruber, Daniel Guest, Stephan Hageboeck, Lukas Heinrich, Maximilian M. Horzela, Marc Huwiler, Clemens Lange , et al. (22 additional authors not shown)

Abstract: Long term sustainability of the high energy physics (HEP) research software ecosystem is essential for the field. With upgrades and new facilities coming online throughout the 2020s this will only become increasingly relevant throughout this decade. Meeting this sustainability challenge requires a workforce with a combination of HEP domain knowledge and advanced software skills. The required softw… ▽ More Long term sustainability of the high energy physics (HEP) research software ecosystem is essential for the field. With upgrades and new facilities coming online throughout the 2020s this will only become increasingly relevant throughout this decade. Meeting this sustainability challenge requires a workforce with a combination of HEP domain knowledge and advanced software skills. The required software skills fall into three broad groups. The first is fundamental and generic software engineering (e.g. Unix, version control,C++, continuous integration). The second is knowledge of domain specific HEP packages and practices (e.g., the ROOT data format and analysis framework). The third is more advanced knowledge involving more specialized techniques. These include parallel programming, machine learning and data science tools, and techniques to preserve software projects at all scales. This paper dis-cusses the collective software training program in HEP and its activities led by the HEP Software Foundation (HSF) and the Institute for Research and Innovation in Software in HEP (IRIS-HEP). The program equips participants with an array of software skills that serve as ingredients from which solutions to the computing challenges of HEP can be formed. Beyond serving the community by ensuring that members are able to pursue research goals, this program serves individuals by providing intellectual capital and transferable skills that are becoming increasingly important to careers in the realm of software and computing, whether inside or outside HEP △ Less

Submitted 6 August, 2021; v1 submitted 28 February, 2021; originally announced March 2021.

Comments: For CHEP 2021 conference,sent for publication to CSBS Springer

MSC Class: HEP; software; training

arXiv:2008.13636 [pdf, ps, other]

doi 10.5281/zenodo.4009114

HL-LHC Computing Review: Common Tools and Community Software

Authors: HEP Software Foundation, :, Thea Aarrestad, Simone Amoroso, Markus Julian Atkinson, Joshua Bendavid, Tommaso Boccali, Andrea Bocci, Andy Buckley, Matteo Cacciari, Paolo Calafiura, Philippe Canal, Federico Carminati, Taylor Childers, Vitaliano Ciulli, Gloria Corti, Davide Costanzo, Justin Gage Dezoort, Caterina Doglioni, Javier Mauricio Duarte, Agnieszka Dziurda, Peter Elmer, Markus Elsing, V. Daniel Elvira, Giulio Eulisse , et al. (85 additional authors not shown)

Abstract: Common and community software packages, such as ROOT, Geant4 and event generators have been a key part of the LHC's success so far and continued development and optimisation will be critical in the future. The challenges are driven by an ambitious physics programme, notably the LHC accelerator upgrade to high-luminosity, HL-LHC, and the corresponding detector upgrades of ATLAS and CMS. In this doc… ▽ More Common and community software packages, such as ROOT, Geant4 and event generators have been a key part of the LHC's success so far and continued development and optimisation will be critical in the future. The challenges are driven by an ambitious physics programme, notably the LHC accelerator upgrade to high-luminosity, HL-LHC, and the corresponding detector upgrades of ATLAS and CMS. In this document we address the issues for software that is used in multiple experiments (usually even more widely than ATLAS and CMS) and maintained by teams of developers who are either not linked to a particular experiment or who contribute to common software within the context of their experiment activity. We also give space to general considerations for future software and projects that tackle upcoming challenges, no matter who writes it, which is an area where community convergence on best practice is extremely useful. △ Less

Submitted 31 August, 2020; originally announced August 2020.

Comments: 40 pages contribution to Snowmass 2021

Report number: HSF-DOC-2020-01

arXiv:2003.07868 [pdf, ps, other]

doi 10.21468/SciPostPhys.9.2.022

Reinterpretation of LHC Results for New Physics: Status and Recommendations after Run 2

Authors: Waleed Abdallah, Shehu AbdusSalam, Azar Ahmadov, Amine Ahriche, Gaël Alguero, Benjamin C. Allanach, Jack Y. Araz, Alexandre Arbey, Chiara Arina, Peter Athron, Emanuele Bagnaschi, Yang Bai, Michael J. Baker, Csaba Balazs, Daniele Barducci, Philip Bechtle, Aoife Bharucha, Andy Buckley, Jonathan Butterworth, Haiying Cai, Claudio Campagnari, Cari Cesarotti, Marcin Chrzaszcz, Andrea Coccaro, Eric Conte , et al. (117 additional authors not shown)

Abstract: We report on the status of efforts to improve the reinterpretation of searches and measurements at the LHC in terms of models for new physics, in the context of the LHC Reinterpretation Forum. We detail current experimental offerings in direct searches for new particles, measurements, technical implementations and Open Data, and provide a set of recommendations for further improving the presentati… ▽ More We report on the status of efforts to improve the reinterpretation of searches and measurements at the LHC in terms of models for new physics, in the context of the LHC Reinterpretation Forum. We detail current experimental offerings in direct searches for new particles, measurements, technical implementations and Open Data, and provide a set of recommendations for further improving the presentation of LHC results in order to better enable reinterpretation in the future. We also provide a brief description of existing software reinterpretation frameworks and recent global analyses of new physics that make use of the current data. △ Less

Submitted 21 July, 2020; v1 submitted 17 March, 2020; originally announced March 2020.

Comments: 58 pages, minor revision following comments from SciPost referees

Report number: CERN-LPCC-2020-001, FERMILAB-FN-1098-CMS-T, Imperial/HEP/2020/RIF/01

Journal ref: SciPost Phys. 9, 022 (2020)

arXiv:1910.10289 [pdf, other]

Extending RECAST for Truth-Level Reinterpretations

Authors: Alex Schuy, Lukas Heinrich, Kyle Cranmer, Shih-Chieh Hsu

Abstract: RECAST is an analysis reinterpretation framework; since analyses are often sensitive to a range of models, RECAST can be used to constrain the plethora of theoretical models without the significant investment required for a new analysis. However, experiment-specific full simulation is still computationally expensive. Thus, to facilitate rapid exploration, RECAST has been extended to truth-level re… ▽ More RECAST is an analysis reinterpretation framework; since analyses are often sensitive to a range of models, RECAST can be used to constrain the plethora of theoretical models without the significant investment required for a new analysis. However, experiment-specific full simulation is still computationally expensive. Thus, to facilitate rapid exploration, RECAST has been extended to truth-level reinterpretations, interfacing with existing systems such as RIVET. △ Less

Submitted 22 October, 2019; originally announced October 2019.

Comments: Talk presented at the 2019 Meeting of the Division of Particles and Fields of the American Physical Society (DPF2019), July 29 - August 2, 2019, Northeastern University, Boston, C1907293

arXiv:1903.04497 [pdf]

doi 10.1088/1361-6471/ab4574

Searching for long-lived particles beyond the Standard Model at the Large Hadron Collider

Authors: Juliette Alimena, James Beacham, Martino Borsato, Yangyang Cheng, Xabier Cid Vidal, Giovanna Cottin, Albert De Roeck, Nishita Desai, David Curtin, Jared A. Evans, Simon Knapen, Sabine Kraml, Andre Lessa, Zhen Liu, Sascha Mehlhase, Michael J. Ramsey-Musolf, Heather Russell, Jessie Shelton, Brian Shuve, Monica Verducci, Jose Zurita, Todd Adams, Michael Adersberger, Cristiano Alpigiani, Artur Apresyan , et al. (176 additional authors not shown)

Abstract: Particles beyond the Standard Model (SM) can generically have lifetimes that are long compared to SM particles at the weak scale. When produced at experiments such as the Large Hadron Collider (LHC) at CERN, these long-lived particles (LLPs) can decay far from the interaction vertex of the primary proton-proton collision. Such LLP signatures are distinct from those of promptly decaying particles t… ▽ More Particles beyond the Standard Model (SM) can generically have lifetimes that are long compared to SM particles at the weak scale. When produced at experiments such as the Large Hadron Collider (LHC) at CERN, these long-lived particles (LLPs) can decay far from the interaction vertex of the primary proton-proton collision. Such LLP signatures are distinct from those of promptly decaying particles that are targeted by the majority of searches for new physics at the LHC, often requiring customized techniques to identify, for example, significantly displaced decay vertices, tracks with atypical properties, and short track segments. Given their non-standard nature, a comprehensive overview of LLP signatures at the LHC is beneficial to ensure that possible avenues of the discovery of new physics are not overlooked. Here we report on the joint work of a community of theorists and experimentalists with the ATLAS, CMS, and LHCb experiments --- as well as those working on dedicated experiments such as MoEDAL, milliQan, MATHUSLA, CODEX-b, and FASER --- to survey the current state of LLP searches at the LHC, and to chart a path for the development of LLP searches into the future, both in the upcoming Run 3 and at the High-Luminosity LHC. The work is organized around the current and future potential capabilities of LHC experiments to generally discover new LLPs, and takes a signature-based approach to surveying classes of models that give rise to LLPs rather than emphasizing any particular theory motivation. We develop a set of simplified models; assess the coverage of current searches; document known, often unexpected backgrounds; explore the capabilities of proposed detector upgrades; provide recommendations for the presentation of search results; and look towards the newest frontiers, namely high-multiplicity "dark showers", highlighting opportunities for expanding the LHC reach for these signals. △ Less

Submitted 11 March, 2019; originally announced March 2019.

Journal ref: J. Phys. G: Nucl. Part. Phys. 47 090501 (2020)

arXiv:1807.02876 [pdf, other]

Machine Learning in High Energy Physics Community White Paper

Authors: Kim Albertsson, Piero Altoe, Dustin Anderson, John Anderson, Michael Andrews, Juan Pedro Araque Espinosa, Adam Aurisano, Laurent Basara, Adrian Bevan, Wahid Bhimji, Daniele Bonacorsi, Bjorn Burkle, Paolo Calafiura, Mario Campanelli, Louis Capps, Federico Carminati, Stefano Carrazza, Yi-fan Chen, Taylor Childers, Yann Coadou, Elias Coniavitis, Kyle Cranmer, Claire David, Douglas Davis, Andrea De Simone , et al. (103 additional authors not shown)

Abstract: Machine learning has been applied to several problems in particle physics research, beginning with applications to high-level physics analysis in the 1990s and 2000s, followed by an explosion of applications in particle and event identification and reconstruction in the 2010s. In this document we discuss promising future research and development areas for machine learning in particle physics. We d… ▽ More Machine learning has been applied to several problems in particle physics research, beginning with applications to high-level physics analysis in the 1990s and 2000s, followed by an explosion of applications in particle and event identification and reconstruction in the 2010s. In this document we discuss promising future research and development areas for machine learning in particle physics. We detail a roadmap for their implementation, software and hardware resource requirements, collaborative initiatives with the data science community, academia and industry, and training the particle physics community in data science. The main objective of the document is to connect and motivate these areas of research and development with the physics drivers of the High-Luminosity Large Hadron Collider and future neutrino experiments and identify the resource needs for their implementation. Additionally we identify areas where collaboration with external communities will be of great benefit. △ Less

Submitted 16 May, 2019; v1 submitted 8 July, 2018; originally announced July 2018.

Comments: Editors: Sergei Gleyzer, Paul Seyfert and Steven Schramm

arXiv:1712.06982 [pdf, other]

doi 10.1007/s41781-018-0018-8

A Roadmap for HEP Software and Computing R&D for the 2020s

Authors: Johannes Albrecht, Antonio Augusto Alves Jr, Guilherme Amadio, Giuseppe Andronico, Nguyen Anh-Ky, Laurent Aphecetche, John Apostolakis, Makoto Asai, Luca Atzori, Marian Babik, Giuseppe Bagliesi, Marilena Bandieramonte, Sunanda Banerjee, Martin Barisits, Lothar A. T. Bauerdick, Stefano Belforte, Douglas Benjamin, Catrin Bernius, Wahid Bhimji, Riccardo Maria Bianchi, Ian Bird, Catherine Biscarat, Jakob Blomer, Kenneth Bloom, Tommaso Boccali , et al. (285 additional authors not shown)

Abstract: Particle physics has an ambitious and broad experimental programme for the coming decades. This programme requires large investments in detector hardware, either to build new facilities and experiments, or to upgrade existing ones. Similarly, it requires commensurate investment in the R&D of software to acquire, manage, process, and analyse the shear amounts of data to be recorded. In planning for… ▽ More Particle physics has an ambitious and broad experimental programme for the coming decades. This programme requires large investments in detector hardware, either to build new facilities and experiments, or to upgrade existing ones. Similarly, it requires commensurate investment in the R&D of software to acquire, manage, process, and analyse the shear amounts of data to be recorded. In planning for the HL-LHC in particular, it is critical that all of the collaborating stakeholders agree on the software goals and priorities, and that the efforts complement each other. In this spirit, this white paper describes the R&D activities required to prepare for this software upgrade. △ Less

Submitted 19 December, 2018; v1 submitted 18 December, 2017; originally announced December 2017.

Report number: HSF-CWP-2017-01

Journal ref: Comput Softw Big Sci (2019) 3, 7

arXiv:1706.01878 [pdf, ps, other]

doi 10.1088/1742-6596/898/10/102019

Yadage and Packtivity - analysis preservation using parametrized workflows

Authors: Kyle Cranmer, Lukas Heinrich

Abstract: Preserving data analyses produced by the collaborations at LHC in a parametrized fashion is crucial in order to maintain reproducibility and re-usability. We argue for a declarative description in terms of individual processing steps - packtivities - linked through a dynamic directed acyclic graph (DAG) and present an initial set of JSON schemas for such a description and an implementation - yadag… ▽ More Preserving data analyses produced by the collaborations at LHC in a parametrized fashion is crucial in order to maintain reproducibility and re-usability. We argue for a declarative description in terms of individual processing steps - packtivities - linked through a dynamic directed acyclic graph (DAG) and present an initial set of JSON schemas for such a description and an implementation - yadage - capable of executing workflows of analysis preserved via Linux containers. △ Less

Submitted 6 June, 2017; originally announced June 2017.

Comments: 9 pages

arXiv:1704.05473 [pdf, other]

doi 10.1088/1742-6596/898/10/102006

HEPData: a repository for high energy physics data

Authors: Eamonn Maguire, Lukas Heinrich, Graeme Watt

Abstract: The Durham High Energy Physics Database (HEPData) has been built up over the past four decades as a unique open-access repository for scattering data from experimental particle physics papers. It comprises data points underlying several thousand publications. Over the last two years, the HEPData software has been completely rewritten using modern computing technologies as an overlay on the Invenio… ▽ More The Durham High Energy Physics Database (HEPData) has been built up over the past four decades as a unique open-access repository for scattering data from experimental particle physics papers. It comprises data points underlying several thousand publications. Over the last two years, the HEPData software has been completely rewritten using modern computing technologies as an overlay on the Invenio v3 digital library framework. The software is open source with the new site available at https://hepdata.net now replacing the previous site at http://hepdata.cedar.ac.uk. In this write-up, we describe the development of the new site and explain some of the advantages it offers over the previous platform. △ Less

Submitted 18 April, 2017; originally announced April 2017.

Comments: 8 pages, 6 figures. Submitted to the proceedings of the 22nd International Conference on Computing in High Energy and Nuclear Physics, CHEP 2016, 10-14 October 2016, San Francisco

Report number: IPPP/17/31

Journal ref: J. Phys.: Conf. Ser. 898 (2017) 102006

Showing 1–33 of 33 results for author: Heinrich, L