Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Parton Labeling without Matching:
Unveiling Emergent Labelling Capabilities in Regression Models

Shikai Qiu calvin_qiu@berkeley.edu Department of Physics, University of California, Berkeley, Berkeley, CA 94720, USA Courant Institute of Mathematical Sciences, New York University, New York, NY 10012    Shuo Han shuohan@lbl.gov Physics Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA    Xiangyang Ju xju@lbl.gov Physics Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA    Benjamin Nachman bpnachman@lbl.gov Physics Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA Berkeley Institute for Data Science, University of California, Berkeley, CA 94720, USA    Haichen Wang haichenwang@berkeley.edu Physics Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA Department of Physics, University of California, Berkeley, Berkeley, CA 94720, USA
Abstract

Parton labeling methods are widely used when reconstructing collider events with top quarks or other massive particles. State-of-the-art techniques are based on machine learning and require training data with events that have been matched using simulations with truth information. In nature, there is no unique matching between partons and final state objects due to the properties of the strong force and due to acceptance effects. We propose a new approach to parton labeling that circumvents these challenges by recycling regression models. The final state objects that are most relevant for a regression model to predict the properties of a particular top quark are assigned to said parent particle without having any parton-matched training data. This approach is demonstrated using simulated events with top quarks and outperforms the widely-used χ2superscript𝜒2\chi^{2}italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT method.

I Introduction

A common task in collider event reconstruction is assigning final state objects to a branch of the hypothesized reaction that generated the event. For example, hard-scatter events with outgoing quarks and gluons produce jets that can be associated with their initiating partons. When there are many outgoing particles from the hard-scatter reaction, this is a complex combinatorial challenge. Events with multiple top quarks naturally result in such final states, since nearly all top quarks decay to a b𝑏bitalic_b-quark and a W𝑊Witalic_W boson, which subsequently decays to two quarks or leptons. A key challenge in many measurements and searches involving top quarks is the assignment of reconstructed objects with one of the top quark decay products. Classically, this assignment has used χ2superscript𝜒2\chi^{2}italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT or related methods that enumerate all possibilities and pick the one which is most consistent with having two on-shell W𝑊Witalic_W boson and top quark intermediaries. The difficulty with these methods is that they do not take into account all available information and are computationally expensive.

A number of modern machine learning (ML) methods have been proposed to address these challenges. These techniques range from Boosted Decision Trees [1, 2, 3] and existing neural networks [4, 5] to custom, permutation invariant deep learning methods [6, 7, 8, 9]. In all cases, object identification can make use of a variety of lepton-, jet- and event-level properties that were inaccessible with χ2superscript𝜒2\chi^{2}italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT or likelihood methods [10]. This is possible because the ML approaches are trained on simulations, so whatever information is available and well-modeled (within uncertainty) can be used for object labeling.

Refer to caption
Refer to caption
Figure 1: Simulated all-hadronic tt¯𝑡¯𝑡t\bar{t}italic_t over¯ start_ARG italic_t end_ARG events. In the Ncolorssubscript𝑁colorsN_{\text{colors}}\rightarrow\inftyitalic_N start_POSTSUBSCRIPT colors end_POSTSUBSCRIPT → ∞ limit, hadrons can be uniquely associated as W𝑊Witalic_W boson descendants. Top: number of jets with at least 10% of their energy from the W+superscript𝑊W^{+}italic_W start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT. Bottom: of these jets, the fraction of their energy from the W+superscript𝑊W^{+}italic_W start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT. Jets are clustered using the anti-ktsubscript𝑘𝑡k_{t}italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT [11] algorithm with R=0.4𝑅0.4R=0.4italic_R = 0.4.

Despite the success of these ML methods, they all share a common fundamental challenge with classical approaches. In particular, they all require matched objects for training. This may be problematic for two reasons (see e.g. Fig. 1). First, there is no unique match between a hard-scatter quark/gluon and a jet. A single quark/gluon can fragment into multiple jets, and a single jet can be composed of hadrons with energy flow originating from multiple quarks/gluons. This is particularly acute for top quarks, which carry color charge and thus must be color-connected to another quark/gluon in the event. The extent of the overlap also depends on the jet clustering algorithm - jets with a larger catchment area [12] are more likely to be due to the merger of multiple parton showers. Second, even if a parent object like a top quark could be uniquely associated with a set of decay products, acceptance effects will obscure the association. In particular, the finite geometric and energy acceptance of detectors results in missed final state objects.

Our philosophy is to circumvent the issues caused by object-parton matching by directly regressing onto the target particle properties. In Ref. [13], we designed the Covariant Particle Transformer (CPT), a partially Lorentz covariant point cloud transformer, to learn the four-vectors of top quarks given reconstructed jets, leptons, photons, and missing energy. In this paper, we show how one can reuse such a regression method to perform parton labeling. We explore two possibilities, one based on the attention mechanism within the CPT and one based on the gradient of predicted four-vectors with respect to the inputs. The latter approach is compatible with any regression-based top quark reconstruction method, even if it does not involve neural network attention. While we still advocate for regression in cases where the underlying top quark properties are needed, parton labeling is still widely used for determining these properties and no matter what approach is used, parton labels can be useful for diagnostic purposes.

This paper is organized as follows. Section II briefly reviews the CPT technique and then introduces our two approaches to extracting parton labels from the regression model. Numerical results are presented in Sec. IV using a dataset that is briefly introduced in Sec. III. The paper ends with conclusions and outlook in Sec. V.

II Methods

Our goal is to take final states with n𝑛nitalic_n top quarks that decay hadronically and assign jets to one of these quarks. In principle, one could simultaneously predict n𝑛nitalic_n and assign jets, but in practice, there is often a particular number of target quarks; if not, one could first run a multi-class classification procedure. We also restrict our approach to assigning three jets to each top quark. Both ML-based approaches described below could be modified to assign fewer or more jets by placing thresholds on the Jacobean values (Sec. II.2) or the attention weights (Sec. II.3), but we leave this to future work.

II.1 Covariant Particle Transformer

The Covariant Particle Transformer (CPT) is a Transformer-based [14] neural network tailored for collider physics applications and has demonstrated superior performance in predicting top quarks’ kinematics compared to classical approaches [13]. CPT takes as inputs the 4-vectors and particle identifications of all observed final state objects (jets, lepton, photons, etc.) and outputs predicted 4-vectors of a pre-specified number of top quarks. Compared to the standard transformer architecture, CPT is designed to respect important symmetries in collider physics: it is permutation invariant under reordering of the inputs and partially Lorentz covariant, meaning if we apply a longitudinal boost and/or a transverse rotation to all the inputs, CPT’s outputs will be boosted and/or rotated accordingly, respecting Lorentz symmetry.

In each layer of the network, CPT additively updates the feature vector of every object fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (could be an input or output) with ΔfiΔsubscript𝑓𝑖\Delta f_{i}roman_Δ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT defined as a function of all the feature vectors {fk}::subscript𝑓𝑘absent\{f_{k}\}:{ italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } :

Δfi=kαikφ(fk),Δsubscript𝑓𝑖subscript𝑘subscript𝛼𝑖𝑘𝜑subscript𝑓𝑘\Delta f_{i}=\sum_{k}\alpha_{ik}\varphi(f_{k}),roman_Δ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT italic_φ ( italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) , (1)

where φ𝜑\varphiitalic_φ is a learned linear transformation and {αik}subscript𝛼𝑖𝑘\{\alpha_{ik}\}{ italic_α start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT } are positive attention weights, which are themselves non-linear functions of {fk},subscript𝑓𝑘\{f_{k}\},{ italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } , such that kαik=1subscript𝑘subscript𝛼𝑖𝑘1\sum_{k}\alpha_{ik}=1∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT = 1 for each i.𝑖i.italic_i . The output feature vectors are eventually transformed to the predicted 4-vectors of the top quarks. If i𝑖iitalic_i is an output index and k𝑘kitalic_k is an input index, then intuitively αiksubscript𝛼𝑖𝑘\alpha_{ik}italic_α start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT measures the importance of the information in k𝑘kitalic_k for predicting the properties of i.𝑖i.italic_i . The above procedure is named the covariant attention mechanism, which modifies the standard attention mechanism in a transformer to ensure partial Lorentz covariance. To capture complex correlations between the inputs and outputs, CPT uses L=6𝐿6L=6italic_L = 6 covariant attention layers and H=4𝐻4H=4italic_H = 4 attention heads per layer to decode the top quark 4-vectors, where each attention head performs separate learned updates according to Equation 1 for added flexibility. We refer readers to the original CPT paper for a more comprehensive review of the architecture and implementation.

II.2 Gradient-based Labeling

The idea of the gradient-based method is to assign a jet to a particular top quark if changes to the jet properties result in significant changes to the top quark properties. If the top quarks were produced independently of each other and of other radiation within the event, then only the jets they produce should be relevant for reconstructing their properties. In reality, this is not the case because top quarks and other objects are correlated through momentum conservation and other physics effects.

Strictly speaking, the term ‘gradient’ applies to the case of one-dimensional quantities (e.g. top quark pTsubscript𝑝𝑇p_{T}italic_p start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT), but for regression methods that predict multiple top quark properties, a more accurate name would be ‘Jacobian-based’. For simplicity, we will henceforth always call this method ‘gradient-based’.

The gradient-based labeling scheme is compatible with any regression model (not just the CPT from Sec. II.1) and is based on the following quantity:

Δik=(fi,pTjk,pT,fi,yjk,y,fi,ϕjk,ϕ),subscriptΔ𝑖𝑘normsubscript𝑓𝑖subscript𝑝𝑇subscript𝑗𝑘subscript𝑝𝑇subscript𝑓𝑖𝑦subscript𝑗𝑘𝑦subscript𝑓𝑖italic-ϕsubscript𝑗𝑘italic-ϕ\displaystyle\Delta_{ik}=\norm{\left(\frac{\partial f_{i,p_{T}}}{\partial j_{k% ,p_{T}}},\frac{\partial f_{i,y}}{\partial j_{k,y}},\frac{\partial f_{i,\phi}}{% \partial j_{k,\phi}}\right)},roman_Δ start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT = ∥ start_ARG ( divide start_ARG ∂ italic_f start_POSTSUBSCRIPT italic_i , italic_p start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_j start_POSTSUBSCRIPT italic_k , italic_p start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG , divide start_ARG ∂ italic_f start_POSTSUBSCRIPT italic_i , italic_y end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_j start_POSTSUBSCRIPT italic_k , italic_y end_POSTSUBSCRIPT end_ARG , divide start_ARG ∂ italic_f start_POSTSUBSCRIPT italic_i , italic_ϕ end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_j start_POSTSUBSCRIPT italic_k , italic_ϕ end_POSTSUBSCRIPT end_ARG ) end_ARG ∥ , (2)

where fi,xsubscript𝑓𝑖𝑥f_{i,x}italic_f start_POSTSUBSCRIPT italic_i , italic_x end_POSTSUBSCRIPT is the predicted x{pT,y,ϕ}𝑥subscript𝑝𝑇𝑦italic-ϕx\in\{p_{T},y,\phi\}italic_x ∈ { italic_p start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT , italic_y , italic_ϕ } of top quark i𝑖iitalic_i and jk,xsubscript𝑗𝑘𝑥j_{k,x}italic_j start_POSTSUBSCRIPT italic_k , italic_x end_POSTSUBSCRIPT is the observed x𝑥xitalic_x of jet k𝑘kitalic_k. Since fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is a neural network, we can compute the derivatives in Eq. 2 using the same automatic differentiation (e.g. back propagation) that is used when training the network in the first place. We assign jet k𝑘kitalic_k to top quark i𝑖iitalic_i if ΔiksubscriptΔ𝑖𝑘\Delta_{ik}roman_Δ start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT is one of the top three values across all k𝑘kitalic_k. The same jet could be assigned to multiple top quarks. Equation 2 is not the unique combination of elements from the Jacobian and it could be that other combinations could be more effective. We found that using the derivatives with respect to pTsubscript𝑝𝑇p_{T}italic_p start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT, y𝑦yitalic_y, and ϕitalic-ϕ\phiitalic_ϕ was only slightly better than pTsubscript𝑝𝑇p_{T}italic_p start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT alone. More complex schemes that weight the different entries separately are also possible.

When f𝑓fitalic_f is a CPT, then ΔiksubscriptΔ𝑖𝑘\Delta_{ik}roman_Δ start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT is a partial Lorentz scalar and so the labeling is invariant under longitudinal boosts and rotations in the transverse plane.

II.3 Attention-based Labeling

In each covariant attention layer and attention head in CPT, the attention weight αiksubscript𝛼𝑖𝑘\alpha_{ik}italic_α start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT can be interpreted as a measure of the importance of input k𝑘kitalic_k for predicting the properties of top i,𝑖i,italic_i , locally in the network. By averaging αiksubscript𝛼𝑖𝑘\alpha_{ik}italic_α start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT over all layers and attention heads, we obtain a measure of the overall importance of input k𝑘kitalic_k to top i𝑖iitalic_i:

α¯ik=1LH,hαikh,subscript¯𝛼𝑖𝑘1𝐿𝐻subscriptsubscriptsuperscript𝛼𝑖𝑘\bar{\alpha}_{ik}=\frac{1}{LH}\sum_{\ell,h}\alpha^{\ell h}_{ik},over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_L italic_H end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ , italic_h end_POSTSUBSCRIPT italic_α start_POSTSUPERSCRIPT roman_ℓ italic_h end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT , (3)

where αikhsubscriptsuperscript𝛼𝑖𝑘\alpha^{\ell h}_{ik}italic_α start_POSTSUPERSCRIPT roman_ℓ italic_h end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT is the attention weight between top i𝑖iitalic_i and input k𝑘kitalic_k in the hthsuperscriptthh^{\text{th}}italic_h start_POSTSUPERSCRIPT th end_POSTSUPERSCRIPT attention head in the thsuperscriptth\ell^{\text{th}}roman_ℓ start_POSTSUPERSCRIPT th end_POSTSUPERSCRIPT layer. Similar to gradient-based labeling, we assign the jet with index k𝑘kitalic_k to top quark i𝑖iitalic_i if α¯iksubscript¯𝛼𝑖𝑘\bar{\alpha}_{ik}over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT is one of the top three values across all jets.

Due to the design of CPT, all attention weights are partial Lorentz scalars and α¯iksubscript¯𝛼𝑖𝑘\bar{\alpha}_{ik}over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT is again a partial Lorentz scalar, implying the labeling is invariant under longitudinal boosts and rotations in the transverse plane.

II.4 χ2superscript𝜒2\chi^{2}italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-based Labeling

The baseline parton labeling scheme that we use is a widely applied χ2superscript𝜒2\chi^{2}italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT method. In particular, in events with at least two jets tagged as originating from bottom quarks (b𝑏bitalic_b-jets), the assignment of jets to top quarks is based on the combination that minimized the following χ2superscript𝜒2\chi^{2}italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT:

χ2superscript𝜒2\displaystyle\chi^{2}italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =(mb1j1j2mt)2σmbjj2+(mb2j3j4mt)2σmbjj2absentsuperscriptsubscript𝑚subscript𝑏1subscript𝑗1subscript𝑗2subscript𝑚𝑡2superscriptsubscript𝜎subscript𝑚𝑏𝑗𝑗2superscriptsubscript𝑚subscript𝑏2subscript𝑗3subscript𝑗4subscript𝑚𝑡2superscriptsubscript𝜎subscript𝑚𝑏𝑗𝑗2\displaystyle=\frac{(m_{b_{1}j_{1}j_{2}}-m_{t})^{2}}{\sigma_{m_{bjj}}^{2}}+% \frac{(m_{b_{2}j_{3}j_{4}}-m_{t})^{2}}{\sigma_{m_{bjj}}^{2}}= divide start_ARG ( italic_m start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_σ start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT italic_b italic_j italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG ( italic_m start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_σ start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT italic_b italic_j italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG
+(mj1j2mW)2σmjj2+(mj3j4mW)2σmjj2,superscriptsubscript𝑚subscript𝑗1subscript𝑗2subscript𝑚𝑊2superscriptsubscript𝜎subscript𝑚𝑗𝑗2superscriptsubscript𝑚subscript𝑗3subscript𝑗4subscript𝑚𝑊2superscriptsubscript𝜎subscript𝑚𝑗𝑗2\displaystyle\hskip 22.76219pt+\frac{(m_{j_{1}j_{2}}-m_{W})^{2}}{\sigma_{m_{jj% }}^{2}}+\frac{(m_{j_{3}j_{4}}-m_{W})^{2}}{\sigma_{m_{jj}}^{2}}\,,+ divide start_ARG ( italic_m start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_m start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_σ start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG ( italic_m start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_m start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_σ start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , (4)

where mtsubscript𝑚𝑡m_{t}italic_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and mWsubscript𝑚𝑊m_{W}italic_m start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT are the top quark and W𝑊Witalic_W boson masses, respectively, and σmbjjsubscript𝜎subscript𝑚𝑏𝑗𝑗\sigma_{m_{bjj}}italic_σ start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT italic_b italic_j italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT and σmjjsubscript𝜎subscript𝑚𝑗𝑗\sigma_{m_{jj}}italic_σ start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT are the resolutions of truth-matched top and W𝑊Witalic_W events, respectively. As in this case, when we need to refer to classical truth labels, we will call top quarks that have all three decay products as ‘truth-matched’ when each of the three quark decay products is within ΔR<0.4Δ𝑅0.4\Delta R<0.4roman_Δ italic_R < 0.4 of exactly one jet (about 20% efficient). Events without six jets, two of which are b𝑏bitalic_b-tagged, are not reconstructable with the χ2superscript𝜒2\chi^{2}italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT method. It may be possible to recover some of the non-reconstructable cases using other approaches for the b𝑏bitalic_b-jets (e.g. taking the highest energy jet(s)), so we check that our results hold in cases where events have two b𝑏bitalic_b-jets.

III Dataset

For numerical studies, we use the same dataset as in Ref. [13], which is briefly summarized below. Top quark pair production in association with a Higgs boson111The Higgs boson decays to photons and is largely ignored and irrelevant for jet labeling. We use this sample because it was the main one used in Ref. [13], although it was also shown that the performance is similar in other top quark final states. in proton-proton collisions is generated with Madgraph@NLO 2.3.7 [15] at next-to-leading order (NLO) in Quantum Chromodynamics (QCD). The decays of the top quarks are simulated with MadSpin [16] and then the rest of the particle-level generation is created with Pythia 8.235 [17]. While this dataset does not emulate detector effects, the salient features of the problem are already present at particle level. Jets are clustered using the anti-ktsubscript𝑘𝑡k_{t}italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT [11] algorithm with R=0.4𝑅0.4R=0.4italic_R = 0.4 as implemented in FastJet 3.3.2 [18, 19].

Jets are required to have |y|2.5𝑦2.5|y|\leq 2.5| italic_y | ≤ 2.5 and pT25subscript𝑝𝑇25p_{T}\geq 25italic_p start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ≥ 25 GeV. Jets that are ΔRΔ𝑅\Delta Rroman_Δ italic_R matched222ΔRΔ𝑅\Delta Rroman_Δ italic_R is defined as Δy2+Δϕ2Δsuperscript𝑦2Δsuperscriptitalic-ϕ2\sqrt{\Delta y^{2}+\Delta\phi^{2}}square-root start_ARG roman_Δ italic_y start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + roman_Δ italic_ϕ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG, where ΔyΔ𝑦\Delta yroman_Δ italic_y is the difference of two particles in pseudorapidity and ΔϕΔitalic-ϕ\Delta\phiroman_Δ italic_ϕ is the difference in azimuthal angle. to b𝑏bitalic_b-quarks at the parton level are labeled as b𝑏bitalic_b-jets; this label is removed333We do not add fake b𝑏bitalic_b-jets, since the fake rate (one in a few hundred) is sufficiently small that missing a real b𝑏bitalic_b-jet and falsely tagging a non b𝑏bitalic_b-jets is rare enough to not impact the numerical results. randomly for 30% of the b𝑏bitalic_b-jets, to mimic the inefficiency of a realistic b𝑏bitalic_b-tagging [20, 21]. We further apply a preselection on the testing set of Nbjet>0subscript𝑁bjet0N_{\mathrm{bjet}}>0italic_N start_POSTSUBSCRIPT roman_bjet end_POSTSUBSCRIPT > 0 and Njet3subscript𝑁jet3N_{\mathrm{jet}}\geq 3italic_N start_POSTSUBSCRIPT roman_jet end_POSTSUBSCRIPT ≥ 3 to mimic realistic data analysis requirements.

IV Results

First, we consider standard, non-unique metrics for evaluating performance. In particular, truth-matched top quarks are compared with each reconstruction method to see the fraction of the time that all three jets are the same. As noted earlier, the truth match labels are not unique, but this is a standard metric for quantifying performance. Figure 2 shows the frequency of an exact match for each method and for different jet multiplicities. The matching generally is harder the more jets there are in the event because there are more combinations and the truth label fidelity also degrades (see Fig. 1).

Overall, the attention-based approach outperforms the other two methods across all configurations, often by a large margin (10% or more). Inclusively, the gradient-based method outperforms the classical χ2superscript𝜒2\chi^{2}italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT assignment, but the two approaches are comparable after requiring two b𝑏bitalic_b-jets. Across all events and inclusively across jet multiplicities, the χ2superscript𝜒2\chi^{2}italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT approach has a poor matching frequency (about 10%) in part because it requires two b𝑏bitalic_b-jets and at least six distinct jets. In contrast, the attention- and gradient-based methods are still effective when there are fewer jets. The numbers for the attention-based and χ2superscript𝜒2\chi^{2}italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-based approaches are similar to the ones found by Spa-Net [6], although there are a number of differences in the setup that prohibit a precise comparison.

Refer to caption
Refer to caption
Figure 2: The fraction of truth-matched tops that have exactly the same labels from the truth matching and from the indicated reconstruction method. Note that these truth labels are not unique, but this is a standard metric. Top: all events that pass the preselection. Bottom: only events with at least two b𝑏bitalic_b-jets. Random corresponds to events with at least six jets and from these, two sets of three are randomly selected.

The next question is to study events in which there is no truth-match. Such events are not even part of the training for other ML-based labeling schemes, but our methods are still able to assign parton labels in these cases. One way to see if the assigned jets in such events are sensible is to examine their trijet invariant mass. Figure 3 presents histograms of this map inclusively and for events without a truth match. There are roughly twice as many entries for the attention- and gradient-based histograms in the top plot of Fig. 3 because of events where there is no truth match. All five histograms in the figure look similar, with a peak near the top quark mass of about 175 GeV [22]. The peak sharpest for the truth-matched events and is slightly sharper for the attention-based method than the gradient-based method. This may be expected from Fig. 2, which indicates that the attention-based approach has a higher fidelity of picking the ‘correct’ jets.

Refer to caption
Refer to caption
Figure 3: The invariant mass of jets labeled as originating from the same top quark in all events (top) and in events without a truth match (bottom). Random corresponds to events with at least six jets and from these, two sets of three are randomly selected.

Our last investigation is if the trijet kinematic properties in unmatched events are close to the truth top quarks. One reasonable definition of a ‘good match’ would be that the reconstructed top properties are close to the truth properties, which does not require assigning quark identities to the jets. Since our methods are derived from a top quark property regressor, we would expect that the trijet properties align well with the truth top quark properties, but it is important to check. Figure 4 provides confirmation for the top quark pTsubscript𝑝𝑇p_{T}italic_p start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT and y𝑦yitalic_y.

Refer to caption
Refer to caption
Figure 4: Scatter plots between the true top quark properties (y𝑦yitalic_y-axis) and the trijet kinematic properties (x𝑥xitalic_x-axis) for the pTsubscript𝑝𝑇p_{T}italic_p start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT (top) and y𝑦yitalic_y (bottom). None of these events have a truth match. Versions with random jets are presented in Fig. 5.

V Conclusions and Outlook

Parton labeling continues to be an important task in collider event reconstruction even though such labels are not unique. We have proposed a set of tools based on regression methods that are able to assign parton labels without also needing unphysical parton matching for training. Our approaches are competitive even though they are not trained using trijet information and are much more flexible than other approaches, since we are able to accommodate events with fewer jets than expected from the lowest order decay Feynman diagrams. While our techniques are compatible with many regression approaches, the CPT model studies here is particularly useful because it is permutation invariant and partially Lorentz covariant. The corresponding labels inherit some of these properties.

There are a number of possible ways to further improve these approaches, including how to best combine the attention weights or Jacobian elements to assign parton labels. It may also be possible to combine approaches in the future, where a simpler model can be trained using the label information from a regression model.

Software

The code for this project is built on the one from Ref. [13]. Updated software that produces also the gradients and makes the figures in this paper can be found at https://github.com/hep-lbdl/Covariant-Particle-Transformer.

Acknowledgments

BN thanks Chase Shimmin for useful discussions. This work is supported by the U.S. Department of Energy, Office of Science under contract DE-AC02-05CH11231. H.W.’s work is partly supported by the U.S. National Science Foundation under the Award No. 2046280.

References

  • Aaboud et al. [2018] M. Aaboud et al. (ATLAS), Search for the standard model Higgs boson produced in association with top quarks and decaying into a bb¯𝑏¯𝑏b\bar{b}italic_b over¯ start_ARG italic_b end_ARG pair in pp𝑝𝑝ppitalic_p italic_p collisions at s𝑠\sqrt{s}square-root start_ARG italic_s end_ARG = 13 TeV with the ATLAS detector, Phys. Rev. D 97, 072016 (2018)arXiv:1712.08895 [hep-ex] .
  • Sirunyan et al. [2020] A. M. Sirunyan et al. (CMS), Measurement of the tt¯bb¯t¯tb¯b\mathrm{t\bar{t}}\mathrm{b\bar{b}}roman_t over¯ start_ARG roman_t end_ARG roman_b over¯ start_ARG roman_b end_ARG production cross section in the all-jet final state in pp collisions at s=𝑠absent\sqrt{s}=square-root start_ARG italic_s end_ARG = 13 TeV, Phys. Lett. B 803, 135285 (2020)arXiv:1909.05306 [hep-ex] .
  • Aad et al. [2020] G. Aad et al. (ATLAS), CP𝐶𝑃CPitalic_C italic_P Properties of Higgs Boson Interactions with Top Quarks in the tt¯H𝑡¯𝑡𝐻t\bar{t}Hitalic_t over¯ start_ARG italic_t end_ARG italic_H and tH𝑡𝐻tHitalic_t italic_H Processes Using Hγγ𝐻𝛾𝛾H\rightarrow\gamma\gammaitalic_H → italic_γ italic_γ with the ATLAS Detector, Phys. Rev. Lett. 125, 061802 (2020)arXiv:2004.04545 [hep-ex] .
  • Erdmann et al. [2019] J. Erdmann, T. Kallage, K. Kröninger, and O. Nackenhorst, From the bottom to the top—reconstruction of tt¯𝑡¯𝑡t\bar{t}italic_t over¯ start_ARG italic_t end_ARG events with deep learning, JINST 14 (11), P11015, arXiv:1907.11181 [hep-ex] .
  • Badea et al. [2022] A. Badea, W. J. Fawcett, J. Huth, T. J. Khoo, R. Poggi, and L. Lee, Solving Combinatorial Problems at Particle Colliders Using Machine Learning,   (2022), arXiv:2201.02205 [hep-ph] .
  • Fenton et al. [2020] M. J. Fenton, A. Shmakov, T.-W. Ho, S.-C. Hsu, D. Whiteson, and P. Baldi, Permutationless Many-Jet Event Reconstruction with Symmetry Preserving Attention Networks,   (2020), arXiv:2010.09206 [hep-ex] .
  • Lee et al. [2020] J. S. H. Lee, I. Park, I. J. Watson, and S. Yang, Zero-Permutation Jet-Parton Assignment using a Self-Attention Network,   (2020), arXiv:2012.03542 [hep-ex] .
  • Shmakov et al. [2021] A. Shmakov, M. J. Fenton, T.-W. Ho, S.-C. Hsu, D. Whiteson, and P. Baldi, SPANet: Generalized Permutationless Set Assignment for Particle Physics using Symmetry Preserving Attention,   (2021), arXiv:2106.03898 [hep-ex] .
  • Ehrke et al. [2023] L. Ehrke, J. A. Raine, K. Zoch, M. Guth, and T. Golling, Topological Reconstruction of Particle Physics Processes using Graph Neural Networks,   (2023), arXiv:2303.13937 [hep-ph] .
  • Erdmann et al. [2014] J. Erdmann, S. Guindon, K. Kroeninger, B. Lemmer, O. Nackenhorst, A. Quadt, and P. Stolte, A likelihood-based reconstruction algorithm for top-quark pairs and the KLFitter framework, Nucl. Instrum. Meth. A 748, 18 (2014)arXiv:1312.5595 [hep-ex] .
  • Cacciari et al. [2008a] M. Cacciari, G. P. Salam, and G. Soyez, The anti-ktsubscript𝑘𝑡k_{t}italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT jet clustering algorithm, JHEP 04, 063arXiv:0802.1189 [hep-ph] .
  • Cacciari et al. [2008b] M. Cacciari, G. P. Salam, and G. Soyez, The Catchment Area of Jets, JHEP 04, 005arXiv:0802.1188 [hep-ph] .
  • Qiu et al. [2022] S. Qiu, S. Han, X. Ju, B. Nachman, and H. Wang, A Holistic Approach to Predicting Top Quark Kinematic Properties with the Covariant Particle Transformer,   (2022), arXiv:2203.05687 [hep-ph] .
  • Vaswani et al. [2017] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, Attention is all you need, in Advances in Neural Information Processing Systems (2017) p. 5998, 1706.03762 .
  • Alwall et al. [2014] J. Alwall, R. Frederix, S. Frixione, V. Hirschi, F. Maltoni, O. Mattelaer, H. S. Shao, T. Stelzer, P. Torrielli, and M. Zaro, The automated computation of tree-level and next-to-leading order differential cross sections, and their matching to parton shower simulations, JHEP 07, 079arXiv:1405.0301 [hep-ph] .
  • Artoisenet et al. [2013] P. Artoisenet, R. Frederix, O. Mattelaer, and R. Rietkerk, Automatic spin-entangled decays of heavy resonances in Monte Carlo simulations, JHEP 03, 015arXiv:1212.3460 [hep-ph] .
  • Sjöstrand et al. [2015] T. Sjöstrand, S. Ask, J. R. Christiansen, R. Corke, N. Desai, P. Ilten, S. Mrenna, S. Prestel, C. O. Rasmussen, and P. Z. Skands, An introduction to PYTHIA 8.2, Comput. Phys. Commun. 191, 159 (2015)arXiv:1410.3012 [hep-ph] .
  • Cacciari et al. [2012] M. Cacciari, G. P. Salam, and G. Soyez, FastJet User Manual, Eur. Phys. J. C 72, 1896 (2012)arXiv:1111.6097 [hep-ph] .
  • Cacciari and Salam [2006] M. Cacciari and G. P. Salam, Dispelling the N3superscript𝑁3N^{3}italic_N start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT myth for the ktsubscript𝑘𝑡k_{t}italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT jet-finder, Phys. Lett. B 641, 57 (2006)arXiv:hep-ph/0512210 .
  • Aad et al. [2019] G. Aad et al. (ATLAS), ATLAS b-jet identification performance and efficiency measurement with tt¯𝑡¯𝑡t{\bar{t}}italic_t over¯ start_ARG italic_t end_ARG events in pp collisions at s=13𝑠13\sqrt{s}=13square-root start_ARG italic_s end_ARG = 13 TeV, Eur. Phys. J. C 79, 970 (2019)arXiv:1907.05120 [hep-ex] .
  • Sirunyan et al. [2018] A. M. Sirunyan et al. (CMS), Identification of heavy-flavour jets with the CMS detector in pp collisions at 13 TeV, JINST 13 (05), P05011, arXiv:1712.07158 [physics.ins-det] .
  • Particle Data Group [2020] Particle Data Group, Review of Particle Physics, Progress of Theoretical and Experimental Physics 2020, 083C01 (2020).
Refer to caption
Refer to caption
Figure 5: Scatter plots between the true top quark properties (y𝑦yitalic_y-axis) and the trijet kinematic properties (x𝑥xitalic_x-axis) for the pTsubscript𝑝𝑇p_{T}italic_p start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT (top) and y𝑦yitalic_y (bottom) using three randomly selected jets for each top candidate. None of these events have a truth match.