¹¹institutetext: European Southern Observatory, Karl Schwarzschildstrasse 2, 85748, Garching bei München, Germany
¹¹email: ilaria.marini@eso.org ²²institutetext: Dipartimento di Fisica - Sezione di Astronomia, Università di Trieste, Via Tiepolo 11, 34131 Trieste, Italy ³³institutetext: INAF-Osservatorio Astronomico di Trieste, Via G. B. Tiepolo 11, 34143 Trieste, Italy ⁴⁴institutetext: IFPU, Institute for Fundamental Physics of the Universe, Via Beirut 2, 34151 Trieste, Italy ⁵⁵institutetext: INFN, Sezione di Trieste, Via Valerio 2, 34127 Trieste, Italy ⁶⁶institutetext: ICSC - Italian Research Center on High-Performance Computing, Big Data and Quantum Computing ⁷⁷institutetext: Data Science and Scientific Computing, Università di Trieste, Via A. Valerio 12/1, 34127 Trieste, Italy

Inferring intrahalo light from stellar kinematics

A deep learning approach

I. Marini 11 A. Saro 2233445566 S. Borgani 2244335566 M. Boi 22

(Received 16 February 2024 / Accepted 28 June 2024)

Abstract

Context. In the context of structure formation, disentangling the central galaxy stellar population from the stellar intrahalo light can help us shed light on the formation history of the halo as a whole, as the properties of the stellar components are expected to retain traces of the formation history. Many approaches are adopted to assess the task, depending on different physical assumptions (e.g. the light profile, chemical composition, and kinematical differences) and depending on whether the full six-dimensional phase-space information is known (much like in simulations) or whether one analyses projected quantities (i.e. observations).

Aims. This paper paves the way for a new approach to bridge the gap between observational and simulation methods. We propose the use of projected kinematical information from stars in simulations in combination with deep learning to create a robust method for identifying intrahalo light in observational data to enhance understanding and consistency in studying the process of galaxy formation.

Methods. Using deep learning techniques, particularly a convolutional neural network called U-Net, we developed a methodology for predicting these contributions in simulated galaxy cluster images. We created a sample of mock images from hydrodynamical simulations (including masking of the interlopers) to train, validate and test the network. Reinforced training (Attention U-Net) was used to improve the first results, as the innermost central regions of the mock images consistently overestimate the stellar intrahalo contribution.

Results. Our work shows that adequate training over a representative sample of mock images can lead to good predictions of the intrahalo light distribution. The model is mildly dependent on the training size and its predictions are less accurate when applied to mock images from different simulations. However, the main features (spatial scales and gradients of the stellar fractions) are recovered for all tests. While the method presented here should be considered as a proof of concept, future work (e.g. generating more realistic mock observations) is required to enable the application of the proposed model to observational data.

Key Words.:

Methods: data analysis – Techniques: miscellaneous – Galaxies: stellar content

1 Introduction

According to the standard cosmological model, galaxy clusters and groups build up their stellar mass through star formation and subsequent mergers (Pillepich et al. 2018a; Ragone-Figueroa et al. 2018; Montenegro-Taborda et al. 2023). This two-phase scenario also naturally explains the relative old stellar population and passive star formation in central galaxies, where clusters still accrete at $z=0$ . Because of this mechanism, most of the stars are locked up in the central (and satellite) galaxies (Kravtsov & Borgani 2012) orbiting in clusters and groups, while a considerable fraction is composed of free-floating stars bound to the halo potential which, for its collisionless nature (Binney & Tremaine 2011), is an exemplary fossil of the cluster formation history. The intrahalo light, whose dependence on the mass will determine whether it is defined intracluster light (ICL¹¹1for simplicity, we refer to both group and cluster-size halo as more generally ICL) or intragroup light (Montes & Trujillo 2019; Alonso Asensio et al. 2020; Montes 2022; Arnaboldi & Gerhard 2022). A typical example is the imprints left in the outer stellar region of unique morphological features, such as shells, ripples, and tidal tails (Bílek et al. 2020; Montes & Trujillo 2022; Valenzuela & Remus 2022) as a natural result of merger events and/or radial infall of satellites (Pop et al. 2018; Karademir et al. 2019).

One of the most prominent unsolved issues in this field is how to effectively separate the light from the Brightest Cluster Galaxy (BCG²²2We name the central stellar component BCG regardless of the size of the host halo which could be in the cluster or group regime interchangeably.) and ICL since they share similar spatial scales. In recent years, several studies (e.g. Contini et al. 2022; Montes 2022, and references therein) have discussed the role of the transition radius, the distance at which the ICL component starts to dominate the stellar component. Due to the variety of methods employed to estimate the ICL contribution, the value of this transition radius may depend on the adopted method of ICL identification. From the observational side, typical values of the transition radius are around 60 $-$ 80 kpc (Montes et al. 2021; Gonzalez et al. 2021), thus in line with results from earlier works (e.g Zibetti et al. 2005; Gonzalez et al. 2007). These values slightly increase for other analyses, such as those presented by Zhang et al. (2019), who concluded that the transition from the BCG to the ICL is just outside 100 kpc, or by Chen et al. (2022) who found values ranging in the interval 70 $-$ 200 kpc. Results based on semi-analytical simulations (e.g. Contini & Gu 2021; Contini 2021; Contini et al. 2022) agree with these observational results, and indicate that the transition radius is independent of both BCG $+$ ICL and halo masses, with typical values of 60 $\pm$ 40 kpc, if similarly derived from profile fitting. Usually, this technique requires the assumption of a double or triple Sérsic profile (Sérsic 1963) or a composition of different profiles such as the Jaffe profile (Jaffe 1983, describing the BCG distribution;) and NFW profile (Navarro et al. 1997) for the ICL (Kluge et al. 2021).

In hydrodynamical simulations, this task is eased thanks to the possibility of accessing the full six-dimensional phase-space traced by the stellar particles. Proctor et al. (2023) used Gaussian mixture methods to decompose the stellar halo into three components (i.e. a disc, a bulge, and ICL) according to their kinematic properties. The authors find the observational equivalent of the transition radius at approximately 30 kpc for halos with $\log(M_{200}/M_{\odot})<12.8$ which quickly increases for higher masses. Likewise, Dolag et al. (2009) and Marini et al. (2022) have demonstrated how an unbinding procedure can be applied to the stellar halos of galaxy clusters to yield two separate kinematical subsets traceable to the BCG and the ICL. The assumptions underlying this method are derived from the velocity distribution of star particles that exhibit a bimodal distribution associated with two dynamically distinct components. Combining this information with an unbinding procedure leads to separation into a central BCG (more compact and dynamically cold) and a hotter diffuse ICL. This kinematical distinction is hardly transferable to observations as shown in Remus et al. (2017) since the kinematical distinction does not necessarily imply a dual and clear distinction of the radial surface density profile.

Thus, it is undeniable that there is a substantial gap between the identification methods applied in simulations (fundamentally based on the complete knowledge of the kinematical and positional information of the stellar component) and in observations (Rudick et al. 2011; Kluge et al. 2021; Arnaboldi & Napolitano 2001), further compromising the efforts to uniform studies on the topic (Montes 2022). In the local Universe, a viable path is to connect high-resolution integral field spectrography (IFS) observations (to extract the fine-grained kinematical structure) with the outcome of numerical simulations. This work aims to pave the way for this perspective using deep learning (DL) techniques (LeCun et al. 2015, for a review). Since its breakthrough (Krizhevsky et al. 2012), DL has rapidly gained acclaim in many scientific applications, including astronomy (Carleo et al. 2019; Smith & Geach 2023) since the usual data volume naturally favours the applications of these techniques. Many applications include improvements in the estimates of photometric redshifts (e.g. Collister & Lahav 2004; Feldmann et al. 2006; Salvato et al. 2011), galaxy morphology identification (Dieleman et al. 2015; Ball et al. 2004; Banerji et al. 2010), exoplanet detection (Gibson et al. 2012), gravitational wave physics (George & Huerta 2018), and analysis of the stellar galactic disc (Cantat-Gaudin et al. 2020), just to name a few. In recent years, a breakthrough in cosmological studies has been reached with the advent of machine learning-based cosmological simulations (He et al. 2015; Kamdar et al. 2016; Villaescusa-Navarro et al. 2020) further proving the advancements in the field. In this complex scenario, convolutional neural networks (CNNs; LeCun et al. 1998; Krizhevsky et al. 2012) have played an instrumental role in astronomy for their ability to automatically learn spatial features from raw pixel data, making them highly effective in tasks such as image classification (Domínguez Sánchez et al. 2018), object detection (Schanche et al. 2019), and image segmentation (Burke et al. 2019).

In this paper, we illustrate a new method devoted to recovering the projected ICL distribution from images of simulated galaxy clusters using CNNs. By leveraging DL techniques, we can accurately extract information about the ICL from images, facilitating our understanding of its properties and distribution within galaxy clusters. We point out that the main purpose of this work is to present a proof of concept to identify ICL by exploiting information on the stellar kinematics, rather than delivering a stand-alone method ready to be applied to observational data. For this reason, several limitations inherent in the approach presented here (e.g. the lack of a fully realistic observational image) will need to be addressed with further investigation before the proposed method can be effectively applied to observational data. On the other hand, it is important to note that observers will increasingly require improved methods for detecting and characterising the ICL in real observational data. As observational capabilities continue to evolve, the development of advanced CNN-based methods will be essential for unlocking the full potential of ICL studies in observational astronomy. Furthermore, the efficacy of this method will rely on its adaptation to more realistic simulations of the stellar structure within galaxy clusters and mock observations. Incorporating detailed models that account for various physical processes affecting stellar populations and the observational process itself will enhance the fidelity of the CNN-based approach, ensuring its applicability to a wide range of observational scenarios and contributing to a deeper understanding of the ICL in the context of galaxy cluster evolution.

The paper is structured as follows. In Sect. 2, we describe the main ingredients of our modelling: the simulation set, the mock images extracted, and the DL models. In Sect. 3, we present the main results of the study, caveats, and future perspectives; in Sect. 4 we summarise our findings and draw the main conclusions of our analysis.

2 Methods

2.1 Input simulations

In this section, we describe the suite of simulations (i.e. Dianoga) used to construct the mock catalogue. Dianoga is a set of zoom-in cosmological hydrodynamical simulations of galaxy clusters carried out with GADGET-3, a Tree Particle Mesh – smoothed particle hydrodynamics (SPH) code, which represents the evolution of the public GADGET-2 code (Springel 2005). The most important changes in our developer branch of GADGET-3 include the use of a higher-order kernel interpolating function, time-dependent artificial viscosity, and artificial conduction schemes, which in turn alleviate several limitations of standard SPH implementations (Dolag et al. 2005; Beck et al. 2016).

At the resolution of choice, the set comprises a total of eight Lagrangian regions which have been selected around some of the most massive halos in a lower-resolution N-body parent box of comoving side of 1 $h^{-1}$ cGpc. Each region hosts several halos in both groups and clusters-mass regimes: for this project, we select each such region among the ten most massive halos. They sum to a total of 80 halos within the group and cluster regime (i.e. $\log M_{200}\sim 10^{13}-10^{14}M_{\odot}$ ). In Fig. 1, we plot the halo mass (left panel) and stellar mass distribution (right) for these halos. Their physical properties are extensively discussed in Bassini et al. (2020) and in Marini et al. (2021).

Initial conditions have been generated following the prescription in Tormen & Bertschinger (1996) for a $\Lambda$ Cold Dark Matter ( $\Lambda$ CDM) cosmological model with $\Omega_{\mathrm{M}}=0.24$ , $\Omega_{\mathrm{b}}=0.04$ , $n_{s}=0.96$ , $\sigma_{8}=0.8$ and $H_{0}=72$ km s^-1 Mpc^-1. In the highest-resolved regions, the DM particle mass is $m_{DM}=8.3\times 10^{7}\,h{{}^{-1}}$ M_⊙ and the initial gas particle is $m_{gas}=3.3\times 10^{7}\,h{{}^{-1}}$ M_⊙. The Plummer equivalent length for the DM particles corresponds to $\epsilon=3.75\,h{{}^{-1}}$ kpc, whereas gas, stars and black hole particles have $\epsilon=3.75\,h{{}^{-1}}$ kpc, $1\,h{{}^{-1}}$ kpc and $1\,h{{}^{-1}}$ kpc at $z=0$ , respectively.

Refer to caption — Figure 1: Halo mass (left) and stellar mass (right) distribution of the 80 halos selected within the eight Dianoga Lagrangian regions at $z=0.3$ .

Several subgrid models describe the unresolved baryonic physics of the simulations, including radiative cooling, star formation, and stellar feedback (Springel & Hernquist 2003), metal and chemical enrichment (Tornatore et al. 2007), and Active Galactic Nuclei (AGN) feedback (for more details, see Appendix A in Ragone-Figueroa et al. 2013). Further details on the simulation can be found in Bassini et al. (2020).

2.1.1 Identification of structures

The catalogue of particles associated with structures (e.g. halos of groups and clusters of galaxies) and substructures (i.e. subhalos or galaxies) in simulations is compiled by a halo-finder. Firstly, a friend-of-friend (FoF) algorithm is run on the particles: this procedure records an initial guess on the hierarchical structure of the simulation based on geometrical assumptions. If particles are clustered in samples with their inter-particle distance smaller than the linking length ( $b=0.2$ in units of mean inter-particle separation, in our case) their unique identification number is stored in association with a halo. A fundamental drawback of this method is that it will occasionally link independent structures together across particle bridges. Furthermore, it will only identify large systems, leaving smaller structures (i.e. subhalos) in dense environments unrecorded. Therefore it is instrumental to benchmark the potential subhalo catalogue through an unbinding procedure, such as the one provided by SubFind (Springel et al. 2001; Dolag et al. 2009). The algorithm runs on the single FoF-identified halos basing its decision on an excursion set procedure. By descending along the density gradient, SubFind creates a list of potential subhalo candidates whose binding energy is later investigated. This amounts to eliminating those particles whose energy makes them unbound to the substructure: if more than a certain minimum number of particles (50) survive the unbinding procedure, the substructure is identified as a genuine subhalo. The centre of each subhalo is identified with the position of the member particles having the minimum value of the gravitational potential. The properties of the halos and subhalos are then determined based on the properties of the particles composing them.

2.1.2 Identification of the ICL

A second unbinding procedure can be applied to the stellar particles bound to the main halo, as they represent the contribution from both the BCG and the ICL. Theoretical predictions and observational results (see Dolag et al. 2010) have shown that the two components populate different regions of the phase-space. The classification is achieved on each single star particle through the automatic classification performed by a random forest (Marini et al. 2022) trained on classified particles by ICL-SubFind (Dolag et al. 2010). The original algorithm assumes the double Maxwellian found in the three-dimensional particle velocity distribution of the two stellar components to be two single Maxwellian distributions associated with the ICL and the BCG, respectively (Murante et al. 2004). More specifically, the ICL is associated with the Maxwellian yielding the largest velocity dispersion, in contrast, the BCG, having colder dynamics, populates the distribution at lower dispersion. To assign each star particle to either of the two dynamical components, the algorithm follows an unbinding procedure comparing its kinetic and potential energy to the assumed potential energy of the central subhalo. Reasons to prefer the random forest to the original algorithm for the separation between ICL and BCG components from its stability to its faster response. More details can be found in Marini et al. (2022).

2.2 Generation of the input images

The design of the images based on our set of simulated clusters mimics the geometrical conditions of a hypothetical observation by a state-of-the-art IFS. On one hand, the difficulties of observing the light at low surface brightness composing the ICL naturally favours close targets, and thus zoomed images. On the other hand, we need to ensure that the hypothetical field of view (FOV) of the IFS is large enough to include the physical scales at which our simulations predict a significant gradient in the ICL presence with increasing radius. To this purpose, in Fig. 2 we show the distribution of the velocity dispersion profiles (along the line of sight) of the most massive halos in each of the eight Lagrangian regions and we estimate the transition radius defined as the physical radius at which the ICL fraction dominates over the BCG+ICL stellar mass. We notice that Marini et al. (2021) have shown that this set of groups and clusters has velocity dispersion profiles that agree with highly resolved spectroscopic observations of nearby clusters. To measure these profiles, we use all the stars bound to the halo within a cylinder of length $2R_{200}$ ³³3We define $R_{\Delta}$ as the radius encompassing a mean halo overdensity equal to $\Delta$ times the critical density of the universe at a given redshift $\rho_{c}(z)$ . around the centre of the halo. This leads to including stars bound to subhalos (i.e. satellite galaxies), but not strictly belonging to BCG or ICL, which results in conditions close to the observational ones. Curves are colour-coded according to the ICL mass fraction (defined as the ICL stellar mass in a radial bin over the total stellar mass of the bin) as a function of the clustercentric distance in $R_{200}$ units. The vertical scatter in the velocity dispersion profiles is explained by the different halo masses. We observe the consistency in the ICL mass fraction with radial distance as in most of the profiles this fraction is peaking at around $0.1\,R_{200}$ , corresponding to the transition radius. This fraction decreases for larger radii as the stellar component associated with the subhalos starts dominating. This suggests that observations should cover a large enough area to include this radius to detect the transition from the BCG to the ICL regimes. Furthermore, this peak is often consistent with the peak in the velocity dispersion profiles, whereas for larger radii the fraction is reduced due to the presence of substructures. Since the typical value of this transition radius is nearly constant once expressed in terms of $R_{200}$ , it implies that this quantity is mainly connected to the cosmological build-up of the halo. This result is consistent with the findings in Proctor et al. (2023) on a C-EAGLE sample of groups and clusters of galaxies of similar halo masses.

To set the spatial resolution and FOV within which we will be conducting our study, it is required to finalise the choice of a specific detector. We choose to simulate the geometrical conditions of the instrument MUSE⁴⁴4MUSE official webpage https://www.eso.org/sci/facilities/paranal/instruments/muse.html#par_title (Bacon et al. 2010) at the VLT which can guarantee the capabilities of Integral Field Unit (IFU) in a wide FOV: the IFU has a FOV of 1 arcmin² and resolution of $(0.2\times 0.2)$ arcsec². This choice is primarily driven by constructing a realistic geometrical setup and does not imply that other observational conditions (e.g. the spectral features) are used in this work. This implies that even though throughout the paper we will speak of mock images, we do not claim to reproduce the necessary observational conditions to be as such. In physical units, this setup translates into $(263.34\times 263.34)$ kpc² and resolution $(2\times 2)$ kpc² at $z=0.3$ , for our reference cosmological model. In order to be conservative on the resolution effect, we double the image size and pixel resolution to $(526.68\times 526.68)$ kpc² and $(4\times 4)$ kpc². In this setting, each image has $130\times 130$ pixels. For the only purpose of speeding up the training of our DL algorithm, we crop the images to the nearest power of 2 in pixels (i.e. $128\times 128$ ), following the indications in Goodfellow et al. (2016). We choose this frame size for all objects thus covering different dynamical ranges for clusters and groups having different masses: we reach $R_{200}$ for the smallest group with $\log(M_{200}/M_{\odot})=12.75$ , and for the largest cluster ( $\log(M_{200}/M_{\odot})=15.10$ ) we cover only the central $0.1\,R_{200}$ .

We create the images to be the post-processed line-of-sight velocity dispersion maps of the star particles in each system, much like what is extracted from the broadening of the spectral lines in the IFU images. To guarantee consistency with the kinematical structure of each system, we reconstruct the velocity dispersion map of all the stars in the frame. Then we mask the pixels containing contaminating galaxies from our images thanks to the halo structure provided by SubFind. In order words, we simulate the standard procedure to mask foreground and background galaxies in data analysis. We follow the same procedure to create the ICL mass fraction image (i.e. the ground truth) on which we will train the network.

One of the caveats connected with DL techniques is the necessity of a large sample size to effectively train and test a model. Choosing only one projection from the largest halos in each Lagrangian region (amounting to eight images) would not suffice the demand, therefore, we are bound to actively expand our dataset with data augmentation. We start by investigating the ten largest halos in our sample for each Lagrangian box (i.e. 80 halos). Then, we take advantage of the halo triaxiality and select for each one 26 different line-of-sight projections which in the three-dimensional space corresponds to taking 45 degrees rotation from the initial system of reference. These choices already guarantee us 2080 images however we further extend it by asking that our images may not be centred on the cluster centre but slightly shifted by a random offset within $15\%$ of the image size. This operation is performed three times. We further discuss the role of the data size in Sect. 3.2.1.

The final dataset comprises 6240 synthetic images that are split into $70:20:10$ subsets to create training:validation:testing sets. Additionally, we create a smaller testing set extracted from ten halos in a different Lagrangian region not previously used for the experiment. The advantage is that this testing set is independent of the halos used to train and validate the DL algorithm. We perform only two rotations around the main axes and two centre-shifts for a total of 40 images. We will present here only the results for this smaller testing set, since it proves to be consistent with the larger more-dependent testing set.

An example of images for the velocity dispersion (left panel) and the ICL mass fraction (right panel) is presented in Fig. 3. There is a mild correlation between the ICL mass fraction and the ICL velocity dispersion map which we exploit for this experiment. This can be hardly appreciated with the naked eye (some features are visible around the BCG position and going outwards) but we already discussed the role of the kinematical profiles of the BCG and ICL. Therefore, we anticipate that a CNN will be able to detect these signals in the simulated images.

2.3 The U-Net architecture

In our work, we aim to predict the ICL fraction in mock images of galaxy clusters by exploiting the information on the velocity dispersion of the stellar component, as provided by an IFS. In more general terms, the desired output is an image (or multidimensional object) rather than a single class label.

Many examples from this class of problems are present in biomedical applications, as image diagnosis is sensitive to a variety of scales in health-related problems. In many of these studies (Siddique et al. 2021, for a review) the preferred DL architecture has been the U-Net (Ronneberger et al. 2015), while later it further gained importance in other fields (e.g. in astronomy Vojtekova et al. 2021; Chadayammuri et al. 2023). The architecture consists of two main branches: a contracting path said to capture the general context and a symmetric expanding path that enables precise localisation (see Fig. 1 in Oktay et al. 2018 for a schematic representation of the architecture). This double action increases the resolution of the output: high-resolution features from the contracting path are combined with the upsampled output allowing better localisation of the features. The two branches are connected through several skip connections (He et al. 2015, the first use of skip connection is attested in resnet;) which correspond to bypassing some of the neural network layers and feed the output of one layer as the input to the following levels. It is a standard module and provides an alternative path for the gradient with backpropagation. The idea is to skip connections in different points of the architecture to allow fine-grain recollection of the original image, throughout the learning process. The contracting path is built following the typical architecture of a CNN: a sequence of $3\times 3$ convolution matrices (with ReLU activation functions⁵⁵5A ReLU (rectified linear unit) activation function is an activation function that introduces the property of non-linearity to a deep learning model. Its mathematical expression is $ReLU(z)=max(0,z)$ ; Glorot et al. 2011; Agarap 2019) and $2\times 2$ max pooling operations to downsample. After each max pool, the feature channels are doubled (as shown at the top of the rectangular boxes). Every step in the expansive path consists of an upsampling of the feature map followed by a $2\times 2$ convolution that halves the number of feature channels and two $3\times 3$ transposed convolutions, each followed by a ReLU. In the final layer, a $1\times 1$ convolution is used to map each $64$ -component feature vector to the original $128\times 128$ image. The output is passed through a sigmoid function⁶⁶6A sigmoid function is a mathematical function having a characteristic S-shaped curve or sigmoid curve. Its mathematical expression is $\sigma(z)=\exp^{z}/(1+\exp^{z})$ , to map each pixel between 0 and 1.

The second network we used in this work is the Attention U-Net (Oktay et al. 2018; Schlemper et al. 2019) an evolution of the U-Net architecture since it integrates attention mechanisms to enhance its ability to capture relevant features and improve the quality of image segmentation. The attention gates (Jetley et al. 2018) induce the network to focus on the regions of interest with no additional supervision nor significant computational overhead. By incorporating attention mechanisms within the skip connections between the contracting and expansive paths of the U-Net, the relevant information is filtered, enabling the translation of intricate patterns and textures while suppressing background noise. The attention mechanism acts as a dynamic filter, adaptively modulating the importance of different spatial locations across feature maps. In this work, we will use both Attention U-Net and the original U-Net.

Our model is written with the support of PyTorch (Paszke et al. 2019). Unless stated otherwise, the model’s parameters (i.e. weights and biases) are initialised according to the default values of the library. The loss function is the mean squared error (MSE) between the true label $y_{i}$ and the predicted one $\hat{y}_{i}$ which is formally defined by the following equation:

\mathrm{MSE}=\frac{1}{N}\sum_{i=1}^{N}(y_{i}-\hat{y}_{i})^{2},

(1)

where $N$ is the number of samples we are testing against. Initially, the model was compiled with the Adam optimiser (Kingma & Ba 2017) following the pre-training learning rate range test described by Smith (2017) to find an optimal static learning rate. In short, this method prescribes an initial phase of parameter-tuning where we vary the learning rate at each epoch for a value that is exponentially (or linearly) increased within a given interval. Changing the learning rate, and consequently studying the change in the loss curve, can help in understanding the optimal value to plug in during the effective training phase. The optimal value is defined as the point at which the loss function is the steepest since the learning will be more efficient at this point.

Later, we found the One-Cycle policy (Smith & Topin 2018) with the stochastic gradient descent (SGD) to give even more satisfactory results, so we settled for this method. This approach anneals the learning rate from an initial value to some maximum learning rate and then back to some minimum learning rate much lower than the initial learning rate. Thus, the main difference is the dynamic learning rate.

3 Results

3.1 Fiducial model

Fig. 4 shows the results for two randomly chosen images from the test set. The colour bar guides the lecture: orange marks areas where the ICL fraction is 1 and black denotes 0. In each row, we present the expected output from ICL-SubFind (left panel) which we take as the ground truth, the predictions by the Attention U-Net (central panel), and the original U-Net (right panel). We estimate the ICL mass fraction within each image.

Most of the stellar mass belongs to the central galaxy and the substructures we mask. The visual inspection shows that both methods are generally able to recover the large-scale ICL structure of the halos. This is further confirmed by the ICL fraction reported in each panel, which does not significantly vary in these cases. Although the ICL distributions are not perfectly matched, cluster centres and masked interlopers (dark circles) are recovered. We observe that none of the DL models fully predicts the gradient in the ICL fraction at smaller radii, nevertheless, the Attention U-Net behaves significantly better in this metric: this is not surprising as the Attention U-Net is designed to reinforce learning in the regions of interest. However, we tested that a longer training phase only leads to overfitting and does not improve the result. Spurious elements (e.g. substructures not properly masked in the velocity dispersion) are also better classified by the Attention U-Net: an example is given on the lower right-hand side of the images in the lower row. Here, the Attention U-Net can minimise the contribution of ICL (effective null in the true distribution) whereas the U-Net predicts fractions close to 0.8. Clearly, masking galaxies will require using the same choice as in the analysis of observational data. We notice that neither the Attention U-Net nor the U-Net exhibits an ICL mass fraction equal to 1 (or 0) anywhere, which represents the case for many pixels in the true image. Although this result is unlikely to be physically meaningful, the networks lack this feature in the final mapping.

For a more quantitative comparison, an evaluation of the median profiles of the ICL fraction in the true and recovered maps as a function of the distance – as plotted in Fig. 5 – can help us further acknowledge how significant these effects are on the general population of simulated clusters. We also report with shaded bands the 16 ${}^{\mathrm{th}}-$ 84^th percentiles of the sample. The Attention U-Net behaves better than the U-Net, as it is mostly in agreement at the centre and tends on average to be lower than the true profile only at larger radii. This result is consistent with a systematic prediction for lower ICL fraction (by $\sim 20$ %) in the outskirts of the predicted maps, as we discussed. In any case, we find that the peak and the subsequent decrease of the ICL fraction are consistent with the ICL distribution, especially in the Attention U-Net case. The same conclusions can be drawn when inspecting the statistical errors from the predictions in the complete test sets. Rather than accounting for pixel-to-pixel misclassification (thereby excluding potential noise), we smooth the predicted and the true ICL distributions with a Gaussian filter over a physical scale of 4 kpc and we estimate a cumulative error as the standard deviation between the two smoothed images integrated along each direction (vertically and horizontally). This should allow us to appreciate where most of the differences in our predictions lie in the larger cluster environment. This result is presented in Fig. 6 in terms of the root MSE (RMSE) which describes the average magnitude of errors in a regression model. Shaded regions report the internal scatter for both networks. The RMSE has an almost constant trend around $\sim 0.012$ , although the Attention U-Net has a slightly larger scatter than the U-Net. Another way of visualising this error is by plotting the residual images between the true and predicted distribution as in Fig. 7 for the halos in Fig. 4. The U-Net is more prone to errors at all scales suggesting that the model responds better when the learning is reinforced with Attention mechanisms.

3.2 Limitations of the algorithms

In the previous sections, we argued that the trained Attention U-Net predicts reasonable ICL distributions to the expected labels. Here, we discuss the foreseeable limitations of the algorithm and attempt to quantify such losses.

3.2.1 Size of the training set

Generally, the training of DL models can suffer from the limited size of the training data because of overfitting or underfitting. Thus, it might be worth discussing this issue here, especially in our case where from a restricted number of halos we created our sample through data augmentation.

In Fig. 8 we propose the following experiment: we evaluate the learning curve of our model during the training phase with different sample sizes (i.e. 500, 1000, 2000 examples, and the full training set). In other words, for each epoch, we estimate the training and the validation errors as a function of the epoch. We remind that the training (validation) error is the error committed in the prediction over the training (validation) set as measured by the loss function (i.e. the MSE). It is remarkable to notice that in all cases we find a learning curve consistent with a well-trained network (i.e. a smooth decrease of both errors to reach a plateau). The plateau in the final phases of the training marks the minima of the gradient descent evolution of the trainable parameters of the network, thus in all cases we do not encounter overfitting or underfitting. Conversely, the errors involved in the process vary with the sample size, reaching their minimum for the full sample. From this, we conclude that the limitation on the data size can significantly impact the learning mechanism and probably represents one of the major limitations in our setup. On the other hand, progressing in the data augmentation operation to enlarge the training size can also negatively impact our training, as we might undergo overfitting considering the limited sample of objects. Having more independent clusters realisation would boost the learning.

3.2.2 IllustrisTNG

We run a second test on the data, namely, we use the best model on a different set of simulations to qualitatively assess the impact of modelled physics in the predictions of ICL mass fractions. When assuming a definition of ICL based on dynamical behaviours many elements can affect the distribution of stellar particles and their kinematics, mostly related to the adopted sub-resolution models of star formation and energy feedback. A concrete example is provided by the AGN feedback which regulates star formation in massive galaxies, particularly impacting BCG masses (e.g. Ragone-Figueroa et al. 2013).

For this task, we select three of the most massive halos in IllustrisTNG-300 at our fiducial redshift (i.e. $z=0.3$ ). IllustrisTNG-300, corresponding to the largest cosmological box available in the IllustrisTNG suite (Pillepich et al. 2018b), includes 75000³ particles in a 302.6 ${}^{3}\,h^{-3}$ cMpc³ volume. We chose this run as its mass and spatial resolutions are similar to the Dianoga set. The dark matter (initial gas) particles have mass $5.9\times 10^{7}\,M_{\odot}$ ( $1.1\times 10^{7}\,M_{\odot}$ ) and Plummer equivalent softening $1.48$ kpc ( $0.37$ ckpc). Details on the simulation and the astrophysical subgrid models implemented can be found in Pillepich et al. (2018a). A dynamical classification of ICL and BCG-bound stars has never been attempted on IllustrisTNG, thus we leave a complete analysis of this aspect to future work. Conversely, we blindly run the ICL-SubFind algorithm to classify stars (see 2.1.2) in the selected halos and we visually inspect the properties of the two stellar components. We remark that in Marini et al. (2022) we already discussed the performance of the random forest for simulations including different physical models and resolutions, being generally consistent with the expected results.

In Fig. 9 we show the resulting distribution of star particle velocities in the two stellar components after running ICL-SubFind. The purple histogram highlights the double-Maxwellian profile for BCG+ICL which can be traced back to the sum of the distributions of the two components. Thus, we can confirm that in IllustrisTNG we can also dynamically distinguish two stellar populations in the main halo associated with a central galaxy and a diffuse component.

Based on this result, we carry out the same analysis (comprehensive of all the steps outlined in Sect. 2.2). The results of this analysis are shown in Fig. 10. The most important result is that the algorithm can spot the central galaxy position thanks to its kinematics and estimate its extension, as derived by the colour gradient in the ICL mass fraction distribution. Most interlopers are detected even though their size is often underestimated. Such error can lead to a systematic overestimation of the ICL stellar mass and points out the necessity of including different numerical simulations in future training.

Therefore, we conclude that optimal results are obtained for a set of images coming from our original set of simulations. Still, applying our Attention U-Net to mock images generated from a completely independent set of simulations also allows us to estimate the ICL and BCG extension happening at $\sim 50$ kpc from the centre.

4 Discussion and conclusions

The challenge of identifying the ICL in galaxy clusters and groups is a long-standing issue. It has become a matter of increasing interest, as current and future surveys gather data of increasing statistics and sensitivity. One example is provided by the recent observations carried out by JWST, which is providing deep and high spatial resolution images to study ICL with a high signal-to-noise up to a radial distance of $\sim 400$ kpc (Montes & Trujillo 2022), twice as far than previous HST studies (e.g. DeMaio et al. 2018). This opens up the possibility of exploring the rich mixture of processes that drive the formation of the ICL.

In this paper, we discuss a new technique to separate the contributions of the BCG and the ICL to the distribution of stars in the central regions of galaxy clusters. We argue that the stellar velocity dispersion is an indirect tracer of the underlying population segregated in the galaxies and central halo. Thus, we can deduce the properties of this component through a kinematical decomposition. Observationally, one can determine the ICL distribution in a cluster after several assumptions on the light profile, defining the ICL to be the contribution to the faint light (i.e. below a given surface brightness limit) in clusters and groups of galaxies (e.g. Mihos et al. 2017; Montes & Trujillo 2019); conversely, other studies (Kluge et al. 2020; Spavone et al. 2020) fit composite models to derive the stellar distribution in the light profile. The results of this latter approach are sensitive to the number of profiles fitted, each associated with a stellar component, and they might not be necessarily linked to the halo’s assembly history (Remus et al. 2017). Several authors have proposed a wavelet-like decomposition technique to extract the ICL from photometric images (Da Rocha & Mendes de Oliveira 2005; Ellien et al. 2021). However, not only do such methods often provide different results according to the assumption chosen, but they also do not necessarily detect the dynamical differences expected to differentiate the ICL from the BCG stellar components. Here, we underline the importance of dynamically identifying ICL both in observational analyses and simulations. In principle, the scatter in the observationally inferred values of the ICL mass fraction, presented so far in the literature (e.g. Rudick et al. 2011; Kluge et al. 2021), could be alleviated by standardising the methods used to define this component, although we are still far from having settled the matter.

Cosmological hydrodynamical simulations suggest that the analysis of stellar kinematics offers a physically grounded approach to define the separation between BCG and ICL components (e.g. Dolag et al. 2010; Marini et al. 2022). However, detailed spectroscopic observations to characterise the kinematics of the ICL are limited by the low surface brightness regime. Most spectroscopic data are restricted to 3 $-$ 4 BCG effective radii (e.g. recently Boardman et al. 2017; Loubser et al. 2022). At larger radii, discrete tracers are extensively used ((e.g. planetary nebulae and globular clusters; see Arnaboldi & Gerhard 2022, for a review). Ideally, recovering the kinematics of the stellar component would help us decipher the variety of results coming from different methods and homogenise the results in the literature.

Our work has shown that we can train a network to infer the ICL distributions from stellar kinematics. Thus, we can adapt and refine this method (for example, using different cosmological simulations to reduce the dependence on the numerical schemes) to suit our needs in future observations, providing a physically motivated distinction between the ICL and the central galaxy.

In this study, we selected a sample of clusters (among the most massive) from the Dianoga zoom-in cosmological hydrodynamical simulations, which we used to create mock images of galaxy clusters. From the original dataset, we performed data augmentation by modifying (e.g. resizing, zooming, rotating) the images and included a simplified treatment of contaminants (i.e. interlopers) by masking substructures in the image. The final dataset is composed of line-of-sight velocity dispersion and ICL mass fraction maps divided into training, validation and test sets. Finally, we trained two models (U-Net and Attention U-Net, Ronneberger et al. 2015) with the One-Cycle policy (Smith & Topin 2018) and we discussed the accuracy of the networks. Our results can be summarised as follows.

•

By modelling the dynamical separation of the BCG and ICL stellar components in our simulations (e.g. Dolag et al. 2010; Marini et al. 2022), cluster-size and group-size halos have a transition radius at roughly $0.1\,R_{200}$ , corresponding to the region where the ICL dominates the stellar component. We expect that covering with observations the scales corresponding to this transition radius should be effective in tracing the dynamical features of the ICL in the stellar kinematics (see Fig. 2).
•

The application of both U-Net and Attention U-Net models to mock velocity dispersion maps allows us to recover the large-scale ICL structure, though some gradients and small-scale features are not fully captured. The Attention U-Net generally outperforms the U-Net, particularly in capturing smaller-scale features and reducing spurious elements, as highlighted in Fig. 5.
•

The most significant limitations to our model are connected to the limited data size – as described in Fig. 8 – and the unique numerical scheme included in the training of the network. Such conditions impact the accuracy of the model predictions, as proved when run on halos from the IllustrisTNG-300 – see Fig. 10. A viable solution would be to expand the training phase to include a set of simulations more extended in terms of the number of clusters simulated, implementations of the relevant physical processes driving galaxy formation, and numerical resolution.

In conclusion, the method presented here proved sufficiently reliable in characterising the ICL distribution in our simulations set only from the projected phase-space information. As a final remark, we shall refrain from claiming that the network will perform at this level of accuracy for real spectroscopic observations or other simulations, as its accuracy has shown to be dependent on the details of the numerical implementation of the physical processes included in the training data. In this sense, a natural follow-up would be the design of a training set effectively mocking the observational conditions of an IFS (e.g. spectral features and signal-to-noise ratio) to guide observers in their task. Furthermore, the analysis should be extended to investigate the role of the halo’s dynamical state (i.e. relaxed or disturbed) in affecting the recovery rate of the maps, since disruptive events can impact both the stellar kinematics (Longobardi et al. 2015) and ICL fraction (Contini et al. 2023). On the other hand, this method paves the way for CNNs as powerful tools for constructing a robust pipeline of ICL detection, taking advantage of high-sensitivity spectroscopic studies of stellar kinematics in central regions of clusters and groups.

Acknowledgements.

We thank Amelia Fraser-McKelvie for her support on the technical aspects of the observation. We are grateful to Magda Arnaboldi, Claudia Pulsoni, and Paola Popesso for the useful discussions that led to significant improvements to the paper’s draft. We acknowledge the CINECA award under the ISCRA initiative, for the availability of high-performance computing resources and support. IM acknowledges support from the European Research Council (ERC) under the European Union’s Horizon Europe research and innovation programme ERC CoG (Grant agreement No. 101045437, PI P. Popesso). Simulations have been carried out: using MARCONI at CINECA (Italy), with CPU time assigned through grants ISCRA B, and through INAF-CINECA and University of Trieste – CINECA agreements; at the Tianhe-2 platform of the Guangzhou Supercomputer Center by the support from the National Key Program for Science and Technology Research and Development (2017YFB0203300). This paper is supported by the Fondazione ICSC National Recovery and Resilience Plan (PNRR) Project ID CN-00000013 ”Italian Research Center on High-Performance Computing, Big Data and Quantum Computing” funded by MUR Missione 4 Componente 2 Investimento 1.4: ”Potenziamento strutture di ricerca e creazione di ”campioni nazionali di R

\&

S (M4C2-19 )” - Next Generation EU (NGEU). SB acknowledges partial financial support from the INFN Indark Grant.

References

Agarap (2019) Agarap, A. F. 2019
Alonso Asensio et al. (2020) Alonso Asensio, I., Dalla Vecchia, C., Bahé, Y. M., Barnes, D. J., & Kay, S. T. 2020, MNRAS, 494, 1859
Arnaboldi & Gerhard (2022) Arnaboldi, M. & Gerhard, O. 2022, Frontiers in Astronomy and Space Sciences, 9
Arnaboldi & Napolitano (2001) Arnaboldi, M. & Napolitano, N. R. 2001, 230, 409
Bacon et al. (2010) Bacon, R., Accardo, M., Adjali, L., et al. 2010, 7735, 773508
Ball et al. (2004) Ball, N. M., Loveday, J., Fukugita, M., et al. 2004, MNRAS, 348, 1038
Banerji et al. (2010) Banerji, M., Lahav, O., Lintott, C. J., et al. 2010, MNRAS, 406, 342
Bassini et al. (2020) Bassini, L., Rasia, E., Borgani, S., et al. 2020, A&A, 642, A37
Beck et al. (2016) Beck, A. M., Murante, G., Arth, A., et al. 2016, MNRAS, 455, 2110
Binney & Tremaine (2011) Binney, J. & Tremaine, S. 2011, Galactic Dynamics: Second Edition (Princeton University Press)
Boardman et al. (2017) Boardman, N. F., Weijmans, A.-M., van den Bosch, R., et al. 2017, MNRAS, 471, 4005
Burke et al. (2019) Burke, C. J., Aleo, P. D., Chen, Y.-C., et al. 2019, MNRAS, 490, 3952
Bílek et al. (2020) Bílek, M., Duc, P.-A., Cuillandre, J.-C., et al. 2020, MNRAS, 498, 2138
Cantat-Gaudin et al. (2020) Cantat-Gaudin, T., Anders, F., Castro-Ginard, A., et al. 2020, A&A, 640, A1
Carleo et al. (2019) Carleo, G., Cirac, I., Cranmer, K., et al. 2019, Reviews of Modern Physics, 91, 045002
Chadayammuri et al. (2023) Chadayammuri, U., Ntampaka, M., ZuHone, J., Bogdán, Á., & Kraft, R. P. 2023, MNRAS, 526, 2812
Chen et al. (2022) Chen, X., Zu, Y., Shao, Z., & Shan, H. 2022, MNRAS, 514, 2692
Collister & Lahav (2004) Collister, A. A. & Lahav, O. 2004, Publications of the Astronomical Society of the Pacific, 116, 345
Contini (2021) Contini, E. 2021, Galaxies, 9, 60
Contini et al. (2022) Contini, E., Chen, H. Z., & Gu, Q. 2022, ApJ, 928, 99
Contini & Gu (2021) Contini, E. & Gu, Q. 2021, ApJ, 915, 106
Contini et al. (2023) Contini, E., Jeon, S., Rhee, J., Han, S., & Yi, S. K. 2023, ApJ, 958, 72
Da Rocha & Mendes de Oliveira (2005) Da Rocha, C. & Mendes de Oliveira, C. 2005, MNRAS, 364, 1069
DeMaio et al. (2018) DeMaio, T., Gonzalez, A. H., Zabludoff, A., et al. 2018, MNRAS, 474, 3009
Dieleman et al. (2015) Dieleman, S., Willett, K. W., & Dambre, J. 2015, MNRAS, 450, 1441
Dolag et al. (2009) Dolag, K., Borgani, S., Murante, G., & Springel, V. 2009, MNRAS, 399, 497
Dolag et al. (2010) Dolag, K., Murante, G., & Borgani, S. 2010, MNRAS, 405, 1544
Dolag et al. (2005) Dolag, K., Vazza, F., Brunetti, G., & Tormen, G. 2005, MNRAS, 364, 753
Domínguez Sánchez et al. (2018) Domínguez Sánchez, H., Huertas-Company, M., Bernardi, M., Tuccillo, D., & Fischer, J. L. 2018, MNRAS, 476, 3661
Ellien et al. (2021) Ellien, A., Slezak, E., Martinet, N., et al. 2021, A&A, 649, A38
Feldmann et al. (2006) Feldmann, R., Carollo, C. M., Porciani, C., et al. 2006, MNRAS, 372, 565
George & Huerta (2018) George, D. & Huerta, E. A. 2018, Physics Letters B, 778, 64
Gibson et al. (2012) Gibson, N. P., Aigrain, S., Roberts, S., et al. 2012, MNRAS, 419, 2683
Glorot et al. (2011) Glorot, X., Bordes, A., & Bengio, Y. 2011, in Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (JMLR Workshop and Conference Proceedings), 315–323, iSSN: 1938-7228
Gonzalez et al. (2021) Gonzalez, A. H., George, T., Connor, T., et al. 2021, MNRAS, 507, 963
Gonzalez et al. (2007) Gonzalez, A. H., Zaritsky, D., & Zabludoff, A. I. 2007, ApJ, 666, 147
Goodfellow et al. (2016) Goodfellow, I., Bengio, Y., & Courville, A. 2016, Deep Learning (MIT Press)
He et al. (2015) He, K., Zhang, X., Ren, S., & Sun, J. 2015, Deep Residual Learning for Image Recognition
Jaffe (1983) Jaffe, W. 1983, MNRAS, 202, 995
Jetley et al. (2018) Jetley, S., Lord, N. A., Lee, N., & Torr, P. H. S. 2018, Learn To Pay Attention
Kamdar et al. (2016) Kamdar, H. M., Turk, M. J., & Brunner, R. J. 2016, MNRAS, 457, 1162
Karademir et al. (2019) Karademir, G. S., Remus, R.-S., Burkert, A., et al. 2019, MNRAS, 487, 318
Kingma & Ba (2017) Kingma, D. P. & Ba, J. 2017, Adam: A Method for Stochastic Optimization
Kluge et al. (2021) Kluge, M., Bender, R., Riffeser, A., et al. 2021, ApJS, 252, 27
Kluge et al. (2020) Kluge, M., Neureiter, B., Riffeser, A., et al. 2020, ApJS, 247, 43
Kravtsov & Borgani (2012) Kravtsov, A. V. & Borgani, S. 2012, Annu. Rev. Astron. Astrophys., 50, 353
Krizhevsky et al. (2012) Krizhevsky, A., Sutskever, I., & Hinton, G. E. 2012, in Advances in Neural Information Processing Systems, Vol. 25 (Curran Associates, Inc.)
LeCun et al. (2015) LeCun, Y., Bengio, Y., & Hinton, G. 2015, Nature, 521, 436
LeCun et al. (1998) LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. 1998, Proceedings of the IEEE, 86, 2278
Longobardi et al. (2015) Longobardi, A., Arnaboldi, M., Gerhard, O., & Mihos, J. C. 2015, A&A, 579, L3
Loubser et al. (2022) Loubser, S. I., Lagos, P., Babul, A., et al. 2022, MNRAS, 515, 1104
Marini et al. (2021) Marini, I., Borgani, S., Saro, A., et al. 2021, MNRAS, 507, 5780
Marini et al. (2022) Marini, I., Borgani, S., Saro, A., et al. 2022, MNRAS, 514, 3082
Mihos et al. (2017) Mihos, J. C., Harding, P., Feldmeier, J. J., et al. 2017, ApJ, 834, 16
Montenegro-Taborda et al. (2023) Montenegro-Taborda, D., Rodriguez-Gomez, V., Pillepich, A., et al. 2023, MNRAS, 521, 800
Montes (2022) Montes, M. 2022, Nature Astronomy, 6, 308
Montes et al. (2021) Montes, M., Brough, S., Owers, M. S., & Santucci, G. 2021, ApJ, 910, 45
Montes & Trujillo (2019) Montes, M. & Trujillo, I. 2019, MNRAS, 482, 2838
Montes & Trujillo (2022) Montes, M. & Trujillo, I. 2022, ApJ, 940, L51
Murante et al. (2004) Murante, G., Arnaboldi, M., Gerhard, O., et al. 2004, ApJ, 607, L83
Navarro et al. (1997) Navarro, J. F., Frenk, C. S., & White, S. D. M. 1997, ApJ, 490, 493
Oktay et al. (2018) Oktay, O., Schlemper, J., Folgoc, L. L., et al. 2018
Paszke et al. (2019) Paszke, A., Gross, S., Massa, F., et al. 2019
Pillepich et al. (2018a) Pillepich, A., Nelson, D., Hernquist, L., et al. 2018a, MNRAS, 475, 648
Pillepich et al. (2018b) Pillepich, A., Springel, V., Nelson, D., et al. 2018b, MNRAS, 473, 4077
Pop et al. (2018) Pop, A.-R., Pillepich, A., Amorisco, N. C., & Hernquist, L. 2018, MNRAS, 480, 1715
Proctor et al. (2023) Proctor, K. L., Lagos, C. d. P., Ludlow, A. D., & Robotham, A. S. G. 2023
Ragone-Figueroa et al. (2018) Ragone-Figueroa, C., Granato, G. L., Ferraro, M. E., et al. 2018, MNRAS, 479, 1125
Ragone-Figueroa et al. (2013) Ragone-Figueroa, C., Granato, G. L., Murante, G., Borgani, S., & Cui, W. 2013, MNRAS, 436, 1750
Remus et al. (2017) Remus, R.-S., Dolag, K., & Hoffmann, T. 2017, Galaxies, 5, 49
Ronneberger et al. (2015) Ronneberger, O., Fischer, P., & Brox, T. 2015, U-Net: Convolutional Networks for Biomedical Image Segmentation
Rudick et al. (2011) Rudick, C. S., Mihos, J. C., & McBride, C. K. 2011, ApJ, 732, 48
Salvato et al. (2011) Salvato, M., Ilbert, O., Hasinger, G., et al. 2011, ApJ, 742, 61
Schanche et al. (2019) Schanche, N., Collier Cameron, A., Hébrard, G., et al. 2019, MNRAS, 483, 5534
Schlemper et al. (2019) Schlemper, J., Oktay, O., Schaap, M., et al. 2019, Medical Image Analysis, 53, 197
Siddique et al. (2021) Siddique, N., Paheding, S., Alom, M. Z., & Devabhaktuni, V. 2021, in Pattern Recognition and Tracking XXXII. Vol. 11735. SPIE, Vol. 11735, 117350L
Smith (2017) Smith, L. N. 2017, Cyclical Learning Rates for Training Neural Networks
Smith & Topin (2018) Smith, L. N. & Topin, N. 2018, Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates
Smith & Geach (2023) Smith, M. J. & Geach, J. E. 2023, Royal Society Open Science, 10, 221454
Spavone et al. (2020) Spavone, M., Iodice, E., van de Ven, G., et al. 2020, A&A, 639, A14
Springel (2005) Springel, V. 2005, MNRAS, 364, 1105
Springel & Hernquist (2003) Springel, V. & Hernquist, L. 2003, MNRAS, 339, 289
Springel et al. (2001) Springel, V., White, S. D. M., Tormen, G., & Kauffmann, G. 2001, MNRAS, 328, 726
Sérsic (1963) Sérsic, J. L. 1963, Boletin de la Asociacion Argentina de Astronomia La Plata Argentina, 6, 41
Tormen & Bertschinger (1996) Tormen, G. & Bertschinger, E. 1996, ApJ, 472, 14
Tornatore et al. (2007) Tornatore, L., Borgani, S., Dolag, K., & Matteucci, F. 2007, MNRAS, 382, 1050
Valenzuela & Remus (2022) Valenzuela, L. M. & Remus, R.-S. 2022
Villaescusa-Navarro et al. (2020) Villaescusa-Navarro, F., Hahn, C., Massara, E., et al. 2020, ApJS, 250, 2
Vojtekova et al. (2021) Vojtekova, A., Lieu, M., Valtchanov, I., et al. 2021, MNRAS, 503, 3204
Zhang et al. (2019) Zhang, Y., Yanny, B., Palmese, A., et al. 2019, ApJ, 874, 165
Zibetti et al. (2005) Zibetti, S., White, S. D. M., Schneider, D. P., & Brinkmann, J. 2005, MNRAS, 358, 949