Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
License: arXiv.org perpetual non-exclusive license
arXiv:2403.06178v1 [astro-ph.GA] 10 Mar 2024

Estimating the mass of galactic components using machine learning algorithms

J. N. López1,212{}^{1,2}start_FLOATSUPERSCRIPT 1 , 2 end_FLOATSUPERSCRIPT, E. Munive1,212{}^{1,2}start_FLOATSUPERSCRIPT 1 , 2 end_FLOATSUPERSCRIPT, A. A. Avilez22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT, O.M. Martínez22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT
11{}^{1}start_FLOATSUPERSCRIPT 1 end_FLOATSUPERSCRIPTCEICO - FZU, Institute of Physics of the Czech Academy of Sciences,
Na Slovance 1999/2, 182 00, Prague Czechia.
22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPTFacultad de Ciencias Físico-Matemáticas. Benemérita Universidad
Autónoma de Puebla. Av. San Claudio SN, Col. San Manuel, Puebla, México
Abstract

The estimation of the bulge and disk massses, the main baryonic components of a galaxy, can be performed using various approaches, but their implementation tend to be challenging as they often rely on strong assumptions about either the baryon dynamics or the dark matter model. In this work, we present an alternative method for predicting the masses of galactic components, including the disk, bulge, stellar and total mass, using a set of machine learning algorithms: KNN-neighbours (KNN), Linear Regression (LR), Random Forest (RF) and Neural Network (NN). The rest-frame absolute magnitudes in the ugriz-photometric system were selected as input features, and the training was performed using a sample of spiral galaxies hosting a bulge from Guo’s mock catalogue (Guo et al., 2011) derived from the Millennium simulation. In general, all the algorithms provide good predictions for the galaxy’s mass components ranging from 109Msuperscript109subscript𝑀direct-product10^{9}\,M_{\odot}10 start_POSTSUPERSCRIPT 9 end_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT ⊙ end_POSTSUBSCRIPT to 1011Msuperscript1011subscript𝑀direct-product10^{11}\,M_{\odot}10 start_POSTSUPERSCRIPT 11 end_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT ⊙ end_POSTSUBSCRIPT, corresponding to the central region of the training mass domain; however, the NN give rise to the most precise predictions in comparison to other methods. Additionally, to test the performance of the NN architecture, we used a sample of observed galaxies from the SDSS survey whose mass components are known. We found that the NN can predict the luminous masses of disk-dominant galaxies within the same range of magnitudes that for the synthetic sample up to a 99%percent9999\%99 % level of confidence, while mass components of galaxies hosting larger bulges are well predicted up to 95%percent9595\%95 % level of confidence. The NN algorithm can also bring up scaling relations between masses of different components and magnitudes.

Keywords— Galactic Systems; Neural Network; Scaling relations

1 Introduction

The bulge-disk decomposition of galactic systems is useful for understanding the evolutionary processes of galaxies and their dynamics. Specifically, the disk and bulge masses can be inferred, given that their stellar population has different dynamic or even chemical features. There are plenty of schemes for classifying galaxies; one of the most popular corresponds to the morphological classification proposed by Edwin Hubble (Hubble, 1929) consisting of four types of galaxies: elliptical, spiral, barred spiral, and irregular. Another method involves the isophotal radius measurement (Holmberg, 1958), determining the size attributed to a galaxy component according to a particular surface brightness level. A way to characterize the light distribution independent of the light profile is through the concentration measure, defined by the ratio of two geometrical regions, each containing a fixed fraction of the total galaxy luminosity (Kent, 1985).

Another approach for reconstructing the visible mass of galactic components involves using standardized fitting functions. Ideally, these functions should be derived from the fundamental principles governing galactic evolution. However, due to the intricate nature of the physics involved, models based on these principles often become complex, with a substantial number of parameters. Then, commonly used functions are empirically derived. For instance, the disk components are well-fitted by an exponential law, while for the elliptical galaxies and the bulges in the spiral ones, the relations typically considered are the King’s model (King, 1966) and the de Vaucouleurs’s law (de Vaucouleurs, 1948). Sometimes, the bulges associated with late-type galaxies are best fitted by exponential laws (Andredakis et al., 1995; Freeman, 1970). However, implementing these methods demands high-quality observational data to obtain reliable results.

Furthermore, it is possible to single out the galactic components through the light distribution of a galaxy. This decomposition is derived by fitting the light profile to a power law, adhering to a specific empirical or analytical model. In the conventional photometric technique, the one-dimensional case is considered. On the other hand, when multiple wavelengths in the spectrum are taken, spectroscopic methods come into play (Johnston et al., 2012).

On the other side, numerical simulations play an important role in exploring predictions of galaxy evolution within the standard ΛΛ\Lambdaroman_ΛCDM prescription (Croton et al., 2006; Guo et al., 2010; De Lucia et al., 2004). Semi-analytical models have gained popularity when identifying the structural components of galactic systems. These models employ a simplified representation of baryonic physics, coupled with Markov-Chain-Monte-Carlo methods for reconstructing merger trees (Lucia, 2019).

For dark matter-only simulations a common technique to infer information about the baryonic components is to assume the halo-abundance matching (HAM), which relates the halo potential well to the star formation rate in such a way that more luminous galaxies are associated to more massive halos. During the evolution of both components, material exchange occurs between the baryonic elements through various processes. For example, forming a galactic bulge may result from major or minor mergers (Hopkins et al., 2010). In these processes, pre-existing and newly formed stars play a crucial role; after a merger, all stars from the progenitors contribute to the bulge component of the resulting galaxy. The gas within the progenitors becomes part of the resulting galaxy disk, and the specific angular momentum of this component equals that of the halo in which it is embedded (Guo et al., 2011; Bower et al., 2006; De Lucia & Blaizot, 2007).

As can be seen, various approaches exist for describing galactic components, including pure morphological observations or photometric and/or spectroscopic techniques, either synthetically through mock catalogues. Conversely, obtaining information about the total mass often involves making strong assumptions, whether about a specific dark matter model or the overall kinematics of the system. In this work, we propose an artificial intelligence (AI)-based method to isolate the bulge and disk components of both baryonic and total galaxy mass. This is accomplished using the information on luminosity and features inferred from stellar dynamics encrypted in Guo’s synthetic catalogue (Guo et al., 2011). Our goal is to perform the decomposition of the galactic components without including additional information about baryons in the training stage beyond the patterns the AI methods can infer from the catalogue. This method can be useful to obtain the components of observed galaxies whose baryonic dynamics cannot be obtained easily.

Since the mass values range between several orders of magnitude, it is well suited to predict the logarithm base 10 of the stellar mass Msubscript𝑀M_{\star}italic_M start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT, the disk mass Mdisksubscript𝑀diskM_{\text{disk}}italic_M start_POSTSUBSCRIPT disk end_POSTSUBSCRIPT and the total mass Mtotsubscript𝑀totM_{\text{tot}}italic_M start_POSTSUBSCRIPT tot end_POSTSUBSCRIPT. Here, the bulge mass Mbulgesubscript𝑀bulgeM_{\text{bulge}}italic_M start_POSTSUBSCRIPT bulge end_POSTSUBSCRIPT is not within the prediction set since we compute it in terms of disk mass Mdisksubscript𝑀diskM_{\text{disk}}italic_M start_POSTSUBSCRIPT disk end_POSTSUBSCRIPT and the total mass Msubscript𝑀M_{\star}italic_M start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT from the following expression (Guo et al., 2011; Lucia, 2019)

M=Mbulge+Mdisk.subscript𝑀subscript𝑀bulgesubscript𝑀diskM_{\star}=M_{\text{bulge}}+M_{\text{disk}}.italic_M start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT = italic_M start_POSTSUBSCRIPT bulge end_POSTSUBSCRIPT + italic_M start_POSTSUBSCRIPT disk end_POSTSUBSCRIPT . (1)

It is worth stressing that with our method, we aim to predict mass components in spirals following some theoretical assumptions about the features of dark matter. Specifically, using a mock catalogue as a training set of our AI algorithms, we set dark matter in a CDM prescription and make inferences about mass components holding such a hypothesis. Our algorithms were tested using observational data reported in the SDSS database (Abdurro’uf et al., 2022) to assess how well the predictions match observations, keeping the features set as simple as possible. We are interested in knowing the advantages and disadvantages of the algorithms to predict galaxies with different features.

This paper is structured as follows: Section 2 presents and analyses the content of Guo’s galaxy catalogue to determine the correlation between input features and output predictions, emphasizing their importance during the training stage. Following this, Section 3 introduces the machine learning algorithms considered in this work and explores their dependency on variations in different parameters. Subsequently, in Section 4.1, we analyze the performance of each algorithm. Then, in Section 4, we apply the trained methods to predict masses of components in observed galaxies from the Sloan Digital Sky Survey (Abdurro’uf et al., 2022) database. This allows us to derive various scaling relations commonly studied in the literature. Finally, we draw some conclusions in section 6.

2 The Data

To train our machine learning algorithms, we have used Guo’s galaxy catalogue (Guo et al., 2011) derived from the Millennium simulation, selecting only galaxies with nonzero bulges or disc components, leading to a set of 833,491 galaxies. This dataset was split randomly, assigning a common selection where 75%percent\%% of total data is defined for training and 25%percent\%% to evaluate the performance of the algorithms. The Millenium simulation is a dark matter-only, carried out under the ΛΛ\Lambdaroman_ΛCDM prescription (Springel et al., 2006) using a customized version of the Gadget 2 code (Springel et al., 2005) with 21603superscript216032160^{3}2160 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT particles within a box of L=500 Mpc/h𝐿500 MpcL=500\text{ Mpc}/hitalic_L = 500 Mpc / italic_h. This catalogue provides information about the merger history of each halo and the baryon content, which has been split into five components: the stellar bulge, the stellar disk, a gas disk, a halo and an ejecta reservoir (Guo et al., 2010).

The analytical model implemented in Guo’s catalogue considers that galaxies form within the central region of dark matter halos. The fitting function, which describes the average baryon fraction of a halo given the total mass, can be written in terms of its mass and redshifts (Gnedin, 2000)

fb(z,Mtot)=fbcos(1+(2α/31)[MtotMc(z)]α)3/α,subscript𝑓𝑏𝑧subscript𝑀totsuperscriptsubscript𝑓𝑏cossuperscript1superscript2𝛼31superscriptdelimited-[]subscript𝑀totsubscript𝑀𝑐𝑧𝛼3𝛼f_{b}(z,M_{\text{tot}})=f_{b}^{\text{cos}}\left(1+(2^{\alpha/3}-1)\left[\frac{% M_{\text{tot}}}{M_{c}(z)}\right]^{-\alpha}\right)^{-3/\alpha},italic_f start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( italic_z , italic_M start_POSTSUBSCRIPT tot end_POSTSUBSCRIPT ) = italic_f start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT cos end_POSTSUPERSCRIPT ( 1 + ( 2 start_POSTSUPERSCRIPT italic_α / 3 end_POSTSUPERSCRIPT - 1 ) [ divide start_ARG italic_M start_POSTSUBSCRIPT tot end_POSTSUBSCRIPT end_ARG start_ARG italic_M start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( italic_z ) end_ARG ] start_POSTSUPERSCRIPT - italic_α end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 3 / italic_α end_POSTSUPERSCRIPT , (2)

where the universal baryon fraction is usually taken as fbcos=ΩbΩ017%superscriptsubscript𝑓𝑏cossubscriptΩ𝑏subscriptΩ0similar-topercent17f_{b}^{\text{cos}}=\displaystyle\frac{\Omega_{b}}{\Omega_{0}}\sim 17\%italic_f start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT cos end_POSTSUPERSCRIPT = divide start_ARG roman_Ω start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_ARG start_ARG roman_Ω start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG ∼ 17 %. Here, Mcsubscript𝑀𝑐M_{c}italic_M start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT represents the characteristic mass objects which can retain 50%percent5050\%50 % of the gas components to form stars. The reionization and cooling depend on the baryon fraction in a given halo and its mass and redshift. The disk and bulge formation are correlated with star formation and supernova feedback processes, as well as with the black hole growth and AGN feedback. Additionally, mergers between the central and satellite galaxies are described through simulations and play an important role in the disk and bulge evolution. This catalogue accurately reproduces the population and clustering mechanisms observed at z0similar-to𝑧0z\sim 0italic_z ∼ 0. However, it exhibits inconsistencies for high-redshift populations.

In this work, we consider galaxies at z=0𝑧0z=0italic_z = 0. Our goal was to investigate spiral galaxies hosting both bulges and disks. Then, we imposed this strong filter when selecting our sample from the mock catalogue. It is crucial to note that our selection encompasses diverse galaxy types without accounting for age or metallicity. The purpose is to explore the capabilities of the algorithms to get information about the systems by exclusively using photometric information.

The resolution of the simulation delimits the range of masses for each component. Once the selection of bulge-disk galaxies has been performed, the range of the total mass is between 1010M/hsuperscript1010subscript𝑀direct-product10^{10}M_{\odot}/h10 start_POSTSUPERSCRIPT 10 end_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT ⊙ end_POSTSUBSCRIPT / italic_h and 1013M/hsuperscript1013subscript𝑀direct-product10^{13}M_{\odot}/h10 start_POSTSUPERSCRIPT 13 end_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT ⊙ end_POSTSUBSCRIPT / italic_h. Notably, the selected total mass range excludes galaxies of both low and high masses. Indeed, massive galaxies tend to exhibit an elliptical morphology rather than spiral (De Lucia et al., 2006).

2.1 Features importance

It is well known that the physical and photometric properties of the stellar population of a galaxy are closely related to its dynamics and the spatial mass distribution of different components within the system. Specifically, this behaviour is reflected in the colour-magnitude relation. For instance, it has been shown that bulge-dominant galaxies have a color-magnitude diagram mainly described by red galaxies (Hogg et al., 2004). Besides, in Barsanti et al. (2021), it has been shown that the bulge is redder than the disk in galaxies within a cluster. A similar conclusion was reported in Dimauro et al. (2018).

In machine learning, the training data is defined as the feature vector X and their corresponding label or associated output y𝑦yitalic_y, with unknown distribution 𝒫(𝐗,y)𝒫𝐗𝑦\mathcal{P}(\textbf{X},y)caligraphic_P ( X , italic_y ), as follows

D={(𝐗1,y1),,(𝐗n,yn)}d×,𝐷subscript𝐗1subscript𝑦1subscript𝐗𝑛subscript𝑦𝑛superscript𝑑D=\Big{\{}(\textbf{X}_{1},y_{1}),\ldots,(\textbf{X}_{n},y_{n})\Big{\}}% \subseteq\mathcal{R}^{d}\times\mathcal{I},italic_D = { ( X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , ( X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) } ⊆ caligraphic_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT × caligraphic_I , (3)

where dsuperscript𝑑\mathcal{R}^{d}caligraphic_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT denotes the dlimit-from𝑑d-italic_d -dimensional feature space, \mathcal{I}caligraphic_I the label space and n𝑛nitalic_n is the sample size. In this work, we consider two sets of features, the first corresponding to the u, g, r, i, and z absolute magnitudes that we dub hereafter as Set I. Such magnitudes are also available in the SDSS dataset; therefore, there is an observational counterpart. Within a second set (Set II), the same features as Set I are considered in addition to the maximum rotational velocity of the halo, Vmaxsubscript𝑉maxV_{\text{max}}italic_V start_POSTSUBSCRIPT max end_POSTSUBSCRIPT, to include information about the kinematics. In both cases, the predictions (labels) are Mdisksubscript𝑀diskM_{\text{disk}}italic_M start_POSTSUBSCRIPT disk end_POSTSUBSCRIPT, Msubscript𝑀M_{\star}italic_M start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT, and Mtotsubscript𝑀totM_{\text{tot}}italic_M start_POSTSUBSCRIPT tot end_POSTSUBSCRIPT as it is displayed in Table 1.

An exploration of the data was conducted using Pearson’s correlation ratio,

r𝐗,y=i=1n(𝐗i𝐗¯)(yiy¯)(n1)SXSY,subscript𝑟𝐗𝑦superscriptsubscript𝑖1𝑛subscript𝐗𝑖¯𝐗subscript𝑦𝑖¯𝑦𝑛1subscript𝑆𝑋subscript𝑆𝑌r_{\textbf{X},y}=\displaystyle\frac{\sum_{i=1}^{n}(\textbf{X}_{i}-\bar{\textbf% {X}})(y_{i}-\bar{y})}{(n-1)S_{X}S_{Y}},italic_r start_POSTSUBSCRIPT X , italic_y end_POSTSUBSCRIPT = divide start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over¯ start_ARG X end_ARG ) ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over¯ start_ARG italic_y end_ARG ) end_ARG start_ARG ( italic_n - 1 ) italic_S start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT end_ARG , (4)

where barred symbols represent the mean values and SX,Ysubscript𝑆𝑋𝑌S_{X,Y}italic_S start_POSTSUBSCRIPT italic_X , italic_Y end_POSTSUBSCRIPT is the standard deviation. When this quotient is rX,Y=±1subscript𝑟𝑋𝑌plus-or-minus1r_{X,Y}=\pm 1italic_r start_POSTSUBSCRIPT italic_X , italic_Y end_POSTSUBSCRIPT = ± 1, we have a perfect positive (negative) correlation, whereas for rX,Y=0subscript𝑟𝑋𝑌0r_{X,Y}=0italic_r start_POSTSUBSCRIPT italic_X , italic_Y end_POSTSUBSCRIPT = 0, parameters are not correlated at all.

Input u𝑢uitalic_u r𝑟ritalic_r g𝑔gitalic_g i𝑖iitalic_i z𝑧zitalic_z Vmaxsubscript𝑉maxV_{\text{max}}italic_V start_POSTSUBSCRIPT max end_POSTSUBSCRIPT
Set I
Set II
Output log10(Mdisk)subscript10subscript𝑀disk\log_{10}(M_{\text{disk}})roman_log start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT ( italic_M start_POSTSUBSCRIPT disk end_POSTSUBSCRIPT ) log10(M\log_{10}(M_{\star}roman_log start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT ( italic_M start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT) log10(Mtot)subscript10subscript𝑀tot\log_{10}(M_{\text{tot}})roman_log start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT ( italic_M start_POSTSUBSCRIPT tot end_POSTSUBSCRIPT )
Table 1: Input and outputs features considered for the ML algorithms in this work. The Set I corresponds to the photometric information derived from Guo’s catalogue using semi-analytical models. The Set II includes information about the dynamics of all compoments.

In Figure 1, the correlation matrix illustrates how the features contribute to the algorithm’s predictions. The matrix displays the absolute values of Pearson’s correlation ratio, focusing solely on the strength of the correlation parameter. As anticipated, Msubscript𝑀M_{\star}italic_M start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT exhibits a high correlation with the magnitudes, particularly with the z and i bands, corresponding to the infrared and near-infrared regions of the spectrum, respectively. Observationally, the determination of luminous mass is significantly influenced by dust, with emissions in the optical band experiencing reddening. Conversely, this effect is negligible in the near-infrared (Tully et al., 1998). On the other hand, Mdisksubscript𝑀diskM_{\text{disk}}italic_M start_POSTSUBSCRIPT disk end_POSTSUBSCRIPT shows less correlation with the magnitudes compared to Msubscript𝑀M_{\star}italic_M start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT and exhibits a weak relation with the remaining quantities. Furthermore, Mtotsubscript𝑀totM_{\text{tot}}italic_M start_POSTSUBSCRIPT tot end_POSTSUBSCRIPT exhibits a strong correlation with Msubscript𝑀M_{\star}italic_M start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT due to the HAM relation implemented in the mock catalogues. The correlation between Mtotsubscript𝑀totM_{\text{tot}}italic_M start_POSTSUBSCRIPT tot end_POSTSUBSCRIPT and Vmaxsubscript𝑉maxV_{\text{max}}italic_V start_POSTSUBSCRIPT max end_POSTSUBSCRIPT, which encapsulates information about the dynamics of all components, surpasses that of other masses. Notably, the correlations with Mdisksubscript𝑀diskM_{\text{disk}}italic_M start_POSTSUBSCRIPT disk end_POSTSUBSCRIPT are weak, suggesting that the dark matter component influences the stellar component as a whole rather than each component individually.

Refer to caption
Figure 1: Heat map of the absolute value of the Pearson correlation coefficient between the galaxy parameters considered in this work. The redder it is, the higher the correlation. As expected, stronger correlations occur between different bands’ stellar mass and magnitudes. However, a relation exists between the total mass and the magnitudes, although to a lesser extent.

3 Implementation of the machine learning algorithms

We employed a set of widely used supervised algorithms known for their effective predictions, listed as follows. These methods were implemented using the scikit-learn library (Pedregosa et al., 2011; Buitinck et al., 2013) and the Keras API (Chollet et al., 2015).

  • KN-Neighbours (KNN). This algorithm relies on the idea that the set of k𝑘kitalic_k nearest data points CxDsubscript𝐶𝑥𝐷C_{x}\subset Ditalic_C start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ⊂ italic_D, where |Cx|=ksubscript𝐶𝑥𝑘\mathinner{\!\left\lvert C_{x}\right\rvert}=kstart_ATOM | italic_C start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT | end_ATOM = italic_k, have similar values. The neighbours are defined such that

    dist(𝐱,𝐱)distmax(𝐱,𝐱′′),with(𝐱′′,y′′)Cx,(𝐱,y)Dformulae-sequencedist𝐱superscript𝐱subscriptdistmax𝐱superscript𝐱′′formulae-sequencewithsuperscript𝐱′′superscript𝑦′′subscript𝐶𝑥superscript𝐱superscript𝑦𝐷\text{dist}(\textbf{x},\textbf{x}^{\prime})\geq\text{dist}_{\text{max}}(% \textbf{x},\textbf{x}^{\prime\prime}),\quad\text{with}\quad(\textbf{x}^{\prime% \prime},y^{\prime\prime})\in C_{x},\quad(\textbf{x}^{\prime},y^{\prime})\in Ddist ( x , x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ≥ dist start_POSTSUBSCRIPT max end_POSTSUBSCRIPT ( x , x start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) , with ( x start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) ∈ italic_C start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT , ( x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∈ italic_D (5)

    This distance is defined in the hyperspace of features using the Euclidian metric, and the final value is the average of their outputs. In this case, the number of neighbours is a free parameter, and we found that the highest accuracy is achieved when the number of neighbours is close to 18; the error starts to increase beyond that value.

  • Linear Regression (LR). The traditional linear regression minimizes the sum of the squared differences between the predicted and actual values. We are considering this method to compare it with more sophisticated techniques.

  • Random Forest (RF). This algorithm is subject to the number of trees and their depth. Each tree contains decision nodes 𝒩msubscript𝒩𝑚\mathcal{N}_{m}caligraphic_N start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT that split the data (Xnode,ynode)subscript𝑋nodesubscript𝑦node(X_{\text{node}},y_{\text{node}})( italic_X start_POSTSUBSCRIPT node end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT node end_POSTSUBSCRIPT ) (in the parent node) into smaller (left and right) subsets in new child nodes CmLsuperscriptsubscript𝐶𝑚𝐿C_{m}^{L}italic_C start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT and CmRsuperscriptsubscript𝐶𝑚𝑅C_{m}^{R}italic_C start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT until the branch finds a homogeneous group according to the set of hyperparameters. Splitting each node in regression is made following the minimization of the residual msubscript𝑚\mathcal{R}_{m}caligraphic_R start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT defined as

    argmin(m)=m𝒩m(ymy¯m)2(mCmL(yy¯mL)2+mCmR(yy¯mR)2),argminsubscript𝑚subscript𝑚subscript𝒩𝑚superscriptsubscript𝑦𝑚subscript¯𝑦𝑚2subscript𝑚superscriptsubscript𝐶𝑚𝐿superscriptsubscript𝑦superscriptsubscript¯𝑦𝑚𝐿2subscript𝑚superscriptsubscript𝐶𝑚𝑅superscriptsubscript𝑦superscriptsubscript¯𝑦𝑚𝑅2\text{argmin}(\mathcal{R}_{m})=\sum_{m\in\mathcal{N}_{m}}(y_{m}-\bar{y}_{m})^{% 2}-\left(\sum_{m\in C_{m}^{L}}(y_{-}\bar{y}_{m}^{L})^{2}+\sum_{m\in C_{m}^{R}}% (y_{-}\bar{y}_{m}^{R})^{2}\right),argmin ( caligraphic_R start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_m ∈ caligraphic_N start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT - over¯ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - ( ∑ start_POSTSUBSCRIPT italic_m ∈ italic_C start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT - end_POSTSUBSCRIPT over¯ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_m ∈ italic_C start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT - end_POSTSUBSCRIPT over¯ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , (6)

    where y¯mLsuperscriptsubscript¯𝑦𝑚𝐿\bar{y}_{m}^{L}over¯ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT and y¯mRsuperscriptsubscript¯𝑦𝑚𝑅\bar{y}_{m}^{R}over¯ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT are the mean of the target values in the child nodes. The split is performed if the minimum msubscript𝑚\mathcal{R}_{m}caligraphic_R start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT is below a defined threshold. Because of how the trees are built, it is easy to overfit. Therefore, it is strongly recommended to use a set of trees instead. We used nearly 150 trees for the training.

  • Neural Network (NN). NN is an interconnected group of nodes stored in a layer and connected to other nodes in the network by unidirectional connections of different weights. Patterns learned in a layer are transferred to the next activated nodes. We implement the early stopping-based method as a regularization technique to avoid overfitting, stopping the training once the performance no longer improves. This is measured by the loss function, which quantifies the discrepancy between predicted error and true values. For a regression, it can be taken as the squared loss function

    =1ni=1n(h(𝐗i)yi)2,1𝑛superscriptsubscript𝑖1𝑛superscriptsubscript𝐗𝑖subscript𝑦𝑖2\mathcal{L}=\frac{1}{n}\sum_{i=1}^{n}\Big{(}h(\textbf{X}_{i})-y_{i}\Big{)}^{2},caligraphic_L = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_h ( X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (7)

    here h(𝐗)𝐗h(\textbf{X})italic_h ( X ) is the function that minimizes the loss associated with the target value of the ilimit-from𝑖i-italic_i -th class, h=argmin(h)argminh=\text{argmin}\mathcal{L}(h)italic_h = argmin caligraphic_L ( italic_h ). A common assumption is to take h(𝐱)=𝐁T𝐗i+b𝐱superscript𝐁𝑇subscript𝐗𝑖𝑏h(\textbf{x})=\textbf{B}^{T}\textbf{X}_{i}+bitalic_h ( x ) = B start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_b, where B𝐵Bitalic_B are considered the weights coefficients and b𝑏bitalic_b a constant. In this case, we also considered the Lasso regularization method. This technique penalises the model’s coefficients, shrinking or setting them directly to zero, giving rise to a sparse model. Then, the eq. (7) is transformed into

    =1ni=1n(𝐁T𝐗i+byi)2+λj=1p|𝐁j|,1𝑛superscriptsubscript𝑖1𝑛superscriptsuperscript𝐁𝑇subscript𝐗𝑖𝑏subscript𝑦𝑖2𝜆superscriptsubscript𝑗1𝑝subscript𝐁𝑗\mathcal{L}=\frac{1}{n}\sum_{i=1}^{n}\Big{(}\textbf{B}^{T}\textbf{X}_{i}+b-y_{% i}\Big{)}^{2}+\lambda\sum_{j=1}^{p}\mathinner{\!\left\lvert\textbf{B}_{j}% \right\rvert},caligraphic_L = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( B start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_b - italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_λ ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT start_ATOM | B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | end_ATOM , (8)

    where the last term is subject toj=1p|𝐁j|<csuperscriptsubscript𝑗1𝑝subscript𝐁𝑗𝑐\sum_{j=1}^{p}\mathinner{\!\left\lvert\textbf{B}_{j}\right\rvert}<c∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT start_ATOM | B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | end_ATOM < italic_c. The best NN architecture was also obtained by varying the model’s hyperparameters, such as hidden layers between 1 and 3, the number of neurons between 32 and 512 per layer and adjusting the learning rate across values of 102superscript10210^{-2}10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT, 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT, and 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT. The best configuration has three hidden layers with 256, 224, and 352 neurons, respectively and a learning rate of 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT.

Refer to caption (a)
Refer to caption (b)
Refer to caption (c)
Refer to caption (d)
Refer to caption (e)
Refer to caption (f)
Figure 2: Relative percentage difference for the predictions of different Machine Learning algorithms concerning the actual values in the mock catalogues. Set I is displayed on the left, and Set II on the right. The histograms in the figures represent the distribution of the data. As expected, the predictions are better where the data density is higher. The lines represent the mean value μ𝜇\muitalic_μ, and the bands are one standard deviation from the mean value μ±σplus-or-minus𝜇𝜎\mu\pm\sigmaitalic_μ ± italic_σ.

4 Testing the algorithms performance

4.1 Relative percentage difference

In Figure 2, we present the relative percentage difference between the logarithm of the actual mass, Mactualsubscript𝑀actualM_{\text{actual}}italic_M start_POSTSUBSCRIPT actual end_POSTSUBSCRIPT in the mock catalogue and the logarithm of the mass predicted by each algorithm, Mpredsubscript𝑀predM_{\text{pred}}italic_M start_POSTSUBSCRIPT pred end_POSTSUBSCRIPT. The algorithm dispersion is estimated by using the parameter ΔΔ\Deltaroman_Δ (Calderon & Berlind, 2019), which can be computed as follows

Δ=100×(logMpredlogMactual1).Δ100subscript𝑀predsubscript𝑀actual1\Delta=100\times\left(\frac{\log M_{\text{pred}}}{\log M_{\text{actual}}}-1% \right).roman_Δ = 100 × ( divide start_ARG roman_log italic_M start_POSTSUBSCRIPT pred end_POSTSUBSCRIPT end_ARG start_ARG roman_log italic_M start_POSTSUBSCRIPT actual end_POSTSUBSCRIPT end_ARG - 1 ) . (9)

The results were plotted into bins for which the mean value is shown in dashed lines, whereas the standard deviation corresponds to the width of the shaded regions around the mean value μ±σplus-or-minus𝜇𝜎\mu\pm\sigmaitalic_μ ± italic_σ. The left panel shows the result when the training was carried out using Set I, while the right side corresponds to Set II.

The uncertainty bands in the histogram noticeably narrow as data counts increase, indicating a more accurate prediction. The highest errors for Mdisksubscript𝑀diskM_{\text{disk}}italic_M start_POSTSUBSCRIPT disk end_POSTSUBSCRIPT and Msubscript𝑀M_{\star}italic_M start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT predictions (Figs. 2 (a) and 2 (c)) lie below 109M/hsuperscript109subscript𝑀direct-product10^{9}M_{\odot}/h10 start_POSTSUPERSCRIPT 9 end_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT ⊙ end_POSTSUBSCRIPT / italic_h and arise due to the low amount of data. In contrast, for Mtotsubscript𝑀totM_{\text{tot}}italic_M start_POSTSUBSCRIPT tot end_POSTSUBSCRIPT in Figs. 2 (e) and 2 (f), the error increases for larger mass values, signifying reliable predictions in the central region around 10111012,Msuperscript1011superscript1012subscript𝑀direct-product10^{11}-10^{12},M_{\odot}10 start_POSTSUPERSCRIPT 11 end_POSTSUPERSCRIPT - 10 start_POSTSUPERSCRIPT 12 end_POSTSUPERSCRIPT , italic_M start_POSTSUBSCRIPT ⊙ end_POSTSUBSCRIPT. Notably, the distribution of Mtotsubscript𝑀totM_{\text{tot}}italic_M start_POSTSUBSCRIPT tot end_POSTSUBSCRIPT is narrower compared to Mdisksubscript𝑀diskM_{\text{disk}}italic_M start_POSTSUBSCRIPT disk end_POSTSUBSCRIPT, as depicted in Figs. 2 (a) and 2 (e). This can be because the sample of galaxies chosen from the mock catalogue satisfies the condition of having a bulge, a criterion fulfilled only by sufficiently massive galaxies.

Most of the predictions exhibit statistical errors centred around zero. Fig. 2 (c) and Panel (d), Msubscript𝑀M_{\star}italic_M start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT displays the smallest percentage difference in both cases, owing to a linear correlation between magnitudes and luminous mass(Reiprich & Boehringer, 2002; Kuiper, 1938; Liebert & Probst, 1987). The LR model shows the best score since it was trained by directly fitting a scaling relation. In some algorithms like NN and RF, the error increases around 1%percent11\%1 % for masses 1010M/hsuperscript1010subscript𝑀direct-product10^{10}M_{\odot}/h10 start_POSTSUPERSCRIPT 10 end_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT ⊙ end_POSTSUBSCRIPT / italic_h in Set II.

In the context of disk mass, as shown in Figs. 2 (a) and 2 (b), the percentage difference is higher compared to the Msubscript𝑀M_{\star}italic_M start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT case. Nevertheless, it remains within an acceptable prediction range for medium and high masses. Interestingly, predictions for both sets of features exhibit similar trends. In contrast to the linear fit used for Msubscript𝑀M_{\star}italic_M start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT, the LR method is no longer the optimal choice due to the nonlinear nature of this relationship. Instead, the NN and RF algorithms demonstrate superior training performance for Set I.

Refer to caption (a)
Refer to caption (b)
Refer to caption (c)
Refer to caption (d)
Refer to caption
(e)
Figure 3: Histograms for the ugriz-photometric system magnitudes for both simulated and observational data. Vertical dashed lines show the mean value of each distribution. These histograms clearly show that within the mock sample, the distribution of magnitudes for galaxies significantly differs from that for the SDSS sample. This suggests that the algorithms will explore the SDSS sample and a different combination of the features from the training set.

Finally, for Mtotsubscript𝑀totM_{\text{tot}}italic_M start_POSTSUBSCRIPT tot end_POSTSUBSCRIPT, fig. 2(e) shows that Set I only gives unbiased predictions within the range 10.7<logMtot<1210.7subscript𝑀tot1210.7<\log M_{\text{tot}}<1210.7 < roman_log italic_M start_POSTSUBSCRIPT tot end_POSTSUBSCRIPT < 12, while for Set II, fig. 2(f), this is true in the range 10.7<logMtot<12.710.7subscript𝑀tot12.710.7<\log M_{\text{tot}}<12.710.7 < roman_log italic_M start_POSTSUBSCRIPT tot end_POSTSUBSCRIPT < 12.7. This makes sense physically as Vmaxsubscript𝑉maxV_{\text{max}}italic_V start_POSTSUBSCRIPT max end_POSTSUBSCRIPT should be more sensitive for probing higher mass halos above Mtot=1012subscript𝑀totsuperscript1012M_{\text{tot}}=10^{12}italic_M start_POSTSUBSCRIPT tot end_POSTSUBSCRIPT = 10 start_POSTSUPERSCRIPT 12 end_POSTSUPERSCRIPT. The correlation between the magnitudes in Set I and Mtotsubscript𝑀totM_{\text{tot}}italic_M start_POSTSUBSCRIPT tot end_POSTSUBSCRIPT is not straightforward. However, since mock catalogues follow the HAM relation, a correlation exists between Mtotsubscript𝑀totM_{\text{tot}}italic_M start_POSTSUBSCRIPT tot end_POSTSUBSCRIPT and Msubscript𝑀M_{\star}italic_M start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT, consequently influencing the magnitudes. This correlation contributes to achieving favourable results in predicting the total mass. In this context, NN yields the best performance for Set I, given the absence of an explicit scale relation, while for Set II, all predictions are similar.

After analyzing the performance of predictions for Set I and Set II, we concluded that the latter does not significantly improve the results. As mentioned, the main enhancement is observed for Mtotsubscript𝑀totM_{\text{tot}}italic_M start_POSTSUBSCRIPT tot end_POSTSUBSCRIPT. Additionally, having information about the Vmaxsubscript𝑉maxV_{\text{max}}italic_V start_POSTSUBSCRIPT max end_POSTSUBSCRIPT for galaxies can be challenging due to the system’s dynamics. Therefore, in the interest of simplicity, we have opted Set I moving forward exclusively.

5 Predictions for observational data

Up to this point, we have assessed the training performance using synthetic data. In this section, we will apply the trained NN to predict masses of different components in real galaxies from the SDSS survey (Abdurro’uf et al., 2022). It is crucial to note that galaxies from the mock catalogue have specific limits for the ugriz-magnitudes, which are directly tied to the resolution of the simulations. This dependence arises from the halo masses and, consequently, stellar masses influencing the ability of the semi-analytical models to assign magnitudes in certain regions. In contrast, observed galaxies from SDSS exhibit limitations in the low surface brightness regime due to challenging observational features (Willman et al., 2002; Williams et al., 2016).

Fig. 3 shows the distribution for each magnitude for both SDSS and Guo’s galaxy catalogue. As previously mentioned, observed galaxies exhibit high luminosity, causing a shift in the mean value of each magnitude compared to synthetic galaxies. Since both samples do not fall within the same ranges, we will focus on regions where we have information about observations and simulations. Indeed, the literature has reported that NN behaves as interpolators (Saxe et al., 2019; Spigler et al., 2018). Therefore, the sample of observed galaxies to be assessed by the algorithm should have input features within the same domain of the training and test mock datasets.

We selected a galaxy catalogue from the SDSS database, with information about 660,000 galaxies and their morphological components (Mendel et al., 2014). The masses listed there were determined by fitting a broadband spectral energy distribution. This process involved making assumptions about the initial mass function, extinction law and stellar evolution. In that catalogue, the bulge-disk brightness profiles were reconstructed using the photometric decomposition method with the Sersic profile

I(R)=Ieexp{bn[(RRe)1/n1]},𝐼𝑅subscript𝐼𝑒subscript𝑏𝑛delimited-[]superscript𝑅subscript𝑅𝑒1𝑛1I(R)=I_{e}\exp\left\{-b_{n}\left[\left(\frac{R}{R_{e}}\right)^{1/n}-1\right]% \right\},italic_I ( italic_R ) = italic_I start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT roman_exp { - italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT [ ( divide start_ARG italic_R end_ARG start_ARG italic_R start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT 1 / italic_n end_POSTSUPERSCRIPT - 1 ] } , (10)

where Resubscript𝑅𝑒R_{e}italic_R start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT is the half-light radius and Iesubscript𝐼𝑒I_{e}italic_I start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT the intensity at that radius. Here n𝑛nitalic_n is known as the Sérsic index and control the curvature of the profile.

The magnitudes used in the predictions stage were obtained from the SDSS DR7 (Abdurro’uf et al., 2022). We converted the apparent magnitude (m𝑚mitalic_m) to absolute magnitude (M𝑀Mitalic_M) using (Schneider, 2006)

M=m5(log10d1),𝑀𝑚5subscript10𝑑1M=m-5\Big{(}\log_{10}d-1\Big{)},italic_M = italic_m - 5 ( roman_log start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT italic_d - 1 ) , (11)
Refer to caption (a)
Refer to caption (b)
Refer to caption (c)
Figure 4: Kernel density estimation (KDE) plots of the stellar (a), disk (b), and bulge (c) masses components versus the r-magnitude for the simulated data in blue and the observational one in green. The red lines are the {95,99}% confidence level (C.L.) contours of the NN predictions. It can be noticed that the NN prediction is more accurate for the stellar mass and disk-dominant galaxies since the agreement is achieved up to 99%percent9999\%99 % C.L. Even though the prediction for the bulge mass is less precise than for other components, the NN archives a good agreement up to 95%percent9595\%95 % C.L.

where d𝑑ditalic_d is the distance to the source, in units of 10 parsecs. Distances were computed using the python library Astropy (The Astropy Collaboration et al., 2013) with the redshift reported in NED111The NASA/IPAC Extragalactic Database (NED) is operated by the Jet Propulsion Laboratory, California Institute of Technology, under contract with the National Aeronautics and Space Administration. and assuming the cosmological parameters from Planck 2018 (Planck Collaboration et al., 2020) H0=67.66subscript𝐻067.66H_{0}=67.66italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 67.66 km/Mpc/s, and Ωm0=0.26subscriptΩ𝑚00.26\Omega_{m0}=0.26roman_Ω start_POSTSUBSCRIPT italic_m 0 end_POSTSUBSCRIPT = 0.26. Our analysis focused on about 70%percent7070\%70 % of the total dataset, concentrating on galaxies with the u, g, r, i, and z magnitudes.

A valuable piece of information for describing the evolution and structure of galaxies are the scaling relations between physical quantities of a galaxy sample. We analyse the scaling relation between mass components and the r-magnitude, as it is reported in other works,(Mahajan et al., 2017a; Venhola, Aku et al., 2019; Mahajan et al., 2017b; Côté et al., 2015). This is best correlated with stellar mass among SDSS filters. The relations for magnitudes in other colours are similar. We also study the MbulgeMdisksubscript𝑀bulgesubscript𝑀diskM_{\text{bulge}}-M_{\text{disk}}italic_M start_POSTSUBSCRIPT bulge end_POSTSUBSCRIPT - italic_M start_POSTSUBSCRIPT disk end_POSTSUBSCRIPT relation as well as the MMtotsubscript𝑀subscript𝑀totM_{\star}-M_{\text{tot}}italic_M start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT - italic_M start_POSTSUBSCRIPT tot end_POSTSUBSCRIPT. We only employ the NN algorithm to obtain the results presented in this section, given that it performs better with fewer errors and its construction involves a more complete architecture than the other AI algorithms.

5.1 Mass-magnitude relation

In Fig. 4, we present distributions projected onto the Mlimit-fromsubscript𝑀M_{\star}-italic_M start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT - rmag, Mdisksubscript𝑀diskM_{\text{disk}}italic_M start_POSTSUBSCRIPT disk end_POSTSUBSCRIPT- rmagmag{}_{\text{mag}}start_FLOATSUBSCRIPT mag end_FLOATSUBSCRIPT planes, and the Mbulgesubscript𝑀bulgeM_{\text{bulge}}italic_M start_POSTSUBSCRIPT bulge end_POSTSUBSCRIPT- rmagmag{}_{\text{mag}}start_FLOATSUBSCRIPT mag end_FLOATSUBSCRIPT relation for completeness. In each case, distributions up to 2σ2𝜎2\sigma2 italic_σ for three datasets are shown: firstly, from the original mock catalogue in blue; from the original SDSS catalogue in green; and the third corresponds to NN predictions for the SDSS galaxies in red. The contours represent the 99%percent9999\%99 % and 95%percent9595\%95 % confidence levels. For plotting these figures we are using the whole data of spiral galaxies in the mock catalogue nevertheless, the masses reported in the observed catalogue fall within the regions depicted in Fig. 2.

First, fig. 4, Panel (a) illustrates the scaling relation between Mrmagsubscript𝑀subscript𝑟magM_{\star}-r_{\text{mag}}italic_M start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT - italic_r start_POSTSUBSCRIPT mag end_POSTSUBSCRIPT. The NN predictions agree with the real values up to 95%percent9595\%95 % C.L. However, as we approach more massive galaxies, and consequently, the resolution limit for simulations increases the error. Overall, the NN exhibits accurate predictions for Msubscript𝑀M_{\star}italic_M start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT, consistent with Fig. 2 (c). Indeed, the best-fit slopes for each dataset only show slight differences. The best fits for the mock catalogue, SDSS, and NN predictions, respectively, are

Msubscript𝑀\displaystyle M_{\star}italic_M start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT =\displaystyle== 0.427rmag+1.370,0.427subscript𝑟mag1.370\displaystyle-0.427r_{\text{mag}}+1.370,- 0.427 italic_r start_POSTSUBSCRIPT mag end_POSTSUBSCRIPT + 1.370 , (12)
Msubscript𝑀\displaystyle M_{\star}italic_M start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT =\displaystyle== 0.461rmag+0.916,0.461subscript𝑟mag0.916\displaystyle-0.461r_{\text{mag}}+0.916,- 0.461 italic_r start_POSTSUBSCRIPT mag end_POSTSUBSCRIPT + 0.916 , (13)
Msubscript𝑀\displaystyle M_{\star}italic_M start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT =\displaystyle== 0.457rmag+0.954.0.457subscript𝑟mag0.954\displaystyle-0.457r_{\text{mag}}+0.954.- 0.457 italic_r start_POSTSUBSCRIPT mag end_POSTSUBSCRIPT + 0.954 . (14)

For Mdisksubscript𝑀diskM_{\text{disk}}italic_M start_POSTSUBSCRIPT disk end_POSTSUBSCRIPT in fig. 4 panel (b), we distinguish a possibly bimodal distribution with two regions for simulations laying inside the range of mass between 7<logMdisk<97subscript𝑀disk97<\log M_{\text{disk}}<97 < roman_log italic_M start_POSTSUBSCRIPT disk end_POSTSUBSCRIPT < 9. There is a separation between both blobs due to the lack of information at rmag19similar-tosubscript𝑟mag19r_{\text{mag}}\sim-19italic_r start_POSTSUBSCRIPT mag end_POSTSUBSCRIPT ∼ - 19. There is an acceptable agreement within the 95%percent9595\%95 % C.L for galaxies in the low-surface brightness region.

Here, it is worth to mention that in all cases the masses predicted by the NN fall within the region of the simulated masses, as expected. However, for Mdisksubscript𝑀diskM_{\text{disk}}italic_M start_POSTSUBSCRIPT disk end_POSTSUBSCRIPT, we observe that the red 2-sigma curve go outside the the blue and green regions for masses below 109M/hsuperscript109subscript𝑀direct-product10^{9}M_{\odot}/h10 start_POSTSUPERSCRIPT 9 end_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT ⊙ end_POSTSUBSCRIPT / italic_h and above 5×1010M/h5superscript1010subscript𝑀direct-product5\times 10^{1}0M_{\odot}/h5 × 10 start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT 0 italic_M start_POSTSUBSCRIPT ⊙ end_POSTSUBSCRIPT / italic_h. This is related to the fact that the output masses are distributed in a three dimensional space (disk-bulge-stellar) and we are showing the projections over a single input parameter.

Bulge masses for most brilliant galaxies within the same mass range are not well predicted and are excluded by the NN architecture. This region corresponds to quasi-elliptical systems with large mass but small disks (see Fig. 4 (b)). In this case, the NN predicts that this type of system is unlikely, and in fact, it would be challenging to distinguish the disk from the bulge without an accurate numerical method. This conclusion is supported by panels (a) and (c), where the prediction aligns with the expected result for more than 95%percent9595\%95 % of the data. However, the missing points in panel (b) are compensated by the excess points in panel (c). This suggests that purely elliptical systems provide a better description of these cases. This behaviour is also reflected in Fig. 2 (a), where the error increases for masses below 109superscript10910^{9}10 start_POSTSUPERSCRIPT 9 end_POSTSUPERSCRIPT M/hsubscript𝑀direct-productM_{\odot}/hitalic_M start_POSTSUBSCRIPT ⊙ end_POSTSUBSCRIPT / italic_h.

Additionally, the fact the neural network (NN) predicts well the stellar mass of SDSS galaxies (Figure 4 a)) serves as a consistency test between the mock catalogue and the NN. However, the predictions for small disk components deviate from the SDSS catalogue, which can suggest that the features are insufficient for training the NN or that the catalogue needs further precise information about the components. The values for Mbulgesubscript𝑀bulgeM_{\text{bulge}}italic_M start_POSTSUBSCRIPT bulge end_POSTSUBSCRIPT in 4 panel (c) are derived from eq. (1) and from values of Msubscript𝑀M_{\star}italic_M start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT and Mdisksubscript𝑀diskM_{\text{disk}}italic_M start_POSTSUBSCRIPT disk end_POSTSUBSCRIPT directly inferred by the NN. We can observe an acceptable agreement between observations and simulations up to 95%percent9595\%95 % C.L.

Refer to caption
Figure 5: KDE plots of the Bulge-Disk decomposition. The distribution for simulated (blue) and observed (green) data and the solid contour levels are shown. We observe a possible trimodal distribution for the mock catalogue. In contrast, the observations show a unimodal distribution similar to the predicted NN distribution for observational data (red contours).

5.2 Bulge-disk components

The relation between the luminous mass and the bulge-disk masses is described by eq. (1). Msubscript𝑀M_{\star}italic_M start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT can be determined by a scaling relation (see Fig. 4). Thus, for a specific value for Mbulgesubscript𝑀bulgeM_{\text{bulge}}italic_M start_POSTSUBSCRIPT bulge end_POSTSUBSCRIPT, the Mdisksubscript𝑀diskM_{\text{disk}}italic_M start_POSTSUBSCRIPT disk end_POSTSUBSCRIPT will only take values within certain intervals, and vice versa.

Figure 5 shows bulge and disk masses of galaxies within both datasets. The mock catalogue shows a trimodal distribution. The most prominent region, for Mdisk>108M/hsubscript𝑀disksuperscript108subscript𝑀direct-productabsentM_{\text{disk}}>10^{8}M_{\odot/h}italic_M start_POSTSUBSCRIPT disk end_POSTSUBSCRIPT > 10 start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT ⊙ / italic_h end_POSTSUBSCRIPT, corresponds to low values for Mbulgesubscript𝑀bulgeM_{\text{bulge}}italic_M start_POSTSUBSCRIPT bulge end_POSTSUBSCRIPT, and it is associated to disk-dominated galaxies. The second region, for Mbulge>1010M/hsubscript𝑀bulgesuperscript1010subscript𝑀direct-productabsentM_{\text{bulge}}>10^{10}M_{\odot/h}italic_M start_POSTSUBSCRIPT bulge end_POSTSUBSCRIPT > 10 start_POSTSUPERSCRIPT 10 end_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT ⊙ / italic_h end_POSTSUBSCRIPT is the bulge-dominated region (Conselice, 2006). This sort of galaxies are usually dubbed as cD--like galaxies (central dominant) (Guo et al., 2011; Oemler Jr, 1976) This behaviour arises in both observed and simulated galaxies, although disk-dominant galaxies are more abundant in both cases.

The third region in the MbulgeMdisksubscript𝑀bulgesubscript𝑀diskM_{\text{bulge}}-M_{\text{disk}}italic_M start_POSTSUBSCRIPT bulge end_POSTSUBSCRIPT - italic_M start_POSTSUBSCRIPT disk end_POSTSUBSCRIPT, which corresponds to galaxies with both small disks and bulges, are only shown for the mock data. This discrepancy suggests that there may be an observational bias because current telescopes might not be able to detect the low-luminosity galaxies that appear in the numerical simulations.

The NN prediction is also shown in Fig. 5. For this last sample, the relationship between disk and bulge is nonlinear and not readily fitted with an analytic function as it happens with scaling relations derived in section 5.1. This can be due to the multimodal distribution suggesting that different scaling relations between bulge-disk mass components might arise for different galaxies within the sample. Nevertheless, the machine learning algorithm can make good predictions for disk-dominant galaxies. Furthermore, it is interesting that the NN algorithm gives rise to mass predictions consistent with the SDSS distribution and does not predict bulge-dominant galaxies as expected. Giving more accurate information about larger bulges can involve more complicated dynamics.

6 Conclusions

It is well known that the bulge-disk decomposition and the estimation of the total mass of galactic systems are complicated tasks that have been tackled by considering several assumptions. This paper presents an alternative method to make such estimations based on AI algorithms designed to predict the masses of different components in spiral galaxies. Two sets of input features were considered in the first stage of our analysis. In the first set, the magnitudes in different bands were considered, while in the second one, Vmaxsubscript𝑉maxV_{\text{max}}italic_V start_POSTSUBSCRIPT max end_POSTSUBSCRIPT was added to the first set to include information about the system’s kinematics. After analyzing the performance of trained algorithms and testing the importance of the parameters, we figured out that the Vmaxsubscript𝑉maxV_{\text{max}}italic_V start_POSTSUBSCRIPT max end_POSTSUBSCRIPT is only relevant for computing the galaxy’s total mass. The previous suggests that absolute magnitudes of the galaxies provide sufficient information to predict the masses of galactic components. Therefore, these methods can be readily applied to estimate masses of observational galaxies from different datasets.

These methods were used to predict the masses of different components of real spiral galaxies within the SDSS catalogue. Throughout the whole SDSS sample of galaxies, values of magnitudes and masses hold higher values than those taken for our analysis. We chose a subsample holding parameters within the same ranges as our training mock catalogue. During the training and test stages, the NN not only provided the estimations for masses of different galaxy components but could predict scaling relations (mass-magnitude) achieving at least a 95%percent9595\%95 % C.L. agreement with observational data.

From either observational or mock samples of galaxies, our analysis and algorithm are restricted to only be applicable within a small range for the total mass, given that only galaxies with non-zero bulges and disks were selected within the sample. We found that the NN algorithm shows good predictions for disk-dominant galaxies only considering the photometric information. Additionally, at 95%percent9595\%95 % confidence level, the NN predicts that it is unlikely to have bulge-dominant galaxies, consistent with the lack of information from observations. This training can be improved using additional information about the galaxy’s dynamics, age or chemical composition. However, using only the photometric information can be useful to obtain a sufficiently good estimation of the mass components in spiral galaxies with bulges.

We will continue this project by constructing machine learning algorithms trained with features inferred directly from observational data to have more accurate results and explore the HAM relation in real galaxies and the possible dependency on the morphology or age of the systems.

Acknowledgement

For this research work, we use the NASA/IPAC Extragalactic Database (NED), which is operated by the Jet Propulsion Laboratory, California Institute of Technology, under contract with the National Aeronautics and Space Administration.

Research leading to these results has received support from the European Structural and Investment Funds and the Czech Ministry of Education, Youth and Sports (project No. FORTE -- CZ.02.01.01/00/22008/0004632.02.01.0100subscript220080004632.02.01.01/00/22_{0}08/0004632.02.01.01 / 00 / 22 start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT 08 / 0004632).

References