Estimating the mass of galactic components using machine learning algorithms

J. N. López

{}^{1,2}

, E. Munive

{}^{1,2}

, A. A. Avilez

{}^{2}

, O.M. Martínez

{}^{2}

{}^{1}

CEICO - FZU, Institute of Physics of the Czech Academy of Sciences,
Na Slovance 1999/2, 182 00, Prague Czechia.

{}^{2}

Facultad de Ciencias Físico-Matemáticas. Benemérita Universidad
Autónoma de Puebla. Av. San Claudio SN, Col. San Manuel, Puebla, México

Abstract

The estimation of the bulge and disk massses, the main baryonic components of a galaxy, can be performed using various approaches, but their implementation tend to be challenging as they often rely on strong assumptions about either the baryon dynamics or the dark matter model. In this work, we present an alternative method for predicting the masses of galactic components, including the disk, bulge, stellar and total mass, using a set of machine learning algorithms: KNN-neighbours (KNN), Linear Regression (LR), Random Forest (RF) and Neural Network (NN). The rest-frame absolute magnitudes in the ugriz-photometric system were selected as input features, and the training was performed using a sample of spiral galaxies hosting a bulge from Guo’s mock catalogue (Guo et al., 2011) derived from the Millennium simulation. In general, all the algorithms provide good predictions for the galaxy’s mass components ranging from $10^{9}\,M_{\odot}$ to $10^{11}\,M_{\odot}$ , corresponding to the central region of the training mass domain; however, the NN give rise to the most precise predictions in comparison to other methods. Additionally, to test the performance of the NN architecture, we used a sample of observed galaxies from the SDSS survey whose mass components are known. We found that the NN can predict the luminous masses of disk-dominant galaxies within the same range of magnitudes that for the synthetic sample up to a $99\%$ level of confidence, while mass components of galaxies hosting larger bulges are well predicted up to $95\%$ level of confidence. The NN algorithm can also bring up scaling relations between masses of different components and magnitudes.

Keywords— Galactic Systems; Neural Network; Scaling relations

1 Introduction

The bulge-disk decomposition of galactic systems is useful for understanding the evolutionary processes of galaxies and their dynamics. Specifically, the disk and bulge masses can be inferred, given that their stellar population has different dynamic or even chemical features. There are plenty of schemes for classifying galaxies; one of the most popular corresponds to the morphological classification proposed by Edwin Hubble (Hubble, 1929) consisting of four types of galaxies: elliptical, spiral, barred spiral, and irregular. Another method involves the isophotal radius measurement (Holmberg, 1958), determining the size attributed to a galaxy component according to a particular surface brightness level. A way to characterize the light distribution independent of the light profile is through the concentration measure, defined by the ratio of two geometrical regions, each containing a fixed fraction of the total galaxy luminosity (Kent, 1985).

Another approach for reconstructing the visible mass of galactic components involves using standardized fitting functions. Ideally, these functions should be derived from the fundamental principles governing galactic evolution. However, due to the intricate nature of the physics involved, models based on these principles often become complex, with a substantial number of parameters. Then, commonly used functions are empirically derived. For instance, the disk components are well-fitted by an exponential law, while for the elliptical galaxies and the bulges in the spiral ones, the relations typically considered are the King’s model (King, 1966) and the de Vaucouleurs’s law (de Vaucouleurs, 1948). Sometimes, the bulges associated with late-type galaxies are best fitted by exponential laws (Andredakis et al., 1995; Freeman, 1970). However, implementing these methods demands high-quality observational data to obtain reliable results.

Furthermore, it is possible to single out the galactic components through the light distribution of a galaxy. This decomposition is derived by fitting the light profile to a power law, adhering to a specific empirical or analytical model. In the conventional photometric technique, the one-dimensional case is considered. On the other hand, when multiple wavelengths in the spectrum are taken, spectroscopic methods come into play (Johnston et al., 2012).

On the other side, numerical simulations play an important role in exploring predictions of galaxy evolution within the standard $\Lambda$ CDM prescription (Croton et al., 2006; Guo et al., 2010; De Lucia et al., 2004). Semi-analytical models have gained popularity when identifying the structural components of galactic systems. These models employ a simplified representation of baryonic physics, coupled with Markov-Chain-Monte-Carlo methods for reconstructing merger trees (Lucia, 2019).

For dark matter-only simulations a common technique to infer information about the baryonic components is to assume the halo-abundance matching (HAM), which relates the halo potential well to the star formation rate in such a way that more luminous galaxies are associated to more massive halos. During the evolution of both components, material exchange occurs between the baryonic elements through various processes. For example, forming a galactic bulge may result from major or minor mergers (Hopkins et al., 2010). In these processes, pre-existing and newly formed stars play a crucial role; after a merger, all stars from the progenitors contribute to the bulge component of the resulting galaxy. The gas within the progenitors becomes part of the resulting galaxy disk, and the specific angular momentum of this component equals that of the halo in which it is embedded (Guo et al., 2011; Bower et al., 2006; De Lucia & Blaizot, 2007).

As can be seen, various approaches exist for describing galactic components, including pure morphological observations or photometric and/or spectroscopic techniques, either synthetically through mock catalogues. Conversely, obtaining information about the total mass often involves making strong assumptions, whether about a specific dark matter model or the overall kinematics of the system. In this work, we propose an artificial intelligence (AI)-based method to isolate the bulge and disk components of both baryonic and total galaxy mass. This is accomplished using the information on luminosity and features inferred from stellar dynamics encrypted in Guo’s synthetic catalogue (Guo et al., 2011). Our goal is to perform the decomposition of the galactic components without including additional information about baryons in the training stage beyond the patterns the AI methods can infer from the catalogue. This method can be useful to obtain the components of observed galaxies whose baryonic dynamics cannot be obtained easily.

Since the mass values range between several orders of magnitude, it is well suited to predict the logarithm base 10 of the stellar mass $M_{\star}$ , the disk mass $M_{\text{disk}}$ and the total mass $M_{\text{tot}}$ . Here, the bulge mass $M_{\text{bulge}}$ is not within the prediction set since we compute it in terms of disk mass $M_{\text{disk}}$ and the total mass $M_{\star}$ from the following expression (Guo et al., 2011; Lucia, 2019)

M_{\star}=M_{\text{bulge}}+M_{\text{disk}}.

(1)

It is worth stressing that with our method, we aim to predict mass components in spirals following some theoretical assumptions about the features of dark matter. Specifically, using a mock catalogue as a training set of our AI algorithms, we set dark matter in a CDM prescription and make inferences about mass components holding such a hypothesis. Our algorithms were tested using observational data reported in the SDSS database (Abdurro’uf et al., 2022) to assess how well the predictions match observations, keeping the features set as simple as possible. We are interested in knowing the advantages and disadvantages of the algorithms to predict galaxies with different features.

This paper is structured as follows: Section 2 presents and analyses the content of Guo’s galaxy catalogue to determine the correlation between input features and output predictions, emphasizing their importance during the training stage. Following this, Section 3 introduces the machine learning algorithms considered in this work and explores their dependency on variations in different parameters. Subsequently, in Section 4.1, we analyze the performance of each algorithm. Then, in Section 4, we apply the trained methods to predict masses of components in observed galaxies from the Sloan Digital Sky Survey (Abdurro’uf et al., 2022) database. This allows us to derive various scaling relations commonly studied in the literature. Finally, we draw some conclusions in section 6.

2 The Data

To train our machine learning algorithms, we have used Guo’s galaxy catalogue (Guo et al., 2011) derived from the Millennium simulation, selecting only galaxies with nonzero bulges or disc components, leading to a set of 833,491 galaxies. This dataset was split randomly, assigning a common selection where 75 $\%$ of total data is defined for training and 25 $\%$ to evaluate the performance of the algorithms. The Millenium simulation is a dark matter-only, carried out under the $\Lambda$ CDM prescription (Springel et al., 2006) using a customized version of the Gadget 2 code (Springel et al., 2005) with $2160^{3}$ particles within a box of $L=500\text{ Mpc}/h$ . This catalogue provides information about the merger history of each halo and the baryon content, which has been split into five components: the stellar bulge, the stellar disk, a gas disk, a halo and an ejecta reservoir (Guo et al., 2010).

The analytical model implemented in Guo’s catalogue considers that galaxies form within the central region of dark matter halos. The fitting function, which describes the average baryon fraction of a halo given the total mass, can be written in terms of its mass and redshifts (Gnedin, 2000)

f_{b}(z,M_{\text{tot}})=f_{b}^{\text{cos}}\left(1+(2^{\alpha/3}-1)\left[\frac{% M_{\text{tot}}}{M_{c}(z)}\right]^{-\alpha}\right)^{-3/\alpha},

(2)

where the universal baryon fraction is usually taken as $f_{b}^{\text{cos}}=\displaystyle\frac{\Omega_{b}}{\Omega_{0}}\sim 17\%$ . Here, $M_{c}$ represents the characteristic mass objects which can retain $50\%$ of the gas components to form stars. The reionization and cooling depend on the baryon fraction in a given halo and its mass and redshift. The disk and bulge formation are correlated with star formation and supernova feedback processes, as well as with the black hole growth and AGN feedback. Additionally, mergers between the central and satellite galaxies are described through simulations and play an important role in the disk and bulge evolution. This catalogue accurately reproduces the population and clustering mechanisms observed at $z\sim 0$ . However, it exhibits inconsistencies for high-redshift populations.

In this work, we consider galaxies at $z=0$ . Our goal was to investigate spiral galaxies hosting both bulges and disks. Then, we imposed this strong filter when selecting our sample from the mock catalogue. It is crucial to note that our selection encompasses diverse galaxy types without accounting for age or metallicity. The purpose is to explore the capabilities of the algorithms to get information about the systems by exclusively using photometric information.

The resolution of the simulation delimits the range of masses for each component. Once the selection of bulge-disk galaxies has been performed, the range of the total mass is between $10^{10}M_{\odot}/h$ and $10^{13}M_{\odot}/h$ . Notably, the selected total mass range excludes galaxies of both low and high masses. Indeed, massive galaxies tend to exhibit an elliptical morphology rather than spiral (De Lucia et al., 2006).

2.1 Features importance

It is well known that the physical and photometric properties of the stellar population of a galaxy are closely related to its dynamics and the spatial mass distribution of different components within the system. Specifically, this behaviour is reflected in the colour-magnitude relation. For instance, it has been shown that bulge-dominant galaxies have a color-magnitude diagram mainly described by red galaxies (Hogg et al., 2004). Besides, in Barsanti et al. (2021), it has been shown that the bulge is redder than the disk in galaxies within a cluster. A similar conclusion was reported in Dimauro et al. (2018).

In machine learning, the training data is defined as the feature vector X and their corresponding label or associated output $y$ , with unknown distribution $\mathcal{P}(\textbf{X},y)$ , as follows

D=\Big{\{}(\textbf{X}_{1},y_{1}),\ldots,(\textbf{X}_{n},y_{n})\Big{\}}% \subseteq\mathcal{R}^{d}\times\mathcal{I},

(3)

where $\mathcal{R}^{d}$ denotes the $d-$ dimensional feature space, $\mathcal{I}$ the label space and $n$ is the sample size. In this work, we consider two sets of features, the first corresponding to the u, g, r, i, and z absolute magnitudes that we dub hereafter as Set I. Such magnitudes are also available in the SDSS dataset; therefore, there is an observational counterpart. Within a second set (Set II), the same features as Set I are considered in addition to the maximum rotational velocity of the halo, $V_{\text{max}}$ , to include information about the kinematics. In both cases, the predictions (labels) are $M_{\text{disk}}$ , $M_{\star}$ , and $M_{\text{tot}}$ as it is displayed in Table 1.

An exploration of the data was conducted using Pearson’s correlation ratio,

r_{\textbf{X},y}=\displaystyle\frac{\sum_{i=1}^{n}(\textbf{X}_{i}-\bar{\textbf% {X}})(y_{i}-\bar{y})}{(n-1)S_{X}S_{Y}},

(4)

where barred symbols represent the mean values and $S_{X,Y}$ is the standard deviation. When this quotient is $r_{X,Y}=\pm 1$ , we have a perfect positive (negative) correlation, whereas for $r_{X,Y}=0$ , parameters are not correlated at all.

Input	$u$	$r$	$g$	$i$	$z$	$V_{\text{max}}$
Set I	✓	✓	✓	✓	✓
Set II	✓	✓	✓	✓	✓	✓
Output		$\log_{10}(M_{\text{disk}})$	$\log_{10}(M_{\star}$ )	$\log_{10}(M_{\text{tot}})$

Table 1: Input and outputs features considered for the ML algorithms in this work. The Set I corresponds to the photometric information derived from Guo’s catalogue using semi-analytical models. The Set II includes information about the dynamics of all compoments.

In Figure 1, the correlation matrix illustrates how the features contribute to the algorithm’s predictions. The matrix displays the absolute values of Pearson’s correlation ratio, focusing solely on the strength of the correlation parameter. As anticipated, $M_{\star}$ exhibits a high correlation with the magnitudes, particularly with the z and i bands, corresponding to the infrared and near-infrared regions of the spectrum, respectively. Observationally, the determination of luminous mass is significantly influenced by dust, with emissions in the optical band experiencing reddening. Conversely, this effect is negligible in the near-infrared (Tully et al., 1998). On the other hand, $M_{\text{disk}}$ shows less correlation with the magnitudes compared to $M_{\star}$ and exhibits a weak relation with the remaining quantities. Furthermore, $M_{\text{tot}}$ exhibits a strong correlation with $M_{\star}$ due to the HAM relation implemented in the mock catalogues. The correlation between $M_{\text{tot}}$ and $V_{\text{max}}$ , which encapsulates information about the dynamics of all components, surpasses that of other masses. Notably, the correlations with $M_{\text{disk}}$ are weak, suggesting that the dark matter component influences the stellar component as a whole rather than each component individually.

Refer to caption — Figure 1: Heat map of the absolute value of the Pearson correlation coefficient between the galaxy parameters considered in this work. The redder it is, the higher the correlation. As expected, stronger correlations occur between different bands’ stellar mass and magnitudes. However, a relation exists between the total mass and the magnitudes, although to a lesser extent.

3 Implementation of the machine learning algorithms

We employed a set of widely used supervised algorithms known for their effective predictions, listed as follows. These methods were implemented using the scikit-learn library (Pedregosa et al., 2011; Buitinck et al., 2013) and the Keras API (Chollet et al., 2015).

•

KN-Neighbours (KNN). This algorithm relies on the idea that the set of $k$ nearest data points $C_{x}\subset D$ , where $\mathinner{\!\left\lvert C_{x}\right\rvert}=k$ , have similar values. The neighbours are defined such that

\text{dist}(\textbf{x},\textbf{x}^{\prime})\geq\text{dist}_{\text{max}}(% \textbf{x},\textbf{x}^{\prime\prime}),\quad\text{with}\quad(\textbf{x}^{\prime% \prime},y^{\prime\prime})\in C_{x},\quad(\textbf{x}^{\prime},y^{\prime})\in D

(5)

This distance is defined in the hyperspace of features using the Euclidian metric, and the final value is the average of their outputs. In this case, the number of neighbours is a free parameter, and we found that the highest accuracy is achieved when the number of neighbours is close to 18; the error starts to increase beyond that value.

•

Linear Regression (LR). The traditional linear regression minimizes the sum of the squared differences between the predicted and actual values. We are considering this method to compare it with more sophisticated techniques.

•

Random Forest (RF). This algorithm is subject to the number of trees and their depth. Each tree contains decision nodes $\mathcal{N}_{m}$ that split the data $(X_{\text{node}},y_{\text{node}})$ (in the parent node) into smaller (left and right) subsets in new child nodes $C_{m}^{L}$ and $C_{m}^{R}$ until the branch finds a homogeneous group according to the set of hyperparameters. Splitting each node in regression is made following the minimization of the residual $\mathcal{R}_{m}$ defined as

\text{argmin}(\mathcal{R}_{m})=\sum_{m\in\mathcal{N}_{m}}(y_{m}-\bar{y}_{m})^{% 2}-\left(\sum_{m\in C_{m}^{L}}(y_{-}\bar{y}_{m}^{L})^{2}+\sum_{m\in C_{m}^{R}}% (y_{-}\bar{y}_{m}^{R})^{2}\right),

(6)

where $\bar{y}_{m}^{L}$ and $\bar{y}_{m}^{R}$ are the mean of the target values in the child nodes. The split is performed if the minimum $\mathcal{R}_{m}$ is below a defined threshold. Because of how the trees are built, it is easy to overfit. Therefore, it is strongly recommended to use a set of trees instead. We used nearly 150 trees for the training.

•

Neural Network (NN). NN is an interconnected group of nodes stored in a layer and connected to other nodes in the network by unidirectional connections of different weights. Patterns learned in a layer are transferred to the next activated nodes. We implement the early stopping-based method as a regularization technique to avoid overfitting, stopping the training once the performance no longer improves. This is measured by the loss function, which quantifies the discrepancy between predicted error and true values. For a regression, it can be taken as the squared loss function

\mathcal{L}=\frac{1}{n}\sum_{i=1}^{n}\Big{(}h(\textbf{X}_{i})-y_{i}\Big{)}^{2},

(7)

here $h(\textbf{X})$ is the function that minimizes the loss associated with the target value of the $i-$ th class, $h=\text{argmin}\mathcal{L}(h)$ . A common assumption is to take $h(\textbf{x})=\textbf{B}^{T}\textbf{X}_{i}+b$ , where $B$ are considered the weights coefficients and $b$ a constant. In this case, we also considered the Lasso regularization method. This technique penalises the model’s coefficients, shrinking or setting them directly to zero, giving rise to a sparse model. Then, the eq. (7) is transformed into

\mathcal{L}=\frac{1}{n}\sum_{i=1}^{n}\Big{(}\textbf{B}^{T}\textbf{X}_{i}+b-y_{% i}\Big{)}^{2}+\lambda\sum_{j=1}^{p}\mathinner{\!\left\lvert\textbf{B}_{j}% \right\rvert},

(8)

where the last term is subject to $\sum_{j=1}^{p}\mathinner{\!\left\lvert\textbf{B}_{j}\right\rvert}<c$ . The best NN architecture was also obtained by varying the model’s hyperparameters, such as hidden layers between 1 and 3, the number of neurons between 32 and 512 per layer and adjusting the learning rate across values of $10^{-2}$ , $10^{-3}$ , and $10^{-4}$ . The best configuration has three hidden layers with 256, 224, and 352 neurons, respectively and a learning rate of $10^{-4}$ .

4 Testing the algorithms performance

4.1 Relative percentage difference

In Figure 2, we present the relative percentage difference between the logarithm of the actual mass, $M_{\text{actual}}$ in the mock catalogue and the logarithm of the mass predicted by each algorithm, $M_{\text{pred}}$ . The algorithm dispersion is estimated by using the parameter $\Delta$ (Calderon & Berlind, 2019), which can be computed as follows

\Delta=100\times\left(\frac{\log M_{\text{pred}}}{\log M_{\text{actual}}}-1% \right).

(9)

The results were plotted into bins for which the mean value is shown in dashed lines, whereas the standard deviation corresponds to the width of the shaded regions around the mean value $\mu\pm\sigma$ . The left panel shows the result when the training was carried out using Set I, while the right side corresponds to Set II.

The uncertainty bands in the histogram noticeably narrow as data counts increase, indicating a more accurate prediction. The highest errors for $M_{\text{disk}}$ and $M_{\star}$ predictions (Figs. 2 (a) and 2 (c)) lie below $10^{9}M_{\odot}/h$ and arise due to the low amount of data. In contrast, for $M_{\text{tot}}$ in Figs. 2 (e) and 2 (f), the error increases for larger mass values, signifying reliable predictions in the central region around $10^{11}-10^{12},M_{\odot}$ . Notably, the distribution of $M_{\text{tot}}$ is narrower compared to $M_{\text{disk}}$ , as depicted in Figs. 2 (a) and 2 (e). This can be because the sample of galaxies chosen from the mock catalogue satisfies the condition of having a bulge, a criterion fulfilled only by sufficiently massive galaxies.

Most of the predictions exhibit statistical errors centred around zero. Fig. 2 (c) and Panel (d), $M_{\star}$ displays the smallest percentage difference in both cases, owing to a linear correlation between magnitudes and luminous mass(Reiprich & Boehringer, 2002; Kuiper, 1938; Liebert & Probst, 1987). The LR model shows the best score since it was trained by directly fitting a scaling relation. In some algorithms like NN and RF, the error increases around $1\%$ for masses $10^{10}M_{\odot}/h$ in Set II.

In the context of disk mass, as shown in Figs. 2 (a) and 2 (b), the percentage difference is higher compared to the $M_{\star}$ case. Nevertheless, it remains within an acceptable prediction range for medium and high masses. Interestingly, predictions for both sets of features exhibit similar trends. In contrast to the linear fit used for $M_{\star}$ , the LR method is no longer the optimal choice due to the nonlinear nature of this relationship. Instead, the NN and RF algorithms demonstrate superior training performance for Set I.

Finally, for $M_{\text{tot}}$ , fig. 2(e) shows that Set I only gives unbiased predictions within the range $10.7<\log M_{\text{tot}}<12$ , while for Set II, fig. 2(f), this is true in the range $10.7<\log M_{\text{tot}}<12.7$ . This makes sense physically as $V_{\text{max}}$ should be more sensitive for probing higher mass halos above $M_{\text{tot}}=10^{12}$ . The correlation between the magnitudes in Set I and $M_{\text{tot}}$ is not straightforward. However, since mock catalogues follow the HAM relation, a correlation exists between $M_{\text{tot}}$ and $M_{\star}$ , consequently influencing the magnitudes. This correlation contributes to achieving favourable results in predicting the total mass. In this context, NN yields the best performance for Set I, given the absence of an explicit scale relation, while for Set II, all predictions are similar.

After analyzing the performance of predictions for Set I and Set II, we concluded that the latter does not significantly improve the results. As mentioned, the main enhancement is observed for $M_{\text{tot}}$ . Additionally, having information about the $V_{\text{max}}$ for galaxies can be challenging due to the system’s dynamics. Therefore, in the interest of simplicity, we have opted Set I moving forward exclusively.

5 Predictions for observational data

Up to this point, we have assessed the training performance using synthetic data. In this section, we will apply the trained NN to predict masses of different components in real galaxies from the SDSS survey (Abdurro’uf et al., 2022). It is crucial to note that galaxies from the mock catalogue have specific limits for the ugriz-magnitudes, which are directly tied to the resolution of the simulations. This dependence arises from the halo masses and, consequently, stellar masses influencing the ability of the semi-analytical models to assign magnitudes in certain regions. In contrast, observed galaxies from SDSS exhibit limitations in the low surface brightness regime due to challenging observational features (Willman et al., 2002; Williams et al., 2016).

Fig. 3 shows the distribution for each magnitude for both SDSS and Guo’s galaxy catalogue. As previously mentioned, observed galaxies exhibit high luminosity, causing a shift in the mean value of each magnitude compared to synthetic galaxies. Since both samples do not fall within the same ranges, we will focus on regions where we have information about observations and simulations. Indeed, the literature has reported that NN behaves as interpolators (Saxe et al., 2019; Spigler et al., 2018). Therefore, the sample of observed galaxies to be assessed by the algorithm should have input features within the same domain of the training and test mock datasets.

We selected a galaxy catalogue from the SDSS database, with information about 660,000 galaxies and their morphological components (Mendel et al., 2014). The masses listed there were determined by fitting a broadband spectral energy distribution. This process involved making assumptions about the initial mass function, extinction law and stellar evolution. In that catalogue, the bulge-disk brightness profiles were reconstructed using the photometric decomposition method with the Sersic profile

I(R)=I_{e}\exp\left\{-b_{n}\left[\left(\frac{R}{R_{e}}\right)^{1/n}-1\right]% \right\},

(10)

where $R_{e}$ is the half-light radius and $I_{e}$ the intensity at that radius. Here $n$ is known as the Sérsic index and control the curvature of the profile.

The magnitudes used in the predictions stage were obtained from the SDSS DR7 (Abdurro’uf et al., 2022). We converted the apparent magnitude ( $m$ ) to absolute magnitude ( $M$ ) using (Schneider, 2006)

M=m-5\Big{(}\log_{10}d-1\Big{)},

(11)

where $d$ is the distance to the source, in units of 10 parsecs. Distances were computed using the python library Astropy (The Astropy Collaboration et al., 2013) with the redshift reported in NED¹¹1The NASA/IPAC Extragalactic Database (NED) is operated by the Jet Propulsion Laboratory, California Institute of Technology, under contract with the National Aeronautics and Space Administration. and assuming the cosmological parameters from Planck 2018 (Planck Collaboration et al., 2020) $H_{0}=67.66$ km/Mpc/s, and $\Omega_{m0}=0.26$ . Our analysis focused on about $70\%$ of the total dataset, concentrating on galaxies with the u, g, r, i, and z magnitudes.

A valuable piece of information for describing the evolution and structure of galaxies are the scaling relations between physical quantities of a galaxy sample. We analyse the scaling relation between mass components and the r-magnitude, as it is reported in other works,(Mahajan et al., 2017a; Venhola, Aku et al., 2019; Mahajan et al., 2017b; Côté et al., 2015). This is best correlated with stellar mass among SDSS filters. The relations for magnitudes in other colours are similar. We also study the $M_{\text{bulge}}-M_{\text{disk}}$ relation as well as the $M_{\star}-M_{\text{tot}}$ . We only employ the NN algorithm to obtain the results presented in this section, given that it performs better with fewer errors and its construction involves a more complete architecture than the other AI algorithms.

5.1 Mass-magnitude relation

In Fig. 4, we present distributions projected onto the $M_{\star}-$ rmag, $M_{\text{disk}}$ - r ${}_{\text{mag}}$ planes, and the $M_{\text{bulge}}$ - r ${}_{\text{mag}}$ relation for completeness. In each case, distributions up to $2\sigma$ for three datasets are shown: firstly, from the original mock catalogue in blue; from the original SDSS catalogue in green; and the third corresponds to NN predictions for the SDSS galaxies in red. The contours represent the $99\%$ and $95\%$ confidence levels. For plotting these figures we are using the whole data of spiral galaxies in the mock catalogue nevertheless, the masses reported in the observed catalogue fall within the regions depicted in Fig. 2.

First, fig. 4, Panel (a) illustrates the scaling relation between $M_{\star}-r_{\text{mag}}$ . The NN predictions agree with the real values up to $95\%$ C.L. However, as we approach more massive galaxies, and consequently, the resolution limit for simulations increases the error. Overall, the NN exhibits accurate predictions for $M_{\star}$ , consistent with Fig. 2 (c). Indeed, the best-fit slopes for each dataset only show slight differences. The best fits for the mock catalogue, SDSS, and NN predictions, respectively, are

$\displaystyle M_{\star}$	$\displaystyle=$	$\displaystyle-0.427r_{\text{mag}}+1.370,$	(12)
$\displaystyle M_{\star}$	$\displaystyle=$	$\displaystyle-0.461r_{\text{mag}}+0.916,$	(13)
$\displaystyle M_{\star}$	$\displaystyle=$	$\displaystyle-0.457r_{\text{mag}}+0.954.$	(14)

For $M_{\text{disk}}$ in fig. 4 panel (b), we distinguish a possibly bimodal distribution with two regions for simulations laying inside the range of mass between $7<\log M_{\text{disk}}<9$ . There is a separation between both blobs due to the lack of information at $r_{\text{mag}}\sim-19$ . There is an acceptable agreement within the $95\%$ C.L for galaxies in the low-surface brightness region.

Here, it is worth to mention that in all cases the masses predicted by the NN fall within the region of the simulated masses, as expected. However, for $M_{\text{disk}}$ , we observe that the red 2-sigma curve go outside the the blue and green regions for masses below $10^{9}M_{\odot}/h$ and above $5\times 10^{1}0M_{\odot}/h$ . This is related to the fact that the output masses are distributed in a three dimensional space (disk-bulge-stellar) and we are showing the projections over a single input parameter.

Bulge masses for most brilliant galaxies within the same mass range are not well predicted and are excluded by the NN architecture. This region corresponds to quasi-elliptical systems with large mass but small disks (see Fig. 4 (b)). In this case, the NN predicts that this type of system is unlikely, and in fact, it would be challenging to distinguish the disk from the bulge without an accurate numerical method. This conclusion is supported by panels (a) and (c), where the prediction aligns with the expected result for more than $95\%$ of the data. However, the missing points in panel (b) are compensated by the excess points in panel (c). This suggests that purely elliptical systems provide a better description of these cases. This behaviour is also reflected in Fig. 2 (a), where the error increases for masses below $10^{9}$ $M_{\odot}/h$ .

Additionally, the fact the neural network (NN) predicts well the stellar mass of SDSS galaxies (Figure 4 a)) serves as a consistency test between the mock catalogue and the NN. However, the predictions for small disk components deviate from the SDSS catalogue, which can suggest that the features are insufficient for training the NN or that the catalogue needs further precise information about the components. The values for $M_{\text{bulge}}$ in 4 panel (c) are derived from eq. (1) and from values of $M_{\star}$ and $M_{\text{disk}}$ directly inferred by the NN. We can observe an acceptable agreement between observations and simulations up to $95\%$ C.L.

5.2 Bulge-disk components

The relation between the luminous mass and the bulge-disk masses is described by eq. (1). $M_{\star}$ can be determined by a scaling relation (see Fig. 4). Thus, for a specific value for $M_{\text{bulge}}$ , the $M_{\text{disk}}$ will only take values within certain intervals, and vice versa.

Figure 5 shows bulge and disk masses of galaxies within both datasets. The mock catalogue shows a trimodal distribution. The most prominent region, for $M_{\text{disk}}>10^{8}M_{\odot/h}$ , corresponds to low values for $M_{\text{bulge}}$ , and it is associated to disk-dominated galaxies. The second region, for $M_{\text{bulge}}>10^{10}M_{\odot/h}$ is the bulge-dominated region (Conselice, 2006). This sort of galaxies are usually dubbed as cD $-$ like galaxies (central dominant) (Guo et al., 2011; Oemler Jr, 1976) This behaviour arises in both observed and simulated galaxies, although disk-dominant galaxies are more abundant in both cases.

The third region in the $M_{\text{bulge}}-M_{\text{disk}}$ , which corresponds to galaxies with both small disks and bulges, are only shown for the mock data. This discrepancy suggests that there may be an observational bias because current telescopes might not be able to detect the low-luminosity galaxies that appear in the numerical simulations.

The NN prediction is also shown in Fig. 5. For this last sample, the relationship between disk and bulge is nonlinear and not readily fitted with an analytic function as it happens with scaling relations derived in section 5.1. This can be due to the multimodal distribution suggesting that different scaling relations between bulge-disk mass components might arise for different galaxies within the sample. Nevertheless, the machine learning algorithm can make good predictions for disk-dominant galaxies. Furthermore, it is interesting that the NN algorithm gives rise to mass predictions consistent with the SDSS distribution and does not predict bulge-dominant galaxies as expected. Giving more accurate information about larger bulges can involve more complicated dynamics.

6 Conclusions

It is well known that the bulge-disk decomposition and the estimation of the total mass of galactic systems are complicated tasks that have been tackled by considering several assumptions. This paper presents an alternative method to make such estimations based on AI algorithms designed to predict the masses of different components in spiral galaxies. Two sets of input features were considered in the first stage of our analysis. In the first set, the magnitudes in different bands were considered, while in the second one, $V_{\text{max}}$ was added to the first set to include information about the system’s kinematics. After analyzing the performance of trained algorithms and testing the importance of the parameters, we figured out that the $V_{\text{max}}$ is only relevant for computing the galaxy’s total mass. The previous suggests that absolute magnitudes of the galaxies provide sufficient information to predict the masses of galactic components. Therefore, these methods can be readily applied to estimate masses of observational galaxies from different datasets.

These methods were used to predict the masses of different components of real spiral galaxies within the SDSS catalogue. Throughout the whole SDSS sample of galaxies, values of magnitudes and masses hold higher values than those taken for our analysis. We chose a subsample holding parameters within the same ranges as our training mock catalogue. During the training and test stages, the NN not only provided the estimations for masses of different galaxy components but could predict scaling relations (mass-magnitude) achieving at least a $95\%$ C.L. agreement with observational data.

From either observational or mock samples of galaxies, our analysis and algorithm are restricted to only be applicable within a small range for the total mass, given that only galaxies with non-zero bulges and disks were selected within the sample. We found that the NN algorithm shows good predictions for disk-dominant galaxies only considering the photometric information. Additionally, at $95\%$ confidence level, the NN predicts that it is unlikely to have bulge-dominant galaxies, consistent with the lack of information from observations. This training can be improved using additional information about the galaxy’s dynamics, age or chemical composition. However, using only the photometric information can be useful to obtain a sufficiently good estimation of the mass components in spiral galaxies with bulges.

We will continue this project by constructing machine learning algorithms trained with features inferred directly from observational data to have more accurate results and explore the HAM relation in real galaxies and the possible dependency on the morphology or age of the systems.

Acknowledgement

For this research work, we use the NASA/IPAC Extragalactic Database (NED), which is operated by the Jet Propulsion Laboratory, California Institute of Technology, under contract with the National Aeronautics and Space Administration.

Research leading to these results has received support from the European Structural and Investment Funds and the Czech Ministry of Education, Youth and Sports (project No. FORTE $-$ CZ $.02.01.01/00/22_{0}08/0004632$ ).

References

Abdurro’uf et al. (2022) Abdurro’uf et al., 2022, The Astrophysical Journals, 259, 35
Andredakis et al. (1995) Andredakis Y. C., Peletier R. F., Balcells M., 1995, Monthly Notices of the Royal Astronomical Society, 275, 874
Barsanti et al. (2021) Barsanti S., et al., 2021, The Astrophysical Journal, 911, 21
Bower et al. (2006) Bower R. G., Benson A. J., Malbon R., Helly J. C., Frenk C. S., Baugh C. M., Cole S., Lacey C. G., 2006, Monthly Notices of the Royal Astronomical Society, 370, 645
Buitinck et al. (2013) Buitinck L., et al., 2013, in ECML PKDD Workshop: Languages for Data Mining and Machine Learning. pp 108–122
Calderon & Berlind (2019) Calderon V. F., Berlind A. A., 2019, Monthly Notices of the Royal Astronomical Society, 490, 2367
Chollet et al. (2015) Chollet F., et al., 2015, Keras, https://keras.io
Conselice (2006) Conselice C. J., 2006, Monthly Notices of the Royal Astronomical Society, 373, 1389
Croton et al. (2006) Croton D. J., et al., 2006, Monthly Notices of the Royal Astronomical Society, 367, 864
Côté et al. (2015) Côté B., Martel H., Drissen L., 2015, The Astrophysical Journal, 802, 123
De Lucia & Blaizot (2007) De Lucia G., Blaizot J., 2007, Monthly Notices of the Royal Astronomical Society, 375, 2
De Lucia et al. (2004) De Lucia G., Kauffmann G., Springel V., White S. D. M., Lanzoni B., Stoehr F., Tormen G., Yoshida N., 2004, Monthly Notices of the Royal Astronomical Society, 348, 333
De Lucia et al. (2006) De Lucia G., Springel V., White S. D. M., Croton D., Kauffmann G., 2006, Monthly Notices of the Royal Astronomical Society, 366, 499
Dimauro et al. (2018) Dimauro P., et al., 2018, Monthly Notices of the Royal Astronomical Society, 478, 5410
Freeman (1970) Freeman K. C., 1970, Astrophysical Journal, vol. 160, p. 811, 160, 811
Gnedin (2000) Gnedin N. Y., 2000, The Astrophysical Journal, 542, 535
Guo et al. (2010) Guo Q., White S., Li C., Boylan-Kolchin M., 2010, Monthly Notices of the Royal Astronomical Society, 404, 1111
Guo et al. (2011) Guo Q., et al., 2011, Monthly Notices of the Royal Astronomical Society, 413, 101
Hogg et al. (2004) Hogg D. W., et al., 2004, The Astrophysical Journal, 601, L29
Holmberg (1958) Holmberg E., 1958, Meddelanden fran Lunds Astronomiska Observatorium Serie II, 136, 1
Hopkins et al. (2010) Hopkins P. F., et al., 2010, The Astrophysical Journal, 715, 202
Hubble (1929) Hubble E., 1929, Proceedings of the National Academy of Science, 15, 168
Johnston et al. (2012) Johnston E. J., Aragón-Salamanca A., Merrifield M. R., Bedregal A. G., 2012, Monthly Notices of the Royal Astronomical Society, 422, 2590
Kent (1985) Kent S. M., 1985, The Astrophysical Journals, 59, 115
King (1966) King I. R., 1966, Astronomical Journal, 71, 64
Kuiper (1938) Kuiper G. P., 1938, The Astrophysical Journal, 88, 472
Liebert & Probst (1987) Liebert J., Probst R. G., 1987, Annual review of astronomy and astrophysics, 25, 473
Lucia (2019) Lucia G., 2019, Galaxies, 7, 56
Mahajan et al. (2017a) Mahajan S., et al., 2017a, Monthly Notices of the Royal Astronomical Society, 475, 788
Mahajan et al. (2017b) Mahajan S., et al., 2017b, Monthly Notices of the Royal Astronomical Society, 475, 788
Mendel et al. (2014) Mendel J. T., Simard L., Palmer M., Ellison S. L., Patton D. R., 2014, VizieR Online Data Catalog, p. J/ApJS/210/3
Oemler Jr (1976) Oemler Jr A., 1976, Astrophysical Journal, Vol. 209, p. 693-709, 209, 693
Pedregosa et al. (2011) Pedregosa F., et al., 2011, Journal of Machine Learning Research, 12, 2825
Planck Collaboration et al. (2020) Planck Collaboration et al., 2020, A&A, 641, A6
Reiprich & Boehringer (2002) Reiprich T. H., Boehringer H., 2002, The Astrophysical Journal, 567, 716
Saxe et al. (2019) Saxe A. M., Bansal Y., Dapello J., Advani M., Kolchinsky A., Tracey B. D., Cox D. D., 2019, Journal of Statistical Mechanics: Theory and Experiment, 2019, 124020
Schneider (2006) Schneider P., 2006, Extragalactic Astronomy and Cosmology
Spigler et al. (2018) Spigler S., Geiger M., d’Ascoli S., Sagun L., Biroli G., Wyart M., 2018, arXiv preprint arXiv:1810.09665
Springel et al. (2005) Springel V., et al., 2005, Nature, 435, 629
Springel et al. (2006) Springel V., Frenk C. S., White S. D. M., 2006, Nature, 440, 1137
The Astropy Collaboration et al. (2013) The Astropy Collaboration et al., 2013, A&A, 558, A33
Tully et al. (1998) Tully R. B., Pierce M. J., Huang J.-S., Saunders W., Verheijen M. A. W., Witchalls P. L., 1998, The Astronomical Journal, 115, 2264
Venhola, Aku et al. (2019) Venhola, Aku et al., 2019, A&A, 625, A143
Williams et al. (2016) Williams R. P., et al., 2016, Monthly Notices of the Royal Astronomical Society, 463, 2746
Willman et al. (2002) Willman B., Dalcanton J., Ivezić Ž., Jackson T., Lupton R., Brinkmann J., Henessy G., Hindsley R., 2002, The Astronomical Journal, 123, 848
de Vaucouleurs (1948) de Vaucouleurs G., 1948, Annales d’Astrophysique, 11, 247