Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Lasso regularization for mixture experiments with noise variables.

Manuel González-Navarrete,
Fabián Manríquez-Méndez  and Manuel Pereira-Barahona
Departamento de Matemática y Estadística, Universidad de La Frontera. Avda. Francisco Salazar 01145, Temuco, Chile. E-mail address: manuel.gonzaleznavarrete@ufrontera.clInstituto de Estadística, Universidad de Valparaíso. Av. Gran Bretaña 1111, Valparaíso, Chile. E-mail address: fabian.manriquez@postgrado.uv.clDepartamento de Estadística, Universidad del Bío-Bío. Avda. Collao 1202, Concepción, Chile. E-mail address: mpereira@ubiobio.cl
Abstract

We apply classical and Bayesian lasso regularizations to a family of models with the presence of mixture and process variables. We analyse the performance of these estimates with respect to ordinary least squares estimators by a simulation study and a real data application. Our results demonstrate the superior performance of Bayesian lasso, particularly via coordinate ascent variational inference, in terms of variable selection accuracy and response optimization.


1 Introduction

The exploration of complex systems through mixture experiments and the influence of external process variables represents a significant challenge in various fields of science and engineering. The ability to predict and optimize responses in such systems is crucial for technological advancement and innovation [7, 8].

In a mathematical framework, the study of mixture experiments involves building a model describing the relationship among the response and the mixture and process variables. This task requires the choose of an experimental design, and the fit of the statistical model by employing the data collected after experimentation. The usual tools to estimate model parameters are the ordinary least squares [3, 7] and, to a lesser extent, the partial least squares [18, 22].

In this context, regularization techniques such as lasso and its Bayesian extension stand out as fundamental tools for model analysis and selection in statistical literature [17], being good candidates for challenges such as high-dimensional mixture experiments. Lasso, introduced by [25], marked an advancement in regression by proposing a technique that minimizes the sum of squared residuals with a constraint on the L1superscript𝐿1L^{1}italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT norm of the coefficients, facilitating variable selection and reducing model complexity. Bayesian lasso, proposed by [23], extends this approach by incorporating a Bayesian perspective that assigns Laplace prior distributions to the regression parameters. This innovation maintains the effectiveness of lasso in variable selection while introducing Bayesian flexibility in estimation. Subsequent developments, such as those by [15], [21], and [1], have delved into the hierarchical structure of the model and improved inference algorithms, highlighting the robustness of Bayesian lasso in variable selection and predictive accuracy.

This study focuses on the integration of mixture experiments with process variables and the application of lasso regularization techniques to simultaneously optimize the mean and variance of the response. In particular, we explore the mixture-process models with noise variables described in [7], emphasizing their importance in understanding the interaction between mixture components and process conditions. The mathematical formulation underlying the optimization of these models is discussed, aiming to evaluate the performance of classical formulation and Bayesian lasso by employing Markov chain Monte Carlo [26] and variational approximation methods [5]. In this sense, a practical application of these concepts is illustrated through a simulation study to evaluate their performances in variable selection task. Moreover, a real data example from [13] is included, discussing the effectiveness of the proposed approach in the study of mixture experiments.

The rest of the paper is organized as follows. In Section 2 we introduce the theoretical aspects in the mathematical study of mixture experiments and the proposed regularization methods. Section 3 includes the results of a simulation study to evaluate the performance of such methods. In Section 4 we expose an application for a soap production experiment. Finally, Section 5 contains some conclusions.

2 Theoretical Background

2.1 Mixture experiments with noise variables

Mixture models with process variables represent an advanced tool for analyzing systems where both component proportions and specific external conditions (process variables) influence the system’s response. These models extend traditional frameworks by incorporating additional variables that reflect the conditions under which the experiment is conducted.

The general formulation of a model including mixture components 𝐱𝐱\mathbf{x}bold_x, process variables 𝐰𝐰\mathbf{w}bold_w, and possibly noise variables 𝐳𝐳\mathbf{z}bold_z, is described by the equation:

Y=f(x,w,z)=iαixi+i<jαijxixj+ipδipxiwp+i<jpδijpxixjwp+itγitxizt+i<jtγijtxixjzt+iptηiptxiwpzt+i<jptηijptxixjwpzt+ε𝑌𝑓xwzsubscript𝑖subscript𝛼𝑖subscript𝑥𝑖subscript𝑖𝑗subscript𝛼𝑖𝑗subscript𝑥𝑖subscript𝑥𝑗subscript𝑖subscript𝑝subscript𝛿𝑖𝑝subscript𝑥𝑖subscript𝑤𝑝subscript𝑖𝑗subscript𝑝subscript𝛿𝑖𝑗𝑝subscript𝑥𝑖subscript𝑥𝑗subscript𝑤𝑝subscript𝑖subscript𝑡subscript𝛾𝑖𝑡subscript𝑥𝑖subscript𝑧𝑡subscript𝑖𝑗subscript𝑡subscript𝛾𝑖𝑗𝑡subscript𝑥𝑖subscript𝑥𝑗subscript𝑧𝑡subscript𝑖subscript𝑝subscript𝑡subscript𝜂𝑖𝑝𝑡subscript𝑥𝑖subscript𝑤𝑝subscript𝑧𝑡subscript𝑖𝑗subscript𝑝subscript𝑡subscript𝜂𝑖𝑗𝑝𝑡subscript𝑥𝑖subscript𝑥𝑗subscript𝑤𝑝subscript𝑧𝑡𝜀\begin{split}Y=&f(\textbf{x},\textbf{w},\textbf{z})=\displaystyle\sum_{i}% \alpha_{i}x_{i}+\mathop{\sum\sum}_{i<j}\alpha_{ij}x_{i}x_{j}+\displaystyle\sum% _{i}\sum_{p}\delta_{ip}x_{i}w_{p}\\ &+\mathop{\sum\sum}_{i<j}\sum_{p}\delta_{ijp}x_{i}x_{j}w_{p}+\displaystyle\sum% _{i}\sum_{t}\gamma_{it}x_{i}z_{t}+\mathop{\sum\sum}_{i<j}\sum_{t}\gamma_{ijt}x% _{i}x_{j}z_{t}\\ &+\displaystyle\sum_{i}\sum_{p}\sum_{t}\eta_{ipt}x_{i}w_{p}z_{t}+\mathop{\sum% \sum}_{i<j}\sum_{p}\sum_{t}\eta_{ijpt}x_{i}x_{j}w_{p}z_{t}+\varepsilon\end{split}start_ROW start_CELL italic_Y = end_CELL start_CELL italic_f ( x , w , z ) = ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + start_BIGOP ∑ ∑ end_BIGOP start_POSTSUBSCRIPT italic_i < italic_j end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_i italic_p end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + start_BIGOP ∑ ∑ end_BIGOP start_POSTSUBSCRIPT italic_i < italic_j end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_i italic_j italic_p end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_i italic_t end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + start_BIGOP ∑ ∑ end_BIGOP start_POSTSUBSCRIPT italic_i < italic_j end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_i italic_j italic_t end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_η start_POSTSUBSCRIPT italic_i italic_p italic_t end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + start_BIGOP ∑ ∑ end_BIGOP start_POSTSUBSCRIPT italic_i < italic_j end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_η start_POSTSUBSCRIPT italic_i italic_j italic_p italic_t end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_ε end_CELL end_ROW (2.1)

where Y𝑌Yitalic_Y is the response variable; β¯=(α¯,δ¯,γ¯,η¯)¯𝛽¯𝛼¯𝛿¯𝛾¯𝜂\underline{\beta}=(\underline{\alpha},\underline{\delta},\underline{\gamma},% \underline{\eta})under¯ start_ARG italic_β end_ARG = ( under¯ start_ARG italic_α end_ARG , under¯ start_ARG italic_δ end_ARG , under¯ start_ARG italic_γ end_ARG , under¯ start_ARG italic_η end_ARG ) is the vector of coefficients modeling linear effects and interactions; and ε𝜀\varepsilonitalic_ε represents the error term, normally distributed with mean zero and variance σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

Key constraints for such models include [13]:

  • The proportions of mixture components 𝐱𝐱\mathbf{x}bold_x must sum to 1, i.e., ixi=1subscript𝑖subscript𝑥𝑖1\sum_{i}x_{i}=1∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1, ensuring that the model adequately reflects the nature of mixtures.

  • Mixture components xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and process variables wpsubscript𝑤𝑝w_{p}italic_w start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT should be selected to faithfully reflect the system under study, including only those factors that have a significant impact on the response.

  • Conveniently, the noise variables are supposed to be independent and identically distributed, with 𝔼(Zt)=0𝔼subscript𝑍𝑡0\mathbb{E}(Z_{t})=0blackboard_E ( italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = 0 and 𝕍(Zt)=1𝕍subscript𝑍𝑡1\mathbb{V}(Z_{t})=1blackboard_V ( italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = 1.

The usual tool to estimate parameters in (2.1) is the method of ordinary least squares, which are given by

β¯^OLS=argminβ¯(yXβ¯)t(yXβ¯)\hat{\underline{\beta}}_{OLS}={\arg\min}_{\underline{\beta}}(y-X\underline{% \beta})^{t}(y-X\underline{\beta})over^ start_ARG under¯ start_ARG italic_β end_ARG end_ARG start_POSTSUBSCRIPT italic_O italic_L italic_S end_POSTSUBSCRIPT = roman_arg roman_min start_POSTSUBSCRIPT under¯ start_ARG italic_β end_ARG end_POSTSUBSCRIPT ( italic_y - italic_X under¯ start_ARG italic_β end_ARG ) start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( italic_y - italic_X under¯ start_ARG italic_β end_ARG ) (2.2)

Once we obtain the fitted model for response variable Y𝑌Yitalic_Y, the objective is finding optimal configurations of mixture variables 𝐱𝐱\mathbf{x}bold_x and process and noise variables 𝐰𝐰\mathbf{w}bold_w and 𝐳𝐳\mathbf{z}bold_z, respectively, which optimize the response for the experiments. This task is completed by using the desirability function approach proposed by Derringer and Suich [11], and extensively used in the recent literature [3, 9, 10, 24].

The desirability function is defined for estimated response functions, such as the moments of Y𝑌Yitalic_Y, 𝔼(Yn)𝔼superscript𝑌𝑛\mathbb{E}(Y^{n})blackboard_E ( italic_Y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ). The values of these functions increase as the ”desirability” of the corresponding response increases. In this sense, for instance, the desirability function of the expectation of a function of the response variable is given by

d(𝔼^(g(Y)))={0,if𝔼^(g(Y))𝔼(g(Y))[𝔼^(g(Y))𝔼(g(Y))𝔼(g(Y))𝔼(g(Y))]r,if𝔼(g(Y))<𝔼^(g(Y))<𝔼(g(Y))1,if𝔼^(g(Y))𝔼(g(Y))𝑑^𝔼𝑔𝑌cases0if^𝔼𝑔𝑌𝔼subscript𝑔𝑌superscriptdelimited-[]^𝔼𝑔𝑌𝔼subscript𝑔𝑌𝔼superscript𝑔𝑌𝔼subscript𝑔𝑌𝑟if𝔼subscript𝑔𝑌^𝔼𝑔𝑌𝔼superscript𝑔𝑌1if^𝔼𝑔𝑌𝔼superscript𝑔𝑌d\left(\widehat{\mathbb{E}}(g(Y))\right)=\begin{cases}0,&\text{if}\ \widehat{% \mathbb{E}}(g(Y))\leq\mathbb{E}(g(Y))_{*}\\ \left[\frac{\widehat{\mathbb{E}}(g(Y))-\mathbb{E}(g(Y))_{*}}{\mathbb{E}(g(Y))^% {*}-\mathbb{E}(g(Y))_{*}}\right]^{r},&\text{if}\ \mathbb{E}(g(Y))_{*}<\widehat% {\mathbb{E}}(g(Y))<\mathbb{E}(g(Y))^{*}\\ 1,&\text{if}\ \widehat{\mathbb{E}}(g(Y))\geq\mathbb{E}(g(Y))^{*}\end{cases}italic_d ( over^ start_ARG blackboard_E end_ARG ( italic_g ( italic_Y ) ) ) = { start_ROW start_CELL 0 , end_CELL start_CELL if over^ start_ARG blackboard_E end_ARG ( italic_g ( italic_Y ) ) ≤ blackboard_E ( italic_g ( italic_Y ) ) start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL [ divide start_ARG over^ start_ARG blackboard_E end_ARG ( italic_g ( italic_Y ) ) - blackboard_E ( italic_g ( italic_Y ) ) start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_ARG start_ARG blackboard_E ( italic_g ( italic_Y ) ) start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - blackboard_E ( italic_g ( italic_Y ) ) start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_ARG ] start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT , end_CELL start_CELL if blackboard_E ( italic_g ( italic_Y ) ) start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT < over^ start_ARG blackboard_E end_ARG ( italic_g ( italic_Y ) ) < blackboard_E ( italic_g ( italic_Y ) ) start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL 1 , end_CELL start_CELL if over^ start_ARG blackboard_E end_ARG ( italic_g ( italic_Y ) ) ≥ blackboard_E ( italic_g ( italic_Y ) ) start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_CELL end_ROW (2.3)

The values 𝔼(g(Y))𝔼subscript𝑔𝑌\mathbb{E}(g(Y))_{*}blackboard_E ( italic_g ( italic_Y ) ) start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT and 𝔼(g(Y))𝔼superscript𝑔𝑌\mathbb{E}(g(Y))^{*}blackboard_E ( italic_g ( italic_Y ) ) start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT give the minimum and maximum acceptable values of 𝔼(g(Y))𝔼𝑔𝑌\mathbb{E}(g(Y))blackboard_E ( italic_g ( italic_Y ) ), respectively. The parameter r𝑟ritalic_r is arbitrarily chosen. Finally, the individual desirabilities are combined using the geometric mean,

D(x,w,z)=(d(𝔼^(g1(Y)))d(𝔼^(gd(Y))))1/d𝐷xwzsuperscript𝑑^𝔼subscript𝑔1𝑌𝑑^𝔼subscript𝑔𝑑𝑌1𝑑D(\textbf{x},\textbf{w},\textbf{z})=\left(d\left(\widehat{\mathbb{E}}(g_{1}(Y)% )\right)\cdot\ldots\cdot d\left(\widehat{\mathbb{E}}(g_{d}(Y))\right)\right)^{% 1/d}italic_D ( x , w , z ) = ( italic_d ( over^ start_ARG blackboard_E end_ARG ( italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_Y ) ) ) ⋅ … ⋅ italic_d ( over^ start_ARG blackboard_E end_ARG ( italic_g start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( italic_Y ) ) ) ) start_POSTSUPERSCRIPT 1 / italic_d end_POSTSUPERSCRIPT (2.4)

This single value of D𝐷Ditalic_D is maximized to obtain the overall assessment of the desirability of the combined expected response functions. In particular, we use g1(Y)=Ysubscript𝑔1𝑌𝑌g_{1}(Y)=Yitalic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_Y ) = italic_Y and g2(Y)=(Y𝔼(Y))2subscript𝑔2𝑌superscript𝑌𝔼𝑌2g_{2}(Y)=-(Y-\mathbb{E}(Y))^{2}italic_g start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_Y ) = - ( italic_Y - blackboard_E ( italic_Y ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. In other words, we maximize the expectation and minimize the variance of the response variable Y𝑌Yitalic_Y.

2.2 Lasso regularization

The introduction of lasso, as proposed by Tibshirani (1996), represented a significant step forward in regression techniques. By minimizing the sum of squared residuals under a constraint on the L1superscript𝐿1L^{1}italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT norm of coefficients, lasso facilitates variable selection and effectively reduces model complexity. Building upon this foundation, Bayesian lasso, as introduced by Park and Casella (2008), takes regression analysis further by adopting a Bayesian framework that assigns Laplace prior distributions to regression parameters. This Bayesian perspective not only preserves the variable selection capabilities of lasso but also introduces greater flexibility in parameter estimation.

2.2.1 Classical formulation

The lasso is a form of penalized least squares that minimizes the residual sum of squares while controlling the L1superscript𝐿1L^{1}italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT norm of the coefficient vector β¯¯𝛽\underline{\beta}under¯ start_ARG italic_β end_ARG. The lasso estimator for a classical regression model is given by,

β¯^L=argminβ¯(yXβ¯)t(yXβ¯)+λ||β¯||1\hat{\underline{\beta}}_{L}={\arg\min}_{\underline{\beta}}(y-X\underline{\beta% })^{t}(y-X\underline{\beta})+\lambda||\underline{\beta}||_{1}over^ start_ARG under¯ start_ARG italic_β end_ARG end_ARG start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT = roman_arg roman_min start_POSTSUBSCRIPT under¯ start_ARG italic_β end_ARG end_POSTSUBSCRIPT ( italic_y - italic_X under¯ start_ARG italic_β end_ARG ) start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( italic_y - italic_X under¯ start_ARG italic_β end_ARG ) + italic_λ | | under¯ start_ARG italic_β end_ARG | | start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (2.5)

where λ0𝜆0\lambda\geq 0italic_λ ≥ 0 is called the shrinkage parameter. In the case λ=0𝜆0\lambda=0italic_λ = 0, we have β¯^L=β¯^OLSsubscript^¯𝛽𝐿subscript^¯𝛽𝑂𝐿𝑆\hat{\underline{\beta}}_{L}=\hat{\underline{\beta}}_{OLS}over^ start_ARG under¯ start_ARG italic_β end_ARG end_ARG start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT = over^ start_ARG under¯ start_ARG italic_β end_ARG end_ARG start_POSTSUBSCRIPT italic_O italic_L italic_S end_POSTSUBSCRIPT, the ordinary least squares (OLS) estimation, and sufficiently large λ𝜆\lambdaitalic_λ reduces β¯Lsubscript¯𝛽𝐿\underline{\beta}_{L}under¯ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT to zero. The lasso has a Bayesian interpretation [25], since the lasso estimation can be seen as the mode of the posterior distribution of β¯¯𝛽\underline{\beta}under¯ start_ARG italic_β end_ARG, when double-exponential and independent prior distributions are assigned to the p𝑝pitalic_p regression coefficients,

p(β¯τ)=(τ/2)pexp(τβ¯1)𝑝conditional¯𝛽𝜏superscript𝜏2𝑝𝜏subscriptnorm¯𝛽1p(\underline{\beta}\mid\tau)=(\tau/2)^{p}\exp\left(-\tau||\underline{\beta}||_% {1}\right)italic_p ( under¯ start_ARG italic_β end_ARG ∣ italic_τ ) = ( italic_τ / 2 ) start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT roman_exp ( - italic_τ | | under¯ start_ARG italic_β end_ARG | | start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) (2.6)

where p(y¯β¯,σ2)=𝒩(y¯Xβ¯,σ2In)𝑝conditional¯𝑦¯𝛽superscript𝜎2𝒩conditional¯𝑦𝑋¯𝛽superscript𝜎2subscriptI𝑛p(\underline{y}\mid\underline{\beta},\sigma^{2})=\mathcal{N}(\underline{y}\mid X% \underline{\beta},\sigma^{2}\textbf{I}_{n})italic_p ( under¯ start_ARG italic_y end_ARG ∣ under¯ start_ARG italic_β end_ARG , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) = caligraphic_N ( under¯ start_ARG italic_y end_ARG ∣ italic_X under¯ start_ARG italic_β end_ARG , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ), for any fixed values of σ>0𝜎0\sigma>0italic_σ > 0 and τ>0𝜏0\tau>0italic_τ > 0, with penalty λ=2τσ2𝜆2𝜏superscript𝜎2\lambda=2\tau\sigma^{2}italic_λ = 2 italic_τ italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

2.2.2 Bayesian formulation

The work [23] shows a Bayesian formulation of lasso regression. The hierarchical model is defined by:

y¯X,β¯,σ2𝒩(Xβ¯,σ2In)β¯σ2,τ¯𝒩(0¯,σ2𝐃τ)τjλExp(λ)j=1,,pformulae-sequencesimilar-toconditional¯𝑦𝑋¯𝛽superscript𝜎2conditional𝒩𝑋¯𝛽superscript𝜎2subscript𝐼𝑛¯𝛽superscript𝜎2¯𝜏similar-toconditional𝒩¯0superscript𝜎2subscript𝐃𝜏subscript𝜏𝑗𝜆similar-toExp𝜆𝑗1𝑝\begin{split}\underline{y}\mid X,\underline{\beta},\sigma^{2}&\sim\mathcal{N}(% X\cdot\underline{\beta},\sigma^{2}\cdot I_{n})\\ \underline{\beta}\mid\sigma^{2},\underline{\tau}&\sim\mathcal{N}(\underline{0}% ,\sigma^{2}\mathbf{D}_{\tau})\\ \tau_{j}\mid\lambda&\sim\operatorname{Exp}(\lambda)\quad j=1,\ldots,p\end{split}start_ROW start_CELL under¯ start_ARG italic_y end_ARG ∣ italic_X , under¯ start_ARG italic_β end_ARG , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL start_CELL ∼ caligraphic_N ( italic_X ⋅ under¯ start_ARG italic_β end_ARG , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⋅ italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL under¯ start_ARG italic_β end_ARG ∣ italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , under¯ start_ARG italic_τ end_ARG end_CELL start_CELL ∼ caligraphic_N ( under¯ start_ARG 0 end_ARG , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_D start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL italic_τ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∣ italic_λ end_CELL start_CELL ∼ roman_Exp ( italic_λ ) italic_j = 1 , … , italic_p end_CELL end_ROW (2.7)

where 𝐃τ=diag(τ1,,τp)subscript𝐃𝜏diagsubscript𝜏1subscript𝜏𝑝\mathbf{D}_{\tau}=\operatorname{diag}(\tau_{1},\ldots,\tau_{p})bold_D start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT = roman_diag ( italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_τ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) y τjλconditionalsubscript𝜏𝑗𝜆\tau_{j}\mid\lambdaitalic_τ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∣ italic_λ and τjsubscript𝜏𝑗\tau_{j}italic_τ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are conditionally independent for all j𝑗jitalic_j. The model can be completed with the gamma prior distributions (σ2)1Ga(a0,b0)similar-tosuperscriptsuperscript𝜎21𝐺𝑎subscript𝑎0subscript𝑏0(\sigma^{2})^{-1}\sim Ga(a_{0},b_{0})( italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∼ italic_G italic_a ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) and λGa(c0,d0)similar-to𝜆𝐺𝑎subscript𝑐0subscript𝑑0\lambda\sim Ga(c_{0},d_{0})italic_λ ∼ italic_G italic_a ( italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ), where a0,b0,c0subscript𝑎0subscript𝑏0subscript𝑐0a_{0},b_{0},c_{0}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and d0subscript𝑑0d_{0}italic_d start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT the hyperparameters. Let θ¯=(β¯,σ2,τ¯,λ)¯𝜃¯𝛽superscript𝜎2¯𝜏𝜆\underline{\theta}=\left(\underline{\beta},\sigma^{2},\underline{\tau},\lambda\right)under¯ start_ARG italic_θ end_ARG = ( under¯ start_ARG italic_β end_ARG , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , under¯ start_ARG italic_τ end_ARG , italic_λ ) be the vector of the parameters for this model. The posterior distribution will be proportional to the model distribution times the prior distribution for the latent components and the parameters:

p(θ¯y¯,X)p(y¯X,β¯,σ2)p(β¯σ2,τ¯)p(τ¯λ)p(σ2)p(λ)proportional-to𝑝conditional¯𝜃¯𝑦𝑋𝑝conditional¯𝑦𝑋¯𝛽superscript𝜎2𝑝conditional¯𝛽superscript𝜎2¯𝜏𝑝conditional¯𝜏𝜆𝑝superscript𝜎2𝑝𝜆p(\underline{\theta}\mid\underline{y},X)\propto p(\underline{y}\mid X,% \underline{\beta},\sigma^{2})p(\underline{\beta}\mid\sigma^{2},\underline{\tau% })p(\underline{\tau}\mid\lambda)p(\sigma^{2})p(\lambda)italic_p ( under¯ start_ARG italic_θ end_ARG ∣ under¯ start_ARG italic_y end_ARG , italic_X ) ∝ italic_p ( under¯ start_ARG italic_y end_ARG ∣ italic_X , under¯ start_ARG italic_β end_ARG , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_p ( under¯ start_ARG italic_β end_ARG ∣ italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , under¯ start_ARG italic_τ end_ARG ) italic_p ( under¯ start_ARG italic_τ end_ARG ∣ italic_λ ) italic_p ( italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_p ( italic_λ )

The posterior distribution for hierarchical model is often intractable.

2.3 Bayesian model estimation

In this section we explain the point estimator we use, the variable selection methods and the computational tools to approximate the posterior distributions. In particular, we include the Markov chain Monte Carlo (MCMC) and variational alternatives by coordinate ascent variational inference (CAVI) and automatic differentiation variational inference (ADVI).

2.3.1 Point estimation and variable selection in Bayesian lasso

We adopt the use of posterior mean, θ^=𝔼(θ¯y¯,X)^𝜃𝔼conditional¯𝜃¯𝑦𝑋\hat{\theta}=\mathbb{E}(\underline{\theta}\mid\underline{y},X)over^ start_ARG italic_θ end_ARG = blackboard_E ( under¯ start_ARG italic_θ end_ARG ∣ under¯ start_ARG italic_y end_ARG , italic_X ), to give point estimations. The choice of this estimator is driven by its capacity to condense all the information provided by the posterior distribution, offering an estimate that considers the diversity of possible parameter values, in contrast to the one-dimensional approach of the MAP [23].

Furthermore, Bayesian lasso provides interval estimates that can guide variable selection. Usually, for each parameter, it is used a 95% credible interval and if the interval contains the value zero, then the regression coefficient is excluded [23]. This criterion will be denoted by CI (credible interval).

However, as discussed by [20], the 95% credible intervals are usually too wide and most predictors would consequently be removed. Therefore, we use the criterion proposed by Li and Lin, that is, we consider the posterior probability of the interval [𝕍(βj)|y¯,X;𝕍(βj)|y¯,X]\left[-\sqrt{\mathbb{V}(\beta_{j})}|\underline{y},X;\sqrt{\mathbb{V}(\beta_{j}% )}|\underline{y},X\right][ - square-root start_ARG blackboard_V ( italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) end_ARG | under¯ start_ARG italic_y end_ARG , italic_X ; square-root start_ARG blackboard_V ( italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) end_ARG | under¯ start_ARG italic_y end_ARG , italic_X ]. In this sense, a regression coefficient is excluded if such probability exceeds a certain threshold and is retained otherwise. In particular, we use 0.50.50.50.5 as a threshold. This criterion is called scaled neighborhood (SN).

2.3.2 Markov chain Monte Carlo

In the field of Bayesian inference, Markov chain Monte Carlo (MCMC) is the most common method to approximate posterior distributions [26]. However, usual MCMC includes high autocorrelation and convergence is slow especially in high-dimensional spaces. To overcome this problem, the rstan package, implemented in R (see [14]), uses an advanced version of Hamiltonian Monte Carlo (HMC) known as the no-U-turn sampler (NUTS) (see [16]), optimizing the sampling process by eliminating the need to manually adjust the number of steps in each improving efficiency in parameter space exploration, reducing autocorrelation between samples and speeding up convergence.

NUTS is an extension of the HMC algorithm that solves the problem of selecting the optimal number of simulation steps. It employs a recursive strategy that automatically expands the trajectory in parameter space, stopping when a reversal or ”U-turn” is detected in the trajectory, hence its name. This enhances sampling efficiency by reducing autocorrelation between samples and optimizing the use of computational resources.

Formally, updating the parameters θ𝜃\thetaitalic_θ in NUTS can be described using Hamiltonian dynamics, where an auxiliary momentum p𝑝pitalic_p is introduced, and the system’s evolution is simulated under the Hamiltonian H(θ,p)=U(θ)+K(p)𝐻𝜃𝑝𝑈𝜃𝐾𝑝H(\theta,p)=U(\theta)+K(p)italic_H ( italic_θ , italic_p ) = italic_U ( italic_θ ) + italic_K ( italic_p ). Here, U(θ)𝑈𝜃U(\theta)italic_U ( italic_θ ) represents the negative logarithmic potential of the posterior distribution, and K(p)𝐾𝑝K(p)italic_K ( italic_p ) is the kinetic energy associated with the momentum p𝑝pitalic_p, typically defined as K(p)=12pTM1p𝐾𝑝12superscript𝑝𝑇superscript𝑀1𝑝K(p)=\frac{1}{2}p^{T}M^{-1}pitalic_K ( italic_p ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_p start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_p, where M𝑀Mitalic_M is the mass (or covariance) matrix that can be adjusted to reflect parameter scales (see [16]).

θn+1,pn+1=Leapfrog(θn,pn,ϵn,Ln)subscript𝜃𝑛1subscript𝑝𝑛1Leapfrogsubscript𝜃𝑛subscript𝑝𝑛subscriptitalic-ϵ𝑛subscript𝐿𝑛\theta_{n+1},p_{n+1}=\text{Leapfrog}(\theta_{n},p_{n},\epsilon_{n},L_{n})italic_θ start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT = Leapfrog ( italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) (2.8)

where Leapfrog()Leapfrog\text{Leapfrog}(\cdot)Leapfrog ( ⋅ ) denotes the leapfrog integration steps used for numerical simulation of Hamiltonian dynamics, ϵnsubscriptitalic-ϵ𝑛\epsilon_{n}italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is the adaptive step size, and Lnsubscript𝐿𝑛L_{n}italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is the number of leapfrog steps, determined dynamically [4].

The implementation of NUTS in rstan allows for more efficient Bayesian inference in high-dimensional models and reduces manual intervention in selecting sampler hyperparameters. The MCMC algorithms, specifically NUTS, were run until convergence, evaluated using the R^^𝑅\hat{R}over^ start_ARG italic_R end_ARG statistic (potential scale reduction), ensuring R^<1.1^𝑅1.1\hat{R}<1.1over^ start_ARG italic_R end_ARG < 1.1 (see [12]).

2.3.3 Methods based on variational inference

The concept behind variational inference methods is to propose a family of densities and find a member q𝑞qitalic_q of that family which closely approximates the target posterior p(θ¯y¯,X)𝑝conditional¯𝜃¯𝑦𝑋p(\underline{\theta}\mid\underline{y},X)italic_p ( under¯ start_ARG italic_θ end_ARG ∣ under¯ start_ARG italic_y end_ARG , italic_X ) [5]. In other words, instead of computing the true posterior, we endeavor to determine the parameters ϕitalic-ϕ\phiitalic_ϕ of a particular distribution qsuperscript𝑞q^{*}italic_q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT (the approximation to our true posterior) such that

q=argmin(q(θ¯;ϕ)||p(θ¯y¯,X))q^{*}=\arg\min\mathcal{L}(q(\underline{\theta};\phi)\ ||\ p(\underline{\theta}% \mid\underline{y},X))italic_q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = roman_arg roman_min caligraphic_L ( italic_q ( under¯ start_ARG italic_θ end_ARG ; italic_ϕ ) | | italic_p ( under¯ start_ARG italic_θ end_ARG ∣ under¯ start_ARG italic_y end_ARG , italic_X ) ) (2.9)

where (||)\mathcal{L}(\cdot\ ||\ \cdot)caligraphic_L ( ⋅ | | ⋅ ) denote the Kullback-Leibler divergence, given by (q(θ¯;ϕ)||p(θ¯y¯,X))=𝔼q[lnq(θ¯;ϕ)p(θ¯y¯,X)]\mathcal{L}(q(\underline{\theta};\phi)\ ||\ p(\underline{\theta}\mid\underline% {y},X))=\mathbb{E}_{q}\left[\ln\frac{q(\underline{\theta};\phi)}{p(\underline{% \theta}\mid\underline{y},X)}\right]caligraphic_L ( italic_q ( under¯ start_ARG italic_θ end_ARG ; italic_ϕ ) | | italic_p ( under¯ start_ARG italic_θ end_ARG ∣ under¯ start_ARG italic_y end_ARG , italic_X ) ) = blackboard_E start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT [ roman_ln divide start_ARG italic_q ( under¯ start_ARG italic_θ end_ARG ; italic_ϕ ) end_ARG start_ARG italic_p ( under¯ start_ARG italic_θ end_ARG ∣ under¯ start_ARG italic_y end_ARG , italic_X ) end_ARG ]. Therefore (2.9) is equivalent to maximizing

q=argmax{𝔼q[lnp(y¯,θ¯,X)lnq(θ¯;ϕ)]ELBO}superscript𝑞subscriptsubscript𝔼𝑞delimited-[]𝑝¯𝑦¯𝜃𝑋𝑞¯𝜃italic-ϕ𝐸𝐿𝐵𝑂q^{*}=\arg\max\{\underbrace{\mathbb{E}_{q}\left[\ln p(\underline{y},\underline% {\theta},X)-\ln q(\underline{\theta};\phi)\right]}_{ELBO}\}italic_q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = roman_arg roman_max { under⏟ start_ARG blackboard_E start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT [ roman_ln italic_p ( under¯ start_ARG italic_y end_ARG , under¯ start_ARG italic_θ end_ARG , italic_X ) - roman_ln italic_q ( under¯ start_ARG italic_θ end_ARG ; italic_ϕ ) ] end_ARG start_POSTSUBSCRIPT italic_E italic_L italic_B italic_O end_POSTSUBSCRIPT } (2.10)

This expression is called evidence lower bound (ELBO). In particular, we want to optimize the ELBO in mean field variational inference, that is, the joint distribution reduces to the product of marginal distributions, q(θ¯)=i=1pq(θi)𝑞¯𝜃superscriptsubscriptproduct𝑖1𝑝𝑞subscript𝜃𝑖q(\underline{\theta})=\prod_{i=1}^{p}q(\theta_{i})italic_q ( under¯ start_ARG italic_θ end_ARG ) = ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT italic_q ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ).

  • Coordinate ascent variational inference: This algorithm to solve the optimization problem was introduced by [4] and denoted by coordinate ascent variational inference (CAVI). The CAVI optimizes one factor of the mean field variational density at a time. This is defined as an iterative optimization of qjsubscript𝑞𝑗q_{j}italic_q start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT for j=1,,p𝑗1𝑝j=1,\ldots,pitalic_j = 1 , … , italic_p, while the other variational distributions are fixed. The optimal qjsubscript𝑞𝑗q_{j}italic_q start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is proportional to the exponential of the log of the complete conditional distribution, is given by

    q(θj)exp{𝔼θ(j)[lnp(θjθ(j),y¯,X)]}j=1,,pformulae-sequenceproportional-to𝑞subscript𝜃𝑗subscript𝔼subscript𝜃𝑗delimited-[]𝑝conditionalsubscript𝜃𝑗subscript𝜃𝑗¯𝑦𝑋𝑗1𝑝q(\theta_{j})\propto\exp\{\mathbb{E}_{\theta_{-(j)}}[\ln p(\theta_{j}\mid% \theta_{-(j)},\underline{y},X)]\}\quad j=1,\ldots,pitalic_q ( italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ∝ roman_exp { blackboard_E start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT - ( italic_j ) end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ roman_ln italic_p ( italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∣ italic_θ start_POSTSUBSCRIPT - ( italic_j ) end_POSTSUBSCRIPT , under¯ start_ARG italic_y end_ARG , italic_X ) ] } italic_j = 1 , … , italic_p (2.11)

    where θ(j)=(θ1,,θj1.θj+1,,θp)\theta_{-(j)}=(\theta_{1},\ldots,\theta_{j-1}.\theta_{j+1},\ldots,\theta_{p})italic_θ start_POSTSUBSCRIPT - ( italic_j ) end_POSTSUBSCRIPT = ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_θ start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT . italic_θ start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT , … , italic_θ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ).

    In the context of CAVI, the focus lies on iteratively adjusting the parameters within the variational distribution until certain convergence standards are reached. This process entails performing analytic derivations for the updates, which may prove to be time-intensive at most and impractical in certain scenarios. The main objective is to optimize the ELBO in the mean field variational inference.

    For the Bayesian lasso model proposed in (2.7), the variational posterior for β¯¯𝛽\underline{\beta}under¯ start_ARG italic_β end_ARG and σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, is given by (for details, see [2])

    q(β¯,σ2)=𝒩(β¯mβ,σ2Cβ)Ga((σ2)1a0,b0)𝑞¯𝛽superscript𝜎2𝒩conditional¯𝛽subscript𝑚𝛽superscript𝜎2subscript𝐶𝛽𝐺𝑎conditionalsuperscriptsuperscript𝜎21subscript𝑎0subscript𝑏0\begin{split}q(\underline{\beta},\sigma^{2})&=\mathcal{N}(\underline{\beta}% \mid m_{\beta},\sigma^{2}\cdot C_{\beta})Ga((\sigma^{2})^{-1}\mid a_{0},b_{0})% \end{split}start_ROW start_CELL italic_q ( under¯ start_ARG italic_β end_ARG , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_CELL start_CELL = caligraphic_N ( under¯ start_ARG italic_β end_ARG ∣ italic_m start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⋅ italic_C start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT ) italic_G italic_a ( ( italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∣ italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_CELL end_ROW

    it is recognized that is a normal-gamma distribution with parameters:

    Cβ1=𝔼[Dτ1]+XtX,mβ=CβXty,aσ2=a0+12andbσ2=b0+12(ytymβtCβ1mβ).formulae-sequencesubscriptsuperscript𝐶1𝛽𝔼delimited-[]superscriptsubscript𝐷𝜏1superscript𝑋𝑡𝑋formulae-sequencesubscript𝑚𝛽subscript𝐶𝛽superscript𝑋𝑡𝑦formulae-sequencesubscript𝑎superscript𝜎2subscript𝑎012andsubscript𝑏superscript𝜎2subscript𝑏012superscript𝑦𝑡𝑦superscriptsubscript𝑚𝛽𝑡superscriptsubscript𝐶𝛽1subscript𝑚𝛽C^{-1}_{\beta}=\mathbb{E}\left[D_{\tau}^{-1}\right]+X^{t}X,\quad m_{\beta}=C_{% \beta}X^{t}y,\quad a_{\sigma^{2}}=a_{0}+\frac{1}{2}\quad\text{and}\quad b_{% \sigma^{2}}=b_{0}+\frac{1}{2}\left(y^{t}y-m_{\beta}^{t}C_{\beta}^{-1}m_{\beta}% \right).italic_C start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT = blackboard_E [ italic_D start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ] + italic_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_X , italic_m start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT = italic_C start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT italic_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_y , italic_a start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG and italic_b start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_y start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_y - italic_m start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT ) .

    the variational distribution for τjsubscript𝜏𝑗\tau_{j}italic_τ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, is given by q(τj)=𝒢𝒢(τjcτ,dτ,fτj)𝑞subscript𝜏𝑗𝒢𝒢conditionalsubscript𝜏𝑗subscript𝑐𝜏subscript𝑑𝜏subscript𝑓subscript𝜏𝑗q(\tau_{j})=\mathcal{GIG}(\tau_{j}\mid c_{\tau},d_{\tau},f_{\tau_{j}})italic_q ( italic_τ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) = caligraphic_G caligraphic_I caligraphic_G ( italic_τ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∣ italic_c start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ), where 𝒢𝒢𝒢𝒢\mathcal{GIG}caligraphic_G caligraphic_I caligraphic_G is a generalized inverse Gaussian distribution, with parameters cτ=12subscript𝑐𝜏12c_{\tau}=\frac{1}{2}italic_c start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG 2 end_ARG, dτ=2𝔼λ[λ]subscript𝑑𝜏2subscript𝔼𝜆delimited-[]𝜆d_{\tau}=2\mathbb{E}_{\lambda}\left[\lambda\right]italic_d start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT = 2 blackboard_E start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT [ italic_λ ] and fτj=𝔼σ2β[(σ2)1βj2].subscript𝑓subscript𝜏𝑗subscript𝔼superscript𝜎2𝛽delimited-[]superscriptsuperscript𝜎21superscriptsubscript𝛽𝑗2f_{\tau_{j}}=\mathbb{E}_{\sigma^{2}\beta}\left[(\sigma^{2})^{-1}\beta_{j}^{2}% \right].italic_f start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT = blackboard_E start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_β end_POSTSUBSCRIPT [ ( italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] . Finally, the variational distribution for λ𝜆\lambdaitalic_λ, is given by

    q(λ)=Ga(λaλ,bλ)𝑞𝜆𝐺𝑎conditional𝜆subscript𝑎𝜆subscript𝑏𝜆q(\lambda)=Ga(\lambda\mid a_{\lambda},b_{\lambda})italic_q ( italic_λ ) = italic_G italic_a ( italic_λ ∣ italic_a start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT )

    it is recognized that is a gamma distribution with parameters

    aλ=g0+pandbλ=h0+j=1p𝔼[τj].formulae-sequencesubscript𝑎𝜆subscript𝑔0𝑝andsubscript𝑏𝜆subscript0superscriptsubscript𝑗1𝑝𝔼delimited-[]subscript𝜏𝑗a_{\lambda}=g_{0}+p\quad\text{and}\quad b_{\lambda}=h_{0}+\sum_{j=1}^{p}% \mathbb{E}\left[\tau_{j}\right].italic_a start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT = italic_g start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_p and italic_b start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT = italic_h start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT blackboard_E [ italic_τ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] .
  • Automatic differentiation variational inference (ADVI): Implementing CAVI requires careful thought about the target distribution and choosing an appropriate variational family specific to the problem. Alternatively, [19] offer a way to automate variational inference. We will first assume all model parameters are continuous. In ADVI, the ELBO is first re-written as

    ELBO(y¯,ϕ):=𝔼q[lnp(y¯,T1(ζ),X)+ln|JT1(ζ)|lnq(ζ;ϕ)]assignELBO¯𝑦italic-ϕsubscript𝔼𝑞delimited-[]𝑝¯𝑦superscript𝑇1𝜁𝑋subscript𝐽superscript𝑇1𝜁𝑞𝜁italic-ϕ\operatorname{ELBO}(\underline{y},\phi):=\mathbb{E}_{q}\left[\ln p(\underline{% y},T^{-1}(\zeta),X)+\ln|J_{T^{-1}}(\zeta)|-\ln q(\zeta;\phi)\right]roman_ELBO ( under¯ start_ARG italic_y end_ARG , italic_ϕ ) := blackboard_E start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT [ roman_ln italic_p ( under¯ start_ARG italic_y end_ARG , italic_T start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_ζ ) , italic_X ) + roman_ln | italic_J start_POSTSUBSCRIPT italic_T start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_ζ ) | - roman_ln italic_q ( italic_ζ ; italic_ϕ ) ]

    Here, T𝑇Titalic_T is a function that transforms θ𝜃\thetaitalic_θ to ζ𝜁\zetaitalic_ζ, where ζdim(θ)𝜁superscript𝑑𝑖𝑚𝜃\zeta\in\mathbb{R}^{dim(\theta)}italic_ζ ∈ blackboard_R start_POSTSUPERSCRIPT italic_d italic_i italic_m ( italic_θ ) end_POSTSUPERSCRIPT. That is, T:support(θ)dim(θ):𝑇support𝜃superscript𝑑𝑖𝑚𝜃T:\operatorname{support}(\theta)\rightarrow\mathbb{R}^{dim(\theta)}italic_T : roman_support ( italic_θ ) → blackboard_R start_POSTSUPERSCRIPT italic_d italic_i italic_m ( italic_θ ) end_POSTSUPERSCRIPT, identified as ζ=T(θ)𝜁𝑇𝜃\zeta=T(\theta)italic_ζ = italic_T ( italic_θ ) and JT1(ζ)subscript𝐽superscript𝑇1𝜁J_{T^{-1}}(\zeta)italic_J start_POSTSUBSCRIPT italic_T start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_ζ ) is the Jacobian of the inverse of T. As all the model parameters ζ𝜁\zetaitalic_ζ have support on the real line, a suitable variational distribution for ζ𝜁\zetaitalic_ζ is a normal distribution. Using a multivariate Gaussian variational distribution q(ζ;ϕ)=N(ζ|m,LLt)𝑞𝜁italic-ϕ𝑁conditional𝜁𝑚𝐿superscript𝐿𝑡q(\zeta;\phi)=N(\zeta|m,LL^{t})italic_q ( italic_ζ ; italic_ϕ ) = italic_N ( italic_ζ | italic_m , italic_L italic_L start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) is specified for ζ𝜁\zetaitalic_ζ and the variational parameters are ϕ=(m,L)italic-ϕ𝑚𝐿\phi=(m,L)italic_ϕ = ( italic_m , italic_L ) enables us to compute the expectation and its gradient using a Monte Carlo estimate. Specifically, to estimate the ELBO, one can sample values from the variational distributions and evaluate the expression inside the expectation mentioned above. To maximize the ELBO, the gradient of the ELBO with respect to the variational parameters is required. That is

    ϕELBO(y¯,ϕ):=ϕ𝔼q[lnp(y¯,T1(ζ),X)+ln|JT1(ζ)|lnq(ζ;ϕ)]assignsubscriptitalic-ϕELBO¯𝑦italic-ϕsubscriptitalic-ϕsubscript𝔼𝑞delimited-[]𝑝¯𝑦superscript𝑇1𝜁𝑋subscript𝐽superscript𝑇1𝜁𝑞𝜁italic-ϕ\nabla_{\phi}\operatorname{ELBO}(\underline{y},\phi):=\nabla_{\phi}\mathbb{E}_% {q}\left[\ln p(\underline{y},T^{-1}(\zeta),X)+\ln|J_{T^{-1}}(\zeta)|-\ln q(% \zeta;\phi)\right]∇ start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT roman_ELBO ( under¯ start_ARG italic_y end_ARG , italic_ϕ ) := ∇ start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT [ roman_ln italic_p ( under¯ start_ARG italic_y end_ARG , italic_T start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_ζ ) , italic_X ) + roman_ln | italic_J start_POSTSUBSCRIPT italic_T start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_ζ ) | - roman_ln italic_q ( italic_ζ ; italic_ϕ ) ]

    Once again, we can assess this through Monte Carlo integration. However, computing the gradient of a random variate isn’t straightforward. Hence, it’s prudent to initially draw a standard normal random variable and then scale it by the variational standard deviation and mean. This way, we can incorporate the gradient within the expectation. To clarify further:

    ϕELBO(y¯,ϕ)1Ss=1Sϕ[lnp(y¯,T1(ζ),X)+ln|JT1(ζ)|lnq(ζ;ϕ)|ζ=m+Lϵ(s)]subscriptitalic-ϕELBO¯𝑦italic-ϕ1𝑆superscriptsubscript𝑠1𝑆subscriptitalic-ϕ𝑝¯𝑦superscript𝑇1𝜁𝑋subscript𝐽superscript𝑇1𝜁evaluated-at𝑞𝜁italic-ϕ𝜁𝑚𝐿superscriptitalic-ϵ𝑠\nabla_{\phi}\operatorname{ELBO}(\underline{y},\phi)\approx\frac{1}{S}\sum_{s=% 1}^{S}\nabla_{\phi}\left[\ln p(\underline{y},T^{-1}(\zeta),X)+\ln|J_{T^{-1}}(% \zeta)|-\ln q(\zeta;\phi)\textbar_{\zeta=m+L\epsilon^{(s)}}\right]∇ start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT roman_ELBO ( under¯ start_ARG italic_y end_ARG , italic_ϕ ) ≈ divide start_ARG 1 end_ARG start_ARG italic_S end_ARG ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT ∇ start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT [ roman_ln italic_p ( under¯ start_ARG italic_y end_ARG , italic_T start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_ζ ) , italic_X ) + roman_ln | italic_J start_POSTSUBSCRIPT italic_T start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_ζ ) | - roman_ln italic_q ( italic_ζ ; italic_ϕ ) | start_POSTSUBSCRIPT italic_ζ = italic_m + italic_L italic_ϵ start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ] (2.12)

    where ϵ(s)N(0,I),s=1,,S.formulae-sequencesimilar-tosuperscriptitalic-ϵ𝑠𝑁0𝐼𝑠1𝑆\epsilon^{(s)}\sim N(0,I),s=1,\ldots,S.italic_ϵ start_POSTSUPERSCRIPT ( italic_s ) end_POSTSUPERSCRIPT ∼ italic_N ( 0 , italic_I ) , italic_s = 1 , … , italic_S . One can also easily compute the stochastic gradient approximation of (2.12)

For the variational inference methods, convergence was determined by monitoring the ELBO. Specifically, convergence was achieved when the relative change in the ELBO between successive iterations fell below a predefined tolerance threshold. This criterion ensured that the optimization process had sufficiently stabilized, indicating that the variational approximation was close to the true posterior distribution (see [19]).

3 A simulation study for variable selection

In this section we analyse the performance of lasso regularization in the variable selection task. We include the results for classical lasso and Bayesian lasso methods (CAVI, ADVI and MCMC), with variable selection criteria CI and SN.

We suppose the experiments are governed by a reduced form of the quadratic mixture model, given by (2.1), with i=1,2,3𝑖123i=1,2,3italic_i = 1 , 2 , 3, p=1𝑝1p=1italic_p = 1 and t=2𝑡2t=2italic_t = 2. We consider a model where β¯=(α¯,δ¯,η¯)¯𝛽¯𝛼¯𝛿¯𝜂\underline{\beta}=(\underline{\alpha},\underline{\delta},\underline{\eta})under¯ start_ARG italic_β end_ARG = ( under¯ start_ARG italic_α end_ARG , under¯ start_ARG italic_δ end_ARG , under¯ start_ARG italic_η end_ARG ), for which we set α¯=δ¯=1¯¯𝛼¯𝛿¯1\underline{\alpha}=\underline{\delta}=\underline{1}under¯ start_ARG italic_α end_ARG = under¯ start_ARG italic_δ end_ARG = under¯ start_ARG 1 end_ARG and η¯=0¯¯𝜂¯0\underline{\eta}=\underline{0}under¯ start_ARG italic_η end_ARG = under¯ start_ARG 0 end_ARG, and then evaluate the performance of lasso to set η¯¯𝜂\underline{\eta}under¯ start_ARG italic_η end_ARG equal zero.

3.1 Data Generation

The primary predictors x1,x2,subscript𝑥1subscript𝑥2x_{1},x_{2},italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , and x3subscript𝑥3x_{3}italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT were generated under constraints to ensure their sum is 1. Specifically, x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and x2subscript𝑥2x_{2}italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT were drawn from uniform distributions U(0.2,0.8)𝑈0.20.8U(0.2,0.8)italic_U ( 0.2 , 0.8 ) and U(0.15,0.5)𝑈0.150.5U(0.15,0.5)italic_U ( 0.15 , 0.5 ) respectively, while x3subscript𝑥3x_{3}italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT was determined as 1x1x21subscript𝑥1subscript𝑥21-x_{1}-x_{2}1 - italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, ensuring it lies within the range [0.05,0.3]0.050.3[0.05,0.3][ 0.05 , 0.3 ].

Additional predictors w1,z1,subscript𝑤1subscript𝑧1w_{1},z_{1},italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , and z2subscript𝑧2z_{2}italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT were introduced to simulate the effects of process and noise variables. The variable w1subscript𝑤1w_{1}italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT was sampled from a binary distribution taking values 0.50.50.50.5 and 1111 with equal probability. Both z1subscript𝑧1z_{1}italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and z2subscript𝑧2z_{2}italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT were drawn from standard normal distributions.

The response variable Y𝑌Yitalic_Y was then generated based on the reduced model, with an added noise term ε𝜀\varepsilonitalic_ε drawn from a normal distribution with mean 0 and standard deviation σ=0.5𝜎0.5\sigma=0.5italic_σ = 0.5.

The implementation of the proposed Bayesian methods requires careful hyperparameter selection and convergence criteria. In this work, we used a prior distribution configuration that includes gamma distributions for ϕitalic-ϕ\phiitalic_ϕ and λ𝜆\lambdaitalic_λ, and exponential distributions for τ𝜏\tauitalic_τ. Initial values for the Markov chains were based on previous estimates obtained via ordinary least squares (OLS) to improve sampling efficiency. These criteria aim to ensure robust and accurate estimation of the model parameters.

3.2 Results

For each method, the frequency of variable selection across the 1000 simulations was recorded. Table 1 shows detailed results on the variable selection for the simulation study. In our simulation, methods with larger N(α)𝑁𝛼N(\alpha)italic_N ( italic_α ) and N(δ)𝑁𝛿N(\delta)italic_N ( italic_δ ) and smaller N(η)𝑁𝜂N(\eta)italic_N ( italic_η ) are considered to perform better.

The confusion matrices in Figure 1 underscore that CAVI outperforms other methods with the highest true positives and lowest false negatives, indicating superior parameter selection accuracy. Conversely, MCMC variants show notably poorer performance, highlighting their inefficacy in accurate parameter identification.

LASSO BL-MCMC BL-CAVI BL-ADVI
CI SN CI SN CI SN
N(α1)𝑁subscript𝛼1N(\alpha_{1})italic_N ( italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) 1.000 1.000 1.000 1.000 1.000 0.998 0.998
N(α2)𝑁subscript𝛼2N(\alpha_{2})italic_N ( italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) 1.000 0.905 0.995 1.000 1.000 0.998 0.998
N(α3)𝑁subscript𝛼3N(\alpha_{3})italic_N ( italic_α start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) 1.000 0.640 0.963 1.000 1.000 0.998 0.998
N(α12)𝑁subscript𝛼12N(\alpha_{12})italic_N ( italic_α start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT ) 1.000 0.527 0.850 1.000 1.000 0.991 0.995
N(α23)𝑁subscript𝛼23N(\alpha_{23})italic_N ( italic_α start_POSTSUBSCRIPT 23 end_POSTSUBSCRIPT ) 0.997 0.007 0.039 0.856 0.999 0.857 0.913
N(α13)𝑁subscript𝛼13N(\alpha_{13})italic_N ( italic_α start_POSTSUBSCRIPT 13 end_POSTSUBSCRIPT ) 0.023 0.507 0.633 0.958 1.000 0.947 0.960
N(δ11)𝑁subscript𝛿11N(\delta_{11})italic_N ( italic_δ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ) 1.000 1.000 1.000 1.000 1.000 0.998 0.998
N(δ21)𝑁subscript𝛿21N(\delta_{21})italic_N ( italic_δ start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT ) 1.000 0.699 0.964 1.000 1.000 0.998 0.998
N(δ31)𝑁subscript𝛿31N(\delta_{31})italic_N ( italic_δ start_POSTSUBSCRIPT 31 end_POSTSUBSCRIPT ) 1.000 0.438 0.949 0.955 1.000 0.994 0.997
N(δ121)𝑁subscript𝛿121N(\delta_{121})italic_N ( italic_δ start_POSTSUBSCRIPT 121 end_POSTSUBSCRIPT ) 0.051 0.489 0.607 0.992 1.000 0.977 0.986
N(δ231)𝑁subscript𝛿231N(\delta_{231})italic_N ( italic_δ start_POSTSUBSCRIPT 231 end_POSTSUBSCRIPT ) 0.096 0.003 0.028 0.394 0.986 0.763 0.852
N(δ131)𝑁subscript𝛿131N(\delta_{131})italic_N ( italic_δ start_POSTSUBSCRIPT 131 end_POSTSUBSCRIPT ) 0.215 0.006 0.099 0.709 0.999 0.901 0.941
N(η111)𝑁subscript𝜂111N(\eta_{111})italic_N ( italic_η start_POSTSUBSCRIPT 111 end_POSTSUBSCRIPT ) 0.002 0.017 0.121 0.000 0.101 0.100 0.283
N(η211)𝑁subscript𝜂211N(\eta_{211})italic_N ( italic_η start_POSTSUBSCRIPT 211 end_POSTSUBSCRIPT ) 0.001 0.004 0.022 0.000 0.084 0.092 0.210
N(η311)𝑁subscript𝜂311N(\eta_{311})italic_N ( italic_η start_POSTSUBSCRIPT 311 end_POSTSUBSCRIPT ) 0.000 0.004 0.010 0.000 0.040 0.047 0.158
N(η112)𝑁subscript𝜂112N(\eta_{112})italic_N ( italic_η start_POSTSUBSCRIPT 112 end_POSTSUBSCRIPT ) 0.003 0.016 0.203 0.003 0.122 0.095 0.268
N(η212)𝑁subscript𝜂212N(\eta_{212})italic_N ( italic_η start_POSTSUBSCRIPT 212 end_POSTSUBSCRIPT ) 0.001 0.006 0.094 0.002 0.101 0.075 0.216
N(η312)𝑁subscript𝜂312N(\eta_{312})italic_N ( italic_η start_POSTSUBSCRIPT 312 end_POSTSUBSCRIPT ) 0.000 0.007 0.047 0.000 0.052 0.051 0.157
N(η1211)𝑁subscript𝜂1211N(\eta_{1211})italic_N ( italic_η start_POSTSUBSCRIPT 1211 end_POSTSUBSCRIPT ) 0.002 0.005 0.036 0.004 0.092 0.035 0.132
N(η2111)𝑁subscript𝜂2111N(\eta_{2111})italic_N ( italic_η start_POSTSUBSCRIPT 2111 end_POSTSUBSCRIPT ) 0.001 0.002 0.015 0.000 0.054 0.000 0.063
N(η1311)𝑁subscript𝜂1311N(\eta_{1311})italic_N ( italic_η start_POSTSUBSCRIPT 1311 end_POSTSUBSCRIPT ) 0.002 0.003 0.013 0.001 0.042 0.018 0.084
N(η1212)𝑁subscript𝜂1212N(\eta_{1212})italic_N ( italic_η start_POSTSUBSCRIPT 1212 end_POSTSUBSCRIPT ) 0.001 0.007 0.098 0.002 0.110 0.031 0.124
N(η2112)𝑁subscript𝜂2112N(\eta_{2112})italic_N ( italic_η start_POSTSUBSCRIPT 2112 end_POSTSUBSCRIPT ) 0.003 0.005 0.062 0.000 0.048 0.006 0.059
N(η1312)𝑁subscript𝜂1312N(\eta_{1312})italic_N ( italic_η start_POSTSUBSCRIPT 1312 end_POSTSUBSCRIPT ) 0.002 0.007 0.059 0.000 0.064 0.022 0.101
Table 1: Frequency of retaining for the regression coefficients.
Refer to caption
Figure 1: The confusion matrices for lasso regularization methods.

Complementary, to evaluate the effectiveness of lasso and Bayesian lasso methods in the context of simulations, we propose the use of the balanced accuracy index (BAI), introduced by [6]. The BAI improves evaluation in contexts where variable selection proportions may be unbalanced, it is calculated as the average of the true positive rate and the true negative rate. Specifically, it is defined as:

BAI=12(TPTP+FN+TNTN+FP),𝐵𝐴𝐼12𝑇𝑃𝑇𝑃𝐹𝑁𝑇𝑁𝑇𝑁𝐹𝑃BAI=\frac{1}{2}\left(\frac{TP}{TP+FN}+\frac{TN}{TN+FP}\right),italic_B italic_A italic_I = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( divide start_ARG italic_T italic_P end_ARG start_ARG italic_T italic_P + italic_F italic_N end_ARG + divide start_ARG italic_T italic_N end_ARG start_ARG italic_T italic_N + italic_F italic_P end_ARG ) ,

where TP𝑇𝑃TPitalic_T italic_P is the number of truly non-zero parameters correctly selected, FP𝐹𝑃FPitalic_F italic_P indicates the number of truly zero parameters incorrectly selected, FN𝐹𝑁FNitalic_F italic_N represents the number of truly non-zero parameters incorrectly excluded, and TN𝑇𝑁TNitalic_T italic_N is the number of truly zero parameters correctly not selected.

Table 2 shows that, according to the BAI, CAVI-SN stands out as the most efficient method for variable selection in the context of simulations, closely followed by CAVI and ADVI with CI criterion, which also demonstrate high efficiency. The classical lasso and ADVI-SN methods show good performance, whereas MCMC methods, perform less effectively. These results suggest that Bayesian variants of lasso, particularly CAVI, may offer significant advantages in terms of variable selection performance, especially in datasets where the balance between sensitivity and specificity is important.

LASSO BL-MCMC BL-CAVI BL-ADVI
CI SN CI SN CI SN
BAI 0,849 0,756 0,806 0,952 0,961 0,952 0,907
Table 2: Comparison of methods performance by using the balanced accuracy index.

4 An example of soap production

We analysis the performance of the different lasso methods by applying to data from a soap processing plant, discussed in [13]. This scenario involves examining the output based on the soap mixture components (x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, x2subscript𝑥2x_{2}italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, x3subscript𝑥3x_{3}italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT) under specific constraints:

0.2x10.8,0.15x20.5,0.05x30.3,x1+x2+x3=1.formulae-sequence0.2subscript𝑥10.80.15subscript𝑥20.50.05subscript𝑥30.3subscript𝑥1subscript𝑥2subscript𝑥310.2\leq x_{1}\leq 0.8,\quad 0.15\leq x_{2}\leq 0.5,\quad 0.05\leq x_{3}\leq 0.% 3,\quad x_{1}+x_{2}+x_{3}=1.0.2 ≤ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ 0.8 , 0.15 ≤ italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ 0.5 , 0.05 ≤ italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ≤ 0.3 , italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = 1 . (4.1)

The process variables of interest include the mixing time (w1subscript𝑤1w_{1}italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT) and plodder temperature (z1subscript𝑧1z_{1}italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT), and the humidity (z2subscript𝑧2z_{2}italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT) with the two latter being harder to control and considered as noise.

Consider the model in (2.1) in its linear version. The matricial formulation is given by,

x¯=[x1x2x3],w=[w],z¯=[z1z2],V=[w00w],formulae-sequence¯𝑥matrixsubscript𝑥1subscript𝑥2subscript𝑥3formulae-sequence𝑤delimited-[]𝑤formulae-sequence¯𝑧matrixsubscript𝑧1subscript𝑧2𝑉matrix𝑤00𝑤\underline{x}=\begin{bmatrix}x_{1}\\ x_{2}\\ x_{3}\\ \end{bmatrix},\quad w=[w],\quad\underline{z}=\begin{bmatrix}z_{1}\\ z_{2}\\ \end{bmatrix},\quad V=\begin{bmatrix}w&0\\ 0&w\\ \end{bmatrix},under¯ start_ARG italic_x end_ARG = [ start_ARG start_ROW start_CELL italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] , italic_w = [ italic_w ] , under¯ start_ARG italic_z end_ARG = [ start_ARG start_ROW start_CELL italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] , italic_V = [ start_ARG start_ROW start_CELL italic_w end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL italic_w end_CELL end_ROW end_ARG ] , (4.2)

and

α¯=[α1α2α3],δ¯=[δ1δ2δ3],Δ=[γ11γ12γ21γ22γ31γ32] and H=[η11η12η21η22η31η32],formulae-sequence¯𝛼matrixsubscript𝛼1subscript𝛼2subscript𝛼3formulae-sequence¯𝛿matrixsubscript𝛿1subscript𝛿2subscript𝛿3Δmatrixsubscript𝛾11subscript𝛾12subscript𝛾21subscript𝛾22subscript𝛾31subscript𝛾32 and 𝐻matrixsubscript𝜂11subscript𝜂12subscript𝜂21subscript𝜂22subscript𝜂31subscript𝜂32\underline{\alpha}=\begin{bmatrix}\alpha_{1}\\ \alpha_{2}\\ \alpha_{3}\\ \end{bmatrix},\quad\underline{\delta}=\begin{bmatrix}\delta_{1}\\ \delta_{2}\\ \delta_{3}\\ \end{bmatrix},\quad\Delta=\begin{bmatrix}\gamma_{11}&\gamma_{12}\\ \gamma_{21}&\gamma_{22}\\ \gamma_{31}&\gamma_{32}\\ \end{bmatrix}\text{ and }H=\begin{bmatrix}\eta_{11}&\eta_{12}\\ \eta_{21}&\eta_{22}\\ \eta_{31}&\eta_{32}\\ \end{bmatrix},under¯ start_ARG italic_α end_ARG = [ start_ARG start_ROW start_CELL italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_α start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] , under¯ start_ARG italic_δ end_ARG = [ start_ARG start_ROW start_CELL italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_δ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] , roman_Δ = [ start_ARG start_ROW start_CELL italic_γ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_CELL start_CELL italic_γ start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_γ start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT end_CELL start_CELL italic_γ start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_γ start_POSTSUBSCRIPT 31 end_POSTSUBSCRIPT end_CELL start_CELL italic_γ start_POSTSUBSCRIPT 32 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] and italic_H = [ start_ARG start_ROW start_CELL italic_η start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_CELL start_CELL italic_η start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_η start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT end_CELL start_CELL italic_η start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_η start_POSTSUBSCRIPT 31 end_POSTSUBSCRIPT end_CELL start_CELL italic_η start_POSTSUBSCRIPT 32 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] , (4.3)

where,

Y=f(x,w,z)=x¯α¯+x¯δ¯w+x¯Δz¯+x¯HVz¯+ϵ𝑌𝑓𝑥𝑤𝑧superscript¯𝑥¯𝛼superscript¯𝑥¯𝛿𝑤superscript¯𝑥Δ¯𝑧superscript¯𝑥𝐻𝑉¯𝑧italic-ϵY=f(x,w,z)=\underline{x}^{\prime}\underline{\alpha}+\underline{x}^{\prime}% \underline{\delta}w+\underline{x}^{\prime}\Delta\underline{z}+\underline{x}^{% \prime}HV\underline{z}+\epsilonitalic_Y = italic_f ( italic_x , italic_w , italic_z ) = under¯ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT under¯ start_ARG italic_α end_ARG + under¯ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT under¯ start_ARG italic_δ end_ARG italic_w + under¯ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT roman_Δ under¯ start_ARG italic_z end_ARG + under¯ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_H italic_V under¯ start_ARG italic_z end_ARG + italic_ϵ (4.4)

Table 3 shows the estimation for OLS, proposed in [13], and the proposed regularization methods for all the 18 parameters in (4.3). It is worth noting that [13] uses OLS to fit the model, and then performs a statistical significance analysis by iteratively eliminating terms with p𝑝pitalic_p-values greater than 0.05 from the initial model, until all remaining terms are statistically significant. It is worth mentioning that classical lasso excludes more covariables than other methods. Besides, the CAVI Bayesian lasso fits a model with more covariables.

In Figure 2, we include the density and bloxplot for posterior distribution of parameter δ1subscript𝛿1\delta_{1}italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, which was eliminated by the BL-MCMC using CI criterion. Moreover, the parameters δ3subscript𝛿3\delta_{3}italic_δ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT and γ21subscript𝛾21\gamma_{21}italic_γ start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT, which were eliminated solely by the BL-ADVI using the CI criterion. Note that, the posterior densities for such parameters show that the MCMC has more variability, and CAVI is the most homogeneous one, concentrating the probability in a region that does not contain the value zero.

Given the model’s 18 parameters and the 40 observations, leave-one-out cross-validation (LOO CV) is particularly advantageous for evaluating our model ( see [27]). LOO CV optimizes data utilization and provides a precise, unbiased estimate of the generalization error, essential for reliable performance assessment in this context. Table 4 presents the LOO CV results for OLS, classical and Bayesian lasso regularization methods. These results demonstrate the comparative effectiveness of each method in minimizing the generalization error, highlighting the robustness of the Bayesian approaches, particularly the BL-CAVI method, which achieved the lowest LOO CV value.

Parameter OLS LASSO BL-MCMC BL-CAVI BL-ADVI
CI SN CI SN CI SN
α^1subscript^𝛼1\widehat{\alpha}_{1}over^ start_ARG italic_α end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT 1898.99 1928.32 1900.10 1899.02 1897.50
α^2subscript^𝛼2\widehat{\alpha}_{2}over^ start_ARG italic_α end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT 1626.42 1699.97 1624.27 1627.03 1624.75
α^3subscript^𝛼3\widehat{\alpha}_{3}over^ start_ARG italic_α end_ARG start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT 1537.79 1431.18 1541.43 1536.33 1535.70
δ^1subscript^𝛿1\widehat{\delta}_{1}over^ start_ARG italic_δ end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT 39.53 -    - 38.07 39.52 40.40
δ^2subscript^𝛿2\widehat{\delta}_{2}over^ start_ARG italic_δ end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT 285.90 170.58 288.79 285.08 284.81
δ^3subscript^𝛿3\widehat{\delta}_{3}over^ start_ARG italic_δ end_ARG start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT - - - 26.57 - 24.52
γ^11subscript^𝛾11\widehat{\gamma}_{11}over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT 9.45 6.59 9.42 9.45 9.45
γ^21subscript^𝛾21\widehat{\gamma}_{21}over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT - 2.47 - -2.08    - -1.74
γ^31subscript^𝛾31\widehat{\gamma}_{31}over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT 31 end_POSTSUBSCRIPT 34.60 - 31.95 34.16 33.99
γ^12subscript^𝛾12\widehat{\gamma}_{12}over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT - - - - -
γ^22subscript^𝛾22\widehat{\gamma}_{22}over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT -20.00 -16.87 -20.00 -20.00 -20.02
γ^32subscript^𝛾32\widehat{\gamma}_{32}over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT 32 end_POSTSUBSCRIPT - - - - -
η^11subscript^𝜂11\widehat{\eta}_{11}over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT 16.84 10.15 16.85 16.83 16.69
η^21subscript^𝜂21\widehat{\eta}_{21}over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT 39.17 20.43 37.45 38.91 38.62
η^31subscript^𝜂31\widehat{\eta}_{31}over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 31 end_POSTSUBSCRIPT -21.09 - -17.64 -20.55 -20.16
η^12subscript^𝜂12\widehat{\eta}_{12}over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT - - - - -
η^22subscript^𝜂22\widehat{\eta}_{22}over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT -25.00 -20.49 -24.99 -24.99 -25.02
η^32subscript^𝜂32\widehat{\eta}_{32}over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 32 end_POSTSUBSCRIPT - - - - -
Table 3: Estimated values for coefficients in (4.3).
Refer to caption
Refer to caption
Figure 2: Posterior densities and boxplots of parameters δ1subscript𝛿1\delta_{1}italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, δ3subscript𝛿3\delta_{3}italic_δ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT and γ21subscript𝛾21\gamma_{21}italic_γ start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT.
OLS LASSO BL-MCMC BL-CAVI BL-ADVI
CI SN CI SN CI SN
LOO CV 11.87 11.35 11.10 11.48 10.64 11.07 11.44
Table 4: Values of LOO-CV for OLS and Bayesian lasso versions (MCMC, CAVI and ADVI).

4.1 Optimization of the response surface by the desirability function

After fitting a combined mixture process-noise variable model, the final goal is to identify levels of the mixture components and controllable variables that simultaneously yield acceptable mean and variance response values. To address this optimization problem, we use the desirability function approach as outlined in (2.3) and (2.4). Following the methodology described by [13], we initially assume that the noise variables have zero-mean. Under this assumption, we utilize the delta method, applying a first-order Taylor series approximation around the mean of the noise variables. Therefore, the expected value and variance of Y𝑌Yitalic_Y are approximated as follows:

𝔼(Y)x¯α¯+x¯Aw¯,similar-to𝔼𝑌superscript¯𝑥¯𝛼superscript¯𝑥A¯𝑤\mathbb{E}(Y)\sim\underline{x}^{\prime}\underline{\alpha}+\underline{x}^{% \prime}\textbf{A}\underline{w},blackboard_E ( italic_Y ) ∼ under¯ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT under¯ start_ARG italic_α end_ARG + under¯ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT A under¯ start_ARG italic_w end_ARG , (4.5)

and

𝕍(Y)[ΛX¯+VΛX¯]ΣX[ΛX¯+VHX¯].similar-to𝕍𝑌superscriptdelimited-[]superscriptΛ¯𝑋superscript𝑉Λ¯𝑋subscriptΣ𝑋delimited-[]superscriptΛ¯𝑋superscript𝑉𝐻¯𝑋\mathbb{V}(Y)\sim\left[\Lambda^{{}^{\prime}}\underline{X}+V^{\prime}\Lambda% \underline{X}\right]^{{}^{\prime}}\Sigma_{X}\left[\Lambda^{{}^{\prime}}% \underline{X}+V^{\prime}H\underline{X}\right].blackboard_V ( italic_Y ) ∼ [ roman_Λ start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT under¯ start_ARG italic_X end_ARG + italic_V start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT roman_Λ under¯ start_ARG italic_X end_ARG ] start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT roman_Σ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT [ roman_Λ start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT under¯ start_ARG italic_X end_ARG + italic_V start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_H under¯ start_ARG italic_X end_ARG ] . (4.6)

These equations allow us to quantify how the proportions of the mixture components and the process conditions directly affect the variability and predictability of the response. Finally, for OLS and the proposed regularization methods, we show the optimal values in Table 5. It is evident that all the methods yield identical proportions for the mixture components, yet there are notable differences in the values of the process variable. In terms of the expected value of the response variable, the Bayesian methods utilizing the SN criterion perform better. However, when also considering the variance of the response variable, the CAVI approximation emerges as the superior choice, exhibiting the smallest coefficient of variation.

OLS LASSO BL-MCMC BL-CAVI BL-ADVI
CI SN CI SN CI SN
x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT 0.45 0.45 0.45 0.45 0.45
x2subscript𝑥2x_{2}italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT 0.50 0.50 0.50 0.50 0.50
x3subscript𝑥3x_{3}italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT 0.05 0.05 0.05 0.05 0.05
w𝑤witalic_w 0.85 0.91 0.89 0.90 0.87 0.91
μY^^subscript𝜇𝑌\widehat{\mu_{Y}}over^ start_ARG italic_μ start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT end_ARG 1881.28 1884.35 1872.76 1888.01 1901.13 1882.75 1890.29
σY^^subscript𝜎𝑌\widehat{\sigma_{Y}}over^ start_ARG italic_σ start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT end_ARG 15.59 15.78 15.62 15.62 15.50 15.66 15.61
CV𝐶𝑉CVitalic_C italic_V 0,00828 0,00837 0,00834 0,00827 0,00815 0,00831 0,008257
Table 5: Optimal values for OLS and Bayesian lasso versions (MCMC, CAVI and ADVI). CV = σY^/μY^^subscript𝜎𝑌^subscript𝜇𝑌\widehat{\sigma_{Y}}/\widehat{\mu_{Y}}over^ start_ARG italic_σ start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT end_ARG / over^ start_ARG italic_μ start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT end_ARG.

5 Conclusions

We proposed the use of classical and Bayesian lasso regularization for mixture experiments with noise variables. The model formulation was given in (2.1), where optimization was pursued through the desirability function method [11], aiming to simultaneously maximize the mean and minimize the variance of the response variable.

The findings from our study highlight the efficacy of proposed regularization techniques in the context of mixture experiments with noise variables. The comparative analysis, which included ordinary least squares (OLS), lasso, and various Bayesian lasso formulations (CAVI, ADVI, and MCMC), demonstrated the superior performance of the CAVI algorithm. CAVI consistently outperformed other methods in both simulation studies and real data applications, particularly in terms of variable selection accuracy and response surface optimization. In a practical application involving a soap processing plant, CAVI not only provided precise parameter estimates but also optimized the response, achieving higher expected values and lower variance. Bayesian lasso variants, especially CAVI and ADVI, proved advantageous over classical lasso in robustness and flexibility of parameter estimation. While MCMC-based methods were reliable, they faced challenges related to convergence and computational efficiency in high-dimensional spaces. Variational inference methods (CAVI and ADVI) offered efficient approximations to posterior distributions, significantly reducing computational time compared to MCMC. The application of Bayesian regularization techniques, particularly CAVI, enhances model selection, parameter estimation, and response optimization in complex systems influenced by both mixture components and process variables.

Acknowledgements

MGN was partially supported by Fondecyt Iniciación 11200500.

References

  • [1] Alhamzawi, R. & Taha Mohammad Ali, H. (2020). A new Gibbs sampler for Bayesian lasso. Commun. Stat. - Simul. 49(7), 1855-1871.
  • [2] Alves, L.C., Dias, R. & Migon, H.S. (2024) Variational Bayesian Lasso for spline regression. Comput. Stat. 39, 2039–2064.
  • [3] Azcarate, S.M., Pinto, L. & Goicoechea, H.C. (2020). Applications of mixture experiments for response surface methodology implementation in analytical methods development. J. Chemometrics, 34(12), e3246.
  • [4] Bishop, C.M.(2006) Pattern Recognition and Machine Learning. Springer.
  • [5] Blei, D.M., Kucukelbir, A. & McAuliffe, J.D. (2017). Variational Inference: A Review for Statisticians. J. Am. Stat. Assoc. 112(518), 859-877.
  • [6] Brodersen, K.H., Ong, C.S., Stephan, K.E. & Buhmann, J.M.(2010). The balanced accuracy and its posterior distribution. In 2010 20th international conference on pattern recognition (3121-3124). IEEE.
  • [7] Cornell, J.A. (2002) Experiments with Mixtures: Designs, Models, and the Analysis of Mixture Data (3rd ed.). John Wiley & Sons.
  • [8] Cornell, J.A. (2011) A Primer on Experiments with Mixtures. Wiley, Hoboken, NJ.
  • [9] Costa, N.R. & Lourenço, J. (2016). Multiresponse problems: desirability and other optimization approaches. J. Chemometrics, 30, 702-714.
  • [10] Costa, N.R. & Pereira, Z.L. (2010). Multiple response optimization: a global criterion-based method. J. Chemometrics, 24(6), 333-342.
  • [11] Derringer, G. & Suich, R. (1980). Simultaneous Optimization of Several Response Variables. J. Qual. Technol. 12(4), 214-219.
  • [12] Gelman, A. & Rubin, D.B. (1992). Inference from Iterative Simulation Using Multiple Sequences. Stat. Sci. 7(4), 457-472.
  • [13] Goldfarb, H.B., Borror, C.M. & Mongomery, D.C. (2003). Mixture-process variable experiments with noise variables. J. Qual. Technol. 35(4), 393-405.
  • [14] Guo, J., Gabry, J., Goodrich, B. & Weber, S.(2020). Package ‘rstan’. URL https://cran.r-project.org/web/packages/rstan/.
  • [15] Hans, C. (2009). Bayesian lasso regression. Biometrika, 96(4), 835-845.
  • [16] Hoffman, M.D. & Gelman, A. (2014) The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res., 15(1), 1593-1623.
  • [17] James, G., Witten, D., Hastie, T. & Tibshirani, R. (2021) An Introduction to Statistical Learning: With Applications in R. Springer New York, NY.
  • [18] Kettaneh-Wold, N. (1992) Analysis of mixture data with partial least squares. Chemom. Intell. Lab. Syst., 14(1-3), 57-69.
  • [19] Kucukelbir. A., Tran, D., Ranganath, R., Gelman, A. & Blei, D.M. (2017) Automatic differentiation variational inference. J. Mach. Learn. Res., 18, 1-45.
  • [20] Li, Q. & Lin, N. (2010) The Bayesian elastic net. Bayesian Anal. 5(1), 151-170.
  • [21] Mallick, H. & Yi, N. (2014). A new Bayesian lasso. Stat. Interface. 7(4), 571
  • [22] Muteki, K. & MacGregor, J.F. (2007). Sequential design of mixture experiments for the development of new products. J. Chemometrics, 21, 496-505.
  • [23] Park, T. & Casella, G. (2008). The Bayesian lasso. J. Am. Stat. Assoc. 103(482), 681-686.
  • [24] Taavitsainen, V.M., Lehtovaara, A. & Lähteenmäki, M. (2010). Response surfaces, desirabilities and rational functions in optimizing sugar production. J. Chemometrics, 24(7‐8), 505-513.
  • [25] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Methodol. 58(1), 267-288.
  • [26] Turkman, M.A.A., Paulino, C.D. & Müller, P. (2019) Computational Bayesian statistics: an introduction (Vol 11). Cambridge University Press.
  • [27] Wong, T.T. (2015). Performance evaluation of classification algorithms by k-fold and leave-one-out cross validation. Pattern Recognit. 48(9), 2839-2846.