Posterior Robustness with Milder Conditions: Contamination Models Revisited

Yasuyuki Hamura¹¹1Corresponding author. Graduate School of Economics, Kyoto University, Yoshida-Honmachi, Sakyo-ku, Kyoto, 606-8501, JAPAN.
E-Mail: yasu.stat@gmail.com, Kaoru Irie²²2Faculty of Economics, The University of Tokyo.
E-Mail: irie@e.u-tokyo.ac.jp and Shonosuke Sugasawa³³3Faculty of Economics, Keio University.
E-Mail: sugasawa@econ.keio.ac.jp

Abstract

Robust Bayesian linear regression is a classical but essential statistical tool. Although novel robustness properties of posterior distributions have been proved recently under a certain class of error distributions, their sufficient conditions are restrictive and exclude several important situations. In this work, we revisit a classical two-component mixture model for response variables, also known as contamination model, where one component is a light-tailed regression model and the other component is heavy-tailed. The latter component is independent of the regression parameters, which is crucial in proving the posterior robustness. We obtain new sufficient conditions for posterior (non-)robustness and reveal non-trivial robustness results by using those conditions. In particular, we find that even the Student- $t$ error distribution can achieve the posterior robustness in our framework. A numerical study is performed to check the Kullback-Leibler divergence between the posterior distribution based on full data and that based on data obtained by removing outliers.

Keywords: heavy-tailed distribution; posterior robustness; two-component mixture

Introduction

Bayesian posterior robustness (O’Hagan, 1979) and related topics have long been studied (e.g., West, 1984; Andrade and O’Hagan, 2006, 2011; O’Hagan and Pericchi, 2012). There, one of the most important objectives is to perform posterior analysis using moderate observations only and discarding outliers that are not related to the parameters of interest. Because the task of manually detecting or determining outliers is difficult in general, robust models are desired under which the effects of outliers are automatically removed.

Although many robust regression models have been proposed in the literature, few works (e.g., O’Hagan, 1979) have given theoretical justifications to those models. In fact, it is only recently that Desgagné (2013, 2015) and Gagnon et al. (2019) have proved posterior robustness for scale, location-scale, and regression models, respectively. Here, posterior densities are said to be robust if they converge to the corresponding conditional densities of parameters based only on non-outliers as the absolute values of outliers tend to infinity. Since then, posterior robustness has been established in various practically important settings; Hamura et al. (2022) obtained robustness results for regressions with shrinkage priors, whereas Hamura et al. (2021) considered a case of integer-valued observation.

In proving the posterior robustness, Gagnon et al. (2019) and Hamura et al. (2022) considered the following model; with observations $y_{1},\dots,y_{n}$ , $p$ -dimensional covariate vectors ${\text{\boldmath$x$}}_{1},\dots,{\text{\boldmath$x$}}_{n}$ , regression coefficients ${\text{\boldmath$\beta$}}\in\mathbb{R}^{p}$ and a scale parameter ${\sigma}\in(0,\infty)$ , they assume

\displaystyle y_{i}\sim f((y_{i}-{{\text{\boldmath$x$}}_{i}}^{\top}{\text{% \boldmath$\beta$}})/{\sigma})/{\sigma},\ \ \ \ i=1,\ldots,n,

(1)

for some error density $f$ and prior $({\text{\boldmath$\beta$}},{\sigma})\sim\pi({\text{\boldmath$\beta$}},{\sigma})$ . In their proof of posterior robustness, it is crucial to assume that $f$ is the log-regularly varying error density. A typical density tail of the log-regularly varying distributions is $f(y)\sim|y|^{-1}\{\log|y|\}^{-(\beta+1)}$ as $|y|\to\infty$ , where $\beta>0$ (For the rigorous definition, see Desgagné (2013)). This distribution has no finite moment and heavier density tails than those of the Student’s $t$ -distribution. If $f$ is the Student’s $t$ -distribution, the posterior is not robust (Gagnon and Hayashi, 2023). These theoretical findings imply the superiority of log-regularly varying error density to the Student’s $t$ -distributions. However, it has also been reported that the Student’s $t$ -error distribution is fairly competitive in posterior inference in several numerical studies (Hamura et al., 2022).

In this paper, we revisit the following classical two-component mixture regression model, also known as the contamination model:

\displaystyle y_{i}\sim(1-s)f_{0}((y_{i}-{{\text{\boldmath$x$}}_{i}}^{\top}{% \text{\boldmath$\beta$}})/{\sigma})/{\sigma}+sf_{1}(y_{i})\text{,}\ \ \ \ i=1,% \ldots,n,

(2)

where $({\text{\boldmath$\beta$}},{\sigma})\sim\pi({\text{\boldmath$\beta$}},{\sigma})$ and $s\in(0,1)$ is a prior probability that an observation becomes an outlier. The first density, $f_{0}$ , has thinner tails and is typically the standard normal distribution. The second density, $f_{1}$ , is a heavy-tailed distribution, such as Student’s $t$ -distribution, and expected to accommodate outliers. One notable feature of the above model is that the second term is completely independent of the parameters $({\text{\boldmath$\beta$}},{\sigma})$ . This is a significant difference from the classical two-component mixtures in Box and Tiao (1968) and subsequent research (Tak et al., 2019; Silva et al., 2020), where the second component is also scaled by observational standard error $\sigma$ . Scaling the second component by $\sigma$ is reasonable in terms of data fit, but could affect the inference on $\sigma$ in the presence of outliers. This observation motivates our research on the above model.

Under the model (2), we show that the posterior is robust if $\pi({\sigma})$ , the marginal prior for ${\sigma}$ , has tails sufficiently lighter than those of the error density $f_{1}$ . When $f_{1}$ is log-regularly varying, then most of prior distributions can satisfy this sufficient condition for robustness. Furthermore, we prove that the sufficient condition on the tails of $\pi({\sigma})$ is “nearly” necessary as well; if the error distribution is not log-regularly varying and has lighter tails than $\pi({\sigma})$ , then the posterior is not robust. With these conditions, we can identify the posterior (non)-robustness for most of the error and prior distributions used in the regression models.

Our result can also explain the gap between the non-robustness of the Student $t$ -distribution in model (1) and its success in posterior inference in numerical studies. For simplicity, assume that only the first observation, $y_{1}$ , is outlying and let $|y_{1}|\to\infty$ . Then, under the model (1) with $f(y)\propto|y|^{-1-{\alpha}}$ as $|y|\to\infty$ for ${\alpha}>0$ (Student’s $t$ -distribution with ${\alpha}$ degree-of-freedom), it holds that

	$\displaystyle p({\text{\boldmath$\beta$}},{\sigma}\|{\text{\boldmath$y$}})$	$\displaystyle\propto\pi({\text{\boldmath$\beta$}},{\sigma}){f((y_{1}-{{\text{% \boldmath$x$}}_{1}}^{\top}{\text{\boldmath$\beta$}})/{\sigma})\over f(y_{1}){% \sigma}}\prod_{i=2}^{n}{f((y_{i}-{{\text{\boldmath$x$}}_{i}}^{\top}{\text{% \boldmath$\beta$}})/{\sigma})\over{\sigma}}{}$
		$\displaystyle\to\pi({\text{\boldmath$\beta$}},{\sigma}){\sigma}^{{\alpha}}% \prod_{i=2}^{n}{f((y_{i}-{{\text{\boldmath$x$}}_{i}}^{\top}{\text{\boldmath$% \beta$}})/{\sigma})\over{\sigma}}{}$

as $|y_{1}|\to\infty$ . This limit is the product of the posterior density without $y_{1}$ and factor ${\sigma}^{{\alpha}}$ . In other words, the Student’s $t$ -distribution can never achieve the posterior robustness. By contrast, under the model (2), we have

	$\displaystyle p({\text{\boldmath$\beta$}},{\sigma}\|{\text{\boldmath$y$}})$	$\displaystyle\propto\pi({\text{\boldmath$\beta$}},{\sigma})\Big{\{}{1-s\over s% }{f_{0}((y_{1}-{{\text{\boldmath$x$}}_{1}}^{\top}{\text{\boldmath$\beta$}})/{% \sigma})\over{\sigma}f_{1}(y_{1})}+1\Big{\}}\prod_{i=2}^{n}\Big{\{}(1-s){f_{0}% ((y_{i}-{{\text{\boldmath$x$}}_{i}}^{\top}{\text{\boldmath$\beta$}})/{\sigma})% \over{\sigma}}+sf_{1}(y_{i})\Big{\}}{}$
		$\displaystyle\to\pi({\text{\boldmath$\beta$}},{\sigma})\prod_{i=2}^{n}\Big{\{}% (1-s){f_{0}((y_{i}-{{\text{\boldmath$x$}}_{i}}^{\top}{\text{\boldmath$\beta$}}% )/{\sigma})\over{\sigma}}+sf_{1}(y_{i})\Big{\}}{}$

as $|y_{1}|\to\infty$ , provided that $f_{1}$ has sufficiently heavier tails than $f_{0}$ (For the rigorous proof, including the computation of the ignored normalizing constant, see the proof of Theorem 1 in the Supplementary Materials). This is precisely the posterior without $y_{1}$ , for which we confirm the posterior robustness. Also, note that $f_{1}$ can be the Student’s $t$ -distribution but still can achieve the posterior robustness under this model. The main difference from the model (1) is that the second component $f_{1}$ of (2) does not involve the parameters $({\text{\boldmath$\beta$}},\sigma)$ . Thanks to this difference, outliers are not linked to the parameters in this model and therefore have no effects on the posterior distribution of $({\text{\boldmath$\beta$}},{\sigma})$ , as long as $f_{1}$ has heavier tails than $f_{0}$ . This observation applies to the general case of multiple outliers, as will be seen below.

The remainder of this paper is organized as follows. In Section 2, sufficient conditions and necessary conditions for posterior robustness are given. In Section 3, a numerical example is given, in which we see that the Kullback-Leibler divergence between the target and available posteriors can diverge or converge to $0$ in some cases. Proofs are given in the Supplementary Material.

Contamination Models and Posterior Robustness

Suppose that we observe

\displaystyle y_{i}\sim(1-s){\rm{N}}(y_{i}|{{\text{\boldmath$x$}}_{i}}^{\top}{% \text{\boldmath$\beta$}},{\sigma}^{2})+sf_{1}(y_{i}){}

for $i=1,\dots,n$ , where ${\text{\boldmath$x$}}=({{\text{\boldmath$x$}}_{i}}^{\top})_{i=1}^{n}$ is a set of continuous explanatory variables and where ${\text{\boldmath$\beta$}}=({\beta}_{k})_{k=1}^{p}\in\mathbb{R}^{p}$ and ${\sigma}\in(0,\infty)$ are parameters of interest following a prior distribution $\pi({\text{\boldmath$\beta$}},{\sigma})$ . Here, $f_{1}(\cdot)$ is an error density, and $s\in(0,1)$ is a prior probability that observation is generated from $f_{1}$ .

Following the work of Desgagné (2015), let ${\cal K},{\cal L}\subset\{1,\dots,n\}$ satisfy ${\cal K}\cup{\cal L}=\{1,\dots,n\}$ , ${\cal K}\cap{\cal L}=\emptyset$ , and ${\cal K},{\cal L}\neq\emptyset$ . Suppose that $a_{i}\in\mathbb{R}$ , $b_{i}\neq 0$ , and $y_{i}=a_{i}+b_{i}{\omega}$ , ${\omega}\to\infty$ , for $i\in{\cal L}$ , such that ${\cal L}$ represents the set of indices of outlying observations. We say that the posterior is robust to outliers under the above model if $p({\text{\boldmath$\beta$}},{\sigma}|{\text{\boldmath$y$}})\to p({\text{% \boldmath$\beta$}},{\sigma}|{\text{\boldmath$y$}}_{{\cal K}})$ as ${\omega}\to\infty$ , where ${\text{\boldmath$y$}}=(y_{i})_{i=1}^{n}$ , ${\text{\boldmath$y$}}_{{\cal K}}=(y_{i})_{i\in{\cal K}}$ , and ${\text{\boldmath$y$}}_{{\cal L}}=(y_{i})_{i\in{\cal L}}$ .

To derive conditions for posterior robustness, we limit the class of prior distributions for $({\text{\boldmath$\beta$}},{\sigma})$ . Suppose that

\displaystyle\pi({\text{\boldmath$\beta$}}|{\sigma})={\pi({\text{\boldmath$% \beta$}},{\sigma})\over\pi({\sigma})}\leq M\prod_{k=1}^{p}\Big{\{}{1\over{% \sigma}}{(|{\beta}_{k}|/{\sigma})^{{\kappa}-1}\over(1+|{\beta}_{k}|/{\sigma})^% {{\kappa}+\nu}}\Big{\}},

(3)

for some $\nu>0$ , $0<{\kappa}\leq 1$ and $M>0$ , where $\pi({\sigma})=\int_{\mathbb{R}^{p}}\pi({\text{\boldmath$\beta$}},{\sigma})d{% \text{\boldmath$\beta$}}$ . That is, the ratio of the prior density and some double-sided scaled-beta density (with spike at the origin) must be bounded uniformly by some constant. This condition is satisfied by most of the conditionally independent priors that are commonly used in practice. Examples include shrinkage priors, such as the horseshoe prior (Carvalho et al., 2009, 2010), as well as the normal priors. The condition is also satisfied by some multivariate priors for dependent $\beta$ , including the multivariate normal prior.

Likewise, we assume the error distributions, $f_{1}$ , are bounded as

\displaystyle f_{1}(y)

\displaystyle\geq{1\over M^{\prime}}{1\over(1+|y|)^{1+{\alpha}}}{1\over\{1+% \log(1+|y|)\}^{1+{\gamma}}},

(4)

for some ${\alpha}\geq 0$ , ${\gamma}\geq-1$ and $M^{\prime}>0$ . The class of distributions that satisfy this condition includes Student’s $t$ -distributions ( ${\alpha}>0$ and ${\gamma}=-1$ ) and log-regularly varying distributions ( ${\alpha}=0$ and ${\gamma}>0$ ).

The following theorem gives a sufficient condition for the posterior to be robust.

Theorem 1.

Suppose that conditions (3) and (4) are satisfied for $\nu>{\alpha}$ . Also, suppose that

\displaystyle E[{\sigma}^{|{\cal L}|{\alpha}+\rho}]<\infty

(5)

for some $\rho>0$ . Then the posterior is robust to outliers under our model; that is, we have

\displaystyle\lim_{{\omega}\to\infty}p({\text{\boldmath$\beta$}},{\sigma}|{% \text{\boldmath$y$}})=p({\text{\boldmath$\beta$}},{\sigma}|{\text{\boldmath$y$% }}_{{\cal K}}){}

at each $({\text{\boldmath$\beta$}},{\sigma})\in\mathbb{R}^{p}\times(0,\infty)$ .

The moment condition for $\pi({\sigma})$ in (5) could be a strong requirement when $\alpha>0$ and $|{\cal L}|$ is large. We will compare this condition with those in the literature later in Table 2. Next, we prove that the posterior robustness does not hold if this moment condition is not satisfied, in addition that the error density tails are not sufficiently heavily tailed.

Theorem 2.

Let $h\colon\mathbb{R}^{p}\to(0,\infty)$ be a probability density and suppose that $\pi({\text{\boldmath$\beta$}}|{\sigma})=h({\text{\boldmath$\beta$}}/{\sigma})/% {\sigma}^{p}$ . Let ${\alpha}>0$ and suppose that

\displaystyle f_{1}(y)

\displaystyle\leq M^{\prime}{1\over(1+|y|)^{1+{\alpha}}}{}

for all $y\in\mathbb{R}$ for some $M^{\prime}>0$ . Suppose that

\displaystyle\pi({\sigma})\geq(1/{\widetilde{M}})/{\sigma}^{|{\cal L}|{\alpha}% +1-\rho}

(6)

for all ${\sigma}>1$ for some ${\widetilde{M}}>0$ and $0<\rho<1$ . Then we have

\displaystyle\lim_{{\omega}\to\infty}p({\text{\boldmath$\beta$}},{\sigma}|{% \text{\boldmath$y$}})=0{}

at each $({\text{\boldmath$\beta$}},{\sigma})\in\mathbb{R}^{p}\times(0,\infty)$ .

Clearly, under the assumptions of Theorem 2, the posterior does not converge in the usual sense. Indeed, we see in the next section that the Kullback-Leibler divergence between $p({\text{\boldmath$\beta$}},{\sigma}|{\text{\boldmath$y$}}_{{\cal K}})$ and $p({\text{\boldmath$\beta$}},{\sigma}|{\text{\boldmath$y$}})$ diverges in such a situation.

From Theorems 1 and 2, we can determine whether a prior $\pi({\sigma})$ yields a robust posterior or not in most cases. Suppose that ${\text{\boldmath$\beta$}}/{\sigma}$ and ${\sigma}$ are independent (e.g., ${\text{\boldmath$\beta$}}|\sigma\sim N(0,\sigma^{2})$ and $\sigma\sim\pi(\sigma)$ ) and that (3) holds. Suppose that equality holds in (4). Then, if we use a gamma prior for ${\sigma}^{2}$ , the moment condition in (5) is always satisfied; hence the posterior is robust regardless of the choice of ${\alpha}$ . If we use an inverse gamma prior or a scaled beta prior for ${\sigma}^{2}$ , either (5) or (6) is satisfied, depending on the hyperparameters. That is, there exists a threshold separating robust and non-robust cases. These observations are summarized in Table 1.

Table 1: Priors and conditions in Theorems 1 and 2

$\mathrm{IG}(A,B)$	$(1/{\sigma}^{2A+1})\exp(-B/{\sigma}^{2})$	$2A>\|{\cal L}\|{\alpha}$	$2A<\|{\cal L}\|{\alpha}$
Prior for $\sigma^{2}$	Density $\pi({\sigma})d{\sigma}$	Condition (5)	Condition (6)
Prior for $\sigma^{2}$	Density $\pi({\sigma})d{\sigma}$	for robustness	for non-robustness
Inverse-gamma:	$(1/{\sigma}^{2A+1})\exp(-B/{\sigma}^{2})$	$2A>\|{\cal L}\|{\alpha}$	$2A<\|{\cal L}\|{\alpha}$
Gamma: $\mathrm{Ga}(C,D)$	${\sigma}^{2C-1}\exp(-D{\sigma}^{2})$	✓	NA
Scaled-beta:	${\sigma}^{2E-1}/(1+{\sigma}^{2})^{E+F}$	$2F>\|{\cal L}\|{\alpha}$	$2F<\|{\cal L}\|{\alpha}$
$\mathrm{SB}(E,F)$	${\sigma}^{2E-1}/(1+{\sigma}^{2})^{E+F}$	$2F>\|{\cal L}\|{\alpha}$	$2F<\|{\cal L}\|{\alpha}$

The sufficient conditions obtained in this study differ from those in Gagnon et al. (2019) and Hamura et al. (2022) not only in the model specification given in (1) and (2) but also in the requirement of the error and prior densities. Table 2 summarizes the sufficient conditions for posterior robustness in the literature and Theorem 1. As pointed out in the introduction, $f_{1}$ in our model does not have to be log-regular varying to achieve the posterior robustness, which is significantly different from the settings in the literature. Instead, at the cost of allowing for a wider class of error distributions for $f_{1}$ , more constraints on the choice of priors for $\sigma$ are needed for the proof of Theorem 1. Consequently, the conditions used in the literature and Theorem 1 are not nested in one another. For example, the conditions in Gagnon et al. (2019) cover the improper prior for $({\text{\boldmath$\beta$}},{\sigma})$ .

It is also worth emphasizing that, as clarified in Table 2, no assumption is made directly on $|{\cal L}|$ , the number of outliers, in Theorem 1. Note that this number is defined by the residuals; $|y_{i}-{{\text{\boldmath$x$}}_{i}}^{\top}{\text{\boldmath$\beta$}}|$ is outlying for $i\in{\cal L}$ and close to zero for $i\in{\cal K}$ . The key result that enables the proof without any assumption on $|{\cal L}|$ is the lemma we obtained about the residuals; for details, see Lemma 1 in the Supplemetary Materials.

Table 2: Sufficient conditions of model components for robustness

	Number of	Error density	Prior density $\pi({\text{\boldmath$\beta$}},{\sigma})$
	outliers $\|{\cal L}\|$	tails ( $f$ or $f_{1}$ )	Density bounds	Moments	Improper
Gagnon et al.	$\|{\cal K}\|\geq\|{\cal L}\|+2p-1$	LRVD	$\max\{1,1/{\sigma}\}$	–	✓
(2019)	$\|{\cal K}\|\geq\|{\cal L}\|+2p-1$	LRVD	$\max\{1,1/{\sigma}\}$	–	✓
Hamura et al.	$\|{\cal K}\|\geq\|{\cal L}\|+p$	LRVD	$\displaystyle\sup_{t\in\mathbb{R}}\|t\|\pi_{{\beta}}(t)<\infty$	$E[{\sigma}^{-n}]<\infty$	NA
(2022)	$\|{\cal K}\|\geq\|{\cal L}\|+p$	LRVD		$E[{\sigma}^{-n}]<\infty$	NA
Theorem 1	Not needed	${1\over(1+\|y\|)^{1+{\alpha}}}$	$\prod\limits_{k=1}^{p}{(\|{\beta}_{k}\|/{\sigma})^{{\kappa}-1}/{\sigma}\over(1+\|% {\beta}_{k}\|/{\sigma})^{{\kappa}+\nu}}$	$E[{\sigma}^{\|{\cal L}\|{\alpha}+\rho}]<\infty$	NA
of this study	Not needed	${1\over(1+\|y\|)^{1+{\alpha}}}$		$E[{\sigma}^{\|{\cal L}\|{\alpha}+\rho}]<\infty$	NA

Numerical Examples

Here, we consider a numerical example to illustrate the property of the posterior (non)-robustness. In doing so, we numerically evaluated the Kullback-Leibler (KL) divergence of the target posterior distribution $p({\text{\boldmath$\beta$}},{\sigma}|{\text{\boldmath$y$}}_{{\cal K}})$ from the available posterior distribution $p({\text{\boldmath$\beta$}},{\sigma}|{\text{\boldmath$y$}})$ , or ${\rm{KL}}=\int_{\mathbb{R}^{p}\times(0,\infty)^{p}}p({\text{\boldmath$\beta$}}% ,{\sigma}|{\text{\boldmath$y$}}_{{\cal K}})[\log\{p({\text{\boldmath$\beta$}},% {\sigma}|{\text{\boldmath$y$}}_{{\cal K}})/p({\text{\boldmath$\beta$}},{\sigma% }|{\text{\boldmath$y$}})\}]d({\text{\boldmath$\beta$}},{\sigma})$ , as well as the point estimates of parameters and predictive intervals. We used the conjugate normal-inverse gamma prior $\pi({\text{\boldmath$\beta$}},{\sigma})\propto(1/{\sigma}^{2A+1})\exp(-B/{% \sigma}^{2})\times(1/{\sigma}^{p})\exp\{-\|{\text{\boldmath$\beta$}}\|^{2}/(2C% ^{2}{\sigma}^{2})\}$ , where $A,B,C>0$ . Under this prior, the posterior becomes a finite mixture of known distributions and analytically and numerically tractable. We considered the following two error densities:

\displaystyle f_{1}^{\rm{light}}(y)={{\alpha}/2\over(1+|y|)^{1+{\alpha}}}\text% {,}\quad y\in\mathbb{R}\text{,}\quad\text{and}\quad f_{1}^{\rm{heavy}}(y)={{% \gamma}/2\over 1+|y|}{1\over\{1+\log(1+|y|)\}^{1+{\gamma}}}\text{,}\quad y\in% \mathbb{R}\text{,}{}

where ${\alpha},{\gamma}>0$ . The first error distribution, $f_{1}^{\rm{light}}$ , is the double-sided scale-beta distribution, whose tail behavior is equivalent to that of Student’s $t$ -distribution. The second error distribution, $f_{1}^{\rm{heavy}}$ , is the unfolded version of the log-Pareto distribution of Cormann and Reiss (2009).

As an example, we deterministically created the dataset as

\displaystyle\begin{pmatrix}{\text{\boldmath$x$}}_{1}&\cdots&{\text{\boldmath$% x$}}_{5}\end{pmatrix}

\displaystyle=\begin{pmatrix}1&1&1&1&1\\ 1&2&3&4&5\end{pmatrix},\ \ \ \ \ \ \bm{\epsilon}^{\top}=\begin{pmatrix}0.1&-0.% 1&0.1&-0.1&\omega\end{pmatrix},\ \ \ \ \ \ \bm{\beta}^{\top}=\begin{pmatrix}1&% 1\end{pmatrix}\text{,}{}

and ${\text{\boldmath$y$}}={\text{\boldmath$x$}}^{\top}\bm{\beta}+\bm{\epsilon}$ . In this example, where $n=5$ and $p=2$ , we considered $\omega\in\{10^{-1},10^{0},10^{1},10^{2},10^{3}\}$ . In computing the KL divergence, the fifth observation with $\omega$ is viewed as an outlier; ${\cal K}=\{1,\dots,4\}$ and ${\cal L}=\{5\}$ . Our experiment includes the case of $\omega=0.1$ to see the performance of the robust model in the absence of outliers. For the prior, we set ${\alpha}=3$ , ${\gamma}=3/2$ , $B=C=1$ and $s=1/10$ , and we considered the two cases $A=1/10$ and $A=2$ . Combining the two priors with the two error distributions $f_{1}^{\rm{light}}$ and $f_{1}^{\rm{heavy}}$ , we have four models in total.

First, we obtained the Monte Carlo approximation of the KL divergence by using 1,000 samples from the posterior distributions. The result is summarized in the left panel of Figure 1. It is clearly seen that the KL divergence does not decrease when $f_{1}=f_{1}^{\rm{light}}$ and $A=1/10$ , since the condition of Theorem 2 is satisfied and the posterior is not convergent. In the other three cases, where the sufficient condition of Theorem 1 is satisfied, the KL divergence converges to $0$ as $\omega\to\infty$ .

In addition, we computed the posterior means of ${\beta}_{2}$ and $\sigma$ in each scenario, which are shown in the middle and right panels of Figure 1, respectively. The point estimates of ${\beta}_{2}$ and $\sigma$ are stable regardless of the value of $\omega$ in the three cases where the posterior robustness holds. It should also be noted that the difference of the point estimates with and without outliers (say, $\omega\geq 10^{2}$ and $\omega=10^{-1}$ ) is small under the posterior robustness. In contrast, the point estimates become unreasonable as $\omega$ increases when $f_{1}=f_{1}^{\rm{light}}$ and $A=1/10$ .

Refer to caption — Figure 1: The KL divergence between $p({\text{\boldmath$\beta$}},{\sigma}|{\text{\boldmath$y$}}_{{\cal K}})$ and $p({\text{\boldmath$\beta$}},{\sigma}|{\text{\boldmath$y$}})$ (left), posterior means of ${\beta}_{2}$ (center) and $\sigma$ (right) under ${\omega}=10^{-1},10^{0},10^{1},\ldots,10^{5}$ .

Next, under the same setting, we computed the posterior and predictive distributions of ${\beta}_{1}+{\tilde{x}}_{2}{\beta}_{2}$ and ${\tilde{y}}\sim(1-s)f_{0}(\{{\tilde{y}}-({\beta}_{1}+{\tilde{x}}_{2}{\beta}_{2% })\}/{\sigma})/{\sigma}+sf_{1}({\tilde{y}})$ given $y$ with ${\omega}\in\{10^{-1},10^{2}\}$ When $f_{1}=f_{1}^{\rm{light}}$ and $A=1/10$ , the credible intervals become extremely wide since the posterior robustness does not hold and the posterior of $({\text{\boldmath$\beta$}},{\sigma})$ converges to zero. When $A=2$ and the posterior robustness holds even for $f_{1}^{\rm{light}}$ , the lengths of the interval estimates become reasonable. The interval lengths obtained under $f_{1}^{\rm{light}}$ and $f_{1}^{\rm{heavy}}$ are similar but slightly different, reflecting the difference between their error density tails.

Table 3: The lengths of

95\%

credible and prediction intervals of

\beta_{1}+\tilde{x}\beta_{2}

and

\tilde{y}

, respectively, evaluated at

\tilde{x}=2

and

4

. Here we consider

\omega=10^{-1}

and

10^{2}

	$\beta_{1}+\tilde{x}\beta_{2}$ (regression value at $\tilde{x}$ )				$\tilde{y}$ (unobserved data at $\tilde{x}$ )
$(\omega,\tilde{x})$	$(10^{-1},2)$	$(10^{-1},4)$	$(10^{2},2)$	$(10^{2},4)$	$(10^{-1},2)$	$(10^{-1},4)$	$(10^{2},2)$	$(10^{2},4)$
heavy ( $A=1/10$ )	1.87	2.27	2.35	3.97	5.91	8.00	6.96	9.21
heavy ( $A=2$ )	1.32	1.62	1.35	2.37	4.63	6.84	5.02	7.09
light ( $A=1/10$ )	1.90	2.27	60.7	82.8	5.13	7.42	145	157
light ( $A=2$ )	1.24	1.56	1.44	2.39	4.51	6.78	4.48	6.91

Acknowledgments

Research of the authors was supported in part by JSPS KAKENHI Grant Number 22K20132, 19K11852, 17K17659, and 21H00699 from Japan Society for the Promotion of Science.

References

Andrade and O’Hagan (2006) Andrade, J. A. A. and A. O’Hagan (2006). Bayesian robustness modeling using regularly varying distributions. Bayesian Analysis 1(1), 169–188.
Andrade and O’Hagan (2011) Andrade, J. A. A. and A. O’Hagan (2011). Bayesian robustness modelling of location and scale parameters. Scandinavian Journal of Statistics 38(4), 691–711.
Box and Tiao (1968) Box, G. E. and G. C. Tiao (1968). A bayesian approach to some outlier problems. Biometrika 55(1), 119–129.
Carvalho et al. (2009) Carvalho, C. M., N. G. Polson, and J. G. Scott (2009). Handling sparsity via the horseshoe. In AISTATS, Volume 5, pp. 73–80.
Carvalho et al. (2010) Carvalho, C. M., N. G. Polson, and J. G. Scott (2010). The horseshoe estimator for sparse signals. Biometrika 97(2), 465–480.
Cormann and Reiss (2009) Cormann, U. and R.-D. Reiss (2009). Generalizing the pareto to the log-pareto model and statistical inference. Extremes 12(1), 93–105.
Desgagné (2013) Desgagné, A. (2013). Full robustness in bayesian modelling of a scale parameter. Bayesian Analysis 8, 187–220.
Desgagné (2015) Desgagné, A. (2015). Robustness to outliers in location–scale parameter model using log-regularly varying distributions. The Annals of Statistics 43(4), 1568–1595.
Gagnon et al. (2019) Gagnon, P., P. Desgagne, and M. Bedard (2019). A new bayesian approach to robustness against outliers in linear regression. Bayesian Analysis 15(2), 389–414.
Gagnon and Hayashi (2023) Gagnon, P. and Y. Hayashi (2023). Theoretical properties of bayesian student- $t$ linear regression. Statistics and Probability Letters 193.
Hamura et al. (2021) Hamura, Y., K. Irie, and S. Sugasawa (2021). Robust hierarchical modeling of counts under zero-inflation and outliers. arXiv preprint arXiv:2106.10503.
Hamura et al. (2022) Hamura, Y., K. Irie, and S. Sugasawa (2022). Log-regularly varying scale mixture of normals for robust regression. Computational Statistics & Data Analysis 173, 107517.
O’Hagan (1979) O’Hagan, A. (1979). On outlier rejection phenomena in bayes inference. Journal of the Royal Statistical Society: Series B 41(3), 358–367.
O’Hagan and Pericchi (2012) O’Hagan, A. and L. Pericchi (2012). Bayesian heavy-tailed models and conflict resolution: A review. Brazilian Journal of Probability and Statistics 26, 372–401.
Silva et al. (2020) Silva, N., M. Prates, and F. Gonccalves (2020). Bayesian linear regression models with flexible error distributions. Journal of Statistical Computation and Simulation 90, 2571–2591.
Tak et al. (2019) Tak, H., J. A. Ellis, and S. K. Ghosh (2019). Robust and accurate inference via a mixture of gaussian and student’st errors. Journal of Computational and Graphical Statistics 28(2), 415–426.
West (1984) West, M. (1984). Outlier models and prior distributions in bayesian linear regression. Journal of the Royal Statistical Society: Series B (Methodological) 46(3), 431–439.

Supplementary Material for “Posterior Robustness with Milder Conditions: Contamination Models Revisited”

A Basic Lemma

Lemma 1 is used in the proof of Theorem 1. If $m,\tilde{m}\in\mathbb{N}$ satisfy $\tilde{m}\leq m$ , we write ${\text{\boldmath$e$}}_{\tilde{m}}^{(m)}$ for the $\tilde{m}$ th unit vector in $\mathbb{R}^{m}$ , namely the $\tilde{m}$ th column of the $m\times m$ identity matrix.

Lemma 1.

Let $n,p\in\mathbb{N}$ . Let ${\text{\boldmath$x$}}_{1},\dots,{\text{\boldmath$x$}}_{n}\in\mathbb{R}^{p}$ be continuous variables. Let $(a_{1},b_{1}),\dots,(a_{n},b_{n})\in\mathbb{R}^{2}$ . Let ${\cal K}=\{i=1,\dots,n|b_{i}=0\}$ and ${\cal L}=\{1,\dots,n\}\setminus{\cal K}$ . Let ${\cal J}=\{-1,\dots,-p\}\cup{\cal K}$ . Let $a_{j}=0$ and ${\text{\boldmath$x$}}_{j}={\text{\boldmath$e$}}_{-j}^{(p)}$ for $j=-1,\dots,-p$ . Suppose that ${\cal L}\neq\emptyset$ . Suppose that $\{b_{i}|i\in{\cal L}\}$ and $\{a_{i}|i\in{\cal K}\}$ are continuous variables.

(i)

Let $1\leq l\leq|{\cal L}|$ . Let $i_{1},\dots,i_{l}\in{\cal L}$ satisfy $i_{1}<\dots<i_{l}$ . Let $\overline{{\varepsilon}}>0$ and $\underline{{\omega}}>0$ be arbitrary. Then there exist $\eta>0$ , ${\delta}>0$ , $0<{\varepsilon}<\overline{{\varepsilon}}$ , and $M>\underline{{\omega}}$ such that for all ${\omega}\geq M$ and all ${\text{\boldmath$\beta$}}=({\beta}_{k})_{k=1}^{p}\in\mathbb{R}^{p}$ , the condition that

\displaystyle|a_{i_{1}}+b_{i_{1}}{\omega}-{{\text{\boldmath$x$}}_{i_{1}}}^{% \top}{\text{\boldmath$\beta$}}|,\dots,|a_{i_{l}}+b_{i_{l}}{\omega}-{{\text{% \boldmath$x$}}_{i_{l}}}^{\top}{\text{\boldmath$\beta$}}|\leq{\varepsilon}{% \omega}{}

implies the following conditions:

(a)

There exist distinct indices $k_{1},\dots,k_{l}=1,\dots,p$ such that $|{\beta}_{k_{1}}|,\dots,|{\beta}_{k_{l}}|>{\delta}{\omega}$ .
(b)

There exist distinct indices $j_{1},\dots,j_{|{\cal K}|+l}\in{\cal J}$ such that $|a_{j_{1}}-{{\text{\boldmath$x$}}_{j_{1}}}^{\top}{\text{\boldmath$\beta$}}|,% \dots,|a_{j_{|{\cal K}|+l}}-{{\text{\boldmath$x$}}_{j_{|{\cal K}|+l}}}^{\top}{% \text{\boldmath$\beta$}}|>\eta$ .

(ii)

Let $\overline{{\varepsilon}}>0$ and $\underline{{\omega}}>0$ be arbitrary. Then there exist $\eta>0$ , ${\delta}>0$ , $0<{\varepsilon}<\overline{{\varepsilon}}$ , and $M>\underline{{\omega}}$ such that for all ${\omega}\geq M$ ,

	$\displaystyle\mathbb{R}^{p}$	$\displaystyle\subset\Big{(}\bigcap_{i\in{\cal L}}\{{\text{\boldmath$\beta$}}% \in\mathbb{R}^{p}\|\|a_{i}+b_{i}{\omega}-{{\text{\boldmath$x$}}_{i}}^{\top}{% \text{\boldmath$\beta$}}\|>{\varepsilon}{\omega}\}\Big{)}{}$
		$\displaystyle\quad\cup\bigcup_{l=1}^{\min\{\|{\cal L}\|,p\}}\bigcup_{\begin{% subarray}{c}i_{1},\dots,i_{l}\in{\cal L}\\ i_{1}<\dots<i_{l}\end{subarray}}\Big{(}\Big{(}\bigcap_{i\in\{i_{1},\dots,i_{l}% \}}\{{\text{\boldmath$\beta$}}\in\mathbb{R}^{p}\|\|a_{i}+b_{i}{\omega}-{{\text{% \boldmath$x$}}_{i}}^{\top}{\text{\boldmath$\beta$}}\|\leq{\varepsilon}{\omega}% \}\Big{)}{}$
		$\displaystyle\quad\cap\Big{(}\bigcap_{i\in{\cal L}\setminus\{i_{1},\dots,i_{l}% \}}\{{\text{\boldmath$\beta$}}\in\mathbb{R}^{p}\|\|a_{i}+b_{i}{\omega}-{{\text{% \boldmath$x$}}_{i}}^{\top}{\text{\boldmath$\beta$}}\|>{\varepsilon}{\omega}\}% \Big{)}{}$
		$\displaystyle\quad\cap\Big{\{}\bigcup_{1\leq k_{1}<\dots<k_{l}\leq p}\bigcap_{% k\in\{k_{1},\dots,k_{l}\}}\{{\text{\boldmath$\beta$}}\in\mathbb{R}^{p}\|\|({% \text{\boldmath$e$}}_{k}^{(p)})^{\top}{\text{\boldmath$\beta$}}\|\geq{\delta}{% \omega}\}\Big{\}}{}$
		$\displaystyle\quad\cap\Big{[}\Big{(}\bigcap_{j\in{\cal J}}\{{\text{\boldmath$% \beta$}}\in\mathbb{R}^{p}\|\|a_{j}-{{\text{\boldmath$x$}}_{j}}^{\top}{\text{% \boldmath$\beta$}}\|>\eta\}\Big{)}{}$
		$\displaystyle\quad\cup\bigcup_{1\leq q\leq p-l}\bigcup_{\begin{subarray}{c}j_{% 1},\dots,j_{q}\in{\cal J}\\ j_{1}<\dots<j_{q}\end{subarray}}\Big{\{}\Big{(}\bigcap_{j\in\{j_{1},\dots,j_{q% }\}}\{{\text{\boldmath$\beta$}}\in\mathbb{R}^{p}\|\|a_{j}-{{\text{\boldmath$x$}}% _{j}}^{\top}{\text{\boldmath$\beta$}}\|\leq\eta\}\Big{)}{}$
		$\displaystyle\quad\cap\Big{(}\bigcap_{j\in{\cal J}\setminus\{j_{1},\dots,j_{q}% \}}\{{\text{\boldmath$\beta$}}\in\mathbb{R}^{p}\|\|a_{j}-{{\text{\boldmath$x$}}_% {j}}^{\top}{\text{\boldmath$\beta$}}\|>\eta\}\Big{)}\Big{\}}\Big{]}\Big{)}\text% {.}{}$

Proof.

Part (ii) follows from part (i). For part (i), fix $\eta,{\delta},{\varepsilon},M>0$ and ${\omega}\geq M$ and ${\text{\boldmath$\beta$}}=({\beta}_{k})_{k=1}^{p}\in\mathbb{R}^{p}$ . Suppose that $|a_{i_{1}}+b_{i_{1}}{\omega}-{{\text{\boldmath$x$}}_{i_{1}}}^{\top}{\text{% \boldmath$\beta$}}|,\dots,|a_{i_{l}}+b_{i_{l}}{\omega}-{{\text{\boldmath$x$}}_% {i_{l}}}^{\top}{\text{\boldmath$\beta$}}|\leq{\varepsilon}{\omega}$ . Then

\displaystyle({{\text{\boldmath$x$}}_{i_{h}}}^{\top})_{h=1}^{l}{\text{% \boldmath$\beta$}}\in(a_{i_{h}})_{h=1}^{l}+(b_{i_{h}})_{h=1}^{l}{\omega}+{% \omega}[\pm{\varepsilon}]^{l}\text{.}{}

If $M>0$ is sufficiently large,

\displaystyle({{\text{\boldmath$x$}}_{i_{h}}}^{\top})_{h=1}^{l}{\text{% \boldmath$\beta$}}\in(b_{i_{h}})_{h=1}^{l}{\omega}+{\omega}[\pm 2{\varepsilon}% ]^{l}\text{.}

(S1)

Now, suppose that $l\geq p+1$ . Then

\displaystyle({{\text{\boldmath$x$}}_{i_{h}}}^{\top})_{h=1}^{p+1}{\text{% \boldmath$\beta$}}/{\omega}\in(b_{i_{h}})_{h=1}^{p+1}+[\pm 2{\varepsilon}]^{p+% 1}\text{.}{}

Since $({{\text{\boldmath$x$}}_{i_{h}}}^{\top})_{h=1}^{p}$ is invertible by assumption,

\displaystyle{{\text{\boldmath$x$}}_{i_{p+1}}}^{\top}(({{\text{\boldmath$x$}}_% {i_{h}}}^{\top})_{h=1}^{p})^{-1}((b_{i_{h}})_{h=1}^{p}+{\text{\boldmath$t$}})=% b_{i_{p+1}}+t{}

for some ${\text{\boldmath$t$}}\in[\pm 2{\varepsilon}]^{p}$ and $t\in[\pm 2{\varepsilon}]$ . This is a contradiction if ${\varepsilon}>0$ is sufficiently small since ${{\text{\boldmath$x$}}_{i_{p+1}}}^{\top}(({{\text{\boldmath$x$}}_{i_{h}}}^{% \top})_{h=1}^{p})^{-1}(b_{i_{h}})_{h=1}^{p}\neq b_{i_{p+1}}$ by assumption. Thus, we have $l\leq p$ if ${\varepsilon}$ is sufficiently small and we assume that $l\leq p$ .

For part (a), suppose that there exist distinct indices $k_{1},\dots,k_{p-l+1}=1,\dots,p$ such that $|{\beta}_{k_{1}}|,\dots,|{\beta}_{k_{p-l+1}}|\leq{\delta}{\omega}$ . Then

\displaystyle(({\text{\boldmath$e$}}_{k_{h}}^{(p)})^{\top})_{h=1}^{p-l+1}{% \text{\boldmath$\beta$}}\in{\omega}[\pm{\delta}]^{p-l+1}\text{.}

(S2)

Let $k_{1}^{\prime},\dots,k_{l-1}^{\prime}\in\{1,\dots,p\}\setminus\{k_{1},\dots,k_% {p-l+1}\}$ be such that $k_{1}^{\prime}<\dots<k_{l-1}^{\prime}$ . Let ${\text{\boldmath$E$}}=(({\text{\boldmath$e$}}_{k_{h}^{\prime}}^{(p)})^{\top})_% {h=1}^{l-1}$ . Then if ${\delta}>0$ is sufficiently small, by (S1) and (S2)

\displaystyle({{\text{\boldmath$x$}}_{i_{h}}}^{\top}{\text{\boldmath$E$}}^{% \top})_{h=1}^{l}{\text{\boldmath$E$}}{\text{\boldmath$\beta$}}\in(b_{i_{h}})_{% h=1}^{l}{\omega}+{\omega}[\pm 3{\varepsilon}]^{l}{}

\displaystyle({{\text{\boldmath$x$}}_{i_{h}}}^{\top}{\text{\boldmath$E$}}^{% \top})_{h=1}^{l}{\text{\boldmath$E$}}{\text{\boldmath$\beta$}}/{\omega}\in(b_{% i_{h}})_{h=1}^{l}+[\pm 3{\varepsilon}]^{l}\text{.}{}

Let $s={\rm rank\,}({{\text{\boldmath$x$}}_{i_{h}}}^{\top}{\text{\boldmath$E$}}^{% \top})_{h=1}^{l}\leq l-1$ . Then there exist invertible matrices ${\text{\boldmath$F$}}\in\mathbb{R}^{l\times l}$ and ${\text{\boldmath$G$}}\in\mathbb{R}^{(l-1)\times(l-1)}$ such that

\displaystyle{\text{\boldmath$F$}}({{\text{\boldmath$x$}}_{i_{h}}}^{\top})_{h=% 1}^{l}{\text{\boldmath$E$}}^{\top}{\text{\boldmath$G$}}=\begin{pmatrix}{\text{% \boldmath$I$}}^{(s)}&{\text{\boldmath$O$}}^{(s,l-1-s)}\\ {\text{\boldmath$O$}}^{(l-s,s)}&{\text{\boldmath$O$}}^{(l-s,l-1-s)}\end{% pmatrix}\text{.}{}

Therefore,

\displaystyle{\text{\boldmath$F$}}^{-1}\begin{pmatrix}{\text{\boldmath$I$}}^{(% s)}&{\text{\boldmath$O$}}^{(s,l-1-s)}\\ {\text{\boldmath$O$}}^{(l-s,s)}&{\text{\boldmath$O$}}^{(l-s,l-1-s)}\end{% pmatrix}{\text{\boldmath$G$}}^{-1}{\text{\boldmath$E$}}{\text{\boldmath$\beta$% }}/{\omega}\in(b_{i_{h}})_{h=1}^{l}+[\pm 3{\varepsilon}]^{l}\text{.}{}

Thus, there exists ${\text{\boldmath$\gamma$}}\in\mathbb{R}^{s}$ such that

\displaystyle{\text{\boldmath$F$}}^{-1}\begin{pmatrix}{\text{\boldmath$I$}}^{(% s)}\\ {\text{\boldmath$O$}}^{(l-s,s)}\end{pmatrix}{\text{\boldmath$\gamma$}}\in(b_{i% _{h}})_{h=1}^{l}+[\pm 3{\varepsilon}]^{l}{}

\displaystyle\begin{pmatrix}{\text{\boldmath$\gamma$}}\\ \bm{0}^{(l-s)}\end{pmatrix}\in{\text{\boldmath$F$}}(b_{i_{h}})_{h=1}^{l}+{% \text{\boldmath$F$}}[\pm 3{\varepsilon}]^{l}\text{,}{}

which is a contradiction if ${\varepsilon}>0$ is sufficiently small since $0\neq({\text{\boldmath$e$}}_{l}^{(l)})^{\top}{\text{\boldmath$F$}}(b_{i_{h}})_% {h=1}^{l}$ by assumption. This proves part (a).

For part (b), suppose that there exist distinct indices $j_{1},\dots,j_{p-l+1}\in{\cal J}$ such that $|a_{j_{1}}-{{\text{\boldmath$x$}}_{j_{1}}}^{\top}{\text{\boldmath$\beta$}}|,% \dots,|a_{j_{p-l+1}}-{{\text{\boldmath$x$}}_{j_{p-l+1}}}^{\top}{\text{% \boldmath$\beta$}}|\leq\eta$ . Then

\displaystyle({{\text{\boldmath$x$}}_{j_{h}}}^{\top})_{h=1}^{p-l+1}{\text{% \boldmath$\beta$}}\in(a_{j_{h}})_{h=1}^{p-l+1}+[\pm\eta]^{p-l+1}\text{.}{}

Let $q_{1},q_{2}\geq 0$ , $j_{1}^{\prime},\dots,j_{q_{1}}^{\prime}=-1,\dots,-p$ , and $j_{1}^{\prime\prime},\dots,j_{q_{2}}^{\prime\prime}\in{\cal K}$ be such that $q_{1}+q_{2}=p-l+1$ , $j_{1}^{\prime}<\dots<j_{q_{1}}^{\prime}<j_{1}^{\prime\prime}<\dots<j_{q_{2}}^{% \prime\prime}$ , and $\{j_{1}^{\prime},\dots,j_{q_{1}}^{\prime}\}\cup\{j_{1}^{\prime\prime},\dots,j_% {q_{2}}^{\prime\prime}\}=\{j_{1},\dots,j_{p-l+1}\}$ . Then

\displaystyle({\beta}_{-j_{h}^{\prime}})_{h=1}^{q_{1}}\in[\pm\eta]^{q_{1}}% \text{.}{}

Let $1\leq k_{1}<\dots<k_{p-q_{1}}\leq p$ be such that $\{k_{1},\dots,k_{p-q_{1}}\}=\{1,\dots,p\}\setminus\{-j_{1}^{\prime},\dots,-j_{% q_{1}}^{\prime}\}$ and let ${\text{\boldmath$E$}}=({\text{\boldmath$e$}}_{k_{1}}^{(p)},\dots,{\text{% \boldmath$e$}}_{k_{p-q_{1}}}^{(p)})$ . Then if $M>0$ is sufficiently large, we have, by (S1),

\displaystyle({{\text{\boldmath$x$}}_{i_{h}}}^{\top}{\text{\boldmath$E$}})_{h=% 1}^{l}{\text{\boldmath$E$}}^{\top}{\text{\boldmath$\beta$}}\in(b_{i_{h}})_{h=1% }^{l}{\omega}+{\omega}[\pm 3{\varepsilon}]^{l}\text{.}

(S3)

Also,

\displaystyle({{\text{\boldmath$x$}}_{j_{h}^{\prime\prime}}}^{\top}{\text{% \boldmath$E$}})_{h=1}^{q_{2}}{\text{\boldmath$E$}}^{\top}{\text{\boldmath$% \beta$}}\in(a_{j_{h}^{\prime\prime}})_{h=1}^{q_{2}}+[\pm(1+q_{1}A)\eta]^{q_{2}% }\text{,}{}

where $A=\max_{1\leq i\leq n}\max_{1\leq k\leq p}|{{\text{\boldmath$x$}}_{i}}^{\top}{% \text{\boldmath$e$}}_{k}^{(p)}|$ . If $\eta>0$ is sufficiently small, the matrix $({{\text{\boldmath$x$}}_{j_{h}^{\prime\prime}}}^{\top}{\text{\boldmath$E$}})_{% h=1}^{q_{2}}$ has rank $q_{2}$ since otherwise $0\in({\text{\boldmath$e$}}_{q_{2}}^{(q_{2})})^{\top}{\text{\boldmath$C$}}(a_{j% _{h}^{\prime\prime}})_{h=1}^{q_{2}}+({\text{\boldmath$e$}}_{q_{2}}^{(q_{2})})^% {\top}{\text{\boldmath$C$}}[\pm(1+q_{1}A)\eta]^{q_{2}}$ for some invertible matrix ${\text{\boldmath$C$}}\in\mathbb{R}^{q_{2}\times q_{2}}$ . Therefore, there exists an invertible matrix ${\text{\boldmath$D$}}\in\mathbb{R}^{(p-q_{1})\times(p-q_{1})}$ such that

\displaystyle({\text{\boldmath$I$}}^{(q_{2})},{\text{\boldmath$O$}}^{(q_{2},p-% q_{1}-q_{2})}){\text{\boldmath$\gamma$}}=({{\text{\boldmath$x$}}_{j_{h}^{% \prime\prime}}}^{\top}{\text{\boldmath$E$}})_{h=1}^{q_{2}}{\text{\boldmath$D$}% }{\text{\boldmath$D$}}^{-1}{\text{\boldmath$E$}}^{\top}{\text{\boldmath$\beta$% }}\in(a_{j_{h}^{\prime\prime}})_{h=1}^{q_{2}}+[\pm(1+q_{1}A)\eta]^{q_{2}}\text% {,}

(S4)

where ${\text{\boldmath$\gamma$}}=({\gamma}_{h})_{h=1}^{p-q_{1}}={\text{\boldmath$D$}% }^{-1}{\text{\boldmath$E$}}^{\top}{\text{\boldmath$\beta$}}$ . It follows from (S3) and (S4) that if $M>0$ is sufficiently large,

\displaystyle({{\text{\boldmath$x$}}_{i_{h}}}^{\top}{\text{\boldmath$E$}})_{h=% 1}^{l}{\text{\boldmath$D$}}({\text{\boldmath$e$}}_{q_{2}+1}^{(p-q_{1})},\dots,% {\text{\boldmath$e$}}_{p-q_{1}}^{(p-q_{1})})({\gamma}_{h})_{h=q_{2}+1}^{p-q_{1% }}/{\omega}\in(b_{i_{h}})_{h=1}^{l}+[\pm 4{\varepsilon}]^{l}\text{.}{}

Thus, since the rank of the matrix $({{\text{\boldmath$x$}}_{i_{h}}}^{\top}{\text{\boldmath$E$}})_{h=1}^{l}{\text{% \boldmath$D$}}({\text{\boldmath$e$}}_{q_{2}+1}^{(p-q_{1})},\dots,{\text{% \boldmath$e$}}_{p-q_{1}}^{(p-q_{1})})$ is less than or equal to $p-q_{1}-q_{2}=l-1$ ,

\displaystyle 0\in({\text{\boldmath$e$}}_{p-q_{1}}^{(p-q_{1})})^{\top}{\text{% \boldmath$B$}}(b_{i_{h}})_{h=1}^{l}+({\text{\boldmath$e$}}_{p-q_{1}}^{(p-q_{1}% )})^{\top}{\text{\boldmath$B$}}[\pm 4{\varepsilon}]^{l}{}

for some invertible matrix ${\text{\boldmath$B$}}\in\mathbb{R}^{l\times l}$ , which is a contradiction if ${\varepsilon}>0$ is sufficiently small. This completes the proof. ∎

Proof of Theorem 1

Here, we prove Theorem 1.

Proof of Theorem 1. The posterior is

\displaystyle p({\text{\boldmath$\beta$}},{\sigma}|{\text{\boldmath$y$}})

\displaystyle=\frac{\displaystyle g({\text{\boldmath$\beta$}},{\sigma};{\omega% })}{\displaystyle\int_{\mathbb{R}^{p}\times(0,\infty)}g({\text{\boldmath$\beta% $}},{\sigma};{\omega})d({\text{\boldmath$\beta$}},{\sigma})}\text{,}{}

where

\displaystyle g({\text{\boldmath$\beta$}},{\sigma};{\omega})

\displaystyle=\pi({\text{\boldmath$\beta$}},{\sigma})\Big{[}\prod_{i\in{\cal K% }}\Big{\{}{1-s\over s}{{\rm{N}}(y_{i}|{{\text{\boldmath$x$}}_{i}}^{\top}{\text% {\boldmath$\beta$}},{\sigma}^{2})\over f_{1}(y_{i})}+1\Big{\}}\Big{]}\prod_{i% \in{\cal L}}\Big{\{}{1-s\over s}{{\rm{N}}(y_{i}|{{\text{\boldmath$x$}}_{i}}^{% \top}{\text{\boldmath$\beta$}},{\sigma}^{2})\over f_{1}(y_{i})}+1\Big{\}}\text% {.}{}

Since

\displaystyle\lim_{{\omega}\to\infty}\prod_{i\in{\cal L}}\Big{\{}{1-s\over s}{% {\rm{N}}(y_{i}|{{\text{\boldmath$x$}}_{i}}^{\top}{\text{\boldmath$\beta$}},{% \sigma}^{2})\over f_{1}(y_{i})}+1\Big{\}}=1\text{,}{}

it is sufficient to show that

\displaystyle\lim_{{\omega}\to\infty}\int_{\mathbb{R}^{p}\times(0,\infty)}g({% \text{\boldmath$\beta$}},{\sigma};{\omega})d({\text{\boldmath$\beta$}},{\sigma})

\displaystyle=\int_{\mathbb{R}^{p}\times(0,\infty)}\pi({\text{\boldmath$\beta$% }},{\sigma})\Big{[}\prod_{i\in{\cal K}}\Big{\{}{1-s\over s}{{\rm{N}}(y_{i}|{{% \text{\boldmath$x$}}_{i}}^{\top}{\text{\boldmath$\beta$}},{\sigma}^{2})\over f% _{1}(y_{i})}+1\Big{\}}\Big{]}d({\text{\boldmath$\beta$}},{\sigma})\text{.}{}

Since for all ${\varepsilon}>0$ and all $i\in{\cal L}$ , $|y_{i}-{{\text{\boldmath$x$}}_{i}}^{\top}{\text{\boldmath$\beta$}}|\geq{% \varepsilon}{\omega}$ and $|y_{i}|\geq 1$ imply

	$\displaystyle{{\rm{N}}(y_{i}\|{{\text{\boldmath$x$}}_{i}}^{\top}{\text{% \boldmath$\beta$}},{\sigma}^{2})\over f_{1}(y_{i})}$	$\displaystyle\leq{M^{\prime}\over\sqrt{2\pi}}{(1+\|y_{i}\|)^{1+{\alpha}}\{1+\log% (1+\|y_{i}\|)\}^{1+{\gamma}}\over{\sigma}\exp\{(\|y_{i}-{{\text{\boldmath$x$}}_{i% }}^{\top}{\text{\boldmath$\beta$}}\|/{\sigma})^{2}/2\}}{}$
		$\displaystyle\leq{M^{\prime}\over\sqrt{2\pi}}2^{1+{\alpha}}{\sigma}^{{\alpha}}% \{1+\log(1+{\sigma})\}^{1+{\gamma}}{(\|y_{i}\|/{\sigma})^{1+{\alpha}}\{1+\log(1+% \|y_{i}\|/{\sigma})\}^{1+{\gamma}}\over\exp\{{\varepsilon}^{2}({\omega}/{\sigma}% )^{2}/2\}}{}$
		$\displaystyle\leq M_{1}{\sigma}^{{\alpha}}\{1+\log(1+{\sigma})\}^{1+{\gamma}}{}$

for some $M_{1}>0$ , it follows from the dominated convergence theorem that

	$\displaystyle\lim_{{\omega}\to\infty}\int_{\mathbb{R}^{p}\times(0,\infty)}\Big% {\{}\prod_{i\in{\cal L}}1(\|y_{i}-{{\text{\boldmath$x$}}_{i}}^{\top}{\text{% \boldmath$\beta$}}\|\geq{\varepsilon}{\omega})\Big{\}}g({\text{\boldmath$\beta$% }},{\sigma};{\omega})d({\text{\boldmath$\beta$}},{\sigma}){}$
	$\displaystyle=\int_{\mathbb{R}^{p}\times(0,\infty)}\pi({\text{\boldmath$\beta$% }},{\sigma})\Big{[}\prod_{i\in{\cal K}}\Big{\{}{1-s\over s}{{\rm{N}}(y_{i}\|{{% \text{\boldmath$x$}}_{i}}^{\top}{\text{\boldmath$\beta$}},{\sigma}^{2})\over f% _{1}(y_{i})}+1\Big{\}}\Big{]}d({\text{\boldmath$\beta$}},{\sigma}){}$

for all $0<{\varepsilon}<\min_{i\in{\cal L}}|b_{i}|/2$ . Thus, since

\displaystyle g({\text{\boldmath$\beta$}},{\sigma};{\omega})

\displaystyle=\sum_{\widetilde{{\cal K}}\subset{\cal K}}\sum_{\widetilde{{\cal L% }}\subset{\cal L}}\pi({\text{\boldmath$\beta$}},{\sigma})\Big{(}{1-s\over s}% \Big{)}^{|\widetilde{{\cal K}}|+|\widetilde{{\cal L}}|}\Big{\{}\prod_{i\in% \widetilde{{\cal K}}}{{\rm{N}}(y_{i}|{{\text{\boldmath$x$}}_{i}}^{\top}{\text{% \boldmath$\beta$}},{\sigma}^{2})\over f_{1}(y_{i})}\Big{\}}\prod_{i\in% \widetilde{{\cal L}}}{{\rm{N}}(y_{i}|{{\text{\boldmath$x$}}_{i}}^{\top}{\text{% \boldmath$\beta$}},{\sigma}^{2})\over f_{1}(y_{i})}\text{,}{}

it suffices to prove that for all $\widetilde{{\cal K}}\subset{\cal K}$ and all $\widetilde{{\cal L}}\subset{\cal L}$ , there exists $0<{\varepsilon}<\min_{i\in{\cal L}}|b_{i}|/2$ such that

\displaystyle\lim_{{\omega}\to\infty}\int_{\mathbb{R}^{p}\times(0,\infty)}h_{% \widetilde{{\cal K}},\widetilde{{\cal L}}}({\text{\boldmath$\beta$}},{\sigma};% {\omega};{\varepsilon})d({\text{\boldmath$\beta$}},{\sigma})=0\text{,}{}

where

\displaystyle h_{\widetilde{{\cal K}},\widetilde{{\cal L}}}({\text{\boldmath$% \beta$}},{\sigma};{\omega};{\varepsilon})

\displaystyle=\Big{\{}1-\prod_{i\in{\cal L}}1(|y_{i}-{{\text{\boldmath$x$}}_{i% }}^{\top}{\text{\boldmath$\beta$}}|\geq{\varepsilon}{\omega})\Big{\}}\pi({% \text{\boldmath$\beta$}},{\sigma})\Big{\{}\prod_{i\in\widetilde{{\cal K}}}{{% \rm{N}}(y_{i}|{{\text{\boldmath$x$}}_{i}}^{\top}{\text{\boldmath$\beta$}},{% \sigma}^{2})\over f_{1}(y_{i})}\Big{\}}\prod_{i\in\widetilde{{\cal L}}}{{\rm{N% }}(y_{i}|{{\text{\boldmath$x$}}_{i}}^{\top}{\text{\boldmath$\beta$}},{\sigma}^% {2})\over f_{1}(y_{i})}{}

converges to $0$ as ${\omega}\to\infty$ . This clearly holds for $\widetilde{{\cal L}}=\emptyset$ for all $\widetilde{{\cal K}}\subset{\cal K}$ .

First, fix $\emptyset\neq\widetilde{{\cal L}}\subset{\cal L}$ and let $\widetilde{{\cal K}}=\emptyset$ . Then, by Lemma 1, there exist ${\delta}>0$ , $0<{\varepsilon}<\min_{i\in{\cal L}}|b_{i}|/2$ , and $M>0$ such that for all ${\omega}\geq M$ ,

	$\displaystyle\mathbb{R}^{p}$	$\displaystyle\subset\Big{(}\bigcap_{i\in\widetilde{{\cal L}}}\{{\text{% \boldmath$\beta$}}\in\mathbb{R}^{p}\|\|a_{i}+b_{i}{\omega}-{{\text{\boldmath$x$}% }_{i}}^{\top}{\text{\boldmath$\beta$}}\|>{\varepsilon}{\omega}\}\Big{)}{}$
		$\displaystyle\quad\cup\bigcup_{l=1}^{\min\{\|\widetilde{{\cal L}}\|,p\}}\bigcup_% {\begin{subarray}{c}i_{1},\dots,i_{l}\in\widetilde{{\cal L}}\\ i_{1}<\dots<i_{l}\end{subarray}}\Big{\{}\Big{(}\bigcap_{i\in\{i_{1},\dots,i_{l% }\}}\{{\text{\boldmath$\beta$}}\in\mathbb{R}^{p}\|\|a_{i}+b_{i}{\omega}-{{\text{% \boldmath$x$}}_{i}}^{\top}{\text{\boldmath$\beta$}}\|\leq{\varepsilon}{\omega}% \}\Big{)}{}$
		$\displaystyle\quad\cap\Big{(}\bigcap_{i\in\widetilde{{\cal L}}\setminus\{i_{1}% ,\dots,i_{l}\}}\{{\text{\boldmath$\beta$}}\in\mathbb{R}^{p}\|\|a_{i}+b_{i}{% \omega}-{{\text{\boldmath$x$}}_{i}}^{\top}{\text{\boldmath$\beta$}}\|>{% \varepsilon}{\omega}\}\Big{)}{}$
		$\displaystyle\quad\cap\bigcup_{1\leq k_{1}<\dots<k_{l}\leq p}\bigcap_{k\in\{k_% {1},\dots,k_{l}\}}\{{\text{\boldmath$\beta$}}\in\mathbb{R}^{p}\|\|({\text{% \boldmath$e$}}_{k}^{(p)})^{\top}{\text{\boldmath$\beta$}}\|\geq{\delta}{\omega}% \}\Big{\}}\text{.}{}$

Since

\displaystyle h_{\widetilde{{\cal K}},\widetilde{{\cal L}}}({\text{\boldmath$% \beta$}},{\sigma};{\omega};{\varepsilon})

\displaystyle=\Big{\{}1-\prod_{i\in{\cal L}}1(|y_{i}-{{\text{\boldmath$x$}}_{i% }}^{\top}{\text{\boldmath$\beta$}}|\geq{\varepsilon}{\omega})\Big{\}}\pi({% \text{\boldmath$\beta$}},{\sigma})\prod_{i\in\widetilde{{\cal L}}}{{\rm{N}}(y_% {i}|{{\text{\boldmath$x$}}_{i}}^{\top}{\text{\boldmath$\beta$}},{\sigma}^{2})% \over f_{1}(y_{i})}{}

for all ${\omega}>0$ , clearly

\displaystyle\lim_{{\omega}\to\infty}\int_{\mathbb{R}^{p}\times(0,\infty)}1% \Big{(}{\text{\boldmath$\beta$}}\in\bigcap_{i\in\widetilde{{\cal L}}}\{{% \widetilde{\text{\boldmath$\beta$}}}\in\mathbb{R}^{p}||a_{i}+b_{i}{\omega}-{{% \text{\boldmath$x$}}_{i}}^{\top}{\widetilde{\text{\boldmath$\beta$}}}|>{% \varepsilon}{\omega}\}\Big{)}h_{\widetilde{{\cal K}},\widetilde{{\cal L}}}({% \text{\boldmath$\beta$}},{\sigma};{\omega})d({\text{\boldmath$\beta$}},{\sigma})

\displaystyle=0{}

by the dominated convergence theorem. Fix $1\leq l\leq\min\{|\widetilde{{\cal L}}|,p\}$ , $i_{1},\dots,i_{l}\in\widetilde{{\cal L}}$ with $i_{1}<\dots<i_{l}$ , and $1\leq k_{1}<\dots<k_{l}\leq p$ and let

	$\displaystyle A({\omega};{\varepsilon},{\delta})$	$\displaystyle=\Big{(}\bigcap_{i\in\{i_{1},\dots,i_{l}\}}\{{\text{\boldmath$% \beta$}}\in\mathbb{R}^{p}\|\|a_{i}+b_{i}{\omega}-{{\text{\boldmath$x$}}_{i}}^{% \top}{\text{\boldmath$\beta$}}\|\leq{\varepsilon}{\omega}\}\Big{)}{}$
		$\displaystyle\quad\cap\Big{(}\bigcap_{i\in\widetilde{{\cal L}}\setminus\{i_{1}% ,\dots,i_{l}\}}\{{\text{\boldmath$\beta$}}\in\mathbb{R}^{p}\|\|a_{i}+b_{i}{% \omega}-{{\text{\boldmath$x$}}_{i}}^{\top}{\text{\boldmath$\beta$}}\|>{% \varepsilon}{\omega}\}\Big{)}{}$
		$\displaystyle\quad\cap\bigcap_{k\in\{k_{1},\dots,k_{l}\}}\{{\text{\boldmath$% \beta$}}\in\mathbb{R}^{p}\|\|({\text{\boldmath$e$}}_{k}^{(p)})^{\top}{\text{% \boldmath$\beta$}}\|\geq{\delta}{\omega}\}\text{.}{}$

Then

	$\displaystyle 1({\text{\boldmath$\beta$}}\in A({\omega};{\varepsilon},{\delta}% ))h_{\widetilde{{\cal K}},\widetilde{{\cal L}}}({\text{\boldmath$\beta$}},{% \sigma};{\omega}){}$
	$\displaystyle\leq M1({\text{\boldmath$\beta$}}\in A({\omega};{\varepsilon},{% \delta}))\pi({\sigma})\Big{\{}\prod_{k=1}^{p}{(\|{\beta}_{k}\|/{\sigma})^{{% \kappa}-1}/{\sigma}\over(1+\|{\beta}_{k}\|/{\sigma})^{{\kappa}+\nu}}\Big{\}}% \prod_{i\in\widetilde{{\cal L}}}{{\rm{N}}(y_{i}\|{{\text{\boldmath$x$}}_{i}}^{% \top}{\text{\boldmath$\beta$}},{\sigma}^{2})\over f_{1}(y_{i})}{}$
	$\displaystyle\leq M_{2}1({\text{\boldmath$\beta$}}\in A({\omega};{\varepsilon}% ,{\delta}))\pi({\sigma})\Big{\{}\prod_{k\in\{1,\dots,p\}\setminus\{k_{1},\dots% ,k_{l}\}}{(\|{\beta}_{k}\|/{\sigma})^{{\kappa}-1}/{\sigma}\over(1+\|{\beta}_{k}\|/% {\sigma})^{{\kappa}+\nu}}\Big{\}}\Big{(}\prod_{i\in\{i_{1},\dots,i_{l}\}}{\rm{% N}}(y_{i}\|{{\text{\boldmath$x$}}_{i}}^{\top}{\text{\boldmath$\beta$}},{\sigma}% ^{2})\Big{)}{}$
	$\displaystyle\quad\times\Big{\{}\prod_{i\in\{i_{1},\dots,i_{l}\}}{1\over f_{1}% (y_{i})}\Big{\}}\Big{\{}\prod_{k\in\{k_{1},\dots,k_{l}\}}{({\delta}{\omega}/{% \sigma})^{{\kappa}-1}/{\sigma}\over(1+{\delta}{\omega}/{\sigma})^{{\kappa}+\nu% }}\Big{\}}[{\sigma}^{{\alpha}}\{1+\log(1+{\sigma})\}^{1+{\gamma}}]^{\|% \widetilde{{\cal L}}\setminus\{i_{1},\dots,i_{l}\}\|}{}$
	$\displaystyle\leq M_{3}\pi({\sigma})\Big{\{}\prod_{k\in\{1,\dots,p\}\setminus% \{k_{1},\dots,k_{l}\}}{(\|{\beta}_{k}\|/{\sigma})^{{\kappa}-1}/{\sigma}\over(1+\|% {\beta}_{k}\|/{\sigma})^{{\kappa}+\nu}}\Big{\}}\Big{(}\prod_{i\in\{i_{1},\dots,% i_{l}\}}{\rm{N}}(y_{i}\|{{\text{\boldmath$x$}}_{i}}^{\top}{\text{\boldmath$% \beta$}},{\sigma}^{2})\Big{)}{}$
	$\displaystyle\quad\times\Big{\{}\prod_{i\in\{i_{1},\dots,i_{l}\}}{{\omega}^{{% \alpha}}(\log{\omega})^{1+{\gamma}}\over{\omega}^{{{\alpha}}^{\prime}}}\Big{\}% }{\sigma}^{l{{\alpha}}^{\prime}}[{\sigma}^{{\alpha}}\{1+\log(1+{\sigma})\}^{1+% {\gamma}}]^{\|\widetilde{{\cal L}}\setminus\{i_{1},\dots,i_{l}\}\|}{}$

for some $M_{2},M_{3}>0$ for any ${\alpha}<{{\alpha}}^{\prime}\leq\nu$ and therefore

\displaystyle\lim_{{\omega}\to\infty}\int_{\mathbb{R}^{p}\times(0,\infty)}1({% \text{\boldmath$\beta$}}\in A({\omega};{\varepsilon},{\delta}))h_{\widetilde{{% \cal K}},\widetilde{{\cal L}}}({\text{\boldmath$\beta$}},{\sigma};{\omega})d({% \text{\boldmath$\beta$}},{\sigma})=0{}

for some ${\alpha}<{{\alpha}}^{\prime}\leq\nu$ .

Next, fix $\emptyset\neq\widetilde{{\cal L}}\subset{\cal L}$ and $\emptyset\neq\widetilde{{\cal K}}\subset{\cal K}$ . Let $\widetilde{{\cal J}}=\{-1,\dots,-p\}\cup\widetilde{{\cal K}}$ . Let $y_{j}=0$ and ${\text{\boldmath$x$}}_{j}={\text{\boldmath$e$}}_{-j}^{(p)}$ for $j=-1,\dots,-p$ . Then, by Lemma 1, there exist $\eta>0$ , ${\delta}>0$ , $0<{\varepsilon}<\min_{i\in{\cal L}}|b_{i}|/2$ , and $M>0$ such that for all ${\omega}\geq M$ ,

	$\displaystyle\mathbb{R}^{p}$	$\displaystyle\subset\Big{(}\bigcap_{i\in\widetilde{{\cal L}}}\{{\text{% \boldmath$\beta$}}\in\mathbb{R}^{p}\|\|a_{i}+b_{i}{\omega}-{{\text{\boldmath$x$}% }_{i}}^{\top}{\text{\boldmath$\beta$}}\|>{\varepsilon}{\omega}\}\Big{)}{}$
		$\displaystyle\quad\cup\bigcup_{l=1}^{\min\{\|\widetilde{{\cal L}}\|,p\}}\bigcup_% {\begin{subarray}{c}i_{1},\dots,i_{l}\in\widetilde{{\cal L}}\\ i_{1}<\dots<i_{l}\end{subarray}}\Big{(}\Big{(}\bigcap_{i\in\{i_{1},\dots,i_{l}% \}}\{{\text{\boldmath$\beta$}}\in\mathbb{R}^{p}\|\|a_{i}+b_{i}{\omega}-{{\text{% \boldmath$x$}}_{i}}^{\top}{\text{\boldmath$\beta$}}\|\leq{\varepsilon}{\omega}% \}\Big{)}{}$
		$\displaystyle\quad\cap\Big{(}\bigcap_{i\in\widetilde{{\cal L}}\setminus\{i_{1}% ,\dots,i_{l}\}}\{{\text{\boldmath$\beta$}}\in\mathbb{R}^{p}\|\|a_{i}+b_{i}{% \omega}-{{\text{\boldmath$x$}}_{i}}^{\top}{\text{\boldmath$\beta$}}\|>{% \varepsilon}{\omega}\}\Big{)}{}$
		$\displaystyle\quad\cap\Big{\{}\bigcup_{1\leq k_{1}<\dots<k_{l}\leq p}\bigcap_{% k\in\{k_{1},\dots,k_{l}\}}\{{\text{\boldmath$\beta$}}\in\mathbb{R}^{p}\|\|({% \text{\boldmath$e$}}_{k}^{(p)})^{\top}{\text{\boldmath$\beta$}}\|\geq{\delta}{% \omega}\}\Big{\}}{}$
		$\displaystyle\quad\cap\Big{[}\Big{(}\bigcap_{j\in\widetilde{{\cal J}}}\{{\text% {\boldmath$\beta$}}\in\mathbb{R}^{p}\|\|y_{j}-{{\text{\boldmath$x$}}_{j}}^{\top}% {\text{\boldmath$\beta$}}\|>\eta\}\Big{)}{}$
		$\displaystyle\quad\cup\bigcup_{1\leq q\leq p-l}\bigcup_{\begin{subarray}{c}j_{% 1},\dots,j_{q}\in\widetilde{{\cal J}}\\ j_{1}<\dots<j_{q}\end{subarray}}\Big{\{}\Big{(}\bigcap_{j\in\{j_{1},\dots,j_{q% }\}}\{{\text{\boldmath$\beta$}}\in\mathbb{R}^{p}\|\|y_{j}-{{\text{\boldmath$x$}}% _{j}}^{\top}{\text{\boldmath$\beta$}}\|\leq\eta\}\Big{)}{}$
		$\displaystyle\quad\cap\Big{(}\bigcap_{j\in\widetilde{{\cal J}}\setminus\{j_{1}% ,\dots,j_{q}\}}\{{\text{\boldmath$\beta$}}\in\mathbb{R}^{p}\|\|y_{j}-{{\text{% \boldmath$x$}}_{j}}^{\top}{\text{\boldmath$\beta$}}\|>\eta\}\Big{)}\Big{\}}\Big% {]}\Big{)}\text{.}{}$

Clearly,

\displaystyle\lim_{{\omega}\to\infty}\int_{\mathbb{R}^{p}\times(0,\infty)}1% \Big{(}{\text{\boldmath$\beta$}}\in\bigcap_{i\in\widetilde{{\cal L}}}\{{% \widetilde{\text{\boldmath$\beta$}}}\in\mathbb{R}^{p}||a_{i}+b_{i}{\omega}-{{% \text{\boldmath$x$}}_{i}}^{\top}{\widetilde{\text{\boldmath$\beta$}}}|>{% \varepsilon}{\omega}\}\Big{)}h_{\widetilde{{\cal K}},\widetilde{{\cal L}}}({% \text{\boldmath$\beta$}},{\sigma};{\omega};{\varepsilon})d({\text{\boldmath$% \beta$}},{\sigma})=0\text{.}{}

Fix $1\leq l\leq\min\{|\widetilde{{\cal L}}|,p\}$ , $i_{1},\dots,i_{l}\in\widetilde{{\cal L}}$ with $i_{1}<\dots<i_{l}$ , and $1\leq k_{1}<\dots<k_{l}\leq p$ . Let

	$\displaystyle A({\omega};{\varepsilon},{\delta})$	$\displaystyle=\Big{(}\bigcap_{i\in\{i_{1},\dots,i_{l}\}}\{{\text{\boldmath$% \beta$}}\in\mathbb{R}^{p}\|\|a_{i}+b_{i}{\omega}-{{\text{\boldmath$x$}}_{i}}^{% \top}{\text{\boldmath$\beta$}}\|\leq{\varepsilon}{\omega}\}\Big{)}{}$
		$\displaystyle\quad\cap\Big{(}\bigcap_{i\in\widetilde{{\cal L}}\setminus\{i_{1}% ,\dots,i_{l}\}}\{{\text{\boldmath$\beta$}}\in\mathbb{R}^{p}\|\|a_{i}+b_{i}{% \omega}-{{\text{\boldmath$x$}}_{i}}^{\top}{\text{\boldmath$\beta$}}\|>{% \varepsilon}{\omega}\}\Big{)}{}$
		$\displaystyle\quad\cap\Big{\{}\bigcap_{k\in\{k_{1},\dots,k_{l}\}}\{{\text{% \boldmath$\beta$}}\in\mathbb{R}^{p}\|\|({\text{\boldmath$e$}}_{k}^{(p)})^{\top}{% \text{\boldmath$\beta$}}\|\geq{\delta}{\omega}\}\Big{\}}{}$
		$\displaystyle\quad\cap\Big{[}\Big{(}\bigcap_{j\in\widetilde{{\cal J}}}\{{\text% {\boldmath$\beta$}}\in\mathbb{R}^{p}\|\|y_{j}-{{\text{\boldmath$x$}}_{j}}^{\top}% {\text{\boldmath$\beta$}}\|>\eta\}\Big{)}{}$
		$\displaystyle\quad\cup\bigcup_{1\leq q\leq p-l}\bigcup_{\begin{subarray}{c}j_{% 1},\dots,j_{q}\in\widetilde{{\cal J}}\\ j_{1}<\dots<j_{q}\end{subarray}}\Big{\{}\Big{(}\bigcap_{j\in\{j_{1},\dots,j_{q% }\}}\{{\text{\boldmath$\beta$}}\in\mathbb{R}^{p}\|\|y_{j}-{{\text{\boldmath$x$}}% _{j}}^{\top}{\text{\boldmath$\beta$}}\|\leq\eta\}\Big{)}{}$
		$\displaystyle\quad\cap\Big{(}\bigcap_{j\in\widetilde{{\cal J}}\setminus\{j_{1}% ,\dots,j_{q}\}}\{{\text{\boldmath$\beta$}}\in\mathbb{R}^{p}\|\|y_{j}-{{\text{% \boldmath$x$}}_{j}}^{\top}{\text{\boldmath$\beta$}}\|>\eta\}\Big{)}\Big{\}}\Big% {]}\text{.}{}$

As in the previous case, for some ${\alpha}<{{\alpha}}^{\prime}\leq\nu$ that is sufficiently close to ${\alpha}$ ,

	$\displaystyle 1({\text{\boldmath$\beta$}}\in A({\omega};{\varepsilon},{\delta}% ))h_{\widetilde{{\cal K}},\widetilde{{\cal L}}}({\text{\boldmath$\beta$}},{% \sigma};{\omega}){}$
	$\displaystyle\leq M_{4}1({\text{\boldmath$\beta$}}\in A({\omega};{\varepsilon}% ,{\delta}))\pi({\sigma}){}$
	$\displaystyle\quad\times\Big{\{}\prod_{k\in\{1,\dots,p\}\setminus\{k_{1},\dots% ,k_{l}\}}{(\|{\beta}_{k}\|/{\sigma})^{{\kappa}-1}/{\sigma}\over(1+\|{\beta}_{k}\|/% {\sigma})^{{\kappa}+\nu}}\Big{\}}\Big{\{}\prod_{i\in\widetilde{{\cal K}}}{{\rm% {N}}(y_{i}\|{{\text{\boldmath$x$}}_{i}}^{\top}{\text{\boldmath$\beta$}},{\sigma% }^{2})\over f_{1}(y_{i})}\Big{\}}\Big{(}\prod_{i\in\{i_{1},\dots,i_{l}\}}{\rm{% N}}(y_{i}\|{{\text{\boldmath$x$}}_{i}}^{\top}{\text{\boldmath$\beta$}},{\sigma}% ^{2})\Big{)}{}$
	$\displaystyle\quad\times\Big{\{}\prod_{i\in\{i_{1},\dots,i_{l}\}}{{\omega}^{{% \alpha}}(\log{\omega})^{1+{\gamma}}\over{\omega}^{{{\alpha}}^{\prime}}}\Big{\}% }{\sigma}^{l{{\alpha}}^{\prime}}[{\sigma}^{{\alpha}}\{1+\log(1+{\sigma})\}^{1+% {\gamma}}]^{\|\widetilde{{\cal L}}\setminus\{i_{1},\dots,i_{l}\}\|}{}$

for some $M_{4}>0$ . Therefore,

	$\displaystyle\int_{\mathbb{R}^{p}\times(0,\infty)}1\Big{(}{\text{\boldmath$% \beta$}}\in\bigcap_{j\in\widetilde{{\cal J}}}\{{\widetilde{\text{\boldmath$% \beta$}}}\in\mathbb{R}^{p}\|\|y_{j}-{{\text{\boldmath$x$}}_{j}}^{\top}{% \widetilde{\text{\boldmath$\beta$}}}\|>\eta\}\Big{)}1({\text{\boldmath$\beta$}}% \in A({\omega};{\varepsilon},{\delta}))h_{\widetilde{{\cal K}},\widetilde{{% \cal L}}}({\text{\boldmath$\beta$}},{\sigma};{\omega})d({\text{\boldmath$\beta% $}},{\sigma}){}$
	$\displaystyle\leq M_{5}\int_{\mathbb{R}^{p}\times(0,\infty)}\Big{(}\pi({\sigma% })\Big{\{}\prod_{k\in\{1,\dots,p\}\setminus\{k_{1},\dots,k_{l}\}}{(\|{\beta}_{k% }\|/{\sigma})^{{\kappa}-1}/{\sigma}\over(1+\|{\beta}_{k}\|/{\sigma})^{{\kappa}+% \nu}}\Big{\}}\Big{(}\prod_{i\in\{i_{1},\dots,i_{l}\}}{\rm{N}}(y_{i}\|{{\text{% \boldmath$x$}}_{i}}^{\top}{\text{\boldmath$\beta$}},{\sigma}^{2})\Big{)}{}$
	$\displaystyle\quad\times\Big{\{}\prod_{i\in\{i_{1},\dots,i_{l}\}}{{\omega}^{{% \alpha}}(\log{\omega})^{1+{\gamma}}\over{\omega}^{{{\alpha}}^{\prime}}}\Big{\}% }{\sigma}^{l{{\alpha}}^{\prime}}[{\sigma}^{{\alpha}}\{1+\log(1+{\sigma})\}^{1+% {\gamma}}]^{\|\widetilde{{\cal L}}\setminus\{i_{1},\dots,i_{l}\}\|}\Big{)}d({% \text{\boldmath$\beta$}},{\sigma})\to 0{}$

as ${\omega}\to\infty$ . Now, suppose that $p\geq l+1$ and fix $1\leq q\leq p-l$ and $j_{1},\dots,j_{q}\in\widetilde{{\cal J}}$ with $j_{1}<\dots<j_{q}$ . Then if ${\omega}>\eta/{\delta}$ ,

	$\displaystyle\int_{\mathbb{R}^{p}\times(0,\infty)}\Big{\{}1\Big{(}{\text{% \boldmath$\beta$}}\in\Big{(}\bigcap_{j\in\{j_{1},\dots,j_{q}\}}\{{\widetilde{% \text{\boldmath$\beta$}}}\in\mathbb{R}^{p}\|\|y_{j}-{{\text{\boldmath$x$}}_{j}}^% {\top}{\widetilde{\text{\boldmath$\beta$}}}\|\leq\eta\}\Big{)}{}$
	$\displaystyle\quad\cap\bigcap_{j\in\widetilde{{\cal J}}\setminus\{j_{1},\dots,% j_{q}\}}\{{\widetilde{\text{\boldmath$\beta$}}}\in\mathbb{R}^{p}\|\|y_{j}-{{% \text{\boldmath$x$}}_{j}}^{\top}{\widetilde{\text{\boldmath$\beta$}}}\|>\eta\}% \Big{)}1({\text{\boldmath$\beta$}}\in A({\omega};{\varepsilon},{\delta}))h_{% \widetilde{{\cal K}},\widetilde{{\cal L}}}({\text{\boldmath$\beta$}},{\sigma};% {\omega})\Big{\}}d({\text{\boldmath$\beta$}},{\sigma}){}$
	$\displaystyle\leq M_{4}\Big{\{}\prod_{i\in\{i_{1},\dots,i_{l}\}}{{\omega}^{{% \alpha}}(\log{\omega})^{1+{\gamma}}\over{\omega}^{{{\alpha}}^{\prime}}}\Big{\}% }\int_{\mathbb{R}^{p}\times(0,\infty)}\Big{\{}1\Big{(}{\text{\boldmath$\beta$}% }\in\Big{(}\bigcap_{j\in\{j_{1},\dots,j_{q}\}}\{{\widetilde{\text{\boldmath$% \beta$}}}\in\mathbb{R}^{p}\|\|y_{j}-{{\text{\boldmath$x$}}_{j}}^{\top}{% \widetilde{\text{\boldmath$\beta$}}}\|\leq\eta\}\Big{)}{}$
	$\displaystyle\quad\cap\bigcap_{j\in\widetilde{{\cal J}}\setminus\{j_{1},\dots,% j_{q}\}}\{{\widetilde{\text{\boldmath$\beta$}}}\in\mathbb{R}^{p}\|\|y_{j}-{{% \text{\boldmath$x$}}_{j}}^{\top}{\widetilde{\text{\boldmath$\beta$}}}\|>\eta\}% \Big{)}1({\text{\boldmath$\beta$}}\in A({\omega};{\varepsilon},{\delta}))\pi({% \sigma}){}$
	$\displaystyle\quad\times\Big{\{}\prod_{k\in(\{1,\dots,p\}\setminus\{k_{1},% \dots,k_{l}\})\cap\{-j_{1},\dots,-j_{q}\}}{(\|{\beta}_{k}\|/{\sigma})^{{\kappa}-% 1}/{\sigma}\over(1+\|{\beta}_{k}\|/{\sigma})^{{\kappa}+\nu}}\Big{\}}\Big{\{}% \prod_{i\in\widetilde{{\cal K}}\cap\{j_{1},\dots,j_{q}\}}{{\rm{N}}(y_{i}\|{{% \text{\boldmath$x$}}_{i}}^{\top}{\text{\boldmath$\beta$}},{\sigma}^{2})\over f% _{1}(y_{i})}\Big{\}}{}$
	$\displaystyle\quad\times\Big{\{}\prod_{k\in(\{1,\dots,p\}\setminus\{k_{1},% \dots,k_{l}\})\setminus\{-j_{1},\dots,-j_{q}\}}{(\|{\beta}_{k}\|/{\sigma})^{{% \kappa}-1}/{\sigma}\over(1+\|{\beta}_{k}\|/{\sigma})^{{\kappa}+\nu}}\Big{\}}\Big% {\{}\prod_{i\in\widetilde{{\cal K}}\setminus\{j_{1},\dots,j_{q}\}}{{\rm{N}}(y_% {i}\|{{\text{\boldmath$x$}}_{i}}^{\top}{\text{\boldmath$\beta$}},{\sigma}^{2})% \over f_{1}(y_{i})}\Big{\}}{}$
	$\displaystyle\quad\times\Big{(}\prod_{i\in\{i_{1},\dots,i_{l}\}}{\rm{N}}(y_{i}% \|{{\text{\boldmath$x$}}_{i}}^{\top}{\text{\boldmath$\beta$}},{\sigma}^{2})\Big% {)}{\sigma}^{l{{\alpha}}^{\prime}}[{\sigma}^{{\alpha}}\{1+\log(1+{\sigma})\}^{% 1+{\gamma}}]^{\|\widetilde{{\cal L}}\setminus\{i_{1},\dots,i_{l}\}\|}\Big{\}}d({% \text{\boldmath$\beta$}},{\sigma}){}$
	$\displaystyle\leq M_{5}\Big{\{}\prod_{i\in\{i_{1},\dots,i_{l}\}}{{\omega}^{{% \alpha}}(\log{\omega})^{1+{\gamma}}\over{\omega}^{{{\alpha}}^{\prime}}}\Big{\}% }\int_{\mathbb{R}^{p}\times(0,\infty)}\Big{\{}1\Big{(}{\text{\boldmath$\beta$}% }\in\Big{(}\bigcap_{j\in\{j_{1},\dots,j_{q}\}}\{{\widetilde{\text{\boldmath$% \beta$}}}\in\mathbb{R}^{p}\|\|y_{j}-{{\text{\boldmath$x$}}_{j}}^{\top}{% \widetilde{\text{\boldmath$\beta$}}}\|\leq\eta\}\Big{)}{}$
	$\displaystyle\quad\cap\bigcap_{j\in\widetilde{{\cal J}}\setminus\{j_{1},\dots,% j_{q}\}}\{{\widetilde{\text{\boldmath$\beta$}}}\in\mathbb{R}^{p}\|\|y_{j}-{{% \text{\boldmath$x$}}_{j}}^{\top}{\widetilde{\text{\boldmath$\beta$}}}\|>\eta\}% \Big{)}1({\text{\boldmath$\beta$}}\in A({\omega};{\varepsilon},{\delta}))\pi({% \sigma}){\sigma}^{l{{\alpha}}^{\prime}}[{\sigma}^{{\alpha}}\{1+\log(1+{\sigma}% )\}^{1+{\gamma}}]^{\|\widetilde{{\cal L}}\setminus\{i_{1},\dots,i_{l}\}\|}{}$
	$\displaystyle\quad\times\Big{\{}\prod_{j\in(\{-1,\dots,-p\}\setminus\{-k_{1},% \dots,-k_{l}\})\cap\{j_{1},\dots,j_{q}\}}{(\|{\beta}_{-j}\|/{\sigma})^{{\kappa}-% 1}/{\sigma}\over(1+\|{\beta}_{-j}\|/{\sigma})^{{\kappa}+\nu}}\Big{\}}\Big{\{}% \prod_{i\in\widetilde{{\cal K}}\cap\{j_{1},\dots,j_{q}\}}{{\rm{N}}(y_{i}\|{{% \text{\boldmath$x$}}_{i}}^{\top}{\text{\boldmath$\beta$}},{\sigma}^{2})\over f% _{1}(y_{i})}\Big{\}}{}$
	$\displaystyle\quad\times\Big{\{}\prod_{j\in(\{-1,\dots,-p\}\setminus\{-k_{1},% \dots,-k_{l}\})\setminus\{j_{1},\dots,j_{q}\}}{(\|{\beta}_{-j}\|/{\sigma})^{{% \kappa}-1}/{\sigma}\over(1+\|{\beta}_{-j}\|/{\sigma})^{{\kappa}+\nu}}\Big{\}}% \Big{(}\prod_{i\in\{i_{1},\dots,i_{l}\}}{\rm{N}}(y_{i}\|{{\text{\boldmath$x$}}_% {i}}^{\top}{\text{\boldmath$\beta$}},{\sigma}^{2})\Big{)}\Big{\}}d({\text{% \boldmath$\beta$}},{\sigma}){}$
	$\displaystyle=M_{5}\Big{\{}\prod_{i\in\{i_{1},\dots,i_{l}\}}{{\omega}^{{\alpha% }}(\log{\omega})^{1+{\gamma}}\over{\omega}^{{{\alpha}}^{\prime}}}\Big{\}}\int_% {\mathbb{R}^{p}\times(0,\infty)}\Big{\{}1\Big{(}{\text{\boldmath$\beta$}}\in% \Big{(}\bigcap_{j\in\{j_{1},\dots,j_{q}\}}\{{\widetilde{\text{\boldmath$\beta$% }}}\in\mathbb{R}^{p}\|\|y_{j}-{{\text{\boldmath$x$}}_{j}}^{\top}{\widetilde{% \text{\boldmath$\beta$}}}\|\leq\eta\}\Big{)}{}$
	$\displaystyle\quad\cap\bigcap_{j\in\widetilde{{\cal J}}\setminus\{j_{1},\dots,% j_{q}\}}\{{\widetilde{\text{\boldmath$\beta$}}}\in\mathbb{R}^{p}\|\|y_{j}-{{% \text{\boldmath$x$}}_{j}}^{\top}{\widetilde{\text{\boldmath$\beta$}}}\|>\eta\}% \Big{)}1({\text{\boldmath$\beta$}}\in A({\omega};{\varepsilon},{\delta}))\pi({% \sigma}){\sigma}^{l{{\alpha}}^{\prime}}[{\sigma}^{{\alpha}}\{1+\log(1+{\sigma}% )\}^{1+{\gamma}}]^{\|\widetilde{{\cal L}}\setminus\{i_{1},\dots,i_{l}\}\|}{}$
	$\displaystyle\quad\times\Big{\{}\prod_{j\in\{-1,\dots,-p\}\cap\{j_{1},\dots,j_% {q}\}}{(\|{\beta}_{-j}\|/{\sigma})^{{\kappa}-1}/{\sigma}\over(1+\|{\beta}_{-j}\|/{% \sigma})^{{\kappa}+\nu}}\Big{\}}\Big{\{}\prod_{i\in\widetilde{{\cal K}}\cap\{j% _{1},\dots,j_{q}\}}{{\rm{N}}(y_{i}\|{{\text{\boldmath$x$}}_{i}}^{\top}{\text{% \boldmath$\beta$}},{\sigma}^{2})\over f_{1}(y_{i})}\Big{\}}{}$
	$\displaystyle\quad\times\Big{\{}\prod_{j\in\{-1,\dots,-p\}\setminus(\{-k_{1},% \dots,-k_{l}\}\cup\{j_{1},\dots,j_{q}\})}{(\|{\beta}_{-j}\|/{\sigma})^{{\kappa}-% 1}/{\sigma}\over(1+\|{\beta}_{-j}\|/{\sigma})^{{\kappa}+\nu}}\Big{\}}\Big{(}% \prod_{i\in\{i_{1},\dots,i_{l}\}}{\rm{N}}(y_{i}\|{{\text{\boldmath$x$}}_{i}}^{% \top}{\text{\boldmath$\beta$}},{\sigma}^{2})\Big{)}\Big{\}}d({\text{\boldmath$% \beta$}},{\sigma})\text{,}{}$

where the equality follows since there is no point ${\widetilde{\text{\boldmath$\beta$}}}=({\tilde{\beta}}_{k})_{k=1}^{p}\in% \mathbb{R}^{p}$ satisfying $|{\tilde{\beta}}_{-j}|\geq{\delta}{\omega}$ and $|y_{j}-{{\text{\boldmath$x$}}_{j}}^{\top}{\widetilde{\text{\boldmath$\beta$}}}% |\leq\eta$ for some $j=-1,\dots,-p$ . The right-hand side converges to $0$ as ${\omega}\to\infty$ regardless of whether $\{-k_{1},\dots,-k_{l}\}\cap\{j_{1},\dots,j_{q}\}=\emptyset$ or not. This completes the proof. $\Box$

Remark 1.

Although we assume for simplicity that $f_{0}$ is the standard normal density, similar results about posterior robustness will be established for other choices of $f_{0}$ as well. The most important property of $f_{0}$ that is used throughout the above proof is that $f_{0}$ has thinner tails than $f_{1}$ .

Proof of Theorem 2

Here, we prove Theorem 2.

Proof of Theorem 2. As in the proof of Theorem 1, we have

\displaystyle p({\text{\boldmath$\beta$}},{\sigma}|{\text{\boldmath$y$}})

\displaystyle=g({\text{\boldmath$\beta$}},{\sigma};{\omega})/\int_{\mathbb{R}^% {p}\times(0,\infty)}g({\text{\boldmath$\beta$}},{\sigma};{\omega})d({\text{% \boldmath$\beta$}},{\sigma})\text{,}{}

where

\displaystyle g({\text{\boldmath$\beta$}},{\sigma};{\omega})

\displaystyle=\pi({\text{\boldmath$\beta$}},{\sigma})\Big{[}\prod_{i\in{\cal K% }}\Big{\{}{1-s\over s}{{\rm{N}}(y_{i}|{{\text{\boldmath$x$}}_{i}}^{\top}{\text% {\boldmath$\beta$}},{\sigma}^{2})\over f_{1}(y_{i})}+1\Big{\}}\Big{]}\prod_{i% \in{\cal L}}\Big{\{}{1-s\over s}{{\rm{N}}(y_{i}|{{\text{\boldmath$x$}}_{i}}^{% \top}{\text{\boldmath$\beta$}},{\sigma}^{2})\over f_{1}(y_{i})}+1\Big{\}}\text% {,}{}

and

\displaystyle\lim_{{\omega}\to\infty}g({\text{\boldmath$\beta$}},{\sigma};{% \omega})=\pi({\text{\boldmath$\beta$}},{\sigma})\prod_{i\in{\cal K}}\Big{\{}{1% -s\over s}{{\rm{N}}(y_{i}|{{\text{\boldmath$x$}}_{i}}^{\top}{\text{\boldmath$% \beta$}},{\sigma}^{2})\over f_{1}(y_{i})}+1\Big{\}}<\infty\text{.}{}

Now, if ${\omega}$ is sufficiently large such that $|y_{i}|\leq 2|b_{i}|{\omega}$ for all $i\in{\cal L}$ , then

	$\displaystyle\int_{\mathbb{R}^{p}\times(0,\infty)}g({\text{\boldmath$\beta$}},% {\sigma};{\omega})d({\text{\boldmath$\beta$}},{\sigma}){}$
	$\displaystyle\geq\int_{\mathbb{R}^{p}\times(0,\infty)}\pi({\text{\boldmath$% \beta$}},{\sigma})\prod_{i\in{\cal L}}\Big{\{}{1-s\over s}{{\rm{N}}(y_{i}\|{{% \text{\boldmath$x$}}_{i}}^{\top}{\text{\boldmath$\beta$}},{\sigma}^{2})\over f% _{1}(y_{i})}\Big{\}}d({\text{\boldmath$\beta$}},{\sigma}){}$
	$\displaystyle\geq{1\over M_{1}}{\omega}^{\|{\cal L}\|(1+{\alpha})}\int_{\mathbb{% R}^{p}\times(0,\infty)}\pi({\sigma}){1\over{\sigma}^{\|{\cal L}\|}}\pi({\text{% \boldmath$\beta$}}\|{\sigma})\Big{[}\prod_{i\in{\cal L}}\exp\Big{\{}-{2(\|y_{i}\|% ^{2}+\|{{\text{\boldmath$x$}}_{i}}^{\top}{\text{\boldmath$\beta$}}\|^{2})\over 2% {\sigma}^{2}}\Big{\}}\Big{]}d({\text{\boldmath$\beta$}},{\sigma}){}$
	$\displaystyle\geq{1\over M_{1}}{\omega}^{\|{\cal L}\|(1+{\alpha})}\int_{\{({% \widetilde{\text{\boldmath$\beta$}}},{\tilde{\sigma}})\in\mathbb{R}^{p}\times(% 0,\infty)\|\\|{\widetilde{\text{\boldmath$\beta$}}}\\|\leq{\sigma}\geq{\omega}\}}% \pi({\sigma}){1\over{\sigma}^{\|{\cal L}\|}}{h({\text{\boldmath$\beta$}}/{\sigma% })\over{\sigma}^{p}}\Big{\{}\prod_{i\in{\cal L}}\exp\Big{(}-{4\|b_{i}\|^{2}{% \omega}^{2}\over{\sigma}^{2}}-\\|{\text{\boldmath$x$}}_{i}\\|^{2}\Big{)}\Big{\}}% d({\text{\boldmath$\beta$}},{\sigma}){}$
	$\displaystyle\geq{1\over M_{2}}{\omega}^{\|{\cal L}\|(1+{\alpha})}\int_{{\omega}% }^{\infty}{1\over{\sigma}^{\|{\cal L}\|{\alpha}+1-\rho}}{1\over{\sigma}^{\|{\cal L% }\|}}\exp\Big{(}-\sum_{i\in{\cal L}}{4\|b_{i}\|^{2}{\omega}^{2}\over{\sigma}^{2}}% \Big{)}d{\sigma}\text{.}{}$

Therefore, by making the change of variables ${\sigma}={\omega}s$ , we obtain

\displaystyle\lim_{{\omega}\to\infty}\int_{\mathbb{R}^{p}\times(0,\infty)}g({% \text{\boldmath$\beta$}},{\sigma};{\omega})d({\text{\boldmath$\beta$}},{\sigma% })\geq\lim_{{\omega}\to\infty}{{\omega}^{\rho}\over M_{2}}\int_{1}^{\infty}{1% \over s^{|{\cal L}|(1+{\alpha})+1-\rho}}\exp\Big{(}-\sum_{i\in{\cal L}}{4|b_{i% }|^{2}\over s^{2}}\Big{)}d{\sigma}=\infty\text{.}{}

This completes the proof. $\Box$

	$\displaystyle\mathbb{R}^{p}$	$\displaystyle\subset\Big{(}\bigcap_{i\in{\cal L}}\{{\text{\boldmath$\beta$}}% \in\mathbb{R}^{p}\|\|a_{i}+b_{i}{\omega}-{{\text{\boldmath$x$}}_{i}}^{\top}{% \text{\boldmath$\beta$}}\|>{\varepsilon}{\omega}\}\Big{)}{}$
		$\displaystyle\quad\cup\bigcup_{l=1}^{\min\{\|{\cal L}\|,p\}}\bigcup_{\begin{% subarray}{c}i_{1},\dots,i_{l}\in{\cal L}\\ i_{1}<\dots<i_{l}\end{subarray}}\Big{(}\Big{(}\bigcap_{i\in\{i_{1},\dots,i_{l}% \}}\{{\text{\boldmath$\beta$}}\in\mathbb{R}^{p}\|\|a_{i}+b_{i}{\omega}-{{\text{% \boldmath$x$}}_{i}}^{\top}{\text{\boldmath$\beta$}}\|\leq{\varepsilon}{\omega}% \}\Big{)}{}$
		$\displaystyle\quad\cap\Big{(}\bigcap_{i\in{\cal L}\setminus\{i_{1},\dots,i_{l}% \}}\{{\text{\boldmath$\beta$}}\in\mathbb{R}^{p}\|\|a_{i}+b_{i}{\omega}-{{\text{% \boldmath$x$}}_{i}}^{\top}{\text{\boldmath$\beta$}}\|>{\varepsilon}{\omega}\}% \Big{)}{}$
		$\displaystyle\quad\cap\Big{\{}\bigcup_{1\leq k_{1}<\dots<k_{l}\leq p}\bigcap_{% k\in\{k_{1},\dots,k_{l}\}}\{{\text{\boldmath$\beta$}}\in\mathbb{R}^{p}\|\|({% \text{\boldmath$e$}}_{k}^{(p)})^{\top}{\text{\boldmath$\beta$}}\|\geq{\delta}{% \omega}\}\Big{\}}{}$
		$\displaystyle\quad\cap\Big{[}\Big{(}\bigcap_{j\in{\cal J}}\{{\text{\boldmath$% \beta$}}\in\mathbb{R}^{p}\|\|a_{j}-{{\text{\boldmath$x$}}_{j}}^{\top}{\text{% \boldmath$\beta$}}\|>\eta\}\Big{)}{}$
		$\displaystyle\quad\cup\bigcup_{1\leq q\leq p-l}\bigcup_{\begin{subarray}{c}j_{% 1},\dots,j_{q}\in{\cal J}\\ j_{1}<\dots<j_{q}\end{subarray}}\Big{\{}\Big{(}\bigcap_{j\in\{j_{1},\dots,j_{q% }\}}\{{\text{\boldmath$\beta$}}\in\mathbb{R}^{p}\|\|a_{j}-{{\text{\boldmath$x$}}% _{j}}^{\top}{\text{\boldmath$\beta$}}\|\leq\eta\}\Big{)}{}$
		$\displaystyle\quad\cap\Big{(}\bigcap_{j\in{\cal J}\setminus\{j_{1},\dots,j_{q}% \}}\{{\text{\boldmath$\beta$}}\in\mathbb{R}^{p}\|\|a_{j}-{{\text{\boldmath$x$}}_% {j}}^{\top}{\text{\boldmath$\beta$}}\|>\eta\}\Big{)}\Big{\}}\Big{]}\Big{)}\text% {.}{}$

	$\displaystyle{{\rm{N}}(y_{i}\|{{\text{\boldmath$x$}}_{i}}^{\top}{\text{% \boldmath$\beta$}},{\sigma}^{2})\over f_{1}(y_{i})}$	$\displaystyle\leq{M^{\prime}\over\sqrt{2\pi}}{(1+\|y_{i}\|)^{1+{\alpha}}\{1+\log% (1+\|y_{i}\|)\}^{1+{\gamma}}\over{\sigma}\exp\{(\|y_{i}-{{\text{\boldmath$x$}}_{i% }}^{\top}{\text{\boldmath$\beta$}}\|/{\sigma})^{2}/2\}}{}$
		$\displaystyle\leq{M^{\prime}\over\sqrt{2\pi}}2^{1+{\alpha}}{\sigma}^{{\alpha}}% \{1+\log(1+{\sigma})\}^{1+{\gamma}}{(\|y_{i}\|/{\sigma})^{1+{\alpha}}\{1+\log(1+% \|y_{i}\|/{\sigma})\}^{1+{\gamma}}\over\exp\{{\varepsilon}^{2}({\omega}/{\sigma}% )^{2}/2\}}{}$
		$\displaystyle\leq M_{1}{\sigma}^{{\alpha}}\{1+\log(1+{\sigma})\}^{1+{\gamma}}{}$

	$\displaystyle\mathbb{R}^{p}$	$\displaystyle\subset\Big{(}\bigcap_{i\in\widetilde{{\cal L}}}\{{\text{% \boldmath$\beta$}}\in\mathbb{R}^{p}\|\|a_{i}+b_{i}{\omega}-{{\text{\boldmath$x$}% }_{i}}^{\top}{\text{\boldmath$\beta$}}\|>{\varepsilon}{\omega}\}\Big{)}{}$
		$\displaystyle\quad\cup\bigcup_{l=1}^{\min\{\|\widetilde{{\cal L}}\|,p\}}\bigcup_% {\begin{subarray}{c}i_{1},\dots,i_{l}\in\widetilde{{\cal L}}\\ i_{1}<\dots<i_{l}\end{subarray}}\Big{\{}\Big{(}\bigcap_{i\in\{i_{1},\dots,i_{l% }\}}\{{\text{\boldmath$\beta$}}\in\mathbb{R}^{p}\|\|a_{i}+b_{i}{\omega}-{{\text{% \boldmath$x$}}_{i}}^{\top}{\text{\boldmath$\beta$}}\|\leq{\varepsilon}{\omega}% \}\Big{)}{}$
		$\displaystyle\quad\cap\Big{(}\bigcap_{i\in\widetilde{{\cal L}}\setminus\{i_{1}% ,\dots,i_{l}\}}\{{\text{\boldmath$\beta$}}\in\mathbb{R}^{p}\|\|a_{i}+b_{i}{% \omega}-{{\text{\boldmath$x$}}_{i}}^{\top}{\text{\boldmath$\beta$}}\|>{% \varepsilon}{\omega}\}\Big{)}{}$
		$\displaystyle\quad\cap\bigcup_{1\leq k_{1}<\dots<k_{l}\leq p}\bigcap_{k\in\{k_% {1},\dots,k_{l}\}}\{{\text{\boldmath$\beta$}}\in\mathbb{R}^{p}\|\|({\text{% \boldmath$e$}}_{k}^{(p)})^{\top}{\text{\boldmath$\beta$}}\|\geq{\delta}{\omega}% \}\Big{\}}\text{.}{}$

	$\displaystyle A({\omega};{\varepsilon},{\delta})$	$\displaystyle=\Big{(}\bigcap_{i\in\{i_{1},\dots,i_{l}\}}\{{\text{\boldmath$% \beta$}}\in\mathbb{R}^{p}\|\|a_{i}+b_{i}{\omega}-{{\text{\boldmath$x$}}_{i}}^{% \top}{\text{\boldmath$\beta$}}\|\leq{\varepsilon}{\omega}\}\Big{)}{}$
		$\displaystyle\quad\cap\Big{(}\bigcap_{i\in\widetilde{{\cal L}}\setminus\{i_{1}% ,\dots,i_{l}\}}\{{\text{\boldmath$\beta$}}\in\mathbb{R}^{p}\|\|a_{i}+b_{i}{% \omega}-{{\text{\boldmath$x$}}_{i}}^{\top}{\text{\boldmath$\beta$}}\|>{% \varepsilon}{\omega}\}\Big{)}{}$
		$\displaystyle\quad\cap\bigcap_{k\in\{k_{1},\dots,k_{l}\}}\{{\text{\boldmath$% \beta$}}\in\mathbb{R}^{p}\|\|({\text{\boldmath$e$}}_{k}^{(p)})^{\top}{\text{% \boldmath$\beta$}}\|\geq{\delta}{\omega}\}\text{.}{}$

	$\displaystyle 1({\text{\boldmath$\beta$}}\in A({\omega};{\varepsilon},{\delta}% ))h_{\widetilde{{\cal K}},\widetilde{{\cal L}}}({\text{\boldmath$\beta$}},{% \sigma};{\omega}){}$
	$\displaystyle\leq M1({\text{\boldmath$\beta$}}\in A({\omega};{\varepsilon},{% \delta}))\pi({\sigma})\Big{\{}\prod_{k=1}^{p}{(\|{\beta}_{k}\|/{\sigma})^{{% \kappa}-1}/{\sigma}\over(1+\|{\beta}_{k}\|/{\sigma})^{{\kappa}+\nu}}\Big{\}}% \prod_{i\in\widetilde{{\cal L}}}{{\rm{N}}(y_{i}\|{{\text{\boldmath$x$}}_{i}}^{% \top}{\text{\boldmath$\beta$}},{\sigma}^{2})\over f_{1}(y_{i})}{}$
	$\displaystyle\leq M_{2}1({\text{\boldmath$\beta$}}\in A({\omega};{\varepsilon}% ,{\delta}))\pi({\sigma})\Big{\{}\prod_{k\in\{1,\dots,p\}\setminus\{k_{1},\dots% ,k_{l}\}}{(\|{\beta}_{k}\|/{\sigma})^{{\kappa}-1}/{\sigma}\over(1+\|{\beta}_{k}\|/% {\sigma})^{{\kappa}+\nu}}\Big{\}}\Big{(}\prod_{i\in\{i_{1},\dots,i_{l}\}}{\rm{% N}}(y_{i}\|{{\text{\boldmath$x$}}_{i}}^{\top}{\text{\boldmath$\beta$}},{\sigma}% ^{2})\Big{)}{}$
	$\displaystyle\quad\times\Big{\{}\prod_{i\in\{i_{1},\dots,i_{l}\}}{1\over f_{1}% (y_{i})}\Big{\}}\Big{\{}\prod_{k\in\{k_{1},\dots,k_{l}\}}{({\delta}{\omega}/{% \sigma})^{{\kappa}-1}/{\sigma}\over(1+{\delta}{\omega}/{\sigma})^{{\kappa}+\nu% }}\Big{\}}[{\sigma}^{{\alpha}}\{1+\log(1+{\sigma})\}^{1+{\gamma}}]^{\|% \widetilde{{\cal L}}\setminus\{i_{1},\dots,i_{l}\}\|}{}$
	$\displaystyle\leq M_{3}\pi({\sigma})\Big{\{}\prod_{k\in\{1,\dots,p\}\setminus% \{k_{1},\dots,k_{l}\}}{(\|{\beta}_{k}\|/{\sigma})^{{\kappa}-1}/{\sigma}\over(1+\|% {\beta}_{k}\|/{\sigma})^{{\kappa}+\nu}}\Big{\}}\Big{(}\prod_{i\in\{i_{1},\dots,% i_{l}\}}{\rm{N}}(y_{i}\|{{\text{\boldmath$x$}}_{i}}^{\top}{\text{\boldmath$% \beta$}},{\sigma}^{2})\Big{)}{}$
	$\displaystyle\quad\times\Big{\{}\prod_{i\in\{i_{1},\dots,i_{l}\}}{{\omega}^{{% \alpha}}(\log{\omega})^{1+{\gamma}}\over{\omega}^{{{\alpha}}^{\prime}}}\Big{\}% }{\sigma}^{l{{\alpha}}^{\prime}}[{\sigma}^{{\alpha}}\{1+\log(1+{\sigma})\}^{1+% {\gamma}}]^{\|\widetilde{{\cal L}}\setminus\{i_{1},\dots,i_{l}\}\|}{}$