Appendix A: The simulation for nonlinear boundary case
Here, we provide an example for a simple nonlinear boundary case to test the performance of SVCMA. The data generating process is described below.
DGP3: \(\Pr (Y=1)=1-\Pr (Y=-1)=0.8I(\text {sign}(\textbf{x}\)\( ^{\top }{\varvec{\beta }})=1)+0.2I(\text {sign}(\textbf{x}^{\top }{\varvec{\beta }})=-1)\), where \(I(\cdot )\) is the indicator function, \(\textbf{x}=(x_1, x_2, x_2^2, x_2^3, \ldots , x_2^{959}, x_3, x_4, \ldots , \)\( x_{p-958})\), \(x_1\sim U(0,1)\), \(x_2, \ldots , x_{p-958} {\mathop {\sim }\limits ^{i.i.d.}} U(-1, 1)\) and \({\varvec{\beta }}=(2, 0, -2, 0, -2, 0, 0, \ldots , 0)\).
In this example, the training data contains all the covariates used in DGP3, and we set the dimension \(p=1000\), the training size \(n=500\) and testing size \(n_{\text {test}}=10{,}000\). The results of error rates are shown in Fig. 13(a). The fitted boundaries of all methods are ploted in Fig. 13(b). It can be seen from these two figures that the SVCMA demonstrates the most optimal performance in the presence of a nonlinear boundary. Additionally, the majority of the methods appear to be capable of fitting a satisfactory boundary around the true boundary.
As for general nonlinear boundaries, the SVCs with different kernels are useful tools. Hence, in the future, we plan to adopt model averaging methodology to combine estimators with different kernels to avoid the hesitation in selecting, and try to promote the performance of SVCs.
Appendix B: Proofs
In the following, all limiting processes below correspond to \(n\rightarrow \infty \) unless stated otherwise.
1.1 B.1 Proof of Lemma 1
Proof of Lemma 1
Part of this proof follows from Zhang et al. (2016b), but there are some differences and the conclusion is also different from that of Zhang et al. (2016b).
We will prove (9) first. Recall that \({\hat{{\varvec{\beta }}}}_{(s)}=\arg \min _{{\varvec{\beta }}_{(s)}}\{n^{-1} \)\( \sum _{i=1}^{n}(1-y_i\textbf{x}_{{(s)},i}^{\top }{\varvec{\beta }}_{(s)})+2^{-1}\lambda _n\Vert {\varvec{\beta }}_{(s)}^+\Vert ^2\}\). We will show that, for any \(0<\eta <1\), there exist a large constant \(\triangle >0\) and an integer N such that when \(n>N\), we have
$$\begin{aligned}{} & {} \Pr \left\{ \min _{1\le s\le S_n}\inf _{\Vert \textbf{u}_{(s)}\Vert =\triangle } \left\{ l_s\left( {\varvec{\beta }}^*_{(s)}+\sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})}\textbf{u}_{(s)}\right) \right. \right. \nonumber \\{} & {} \qquad \left. \left. -l_s({\varvec{\beta }}^*_{(s)})\right\}>0 \right\} >1-\eta , \end{aligned}$$
(B1)
where \(\textbf{u}_{(s)}\in {\mathcal {R}}^{p_s}\) and \(l_s({\varvec{\beta }}_{(s)})=n^{-1}\sum _{i=1}^{n}(1-y_i\textbf{x}_{{(s)},i} \)\( ^{\top }{\varvec{\beta }}_{(s)})_++2^{-1}\lambda _n\Vert {\varvec{\beta }}_{(s)}^+\Vert ^2\). As the hinge loss is convex, this implies that with probability greater than \(1-\eta \), \(\max _{1\le s\le S_n}\Vert {\hat{{\varvec{\beta }}}}_{(s)}-{\varvec{\beta }}^*_{(s)}\Vert \le \triangle \sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})} \). Hence equation (9) in Lemma 1 holds.
Note that \(l_s\left( {\varvec{\beta }}^*_{(s)}+\sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})}\textbf{u}_{(s)}\right) -l_s\left( {\varvec{\beta }}^*_{(s)}\right) \) can be expressed as
$$\begin{aligned}{} & {} l_s\left( {\varvec{\beta }}^*_{(s)}+\sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})}\textbf{u}_{(s)}\right) -l_s\left( {\varvec{\beta }}^*_{(s)}\right) \nonumber \\{} & {} \quad =n^{-1} \sum _{i=1}^{n}\left\{ \left( 1-y_i\textbf{x}_{{(s)},i}^{\top }({\varvec{\beta }}^*_{(s)}+\sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})}\textbf{u}_{(s)}) \right) _+\right. \nonumber \\{} & {} \qquad \left. -\left( 1-y_i\textbf{x}_{{(s)},i}^{\top }{\varvec{\beta }}^*_{(s)}\right) _+ \right\} \nonumber \\{} & {} \qquad + 2^{-1}\lambda _n \left\| {\varvec{\beta }}^{*+}_{(s)}+\sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})}\textbf{u}_{(s)}^+\right\| ^2\nonumber \\{} & {} \qquad -2^{-1}\lambda _n \Vert {\varvec{\beta }}^{*+}_{(s)}\Vert ^2. \end{aligned}$$
(B2)
It is shown that
$$\begin{aligned}{} & {} \max _{1\le s\le S_n}\sup _{\Vert \textbf{u}_{(s)}\Vert =\triangle }\bigg |\left\| {\varvec{\beta }}^{*+}_{(s)}+\sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})}\textbf{u}_{(s)}^+ \right\| ^2-\left\| {\varvec{\beta }}^{*+}_{(s)}\right\| ^2 \bigg |\nonumber \\{} & {} \quad \le \max _{1\le s\le S_n}\sup _{\Vert \textbf{u}_{(s)}\Vert =\triangle }\left( \left\| {\varvec{\beta }}^{*+}_{(s)}+\sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})}\textbf{u}_{(s)}^+ \right\| +\left\| {\varvec{\beta }}^{*+}_{(s)}\right\| \right) \nonumber \\{} & {} \qquad \times \left\| \sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})}\textbf{u}_{(s)}^+ \right\| \nonumber \\{} & {} \quad \le 2\triangle C_2{p_{\max }}\sqrt{n^{-1}\log ({p_{\max }})}+\triangle n^{-1}{p_{\max }}\log ({p_{\max }})\nonumber \\{} & {} \quad =O(\triangle {p_{\max }}\sqrt{n^{-1}\log ({p_{\max }})}), \end{aligned}$$
(B3)
where the last inequality is obtained from Condition 3. Hence the order of difference of penalty terms in (B2) is \(O(\triangle \lambda _n{p_{\max }}\sqrt{n^{-1}\log ({p_{\max }})})\).
Denote
$$\begin{aligned} g_{s,i}(\textbf{u}_{(s)})&=\left( 1-y_i\textbf{x}_{{(s)},i}^{\top }({\varvec{\beta }}^*_{(s)}+\sqrt{n^{-1} {p_{\max }}\log ({p_{\max }})}\textbf{u}_{(s)}) \right) _+\\&\quad -\left( 1-y_i\textbf{x}_{{(s)},i}^{\top }{\varvec{\beta }}^*_{(s)}\right) _+\\&\quad +\sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})}y_i\textbf{x}_{{(s)},i}^{\top }\textbf{u}_{(s)}{\textbf{1}}\left( 1-y_i\textbf{x}_{{(s)},i}^{\top }{\varvec{\beta }}^*_{(s)}\ge 0 \right) \\&\quad -\mathbb {E}\left[ \left( 1-y_i\textbf{x}_{{(s)},i}^{\top }({\varvec{\beta }}^*_{(s)}+\sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})}\textbf{u}_{(s)})\right) _+ \right] \\&\quad +\mathbb {E}\left[ \left( 1-y_i\textbf{x}_{{(s)},i}^{\top }{\varvec{\beta }}^*_{(s)}\right) _+ \right] . \end{aligned}$$
It can be verified that \(\mathbb {E}[g_{{(s)},i}(\textbf{u})]=0\), \(s=1,2,\ldots ,S_n\) by the definition of \({\varvec{\beta }}^*_{(s)}\) and \({\textbf{J}}_{(s)}({\varvec{\beta }}^*_{(s)})=0\). Note that (B2) can be further decomposed as
$$\begin{aligned}&n^{-1}\sum _{i=1}^{n}\left\{ \left( 1-y_i\textbf{x}_{{(s)},i}^{\top }({\varvec{\beta }}^*_{(s)}+\sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})}\textbf{u}_{(s)}) \right) _+\right. \\&\quad \left. -\left( 1-y_i\textbf{x}_{{(s)},i}^{\top }{\varvec{\beta }}^*_{(s)}\right) _+ \right\} \\&\equiv n^{-1} (A_{s,n}+B_{s,n} ), \end{aligned}$$
where
$$\begin{aligned} A_{s,n}=\sum _{i=1}^{n}g_{s,i}(\textbf{u}_{(s)}) \end{aligned}$$
and
$$\begin{aligned} B_{s,n}&=\sum _{i=1}^{n}\Big [ -\sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})}y_i \textbf{x}_{{(s)},i}^{\top }\textbf{u}_{(s)}{\textbf{1}}\nonumber \\&\quad \left( 1-y_i\textbf{x}_{{(s)},i}^{\top }{\varvec{\beta }}^*_{(s)}\ge 0 \right) \nonumber \\&\quad + \mathbb {E}\left\{ \left( 1-y_i\textbf{x}_{{(s)},i}^{\top }({\varvec{\beta }}^*_{(s)}+\sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})}\textbf{u}_{(s)}) \right) _+ \right\} \nonumber \\&\quad -\mathbb {E}\left\{ \left( 1-y_i\textbf{x}_{{(s)},i}^{\top }{\varvec{\beta }}^*_{(s)}\right) _+\right\} \Big ]. \end{aligned}$$
(B4)
The remainder of the proof consists of three steps. In Step 1, we demonstrate that
$$\begin{aligned} \max _{1\le s\le S_n}\sup _{\Vert \textbf{u}_{(s)}\Vert =\triangle }|A_{s,n}|=\triangle ^{3/2}{p_{\max }}o_{p}(1). \end{aligned}$$
(B5)
In Step 2, it is shown that \(\min _{1\le s\le S_n}\inf _{\Vert \textbf{u}_{(s)}\Vert =\triangle }B_{s,n}\) dominates the terms of order \(\triangle ^{3/2}{p_{\max }}o_{p}(1)\) and is larger than zero. In Step 3, we use the results from the previous steps to prove (B1).
Step 1: We use the covering number introduced by van der Vaart and Wellner (1996) to prove the uniform rate in (B5). It suffices to show, for any \(\epsilon >0\), that
$$\begin{aligned} \Pr \left( \max _{1\le s\le S_n}\sup _{\Vert \textbf{u}_{(s)}\Vert =\triangle } p_s^{-1}\bigg |\sum _{i=1}^{n}g_{s,i}(\textbf{u}_{(s)}) \bigg |>\triangle ^{3/2}\epsilon \right) \rightarrow 0. \end{aligned}$$
(B6)
Note that the hinge loss satisfies the Lipschitz condition and \(\max _{1\le i \le n}\Vert \textbf{x}_{{(s)},i}\Vert \le C_1\sqrt{p_s}\), \(\max _{1\le i \le n}\mathbb {E}\Vert \textbf{x}_{{(s)},i}\Vert \le C_1\sqrt{p_s}\) from Condition 2. It is shown that
$$\begin{aligned} |g_{s,i}(\textbf{u}_{(s)})|&\le 3\triangle \sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})} \nonumber \\&\quad \max \left\{ \max _{1\le i \le n}\Vert \textbf{x}_{{(s)},i}\Vert ,\max _{1\le i \le n}\mathbb {E}\Vert \textbf{x}_{{(s)},i}\Vert \right\} \nonumber \\&\le 3C_1\triangle {p_{\max }}\sqrt{n^{-1}\log ({p_{\max }})} \end{aligned}$$
(B7)
and thus \(\max _{1\le s\le S_n}\sup _{\Vert \textbf{u}_{(s)}\Vert =\triangle }p_s^{-1}|g_{s,i}(\textbf{u}_{(s)})|=o(1)\) by Condition 6. By Lemma 2.5 of van de Geer (2000), the ball \(\{\textbf{u}_{(s)}:\Vert \textbf{u}_{(s)}\Vert \le \triangle \}\) in \({\mathcal {R}}^{p_s+1}\) can be covered by \(N_s\) balls with radius \(\zeta _s\), where \(N_s\le \{(4\triangle +\zeta _s)/\zeta _s\}^{p_s+1}\). Denote \({\textbf {u}}_{(s)}^{1},\ldots ,{\textbf {u}}_{(s)}^{N_s}\) as the centers of the \(N_s\) balls, let \(\zeta _s=(nM_1)^{-1} p_s\) (for some large constant \(M_1>0\) ) and denote \( {\mathcal {U}}_s^{k}=\{\textbf{u}_{(s)}: \Vert \textbf{u}_{(s)}-\textbf{u}_{(s)}^{k}\Vert \le \zeta _s \& \Vert \textbf{u}_{(s)}\Vert =\triangle \}\). For any \(\epsilon >0\), we have
$$\begin{aligned}&\max _{1\le s\le S_n}\max _{1\le k\le N_s} \sup _{\textbf{u}_{(s)}\in {\mathcal {U}}_s^{(k)}} p_s^{-1}\bigg |\sum _{i=1}^{n}g_{s,i}(\textbf{u}_{(s)})-\sum _{i=1}^{n}g_{s,i}(\textbf{u}_{(s)}^k)\bigg |\nonumber \\&\quad \le \max _{1\le s\le S_n}\max _{1\le k\le N_s} \sup _{\textbf{u}_{(s)}\in {\mathcal {U}}_s^{k}} p_s^{-1}\sum _{i=1}^{n}\bigg |g_{s,i}(\textbf{u}_{(s)})- g_{s,i}(\textbf{u}_{(s)}^k)\bigg |\nonumber \\&\quad \le \max _{1\le s\le S_n}\max _{1\le k\le N_s} \sup _{\textbf{u}_{(s)}\in {\mathcal {U}}_s^{k}} np_s^{-1}\nonumber \\&\qquad \quad \Big \{2\sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})}\Vert \textbf{x}_{{(s)},i}\Vert \Vert \textbf{u}_{(s)}-\textbf{u}_{(s)}^{k}\Vert \nonumber \\&\qquad +\sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})}\Vert \textbf{u}_{(s)}-\textbf{u}_{(s)}^{k}\Vert \mathbb {E}\Vert \textbf{x}_{{(s)},i}\Vert \Big \} \nonumber \\&\quad \le \max _{1\le s\le S_n}3\triangle n p_s^{-1} \sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})}\nonumber \\&\qquad \max \left\{ \max _{1\le i \le n}\Vert \textbf{x}_{{(s)},i}\Vert ,\max _{1\le i \le n}\mathbb {E}\Vert \textbf{x}_{{(s)},i}\Vert \right\} \zeta _s\nonumber \\&\quad \le 3C_1 M_1^{-1}\triangle {p_{\max }}\sqrt{n^{-1}\log ({p_{\max }})}\nonumber \\&\quad =o( \triangle ^{3/2}p_{\min }\epsilon /2), \end{aligned}$$
(B8)
where the last inequality arises from Condition 6. From (B8), it can be shown that
$$\begin{aligned}&\Pr \left( \max _{1\le s\le S_n}\sup _{\Vert \textbf{u}_{(s)}\Vert =\triangle } p_s^{-1}\bigg |\sum _{i=1}^{n}g_{s,i}(\textbf{u}_{(s)}) \bigg |>\triangle ^{3/2} \epsilon \right) \nonumber \\&\quad \le \Pr \Bigg (\max _{1\le s\le S_n}\max _{1\le k\le N_s} \sup _{\textbf{u}_{(s)}\in {\mathcal {U}}_s^{(k)}}p_s^{-1}\bigg |\sum _{i=1}^{n}g_{s,i}(\textbf{u}_{(s)})-\sum _{i=1}^{n}g_{s,i}(\textbf{u}_{(s)}^{k})\bigg |\nonumber \\&\qquad +\max _{1\le s\le S_n}\max _{1\le k \le N_s}p_s^{-1}\bigg |\sum _{i=1}^{n}g_{s,i}(\textbf{u}_{(s)}^{k}) \bigg |>\triangle ^{3/2}\epsilon \Bigg )\nonumber \\&\quad \le \Pr \Bigg (\max _{1\le s\le S_n}\max _{1\le k\le N_s} \sup _{\textbf{u}_{(s)}\in {\mathcal {U}}_s^{(k)}}\bigg |\sum _{i=1}^{n}g_{s,i}(\textbf{u}_{(s)})\nonumber \\&\qquad -\sum _{i=1}^{n}g_{s,i}(\textbf{u}_{(s)}^{k})\bigg |>\triangle ^{3/2}p_{\min }\epsilon /2\Bigg )\nonumber \\&\qquad +\sum _{s=1}^{S_n}\sum _{k=1}^{N_s}\Pr \left( \bigg |\sum _{i=1}^{n}g_{s,i}(\textbf{u}_{(s)}^{k}) \bigg |>\triangle ^{3/2}p_s\epsilon /2 \right) \nonumber \\&\quad = \sum _{s=1}^{S_n}\sum _{k=1}^{N_s}\Pr \left( \bigg |\sum _{i=1}^{n}g_{s,i}(\textbf{u}_{(s)}^{k}) \bigg |> \triangle ^{3/2}p_s\epsilon /2 \right) +o(1) \end{aligned}$$
(B9)
and \(\sum _{i=1}^{n}g_{s,i}(\textbf{u}_{(s)}^{(k)})\) is the sum of independent zero-mean random variables.
By the bounded conditional density, under Conditions 1 and 4, recognising that
\(\max _{1\le i \le n}\Vert \textbf{x}_{{(s)},i}\Vert \le C_1\sqrt{p_s}\), we have
$$\begin{aligned}&\Pr \left( |1-y_i\textbf{x}_{{(s)},i}^{\top }{\varvec{\beta }}^*_{(s)}|\le \sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})} \max _{1\le i \le n}\Vert \textbf{x}_{{(s)},i}\Vert \triangle \right) \nonumber \\&\quad =\Pr \Big (\pm 1-\sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})} \max _{1\le i \le n}\Vert \textbf{x}_{{(s)},i}\Vert \triangle \le \textbf{x}_{{(s)},i}^{\top }{\varvec{\beta }}^*_{(s)}\nonumber \\&\quad \le \sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})} \max _{1\le i \le n}\Vert \textbf{x}_{{(s)},i}\Vert \triangle \pm 1\Big \vert y_i=\pm 1\Big )\nonumber \\&\quad \le 2C_3\sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})}\max _{1\le i \le n}\Vert \textbf{x}_{{(s)},i}\Vert \triangle \nonumber \\&\quad \le 2\triangle C_1 C_3\sqrt{n^{-1} p_s{p_{\max }}\log ({p_{\max }})}. \end{aligned}$$
(B10)
Note that when \(1-y_i\textbf{x}_{{(s)},i}^{\top }{\varvec{\beta }}^*_{(s)}<\sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})}\)\(y_i\textbf{x}_{{(s)},i}^{\top }\textbf{u}_{(s)}\) and \(\sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})}y_i\textbf{x}_{{(s)},i}^{\top }\textbf{u}_{(s)}<0\), or when \(1-y_i\textbf{x}_{{(s)},i}^{\top }{\varvec{\beta }}^*_{(s)}>\sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})}y_i\textbf{x}_{{(s)},i}^{\top }\textbf{u}_{(s)}\) and \(\sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})}y_i\textbf{x}_{{(s)},i}^{\top }\textbf{u}_{(s)}>0\),
$$\begin{aligned}&\left( 1-y_i\textbf{x}_{{(s)},i}^{\top }({\varvec{\beta }}^*_{(s)}+\sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})}\textbf{u}_{(s)}) \right) _+\nonumber \\&\quad -(1-y_i\textbf{x}_{{(s)},i}^{\top }{\varvec{\beta }}^*_{(s)})_+ +\sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})} y_i\textbf{x}_{{(s)},i}\nonumber \\&\quad ^{\top }\textbf{u}_{(s)}{\textbf{1}}(1-y_i\textbf{x}_{{(s)},i}^{\top }{\varvec{\beta }}^*_{(s)}\ge 0)=0. \end{aligned}$$
(B11)
Furthermore, equation (B11) holds when \(|1-y_i\textbf{x}_{{(s)},i}^{\top }{\varvec{\beta }}^*_{(s)}|>\sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})}\max _{1\le i \le n}\Vert \textbf{x}_{{(s)},i}\Vert \triangle \) as \(\sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})}\max _{1\le i \le n}\Vert \textbf{x}_{{(s)},i}\Vert \triangle >\bigg |\sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})}y_i\textbf{x}_{{(s)},i}^{\top }\textbf{u}_{(s)}\bigg |\).
Hence we can write
$$\begin{aligned}&\mathop {\sum }\limits _{i=1}^{n}\mathbb {E}\{g_{s,i}^2(u^k_{(s)})\}\nonumber \\ {}&{} {\le } \sum _{i=1}^{n}\mathbb {E}\Bigg [\Bigg \{\Big |\left( 1{-}y_i{\textbf {x}}_{{(s)},i}^{\top }({\varvec{\beta }}^*_{(s)}{+}\sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})}{} {\textbf {u}}_{(s)}^{k}) \right) _+\nonumber \\{}&{} \qquad -\left( 1-y_i{\textbf {x}}_{{(s)},i}^{\top }{\varvec{\beta }}^*_{(s)}\right) _+\Big |\nonumber \\{}&{} \qquad +\Big |\sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})}y_i{\textbf {x}}_{{(s)},i}^{\top }{} {\textbf {u}}_{(s)}^{k} \Big |\Bigg \}^2{{\textbf {1}}}\Big (|1-y_i{\textbf {x}}_{{(s)},i}^{\top }{\varvec{\beta }}^*_{(s)}|\nonumber \\{}&{} \quad \le \sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})} \times \max _{1\le i \le n}\Vert {\textbf {x}}_{{(s)},i}\Vert \triangle \Big )\Bigg ]\nonumber \\{}&{} \quad \le \sum _{i=1}^{n}\mathbb {E}\Bigg \{ \left( 2\sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})}{} {\textbf {x}}_{{(s)},i}^{\top }{} {\textbf {u}}_{(s)}^k\right) ^2{{\textbf {1}}}\nonumber \\{}&{} \quad \quad \Big (|1-y_i{\textbf {x}}_{{(s)},i}^{\top }{\varvec{\beta }}^*_{(s)}|\le \sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})}\nonumber \\{}&{} \qquad \times \max _{1\le i \le n}\Vert {\textbf {x}}_{{(s)},i}\Vert \triangle \Big )\Bigg \}\nonumber \\{}&{} \quad \le \left( 2\sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})}\max _{1\le i \le n}\Vert {\textbf {x}}_{{(s)},i}\Vert \triangle \right) ^2\nonumber \\{}&{} \quad \qquad \sum _{i=1}^{n}\mathbb {E}\textbf{1}\Big (|1-y_i{\textbf {x}}_{{(s)},i}^{\top }{\varvec{\beta }}^*_{(s)}|\le \sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})}\nonumber \\{}&{} \qquad \times \max _{1\le i \le n}\Vert {\textbf {x}}_{{(s)},i}\Vert \triangle \Big )\nonumber \\{}&{} \quad \le 4 C_1^2\triangle ^2 n^{-1} p_s {p_{\max }}\log ({p_{\max }})\sum _{i=1}^{n}\mathbb {E}\textbf{1}\Big (|1-y_i{\textbf {x}}_{{(s)},i}^{\top }{\varvec{\beta }}^*_{(s)}|\nonumber \\{}&{} \qquad \le \sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})}\max _{1\le i \le n}\Vert {\textbf {x}}_{{(s)},i}\Vert \triangle \Big )\nonumber \\{}&{} \quad \le 4 C_1^2\triangle ^2 n^{-1} p_s {p_{\max }}\log ({p_{\max }})\nonumber \\{}&{} \qquad \times 2n\triangle C_1 C_3\sqrt{n^{-1} p_s{p_{\max }}\log ({p_{\max }})}\nonumber \\{}&{} \quad = 8\triangle ^3 C_1^3C_3n^{-1/2}p_s^{3/2}p^{3/2}_{\max }\log ^{3/2}({p_{\max }}) , \end{aligned}$$
(B12)
where the second-to-last inequality arises from \(\max _{1\le i \le n}\)\( \Vert \textbf{x}_{{(s)},i}\Vert \le C_1\sqrt{p_s}\) and the last inequality is from (B10). Finally, by Bernstein’s inequality and recognising (B7) and (B12), we can write
$$\begin{aligned}&\hspace{-21pc}\sum _{s=1}^{S_n}\sum _{k=1}^{N_s}\Pr \left( \bigg |\sum _{i=1}^{n}g_{s,i}(\textbf{u}^k) \bigg |>\triangle ^{3/2}p_s\epsilon /2 \right) \le \sum _{s=1}^{S_n}\sum _{k=1}^{N_s}2\exp \left( -\frac{\triangle ^{3}p_s^2\epsilon ^2/4}{\sum _{i=1}^{n}\mathbb {E}\{g_{s,i}^2(\textbf{u}^k)\}+3\triangle ^{5/2} C_1p_s{p_{\max }}\sqrt{n^{-1}\log ({p_{\max }})} \epsilon /2 }\right) \nonumber \\&\hspace{-21pc}\quad \le \sum _{s=1}^{S_n}\left( \frac{4\triangle +(nM_1)^{-1}p_s}{(nM_1)^{-1}p_s} \right) ^{p_s+1}\times \exp \left( -\frac{\triangle ^{3}p_s^2\epsilon ^2/4}{ 8\triangle ^3 C_1^3 C_3 n^{-1/2}p_s^{3/2}p^{3/2}_{\max }\log ^{3/2}({p_{\max }})+ 3\triangle ^{5/2} C_1p_s{p_{\max }}\sqrt{n^{-1}\log ({p_{\max }})} \epsilon /2 }\right) \nonumber \\&\hspace{-21pc}\quad \le S_n \left( \frac{4\triangle M_1n}{p_{\min }}+1 \right) ^{{p_{\max }}+1} \exp \left( -\frac{\triangle ^{3}p_{\min }^{1/2}\epsilon ^2/4}{ 16\triangle ^3 C_1^3 C_3 n^{-1/2}p^{3/2}_{\max }\log ^{3/2}({p_{\max }}) }\right) \nonumber \\&\hspace{-21pc}\quad =O(1)\exp \Big \{\log (S_n)+({p_{\max }}+1)\log (4\triangle nM_1p^{-1}_{\min }+1)-64^{-1}C_1^{-3}C_3^{-1}\epsilon ^2n^{1/2}p_{\min }^{1/2}p^{-3/2}_{\max }\log ^{-3/2}({p_{\max }})\Big \}\nonumber \\&\hspace{-21pc}\quad =o(1), \end{aligned}$$
(B13)
where the last equality is due to Condition 6 and \(S_n=O\{\exp (n^{\tau })\}\) for \(\tau \in (0, 1/2-3\kappa /2)\). The proof of (B6) is complete by combining (B9) and (B13).
Step 2: Let us rewrite \(B_{s,n}\) as \(B_{s,n}\equiv B_{s,n1}+B_{s,n2}\), where
$$\begin{aligned} B_{s,n1}&= -\sum _{i=1}^{n}\sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})}y_i \textbf{x}_{{(s)},i}^{\top }\textbf{u}_s {\textbf{1}}\\&\quad \left( 1-y_i\textbf{x}_{{(s)},i}^{\top }{\varvec{\beta }}^*_{(s)}\ge 0 \right) ,\\ \text {and} \\ B_{s,n2}&= \mathbb {E}\left\{ \left( 1-y_i\textbf{x}_{{(s)},i}^{\top }({\varvec{\beta }}^*_{(s)}+\sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})}\textbf{u}_{(s)}) \right) _+ \right\} \\&\quad -\mathbb {E}\left\{ \left( 1-y_i\textbf{x}_{{(s)},i}^{\top }{\varvec{\beta }}^*_{(s)}\right) _+\right\} . \end{aligned}$$
To analyse \(B_{s,n1}\), we observe that
$$\begin{aligned}&\bigg |\sum _{i=1}^{n}y_i\textbf{x}_{{(s)},i}^{\top }\textbf{u}_{(s)}{\textbf{1}}\left( 1-y_i\textbf{x}_{{(s)},i}^{\top }{\varvec{\beta }}^*_{(s)}\ge 0 \right) \bigg |\nonumber \\&\quad =\bigg |\sum _{j=0}^{p_s}\sum _{i=1}^{n}y_i\textbf{x}_{{(s)},ij}u_{{(s)},j}{\textbf{1}}\left( 1-y_i\textbf{x}_{{(s)},i}^{\top }{\varvec{\beta }}^*_{(s)}\ge 0 \right) \bigg |\nonumber \\&\quad {\le }\sum _{j=0}^{p_s}|u_{{(s)},j}|\max _{0\le j\le p_s}\bigg |\sum _{i=1}^{n}y_i x_{{(s)},ij}{\textbf{1}}\left( 1{-}y_i\textbf{x}_{{(s)},i}^{\top }{\varvec{\beta }}^*_{(s)}\ge 0 \right) \bigg |\nonumber \\&\quad \le \sqrt{\sum _{j=0}^{p_s}u_{{(s)},j}^2}\sqrt{\sum _{j=0}^{p_s}1}\max _{0\le j\le p_s}\bigg |\sum _{i=1}^{n}y_ix_{{(s)},ij}{\textbf{1}}\left( 1\right. \nonumber \\&\qquad \left. -y_i\textbf{x}_{{(s)},i}^{\top }{\varvec{\beta }}^*_{(s)}\ge 0 \right) \bigg |\nonumber \\&\quad \le \sqrt{p_s+1} \triangle \max _{0\le j\le p_s}\bigg |\sum _{i=1}^{n}y_i x_{{(s)},ij}{\textbf{1}}\left( 1\right. \nonumber \\&\qquad \left. -y_i\textbf{x}_{{(s)},i}^{\top }{\varvec{\beta }}^*_{(s)}\ge 0 \right) \bigg |. \end{aligned}$$
(B14)
By the definition of \({\textbf{J}}_s({\varvec{\beta }}^*_{(s)})\), note that \(\mathbb {E}\left[ y_ix_{{(s)},ij}{\textbf{1}}\left( 1- \right. \right. \)\(\left. \left. y_i\textbf{x}_{{(s)},i}^{\top }{\varvec{\beta }}^*_{(s)}\ge 0 \right) \right] =0\) for \(0\le j \le p_s\). By Lemma 14.24 in Bühlmann and van de Geer (2011) (the Nemirovski moment inequality),
$$\begin{aligned}&\mathbb {E}\left\{ \max _{0\le j\le p_s}\bigg |\sum _{i=1}^{n}y_ix_{{(s)},ij}{\textbf{1}}\left( 1-y_i\textbf{x}_{{(s)},i}^{\top }{\varvec{\beta }}^*_{(s)}\ge 0 \right) \bigg |\right\} \nonumber \\&\quad \le \sqrt{8\log (2p_s+2)}\mathbb {E}\left( \max _{1\le j\le p_s+1}\sum _{i=1}^{n}y_i^2x^2_{{(s)},ij}\right) ^{1/2}\nonumber \\&\quad \le \sqrt{ 8\log (2p_s+2)}\sqrt{nC_1^2}\nonumber \\&\quad =O(\sqrt{n\log (p_s)}), \end{aligned}$$
(B15)
where the last inequality is established by Condition 2. Additionally, using Markov’s inequality and by (B15), we obtain
$$\begin{aligned}&\max _{0\le j\le p_s}\bigg |\sum _{i=1}^{n}y_ix_{{(s)},ij}{\textbf{1}}\left( 1-y_i\textbf{x}_{{(s)},i}^{\top }{\varvec{\beta }}^*_{(s)}\ge 0 \right) \bigg |\nonumber \\&\quad = O_{p}(\sqrt{n \log (p_s)}). \end{aligned}$$
(B16)
Combining (B14) and (B16), we have
$$\begin{aligned}&\max _{1\le s\le S_n}\sup _{\Vert \textbf{u}_{(s)}\Vert =\triangle }|B_{s,n1}|\nonumber \\&\quad {=}\max _{1\le s\le S_n}\sup _{\Vert \textbf{u}_{(s)}\Vert =\triangle }\bigg |\sum _{i=1}^{n}{-}\sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})}y_i \textbf{x}_{{(s)},i}^{\top }\textbf{u}_{(s)}{\textbf{1}}\nonumber \\&\qquad \left( 1-y_i\textbf{x}_{{(s)},i}^{\top }{\varvec{\beta }}^*_{(s)}\ge 0 \right) \bigg |\nonumber \\&\quad =\sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})}\max _{1\le s\le S_n}\sup _{\Vert \textbf{u}_{(s)}\Vert =\triangle }\bigg |\sum _{i=1}^{n}y_i\textbf{x}_{{(s)},i}^{\top }\textbf{u}_{(s)}{\textbf{1}}\nonumber \\&\qquad \left( 1-y_i\textbf{x}_{{(s)},i}^{\top }{\varvec{\beta }}^*_{(s)}\ge 0 \right) \bigg |\nonumber \\&\quad \le \sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})} \triangle O_p\left\{ \sqrt{{p_{\max }}+1}\sqrt{n\log ({p_{\max }})}\right\} \nonumber \\&\quad = O_{p}(\triangle {p_{\max }}\log ({p_{\max }})). \end{aligned}$$
(B17)
Turning to \(B_{s,n2}\), under Conditions 5 and 6 and according to Koo et al. (2008), \({\textbf{H}}_{(s)}({\varvec{\beta }}_{(s)})\) is element-wise continuous at \({\varvec{\beta }}^*_s\). By Taylor expansion of the hinge loss at \({\varvec{\beta }}^*_{(s)}\), we have
$$\begin{aligned} {\textbf{H}}_{(s)}\left( {\varvec{\beta }}^*_{(s)}+t\sqrt{n^{-1}{p_{\max }}}\textbf{u}_{(s)}\right) ={\textbf{H}}_{(s)}({\varvec{\beta }}^*_{(s)})+o(1). \end{aligned}$$
(B18)
Hence, it is shown that
$$\begin{aligned}&\min _{1\le s\le S_n}\inf _{\Vert \textbf{u}_{(s)}\Vert =\triangle }B_{s,n2}\nonumber \\&\quad =\min _{1\le s\le S_n}\inf _{\Vert \textbf{u}_{(s)}\Vert =\triangle }\sum _{i=1}^{n}\nonumber \\&\qquad \Bigg [\mathbb {E}\left\{ \left( 1-y_i\textbf{x}_{{(s)},i}^{\top }({\varvec{\beta }}^*_{(s)}+\sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})}\textbf{u}_{(s)}) \right) _+ \right\} \nonumber \\&\qquad -\mathbb {E}\left\{ (1-y_i\textbf{x}_{{(s)},i}^{\top }{\varvec{\beta }}^*_{(s)})_+\right\} \Bigg ]\nonumber \\&\quad = \min _{1\le s\le S_n}\inf _{\Vert \textbf{u}_{(s)}\Vert =\triangle } 2^{-1}{p_{\max }}\log ({p_{\max }}) \textbf{u}_{(s)}^{\top }{\textbf{H}}_{(s)}\nonumber \\&\qquad \left( {\varvec{\beta }}^*_{(s)}+t\sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})}\textbf{u}_{(s)}\right) \textbf{u}_{(s)}\nonumber \\&\quad \ge 2^{-1}\triangle ^2 c_0{p_{\max }}\log ({p_{\max }}), \end{aligned}$$
(B19)
for some \(0<t<1\), where the last inequality is due to (B18) and Condition 5. It can be readily shown by (B4), (B17), (B19) and Condition 6 that when \(\triangle \) is sufficiently large, \(2^{-1}\triangle ^2 c_0{p_{\max }}(>0)\) dominates other terms in \(B_{s,n}\). This completes the proof of Step 2.
Step 3: Combining (B3), (B6), (B17) and (B19), when n and \(\triangle \) are sufficiently large, we have
$$\begin{aligned}&\max _{1\le s\le S_n}\inf _{\Vert \textbf{u}_{(s)}\Vert =\triangle } \left\{ l_s\left( {\varvec{\beta }}^*_{(s)}+\sqrt{n^{-1}p_{\max }\log ({p_{\max }})}\textbf{u}_{(s)}\right) -l_s({\varvec{\beta }}^*_{(s)})\right\} \nonumber \\&\quad =\max _{1\le s\le S_n}\inf _{\Vert \textbf{u}_{(s)}\Vert =\triangle }\Big \{ n^{-1}(A_{s,n}+B_{s,n})+2^{-1}\lambda _n \left\| {\varvec{\beta }}^{*+}_{(s)}\right. \nonumber \\&\qquad \left. +\sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})}\textbf{u}_{(s)}^+\right\| ^2 -2^{-1}\lambda _n \Vert {\varvec{\beta }}^{*+}_{(s)}\Vert ^2\Big \}\nonumber \\&\quad \ge \max _{1\le s\le S_n}\inf _{\Vert \textbf{u}_{(s)}\Vert =\triangle }\Big \{ n^{-1}B_{s,n}-n^{-1}|A_{s,n}|-2^{-1}\lambda _n \Big |\left\| {\varvec{\beta }}^{*+}_{(s)}\right. \nonumber \\&\qquad \left. +\sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})}\textbf{u}_{(s)}^+\right\| ^2- \Vert {\varvec{\beta }}^{*+}_{(s)}\Vert ^2\Big |\Big \}\nonumber \\&\quad \ge \min _{1\le s\le S_n}\inf _{\Vert \textbf{u}_{(s)}\Vert =\triangle }n^{-1} B_{s,n2}-\max _{1\le s\le S_n}\sup _{\Vert \textbf{u}_{(s)}\Vert =\triangle }n^{-1}|B_{s,n1}|\nonumber \\&\qquad -\max _{1\le s\le S_n}\sup _{\Vert \textbf{u}_{(s)}\Vert =\triangle }n^{-1}|A_{s,n}|\nonumber \\&\qquad -2^{-1}\lambda _n \max _{1\le s\le S_n}\sup _{\Vert \textbf{u}_{(s)}\Vert =\triangle } \Big |\left\| {\varvec{\beta }}^{*+}_{(s)}+\sqrt{n^{-1}{p_{\max }}\log ({p_{\max }})}\textbf{u}_{(s)}^+\right\| ^2\nonumber \\&\qquad - \Vert {\varvec{\beta }}^{*+}_{(s)}\Vert ^2\Big |\nonumber \\&\quad =2^{-1}n^{-1}\triangle ^2 c_0{p_{\max }}\log ({p_{\max }})- O_{p}(\triangle n^{-1} {p_{\max }}\log ({p_{\max }}))\nonumber \\&\qquad -\triangle ^{3/2}n^{-1}{p_{\max }}o_{p}(1)\nonumber \\&\qquad -2^{-1}\triangle \lambda _n{p_{\max }}\sqrt{n^{-1}\log ({p_{\max }})}\nonumber \\&>0, \end{aligned}$$
(B20)
where the last inequality is obtained from Conditions 5–6 and \(\lambda _n=O(\sqrt{n^{-1}\log ({p_{\max }})})\). This completes the proof of (B1).
Equation (10) can be proved in a similar way. Note that \(n-\lfloor n/J \rfloor \sim n\) and each sample from \({\mathcal {D}}_n\) is drawn independently from an identical distribution. Hence \({\widetilde{{\varvec{\beta }}}}^{[-j]}_{(s)}\) converges to \({\varvec{\beta }}^*_{(s)}\) in the same order as \({\hat{{\varvec{\beta }}}}_{(s)}\) for each \(j=1,2,\ldots ,J\), i.e.,
$$\begin{aligned} \max _{1\le j\le J} \max _{1\le s\le S_n}\left\| {\widetilde{{\varvec{\beta }}}}_{(s)}^{[-j]}-{\varvec{\beta }}^*_{(s)}\right\| =O_p\left( \sqrt{\frac{{p_{\max }}\log ({p_{\max }})}{n}}\right) . \end{aligned}$$
(B21)
1.2 B.2 Proof of Theorem 1
Let us introduce Lemma 2 that facilitates the proof of Theorem 1.
Lemma 2
Assume that Condition 7 and
$$\begin{aligned} \sup _{\textbf{w}\in {{\mathcal {W}}}}\bigg |\frac{\textrm{CV}(\textbf{w})-R_n(\textbf{w})}{R_n(\textbf{w})}\bigg |=o_p(1) \end{aligned}$$
(B22)
hold. Then
$$\begin{aligned} \frac{R_n({\hat{\textbf{w}}})}{\inf _{\textbf{w}\in {\mathcal {W}}} R_n(\textbf{w})}\rightarrow 1 \end{aligned}$$
(B23)
in probability, where \({\hat{\textbf{w}}}\) is the optimal solution from (8).
Proof of Lemma 2
By the definition of infimum, there exist a sequence \(\vartheta _n\) and a vector sequence \(\textbf{w}_n\in {{\mathcal {W}}}\) such that as \(n\rightarrow \infty \), \(\vartheta _n\rightarrow 0\) and
$$\begin{aligned} \inf _{\textbf{w}\in {{\mathcal {W}}}}R_n(\textbf{w})=R_n(\textbf{w}_n)-\vartheta _n. \end{aligned}$$
(B24)
From Condition 7, we have
$$\begin{aligned} \frac{R_n(\textbf{w}_n)}{\inf _{\textbf{w}\in {{\mathcal {W}}}}R_n(\textbf{w})}&>\frac{\inf _{\textbf{w}\in {{\mathcal {W}}}}R_n(\textbf{w})}{\inf _{\textbf{w}\in {{\mathcal {W}}}}R_n(\textbf{w})}=1, \end{aligned}$$
(B25)
and
$$\begin{aligned} \frac{\vartheta _n}{\inf _{\textbf{w}\in {{\mathcal {W}}}}R_n(\textbf{w})}&=o_p(1). \end{aligned}$$
(B26)
Taking (B22), (B25) and (B26) together, for any \(\delta >0\),
$$\begin{aligned}&\Pr \left\{ \bigg |\frac{\inf _{\textbf{w}\in {\mathcal {W}}} R_n(\textbf{w})}{R_n({\hat{\textbf{w}}})}-1\bigg |>\delta \right\} \nonumber \\&\quad =\Pr \left\{ \frac{R_n({\hat{\textbf{w}}})-\inf _{\textbf{w}\in {\mathcal {W}}} R_n(\textbf{w})}{R_n({\hat{\textbf{w}}})}-1>\delta \right\} \nonumber \\&\quad =\Pr \left\{ \frac{R_n({\hat{\textbf{w}}})-{\text {CV}}({\hat{\textbf{w}}})+{\text {CV}}({\hat{\textbf{w}}})-R_n(\textbf{w}_n)+\vartheta _n}{R_n({\hat{\textbf{w}}})}>\delta \right\} \nonumber \\&\quad \le \Pr \left\{ \frac{R_n({\hat{\textbf{w}}})-{\text {CV}}({\hat{\textbf{w}}})+{\text {CV}}(\textbf{w}_n)-R_n(\textbf{w}_n)+\vartheta _n}{R_n({\hat{\textbf{w}}})}>\delta \right\} \nonumber \\&\quad \le \Pr \left\{ \frac{|R_n({\hat{\textbf{w}}})-{\text {CV}}({\hat{\textbf{w}}})|}{R_n({\hat{\textbf{w}}})} +\frac{|{\text {CV}}(\textbf{w}_n)-R_n(\textbf{w}_n)|}{\inf _{\textbf{w}\in {{\mathcal {W}}}}R_n(\textbf{w})}\right. \nonumber \\&\qquad \left. +\frac{\vartheta _n}{\inf _{\textbf{w}\in {{\mathcal {W}}}}R_n(\textbf{w})}>\delta \right\} \nonumber \\&\quad \le \Pr \left\{ \sup _{\textbf{w}\in {{\mathcal {W}}}}\bigg |\frac{R_n(\textbf{w})-{\text {CV}}(\textbf{w})}{R_n(\textbf{w})}\bigg |\right. \nonumber \\&\qquad \left. +\frac{|{\text {CV}}(\textbf{w}_n)-R_n(\textbf{w}_n)|/R_n(\textbf{w}_n)}{\inf _{\textbf{w}\in {{\mathcal {W}}}}R_n(\textbf{w})/R_n(\textbf{w}_n)}+\frac{\vartheta _n}{\inf _{\textbf{w}\in {{\mathcal {W}}}}R_n(\textbf{w})}>\delta \right\} \nonumber \\&\quad \le \Pr \Bigg \{\sup _{\textbf{w}\in {{\mathcal {W}}}}\bigg |\frac{R_n(\textbf{w})-{\text {CV}}(\textbf{w})}{R_n(\textbf{w})}\bigg |\nonumber \\&\qquad +\sup _{\textbf{w}\in {{\mathcal {W}}}}\bigg |\frac{{\text {CV}}(\textbf{w})-R_n(\textbf{w})}{R_n(\textbf{w})}\bigg |\frac{R_n(\textbf{w}_n)}{\inf _{\textbf{w}\in {{\mathcal {W}}}}R_n(\textbf{w})}\nonumber \\&\qquad +\frac{\vartheta _n}{\inf _{\textbf{w}\in {{\mathcal {W}}}}R_n(\textbf{w})}>\delta \Bigg \}\nonumber \\&\quad \rightarrow 0, \end{aligned}$$
(B27)
which implies that (B23) is valid.
Proof of Theorem 1
Let
$$\begin{aligned} T_n=\frac{1}{n}\sum _{i=1}^{n}\left( 1-y_i\textbf{x}_i^{\top }{\hat{{\varvec{\beta }}}}(\textbf{w})\right) _+ \end{aligned}$$
(B28)
By Lemma 2 and the triangle inequality, it suffices to verify that
$$\begin{aligned} \sup _{\textbf{w}\in {{\mathcal {W}}}}\frac{|{\text {CV}}(\textbf{w})-T_n(\textbf{w})|}{R_n(\textbf{w})}=o_p(1), \end{aligned}$$
(B29)
and
$$\begin{aligned} \sup _{\textbf{w}\in {{\mathcal {W}}}}\frac{|T_n(\textbf{w})-R_n(\textbf{w})|}{R_n(\textbf{w})}=o_p(1). \end{aligned}$$
(B30)
For (B29), we have
$$\begin{aligned}&|{\text {CV}}(\textbf{w})-T_n(\textbf{w})|=\bigg |\frac{1}{n}\sum _{j=1}^{J}\sum _{i\in \mathcal{A}(j)}\left\{ (1-y_i\textbf{x}_i^{\top }{\widetilde{{\varvec{\beta }}}}^{[-j]}(\textbf{w}))_+\right. \nonumber \\&\qquad \left. -(1-y_i\textbf{x}_i^{\top }{\hat{{\varvec{\beta }}}}(\textbf{w}))_+\right\} \bigg |\nonumber \\&\quad \le \frac{1}{n}\sum _{j=1}^{J}\sum _{i\in \mathcal{A}(j)}\bigg |\int _{y_i\textbf{x}_i^{\top }{\widetilde{{\varvec{\beta }}}}^{[-j]}(\textbf{w})}^{y_i\textbf{x}_i^{\top }{\hat{{\varvec{\beta }}}}(\textbf{w})}I(t\le 1)\textrm{d}t \bigg |\nonumber \\&\quad \le \frac{1}{n}\sum _{j=1}^{J}\sum _{i\in \mathcal{A}(j)}\bigg |y_i\textbf{x}_i^{\top }\left( {\widetilde{{\varvec{\beta }}}}^{[-j]}(\textbf{w})-{\hat{{\varvec{\beta }}}}(\textbf{w})\right) \bigg |\nonumber \\&\quad \le \frac{1}{n}\sum _{j=1}^{J}\sum _{i\in \mathcal{A}(j)}\sum _{s=1}^{S_n}w_s\Vert \textbf{x}_{{(s)},i}\Vert \left\| {\widetilde{{\varvec{\beta }}}}_{(s)}^{[-j]}-{\hat{{\varvec{\beta }}}}_{(s)}\right\| \nonumber \\&\quad \le \frac{1}{n}\sum _{i=1}^{n}\max _{1\le s\le S_n}\Vert \textbf{x}_{{(s)},i}\Vert \max _{1\le j\le J}\max _{1\le s\le S_n}\left\| {\widetilde{{\varvec{\beta }}}}_{(s)}^{[-j]}-{\hat{{\varvec{\beta }}}}_{(s)}\right\| \nonumber \\&\quad \le C_1\sqrt{{p_{\max }}}\max _{1\le j\le J}\max _{1\le s\le S_n}\left\| {\widetilde{{\varvec{\beta }}}}_{(s)}^{[-j]}-{\hat{{\varvec{\beta }}}}_{(s)}\right\| \nonumber \\&\quad =O_p\left( \frac{{p_{\max }}\sqrt{\log ({p_{\max }})}}{\sqrt{n}}\right) \nonumber \\&\quad =o_p(1), \end{aligned}$$
(B31)
where the second last equality is established based on Lemma 1, and the last equality is based on Conditions 6. Coupled with Condition 7 and (B31), we obtain (B29).
To prove (B30), note that
$$\begin{aligned}&|T_n(\textbf{w})-R_n(\textbf{w})|=\bigg |\frac{1}{n}\sum _{i=1}^{n}\left( 1-y_i\textbf{x}_i^{\top }{\hat{{\varvec{\beta }}}}(\textbf{w}) \right) _+\nonumber \\&\qquad -\mathbb {E}\left\{ (1-y\textbf{x}^{\top }{\hat{{\varvec{\beta }}}}(\textbf{w}) )_+\mid {\mathcal {D}}_n\right\} \bigg |\nonumber \\&\quad \le \bigg |\frac{1}{n}\sum _{i=1}^{n}\left( 1{-}y_i\textbf{x}_i^{\top }{\hat{{\varvec{\beta }}}}(\textbf{w}) \right) _+{-}\frac{1}{n}\sum _{i=1}^{n}\left( 1{-}y_i\textbf{x}_i^{\top }{\varvec{\beta }}^*(\textbf{w}) \right) _+\bigg |\nonumber \\&\qquad +\bigg |\frac{1}{n}\sum _{i=1}^{n}\left( 1-y_i\textbf{x}_i^{\top }{\varvec{\beta }}^*(\textbf{w}) \right) _+-\mathbb {E}\left( 1-y\textbf{x}^{\top }{\varvec{\beta }}^*(\textbf{w})\right) _+\bigg |\nonumber \\&\qquad {+}\bigg |\mathbb {E}\left( 1{-}y\textbf{x}^{\top }{\varvec{\beta }}^*(\textbf{w})\right) _+{-}\mathbb {E}\left\{ (1{-}y\textbf{x}^{\top }{\hat{{\varvec{\beta }}}}(\textbf{w}))_+ \mid {\mathcal {D}}_n\right\} \bigg |\nonumber \\&\quad \equiv |\Omega _1(\textbf{w})|+|\Omega _2(\textbf{w})|+|\Omega _3(\textbf{w})|. \end{aligned}$$
(B32)
Recognising the above, Lemma 1 and Conditions 3 and 6, it can be shown that
$$\begin{aligned}&\sup _{\textbf{w}\in {{\mathcal {W}}}}|\Omega _1(\textbf{w})|\le \sup _{\textbf{w}\in {{\mathcal {W}}}}\frac{1}{n}\sum _{i=1}^{n}\nonumber \\&\qquad \bigg |\left( 1-y_i\textbf{x}_i^{\top }{\hat{{\varvec{\beta }}}}(\textbf{w}) \right) _+- \left( 1-y_i\textbf{x}_i^{\top }{\varvec{\beta }}^*(\textbf{w}) \right) _+\bigg |\nonumber \\&\quad \le \sup _{\textbf{w}\in {{\mathcal {W}}}}\frac{1}{n}\sum _{i=1}^{n}\bigg |y_i\textbf{x}_i^{\top }\left( {\varvec{\beta }}^*(\textbf{w})-{\hat{{\varvec{\beta }}}}(\textbf{w})\right) \bigg |\nonumber \\&\quad \le \sup _{\textbf{w}\in {{\mathcal {W}}}}\frac{1}{n}\sum _{i=1}^{n}\sum _{s=1}^{S_n}w_s \Vert \textbf{x}_{{(s)},i}\Vert \left\| {\varvec{\beta }}^*_{(s)}-{\hat{{\varvec{\beta }}}}_{(s)}\right\| \nonumber \\&\quad \le \max _{1\le s\le S_n}\left\| {\varvec{\beta }}^*_{(s)}-{\hat{{\varvec{\beta }}}}_{(s)}\right\| \max _{1\le i \le n}\max _{1\le s\le S_n}\Vert \textbf{x}_{{(s)},i}\Vert \nonumber \\&\quad =O_p\left( \frac{{p_{\max }}\sqrt{\log ({p_{\max }})}}{\sqrt{n}}\right) \nonumber \\&\quad =o_p(1). \end{aligned}$$
(B33)
Define
$$\begin{aligned} |\textbf{w}-\textbf{w}'|_1=\sum _{s=1}^{S_n}|w_s-w'_s|, \end{aligned}$$
(B34)
for any \(\textbf{w}=(w_1,\ldots ,w_{S_n})\in {{\mathcal {W}}}\) and \(\textbf{w}'=(w'_1,\ldots ,w'_{S_n})\in {{\mathcal {W}}}\). Let \(h_n=1/( {p_{\max }}\log n)\) and create grids using regions of the form \({{\mathcal {W}}}^{(l)}=\{\textbf{w}:|\textbf{w}-\textbf{w}^{(l)}|_1\le h_n\}\). By the notion of the \(\epsilon -\)covering number introduced by van der Vaart and Wellner (1996), \({{\mathcal {W}}}\) can be covered with \(N=O(1/h_n^{S_n-1})\) regions \({{\mathcal {W}}}^{(l)}\), \(l=1,\ldots ,N.\)
Note that
$$\begin{aligned}&\sup _{\textbf{w}\in {{\mathcal {W}}}^{(l)}}|\Omega _2(\textbf{w})-\Omega _2(\textbf{w}^{(l)})|\nonumber \\&\quad \le \sup _{\textbf{w}\in {{\mathcal {W}}}^{(l)}}\bigg |\frac{1}{n}\sum _{i=1}^{n}\left( 1-y_i\textbf{x}_i^{\top }{\varvec{\beta }}^*(\textbf{w}) \right) _+\nonumber \\&\qquad -\frac{1}{n}\sum _{i=1}^{n}\left( 1-y_i\textbf{x}_i^{\top }{\varvec{\beta }}^*(\textbf{w}^{(l)}) \right) _+\bigg |\nonumber \\&\qquad +\sup _{\textbf{w}\in {{\mathcal {W}}}^{(l)}}\bigg |\mathbb {E}\left( 1-y\textbf{x}^{\top }{\varvec{\beta }}^*(\textbf{w})\right) _+\nonumber \\&\qquad -\mathbb {E}\left( 1-y\textbf{x}^{\top }{\varvec{\beta }}^*(\textbf{w}^{(l)})\right) _+\bigg |\nonumber \\&\quad \le \sup _{\textbf{w}\in {{\mathcal {W}}}^{(l)}}\frac{1}{n}\sum _{i=1}^{n}\bigg |y_i\textbf{x}_i^{\top }\{{\varvec{\beta }}^*(\textbf{w}^{(l)})-{\varvec{\beta }}^*(\textbf{w})\} \bigg |\nonumber \\&\qquad +\sup _{\textbf{w}\in {{\mathcal {W}}}^{(l)}}\mathbb {E}\bigg |y\textbf{x}^{\top }\{{\varvec{\beta }}^*(\textbf{w}^{(l)})-{\varvec{\beta }}^*(\textbf{w})\} \bigg |\nonumber \\&\quad \le \sup _{\textbf{w}\in {{\mathcal {W}}}^{(l)}}\frac{1}{n}\sum _{i=1}^{n}\sum _{s=1}^{S_n}|w_s-w^{(l)}_s|\bigg |\textbf{x}_{{(s)},i}^{\top }{\varvec{\beta }}^*_{(s)}\bigg |\nonumber \\&\qquad +\sup _{\textbf{w}\in {{\mathcal {W}}}^{(l)}}\sum _{s=1}^{S_n}|w_s-w^{(l)}_s|\mathbb {E}\bigg |\textbf{x}_{{(s)},i}^{\top }{\varvec{\beta }}^*_{(s)}\bigg |\nonumber \\&\quad =\sup _{\textbf{w}\in {{\mathcal {W}}}^{(l)}}|\textbf{w}-\textbf{w}^{(l)}|_1\max _{1\le s\le S_n}\Vert {\varvec{\beta }}^*_{(s)}\Vert \nonumber \\&\qquad \left( \max _{1\le i \le n}\max _{1\le s\le S_n}\Vert \textbf{x}_{{(s)},i}\Vert + \max _{1\le i \le n}\max _{1\le s\le S_n}\mathbb {E}\Vert \textbf{x}_{{(s)},i}\Vert \right) \nonumber \\&\quad \le \frac{C_2\sqrt{{p_{\max }}}}{{p_{\max }}\log (n)}2C_1\sqrt{{p_{\max }}}\nonumber \\&\quad =O_p( \log ^{-1}(n))\nonumber \\&\quad =o_p(1), \end{aligned}$$
(B35)
where the result holds uniformly for j. Hence we have
$$\begin{aligned} \sup _{\textbf{w}\in {{\mathcal {W}}}}|\Omega _2(\textbf{w})|&=\max _{1\le l\le N}\sup _{\textbf{w}\in {{\mathcal {W}}}^{(j)}}|\Omega _2(\textbf{w})|\nonumber \\&\le \max _{1\le l\le N}|\Omega _2(\textbf{w}^{(l)})|\nonumber \\&\quad +\max _{1\le l\le N}\sup _{\textbf{w}\in {{\mathcal {W}}}^{(j)}}|\Omega _2(\textbf{w})-\Omega _2(\textbf{w}^{(l)})|\nonumber \\&=\max _{1\le l\le N}|\Omega _2({\textbf{w}}^{(l)})|+o_p(1). \end{aligned}$$
(B36)
Furthermore, for any \(\epsilon >0\),
$$\begin{aligned}&\Pr \left\{ \max _{1\le l\le N}|\Omega _2({\textbf{w}}^{(l)})|> 3\epsilon \right\} \nonumber \\&\quad =\Pr \Bigg [\max _{1\le l\le N}\Big \vert \frac{1}{n}\sum _{i=1}^{n}\left( 1-y_i\textbf{x}_i^{\top }{\varvec{\beta }}^*({\textbf{w}}^{(l)})\right) _+{\textbf{1}}\nonumber \\&\qquad \left( |1-y_i\textbf{x}_i^{\top }{\varvec{\beta }}^*({\textbf{w}}^{(l)})|<{p_{\max }}n^{0.1}\right) \nonumber \\&\qquad +\frac{1}{n}\sum _{i=1}^{n}\left( 1-y_i\textbf{x}_i^{\top }{\varvec{\beta }}^*({\textbf{w}}^{(l)})\right) _+\nonumber \\&\qquad {\textbf{1}}\left( |1-y_i\textbf{x}_i^{\top }{\varvec{\beta }}^*({\textbf{w}}^{(l)})|\ge {p_{\max }}n^{0.1}\right) \nonumber \\&\qquad -\mathbb {E}\left\{ (1-y\textbf{x}^{\top }{\varvec{\beta }}^*({\textbf{w}}^{(l)}))_+{\textbf{1}}\right. \nonumber \\&\qquad \left. \left( |1-y\textbf{x}^{\top }{\varvec{\beta }}^*({\textbf{w}}^{(l)})|< {p_{\max }}n^{0.1}\right) \right\} \nonumber \\&\qquad -\mathbb {E}\left\{ (1-y\textbf{x}^{\top }{\varvec{\beta }}^*({\textbf{w}}^{(l)}))_+{\textbf{1}}\right. \nonumber \\&\qquad \left. \left( |1-y\textbf{x}^{\top }{\varvec{\beta }}^*({\textbf{w}}^{(l)})|\ge {p_{\max }}n^{0.1}\right) \right\} \Big \vert> 3\epsilon \Bigg ]\nonumber \\&\quad \le \Pr \Bigg [\max _{1\le l\le N}\Big \vert \frac{1}{n}\sum _{i=1}^{n}\left( 1-y_i\textbf{x}_i^{\top }{\varvec{\beta }}^*({\textbf{w}}^{(l)})\right) _+{\textbf{1}}\nonumber \\&\qquad \left( |1-y_i\textbf{x}_i^{\top }{\varvec{\beta }}^*({\textbf{w}}^{(l)})|<{p_{\max }}n^{0.1}\right) \nonumber \\&\qquad -\mathbb {E}\left\{ (1-y\textbf{x}^{\top }{\varvec{\beta }}^*({\textbf{w}}^{(l)}))_+{\textbf{1}}\right. \nonumber \\&\qquad \left. \left( |1-y\textbf{x}^{\top }{\varvec{\beta }}^*({\textbf{w}}^{(l)})|< {p_{\max }}n^{0.1}\right) \right\} \Big \vert>\epsilon \Bigg ]\nonumber \\&\qquad +\Pr \Bigg [\max _{1\le l\le N}\frac{1}{n}\sum _{i=1}^{n}\left( 1-y_i\textbf{x}_i^{\top }{\varvec{\beta }}^*({\textbf{w}}^{(l)})\right) _+{\textbf{1}}\nonumber \\&\qquad \left( |1-y_i\textbf{x}_i^{\top }{\varvec{\beta }}^*({\textbf{w}}^{(l)})|\ge {p_{\max }}n^{0.1}\right)>\epsilon \Bigg ]\nonumber \\&\qquad +\Pr \Bigg [\max _{1\le l\le N}\mathbb {E}\left\{ (1-y\textbf{x}^{\top }{\varvec{\beta }}^*({\textbf{w}}^{(l)}))_+{\textbf{1}}\right. \nonumber \\&\qquad \left. \left( |1-y\textbf{x}^{\top }{\varvec{\beta }}^*({\textbf{w}}^{(l)})|\ge {p_{\max }}n^{0.1}\right) \right\} > \epsilon \Bigg ]\nonumber \\&\quad \equiv \Xi _1+\Xi _2+\Xi _3. \end{aligned}$$
(B37)
Clearly,
$$\begin{aligned}&\sum _{i=1}^{n}\mathbb {E}\left\{ (1-y_i\textbf{x}_i^{\top }{\varvec{\beta }}^*({\textbf{w}}^{(l)}))_+\right\} ^2\nonumber \\&\quad \le \sum _{i=1}^{n}\mathbb {E}\bigg |1-y_i\textbf{x}_i^{\top }{\varvec{\beta }}^*({\textbf{w}}^{(l)})\bigg |^2\nonumber \\&\quad \le \sum _{i=1}^{n}\mathbb {E}\left( 1+2|\textbf{x}_i^{\top }{\varvec{\beta }}^*(\textbf{w}^{(l)})|+\textbf{x}_i^{\top }{\varvec{\beta }}^*(\textbf{w}^{(l)}) {\varvec{\beta }}^{*\text {T}}(\textbf{w}^{(l)})\textbf{x}_i^{\top }\right) \nonumber \\&\quad \le \sum _{i=1}^{n}\mathbb {E}\left( 1+2\max _{1\le s\le S_n}\Vert \textbf{x}_{{(s)},i}\Vert \Vert {\varvec{\beta }}^*_{(s)}\Vert \right. \nonumber \\&\quad \left. +\max _{1\le s\le S_n}\Vert {\varvec{\beta }}^*_{(s)}\Vert ^2\Vert \textbf{x}_{{(s)},i}\Vert ^2 \right) \nonumber \\&\quad \le 4C^2_1C^2_2 n p^2_{\max }. \end{aligned}$$
(B38)
Using Boole’s and Bernstein’s inequalities and by (B38),
$$\begin{aligned} \Xi _1&\le \sum _{j=1}^N\Pr \Bigg [\Big \vert \frac{1}{n}\sum _{i=1}^{n}\left( 1-y_i\textbf{x}_i^{\top }{\varvec{\beta }}^*({\textbf{w}}^{(l)})\right) _+{\textbf{1}}\nonumber \\&\quad \left( |1-y_i\textbf{x}_i^{\top }{\varvec{\beta }}^*({\textbf{w}}^{(l)})|<{p_{\max }}n^{0.1}\right) \nonumber \\&\quad -\mathbb {E}\left\{ (1-y\textbf{x}^{\top }{\varvec{\beta }}^*({\textbf{w}}^{(l)}))_+{\textbf{1}}\right. \nonumber \\&\quad \left. \left( |1-y\textbf{x}^{\top }{\varvec{\beta }}^*({\textbf{w}}^{(l)})|< {p_{\max }}n^{0.1}\right) \right\} \Big \vert >\epsilon \Bigg ]\nonumber \\&\le N\exp \left( - \frac{n^2\epsilon ^2/2}{4C_1^2C_2^2 np_{\max }^2+\epsilon {p_{\max }}n^{0.1}/3}\right) \nonumber \\&\le ({p_{\max }}\log n)^{S_n-1}\exp \left( {-} \frac{n^2\epsilon ^2/2}{4C_1^2C_2^2 np_{\max }^2{+}\epsilon {p_{\max }}n^{0.1}/3}\right) \nonumber \\&=O\left\{ \exp \left( -\epsilon ^2 n p^{-2}_{\max }{+} S_n \log ({p_{\max }})+S_n\log \log (n)\right) \right\} \nonumber \\&=o(1), \end{aligned}$$
(B39)
where the last equality is established from Condition 6 and the condition that \(S_n=O(n^\tau )\) for \(\tau \in (0,1-2\kappa )\). Additionally, we can write
$$\begin{aligned}&\Xi _2=\Pr \Bigg \{\max _{1\le l\le N}\frac{1}{n}\sum _{i=1}^{n}\left( 1-y_i\textbf{x}_i^{\top }{\varvec{\beta }}^*({\textbf{w}}^{(l)})\right) _+{\textbf{1}}\nonumber \\&\qquad \left( |1-y_i\textbf{x}_i^{\top }{\varvec{\beta }}^*({\textbf{w}}^{(l)})|\ge {p_{\max }}n^{0.1}\right) >\epsilon \Bigg \}\nonumber \\&\quad \le \Pr \left( \max _{1\le l\le N}\max _{1\le i \le n}|1-y_i\textbf{x}_i^{\top }{\varvec{\beta }}^*({\textbf{w}}^{(l)})|\ge {p_{\max }}n^{0.1}\right) \nonumber \\&\quad \le \Pr \left\{ \max _{1\le l\le N}\max _{1\le i \le n}\sum _{s=1}^{S_n}{w^{(l)}_s}\left( 1+\Vert \textbf{x}_{{(s)},i}\Vert \Vert {\varvec{\beta }}^*_{(s)}\Vert \right) \right. \nonumber \\&\quad \left. \ge {p_{\max }}n^{0.1}\right\} \nonumber \\&\quad \le \Pr \left\{ \left( 1+\max _{1\le i \le n}\max _{1\le s\le S_n}\Vert \textbf{x}_{{(s)},i}\Vert \max _{1\le s\le S_n}\Vert {\varvec{\beta }}^*_{(s)}\Vert \right) \right. \nonumber \\&\quad \left. \ge {p_{\max }}n^{0.1}\right\} \nonumber \\&\quad = o(1), \end{aligned}$$
(B40)
where the last inequality holds because of Conditions 2 and 3. Similarly,
$$\begin{aligned} \Xi _3&=\Pr \left[ \max _{1\le l\le N}\mathbb {E}\left\{ (1-y\textbf{x}^{\top }{\varvec{\beta }}^*({\textbf{w}}^{(l)}))_+{\textbf{1}}\right. \right. \nonumber \\&\quad \left. \left. \left( |1-y\textbf{x}^{\top }{\varvec{\beta }}^*({\textbf{w}}^{(l)})|\ge {p_{\max }}n^{0.1}\right) \right\} > \epsilon \right] \nonumber \\&\le \Pr \left( \max _{1\le l\le N}\mathbb {E}|1-y\textbf{x}^{\top }{\varvec{\beta }}^*({\textbf{w}}^{(l)})|\ge {p_{\max }}n^{0.1}\right) \nonumber \\&\le \Pr \left\{ \left( 1+\max _{1\le s\le S_n}\mathbb {E}\Vert \textbf{x}_{(s)}\Vert \max _{1\le s\le S_n}\Vert {\varvec{\beta }}^*_{(s)}\Vert \right) \ge {p_{\max }}n^{0.1}\right\} \nonumber \\&=o(1). \end{aligned}$$
(B41)
Together with (B37), (B39)–(B41), we obtain \(\max _{1\le l\le N}|\)\( \Omega _2({\textbf{w}}^{(l)})|=o_p(1)\). As well, by (B36), we have
$$\begin{aligned} \sup _{\textbf{w}\in {{\mathcal {W}}}}|\Omega _2(\textbf{w})|=o_P(1). \end{aligned}$$
(B42)
Finally, note that \((y,\textbf{x})\) and \(({\tilde{y}}, {\tilde{\textbf{x}}})\) are independently and identically distributed, and under Lemma 1, we have
$$\begin{aligned}&\sup _{\textbf{w}\in {{\mathcal {W}}}}|\Omega _3(\textbf{w})|=\sup _{\textbf{w}\in {{\mathcal {W}}}}\bigg |\mathbb {E}\left( 1-y\textbf{x}^{\top }{\varvec{\beta }}^*(\textbf{w})\right) _+\nonumber \\&\qquad -\mathbb {E}\left\{ (1-{\tilde{y}}{\tilde{\textbf{x}}}^{\top }{\hat{{\varvec{\beta }}}}(\textbf{w})) \mid {\mathcal {D}}_n\right\} _+\bigg |\nonumber \\&\quad =\sup _{\textbf{w}\in {{\mathcal {W}}}}\bigg |\mathbb {E}\left( 1-{\tilde{y}}{\tilde{\textbf{x}}}^{\top }{\varvec{\beta }}^*(\textbf{w})\right) _+\nonumber \\&\quad -\mathbb {E}\left\{ (1-{\tilde{y}}{\tilde{\textbf{x}}}^{\top }{\hat{{\varvec{\beta }}}}(\textbf{w})) \mid {\mathcal {D}}_n\right\} _+\bigg |\nonumber \\&\quad = \sup _{\textbf{w}\in {{\mathcal {W}}}}\bigg |\mathbb {E}\int _{{\tilde{y}}{\tilde{\textbf{x}}}^{\top }{\varvec{\beta }}^*(\textbf{w})}^{{\tilde{y}}{\tilde{\textbf{x}}}^{\top }{\hat{{\varvec{\beta }}}}(\textbf{w})}I(t\le 1)\textrm{d}t\bigg \vert {\mathcal {D}}_n\bigg |\nonumber \\&\quad \le \sup _{\textbf{w}\in {{\mathcal {W}}}}\mathbb {E}\left\{ \bigg |{\tilde{y}}{\tilde{\textbf{x}}}^{\top }\left( {\hat{{\varvec{\beta }}}}(\textbf{w})-{\varvec{\beta }}^*(\textbf{w})\right) \bigg |\big \vert {\mathcal {D}}_n\right\} \nonumber \\&\quad \le \sup _{\textbf{w}\in {{\mathcal {W}}}}\sum _{s=1}^{S_n}w_s\mathbb {E}\left\{ \bigg |{\tilde{\textbf{x}}}_{(s)}^{\top }\left( {\hat{{\varvec{\beta }}}}_{(s)}-{\varvec{\beta }}^*_{(s)}\right) \bigg |\big \vert {\mathcal {D}}_n\right\} \nonumber \\&\quad \le \max _{1\le s\le S_n}\left\| {\hat{{\varvec{\beta }}}}_{(s)}-{\varvec{\beta }}^*_{(s)}\right\| \max _{1\le s\le S_n}\mathbb {E}\Vert {\tilde{\textbf{x}}}_{{(s)},i}\Vert \nonumber \\&\quad =O_p\left( \frac{ {p_{\max }}\sqrt{\log ({p_{\max }})}}{\sqrt{n}}\right) \nonumber \\&\quad =o_p(1), \end{aligned}$$
(B43)
where the last inequality holds due to Condition 6. Putting (B32), (B33), (B42) and (B43) together, we complete the proof of (B30).