Part of Advances in Neural Information Processing Systems 21 (NIPS 2008)
Sahand Negahban, Martin J. Wainwright
We consider the following instance of transfer learning: given a pair of regression problems, suppose that the regression coefficients share a partially common support, parameterized by the overlap fraction $\overlap$ between the two supports. This set-up suggests the use of $1, \infty$-regularized linear regression for recovering the support sets of both regression vectors. Our main contribution is to provide a sharp characterization of the sample complexity of this $1,\infty$ relaxation, exactly pinning down the minimal sample size $n$ required for joint support recovery as a function of the model dimension $\pdim$, support size $\spindex$ and overlap $\overlap \in [0,1]$. For measurement matrices drawn from standard Gaussian ensembles, we prove that the joint $1,\infty$-regularized method undergoes a phase transition characterized by order parameter $\orpar(\numobs, \pdim, \spindex, \overlap) = \numobs{(4 - 3 \overlap) s \log(p-(2-\overlap)s)}$. More precisely, the probability of successfully recovering both supports converges to $1$ for scalings such that $\orpar > 1$, and converges to $0$ to scalings for which $\orpar < 1$. An implication of this threshold is that use of $1, \infty$-regularization leads to gains in sample complexity if the overlap parameter is large enough ($\overlap > 2/3$), but performs worse than a naive approach if $\overlap < 2/3$. We illustrate the close agreement between these theoretical predictions, and the actual behavior in simulations. Thus, our results illustrate both the benefits and dangers associated with block-$1,\infty$ regularization in high-dimensional inference.