Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Better Locally Private Sparse Estimation Given Multiple Samples Per User

Yuheng Ma    Ke Jia    Hanfang Yang
Abstract

Previous studies yielded discouraging results for item-level locally differentially private linear regression with ssuperscript𝑠s^{*}italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT-sparsity assumption, where the minimax rate for nm𝑛𝑚nmitalic_n italic_m samples is 𝒪(sd/nmε2)𝒪superscript𝑠𝑑𝑛𝑚superscript𝜀2\mathcal{O}(s^{*}d/nm\varepsilon^{2})caligraphic_O ( italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_d / italic_n italic_m italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). This can be challenging for high-dimensional data, where the dimension d𝑑ditalic_d is extremely large. In this work, we investigate user-level locally differentially private sparse linear regression. We show that with n𝑛nitalic_n users each contributing m𝑚mitalic_m samples, the linear dependency of dimension d𝑑ditalic_d can be eliminated, yielding an error upper bound of 𝒪(s2/nmε2)𝒪superscript𝑠absent2𝑛𝑚superscript𝜀2\mathcal{O}(s^{*2}/nm\varepsilon^{2})caligraphic_O ( italic_s start_POSTSUPERSCRIPT ∗ 2 end_POSTSUPERSCRIPT / italic_n italic_m italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). We propose a framework that first selects candidate variables and then conducts estimation in the narrowed low-dimensional space, which is extendable to general sparse estimation problems with tight error bounds. Experiments on both synthetic and real datasets demonstrate the superiority of the proposed methods. Both the theoretical and empirical results suggest that, with the same number of samples, locally private sparse estimation is better conducted when multiple samples per user are available.

User Level Local Differential Privacy, Sparse Linear Regression

1 Introduction

Local differential privacy (LDP) (Kairouz et al., 2014; Duchi et al., 2018), a variant of differential privacy (DP) (Dwork et al., 2006), has gained considerable attention in recent years. LDP assumes that each sample is possessed by a data holder, who privatizes their data before it is collected by the curator. Offering a stronger sense of privacy protection compared to central DP, learning under LDP often encounters challenges such as slow convergence, high demand for local machine capacity, and limited accessibility to basic techniques (Duchi et al., 2018; Tramèr et al., 2022; Ma et al., 2024b), which obstruct the theoretical analysis and practical implementation of LDP learning.

Fortunately, in some scenarios, each user may possess multiple samples, which can serve as a way to overcome these difficulties. This is known as user-level LDP (ULDP) (Acharya et al., 2023; Bassily & Sun, 2023). Research has demonstrated performance improvement in intentionally designed models when each user has multiple samples, from both the central DP perspective (Liu et al., 2020; Ghazi et al., 2021; Levy et al., 2021; Narayanan et al., 2022; Ghazi et al., 2023) and the LDP perspective (Girgis et al., 2022; Acharya et al., 2023; Bassily & Sun, 2023). In most cases (for ULDP), the improvement lies in the effective sample size: if there are n𝑛nitalic_n users with m𝑚mitalic_m samples and privacy budget ε𝜀\varepsilonitalic_ε, the problem is as tractable as having nm𝑛𝑚nmitalic_n italic_m users with one sample and privacy budget ε𝜀\varepsilonitalic_ε. See Table 1 for a summary.

We proceed to ask the following question: Besides effective sample size, does having multiple samples per user offer benefits? If the answer to this question is affirmative, it holds practical significance. For instance, when designing data collection schemes, the primary focus should be on users capable and willing to provide multiple samples. Moreover, if a significant number of users lack trust in the data collector but are willing to share information within small groups (such as family or company), then better mechanisms can be devised for conducting the learning process.

In this work, we offer an affirmative response to the question from the perspective of sparse estimation. Sparse estimation stands as a crucial task in modern machine learning, especially when dealing with high-dimensional data where structured assumptions like sparsity can significantly enhance performance. Particularly, we study sparse linear regression. We first elucidate why the minimax lower bound fails to hold when each user possesses multiple samples and provide a lower bound for ULDP (Theorem 2.4). Subsequently, we introduce an algorithm structured as follows: half of the users perform local variable selection and aggregate their findings to identify the support of non-zero variables. Under mild assumptions, we establish theoretical guarantees for both local selection (Proposition 3.2) and aggregation (Proposition 3.3). Then, to conduct estimation on the narrowed space, we propose a sub-optimal multi-round protocol (Theorem 3.4) and a two-round protocol (Theorem 3.6). The latter achieves an estimation error 𝒪(s2/nmε2)𝒪superscript𝑠absent2𝑛𝑚superscript𝜀2\mathcal{O}(s^{*2}/nm\varepsilon^{2})caligraphic_O ( italic_s start_POSTSUPERSCRIPT ∗ 2 end_POSTSUPERSCRIPT / italic_n italic_m italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Compared to minimax error rate 𝒪(ds/nmε2)𝒪𝑑superscript𝑠𝑛𝑚superscript𝜀2\mathcal{O}(ds^{*}/nm\varepsilon^{2})caligraphic_O ( italic_d italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT / italic_n italic_m italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) under LDP, our rate improves by a factor of s/dsuperscript𝑠𝑑s^{*}/ditalic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT / italic_d which can be significant for high dimensional data. Furthermore, we demonstrate how the latter protocol straightforwardly extendeds to other sparse estimation problems (Theorem 3.7).

We summarize our contributions as follows.

  • We formalize, for the first time, the advantage of ULDP over LDP by considering the sparse assumption. Our findings reveal that the rates of sparse problems, such as sparse linear regression and sparse mean estimation, do not scale linearly in d𝑑ditalic_d under ULDP, which contrasts with previous negative results for LDP.

  • We provide a general framework for ULDP sparse estimation. Moreover, focusing on linear regression, we devise tailored methods that achieve tight upper bounds. The precise estimation procedures serve as solutions to low-dimensional ULDP linear regression, which are of independent interest.

  • We conduct experiments on both synthetic and real datasets, with convincing results demonstrating the superiority of our methods.

The article is structured as follows: In Section 2, we discuss related literature, preliminary knowledge, and minimax results of ULDP sparse linear regression. In Section 3, we present our solutions. In Section 4, we provide experiment results. All technical proofs, detailed algorithms, and additional experiment results are included in the appendix.

Table 1: Comparison of error rate between non-private, ULDP, and LDP results. Results assume the true parameter lies within subscript\ell_{\infty}roman_ℓ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT unit ball. Here, we consider sparse regression with beta-min condition, which improves a logd𝑑\log droman_log italic_d over the usual case.
Non-private
(nm𝑛𝑚nmitalic_n italic_m samples)
ε𝜀\varepsilonitalic_ε-ULDP
(n𝑛nitalic_n users m𝑚mitalic_m samples)
ε𝜀\varepsilonitalic_ε-LDP
(nm𝑛𝑚nmitalic_n italic_m samples)
discrete
distribution111Kairouz et al. (2016); Acharya et al. (2023)
knm𝑘𝑛𝑚\sqrt{\frac{k}{nm}}square-root start_ARG divide start_ARG italic_k end_ARG start_ARG italic_n italic_m end_ARG end_ARG k2nmε2superscript𝑘2𝑛𝑚superscript𝜀2\sqrt{\frac{k^{2}}{nm\varepsilon^{2}}}square-root start_ARG divide start_ARG italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n italic_m italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG k2nmε2superscript𝑘2𝑛𝑚superscript𝜀2\sqrt{\frac{k^{2}}{nm\varepsilon^{2}}}square-root start_ARG divide start_ARG italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n italic_m italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG
mean
estimation222Duchi et al. (2018); Bassily & Sun (2023)
dnm𝑑𝑛𝑚\frac{d}{nm}divide start_ARG italic_d end_ARG start_ARG italic_n italic_m end_ARG d2nmε2superscript𝑑2𝑛𝑚superscript𝜀2\frac{d^{2}}{nm\varepsilon^{2}}divide start_ARG italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n italic_m italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG d2nmε2superscript𝑑2𝑛𝑚superscript𝜀2\frac{d^{2}}{nm\varepsilon^{2}}divide start_ARG italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n italic_m italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG
sparse
regression333Ndaoud (2019); Zhu et al. (2023)
snmsuperscript𝑠𝑛𝑚\frac{s^{*}}{nm}divide start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG start_ARG italic_n italic_m end_ARG 𝐬𝟐𝐧𝐦ε𝟐superscript𝐬absent2𝐧𝐦superscript𝜀2\mathbf{\frac{s^{*2}}{nm\varepsilon^{2}}}divide start_ARG bold_s start_POSTSUPERSCRIPT ∗ bold_2 end_POSTSUPERSCRIPT end_ARG start_ARG bold_nm italic_ε start_POSTSUPERSCRIPT bold_2 end_POSTSUPERSCRIPT end_ARG (ours) ds2nmε2𝑑superscript𝑠absent2𝑛𝑚superscript𝜀2\frac{ds^{*2}}{nm\varepsilon^{2}}divide start_ARG italic_d italic_s start_POSTSUPERSCRIPT ∗ 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n italic_m italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG

2 ULDP Sparse Linear Regression

2.1 Preliminaries

We introduce necessary notations. For any vector x𝑥xitalic_x, let xisuperscript𝑥𝑖x^{i}italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT denote the i𝑖iitalic_i-th element of x𝑥xitalic_x. Let x{i1,,ij}superscript𝑥subscript𝑖1subscript𝑖𝑗x^{\{i_{1},\cdots,i_{j}\}}italic_x start_POSTSUPERSCRIPT { italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } end_POSTSUPERSCRIPT be a slicing vector of x𝑥xitalic_x, whose j𝑗jitalic_j-th elements is xijsuperscript𝑥subscript𝑖𝑗x^{i_{j}}italic_x start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. Let xpsubscriptnorm𝑥𝑝\|x\|_{p}∥ italic_x ∥ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT be the psubscript𝑝\ell_{p}roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT norm of x𝑥xitalic_x for 0p0𝑝0\leq p\leq\infty0 ≤ italic_p ≤ ∞. We will evaluate the estimation error by the squared loss, i.e. β^β22superscriptsubscriptnorm^𝛽superscript𝛽22\|\widehat{\beta}-\beta^{*}\|_{2}^{2}∥ over^ start_ARG italic_β end_ARG - italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. For matrix A𝐴Aitalic_A, let λi(A)subscript𝜆𝑖𝐴\lambda_{i}(A)italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_A ) denote the i𝑖iitalic_i-th largest singular value of A𝐴Aitalic_A. Throughout this paper, we use the notation anbnless-than-or-similar-tosubscript𝑎𝑛subscript𝑏𝑛a_{n}\lesssim b_{n}italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≲ italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and anbngreater-than-or-equivalent-tosubscript𝑎𝑛subscript𝑏𝑛a_{n}\gtrsim b_{n}italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≳ italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT to denote that there exist positive constant c𝑐citalic_c and csuperscript𝑐c^{\prime}italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT such that ancbnsubscript𝑎𝑛𝑐subscript𝑏𝑛a_{n}\leq cb_{n}italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≤ italic_c italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and ancbnsubscript𝑎𝑛superscript𝑐subscript𝑏𝑛a_{n}\geq c^{\prime}b_{n}italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≥ italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, for all n𝑛n\in\mathbb{N}italic_n ∈ blackboard_N. We use a=𝒪(b)𝑎𝒪𝑏a=\mathcal{O}(b)italic_a = caligraphic_O ( italic_b ) if abless-than-or-similar-to𝑎𝑏a\lesssim bitalic_a ≲ italic_b. We denote anbnasymptotically-equalssubscript𝑎𝑛subscript𝑏𝑛a_{n}\asymp b_{n}italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≍ italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT if anbnless-than-or-similar-tosubscript𝑎𝑛subscript𝑏𝑛a_{n}\lesssim b_{n}italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≲ italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and bnanless-than-or-similar-tosubscript𝑏𝑛subscript𝑎𝑛b_{n}\lesssim a_{n}italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≲ italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. Let ab=max(a,b)𝑎𝑏𝑎𝑏a\vee b=\max(a,b)italic_a ∨ italic_b = roman_max ( italic_a , italic_b ) and ab=min(a,b)𝑎𝑏𝑎𝑏a\wedge b=\min(a,b)italic_a ∧ italic_b = roman_min ( italic_a , italic_b ). Besides, for any set Ad𝐴superscript𝑑A\subset\mathbb{R}^{d}italic_A ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, the diameter of A𝐴Aitalic_A is defined by diam(A):=supx,xAxx2assigndiam𝐴subscriptsupremum𝑥superscript𝑥𝐴subscriptnorm𝑥superscript𝑥2\mathrm{diam}(A):=\sup_{x,x^{\prime}\in A}\|x-x^{\prime}\|_{2}roman_diam ( italic_A ) := roman_sup start_POSTSUBSCRIPT italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ italic_A end_POSTSUBSCRIPT ∥ italic_x - italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.

Suppose we have n𝑛nitalic_n users. The i𝑖iitalic_i-th user has m𝑚mitalic_m i.i.d. samples (Xi,yi)={(Xi,j,yi,j),j=1,,m}(X_{i},y_{i})=\{(X_{i,j},y_{i,j}),j=1,\cdots,m\}( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = { ( italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ) , italic_j = 1 , ⋯ , italic_m } from distribution PP\mathrm{P}roman_P on domain 𝒳×𝒴d×𝒳𝒴superscript𝑑\mathcal{X}\times\mathcal{Y}\subseteq\mathbb{R}^{d}\times\mathbb{R}caligraphic_X × caligraphic_Y ⊆ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT × blackboard_R. We consider the classical sparse linear regression. Let each Xi,jsubscript𝑋𝑖𝑗X_{i,j}italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT be i.i.d. sub-Gaussian. Moreover, Σ=𝔼[XX]Σ𝔼delimited-[]𝑋superscript𝑋top\Sigma=\mathbb{E}[XX^{\top}]roman_Σ = blackboard_E [ italic_X italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ] denote the covariance matrix of the marginal distribution. Assume CX1λd(Σ)λ1(Σ)CXsuperscriptsubscript𝐶𝑋1subscript𝜆𝑑Σsubscript𝜆1Σsubscript𝐶𝑋C_{X}^{-1}\leq\lambda_{d}(\Sigma)\leq\lambda_{1}(\Sigma)\leq C_{X}italic_C start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ≤ italic_λ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( roman_Σ ) ≤ italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( roman_Σ ) ≤ italic_C start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT for some constant CX>1subscript𝐶𝑋1C_{X}>1italic_C start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT > 1. For mean zero sub-Gaussian random variable σ𝜎\sigmaitalic_σ, conditional distribution PY|XsubscriptPconditional𝑌𝑋\mathrm{P}_{Y|X}roman_P start_POSTSUBSCRIPT italic_Y | italic_X end_POSTSUBSCRIPT and its coefficients βsuperscript𝛽\beta^{*}italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT are described by

y=Xβ+σ,βΩs,ad={β0s,\displaystyle y=X\beta^{*}+\sigma,\quad\beta^{*}\in\Omega_{s^{*},a}^{d}=\biggl% {\{}\|\beta^{*}\|_{0}\leq s^{*},italic_y = italic_X italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + italic_σ , italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ roman_Ω start_POSTSUBSCRIPT italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT = { ∥ italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ≤ italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , (1)
β1,maxβj>0|βj|subscriptnormsuperscript𝛽1subscriptsuperscript𝛽absent𝑗0superscript𝛽absent𝑗\displaystyle\|\beta^{*}\|_{\infty}\leq 1,\max_{\beta^{*j}>0}|\beta^{*j}|∥ italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≤ 1 , roman_max start_POSTSUBSCRIPT italic_β start_POSTSUPERSCRIPT ∗ italic_j end_POSTSUPERSCRIPT > 0 end_POSTSUBSCRIPT | italic_β start_POSTSUPERSCRIPT ∗ italic_j end_POSTSUPERSCRIPT | a},\displaystyle\geq a\biggr{\}},≥ italic_a } ,

which has ssuperscript𝑠s^{*}italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT-sparsity and non-zero entries bounded away from 0. Without loss of generality, we assume the first ssuperscript𝑠s^{*}italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT elements of βsuperscript𝛽\beta^{*}italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT are non-zero.

We adopt the following setting for privacy constraints. Any estimation of βsuperscript𝛽\beta^{*}italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is considered as a random variable, while its construction process with respect to the data is user-level locally differentially private (ULDP). We consider the sequential interactive case where the private observation Uisubscript𝑈𝑖U_{i}italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is decided only by its local samples (Xi,yi)subscript𝑋𝑖subscript𝑦𝑖(X_{i},y_{i})( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) and previous observations U1,,Ui1subscript𝑈1subscript𝑈𝑖1U_{1},\cdots,U_{i-1}italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_U start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT. The rigorous definition of (pure) ULDP is as follows.

Definition 2.1 (User-level local differential privacy).

Given data {(Xi,yi)}i=1nsuperscriptsubscriptsubscript𝑋𝑖subscript𝑦𝑖𝑖1𝑛\{(X_{i},y_{i})\}_{i=1}^{n}{ ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, each (Xi,yi)subscript𝑋𝑖subscript𝑦𝑖(X_{i},y_{i})( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) is mapped to privatized information Uisubscript𝑈𝑖U_{i}italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT which is a random variable on 𝒰𝒰\mathcal{U}caligraphic_U. Let σ(𝒰)𝜎𝒰\sigma(\mathcal{U})italic_σ ( caligraphic_U ) be the σ𝜎\sigmaitalic_σ-field on 𝒰𝒰\mathcal{U}caligraphic_U. Uisubscript𝑈𝑖U_{i}italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is drawn conditional on (Xi,yi)subscript𝑋𝑖subscript𝑦𝑖(X_{i},y_{i})( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) via the distribution RR\mathrm{R}roman_R(UiXi=x,Yi=y,U1:(i1)=u1:(i1))formulae-sequenceconditionalsubscript𝑈𝑖subscript𝑋𝑖𝑥formulae-sequencesubscript𝑌𝑖𝑦subscript𝑈:1𝑖1subscript𝑢:1𝑖1\left(U_{i}\mid X_{i}=x,Y_{i}=y,U_{1:(i-1)}=u_{1:(i-1)}\right)( italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∣ italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_x , italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_y , italic_U start_POSTSUBSCRIPT 1 : ( italic_i - 1 ) end_POSTSUBSCRIPT = italic_u start_POSTSUBSCRIPT 1 : ( italic_i - 1 ) end_POSTSUBSCRIPT ). Then the mechanism RR\mathrm{R}roman_R provides ε𝜀\varepsilonitalic_ε-user-level local differential privacy (ε𝜀\varepsilonitalic_ε-ULDP) if

R(UiUXi=x,Yi=y,U1:(i1)=u1:(i1))R(UiUXi=x,Yi=y,U1:(i1)=u1:(i1))eε\displaystyle\frac{\mathrm{R}\left(U_{i}\in U\mid X_{i}=x,Y_{i}=y,U_{1:(i-1)}=% u_{1:(i-1)}\right)}{\mathrm{R}\left(U_{i}\in U\mid X_{i}=x^{\prime},Y_{i}=y^{% \prime},U_{1:(i-1)}=u_{1:(i-1)}\right)}\leq e^{\varepsilon}divide start_ARG roman_R ( italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_U ∣ italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_x , italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_y , italic_U start_POSTSUBSCRIPT 1 : ( italic_i - 1 ) end_POSTSUBSCRIPT = italic_u start_POSTSUBSCRIPT 1 : ( italic_i - 1 ) end_POSTSUBSCRIPT ) end_ARG start_ARG roman_R ( italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_U ∣ italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_U start_POSTSUBSCRIPT 1 : ( italic_i - 1 ) end_POSTSUBSCRIPT = italic_u start_POSTSUBSCRIPT 1 : ( italic_i - 1 ) end_POSTSUBSCRIPT ) end_ARG ≤ italic_e start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT

for all 1in1𝑖𝑛1\leq i\leq n1 ≤ italic_i ≤ italic_n, Uσ(𝒰)𝑈𝜎𝒰U\in\sigma(\mathcal{U})italic_U ∈ italic_σ ( caligraphic_U ), x,x𝒳m𝑥superscript𝑥superscript𝒳𝑚x,x^{\prime}\in\mathcal{X}^{m}italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_X start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT, and y,y𝒴m𝑦superscript𝑦superscript𝒴𝑚y,y^{\prime}\in\mathcal{Y}^{m}italic_y , italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_Y start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT.

ULDP reduces to the conventional item-level LDP for m=1𝑚1m=1italic_m = 1. Besides being more practically reasonable (Cummings et al., 2022), ULDP is also a more stringent definition than item-level LDP. To achieve ε𝜀\varepsilonitalic_ε-ULDP by trivially using group privacy, each item must use a significantly smaller budget ε/m𝜀𝑚\varepsilon/mitalic_ε / italic_m. Conversely, on the curator side, inference of any single item is no easier than inference of the whole user, which means each item is as safe as ε𝜀\varepsilonitalic_ε-LDP against the curator. As a compromise, each item should expose information to its group mates. The requirement is typically acceptable, such as when each user has multiple records on a personal cellphone, or when data sources can be clustered into small groups where secrete information is safely shared.

2.2 Related Work

Extensive studies have been conducted focusing on the central DP setting for linear regression model in low dimensions (Wang, 2018; Avella-Medina et al., 2023; Alabi et al., 2022; Arora et al., 2022; Amin et al., 2023) and high dimensions (Kifer et al., 2012; Talwar et al., 2015; Kumar & Deisenroth, 2019; Zhang & Zhang, 2021; Cai et al., 2021; Hu et al., 2022; Khanna et al., 2023a, b; Raff et al., 2023). Despite variations in settings and assumptions, state-of-the-art results (Liu et al., 2022; Varshney et al., 2022; Cai et al., 2023) indicate a general error rate of 𝒪(slogd/(nε2))𝒪superscript𝑠𝑑𝑛superscript𝜀2\mathcal{O}(s^{*}\log d/(n\varepsilon^{2}))caligraphic_O ( italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT roman_log italic_d / ( italic_n italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ) for squared loss, where the dependency on the dimension of feature space is logd𝑑\log droman_log italic_d. Thus, d𝑑ditalic_d can be exponentially large in nε2𝑛superscript𝜀2n\varepsilon^{2}italic_n italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT to ensure consistent estimation.

This is not the case in local setting, of which there is still a lack of understanding compared to the central one. Several works addressed the problem focusing on the optimization error (Smith et al., 2017; Zheng et al., 2017). Both works assumed diam(𝒳)1diam𝒳1\mathrm{diam}(\mathcal{X})\leq 1roman_diam ( caligraphic_X ) ≤ 1 and do not generalize to many practical settings, such as when all features are i.i.d. and therefore diam(𝒳)=𝒪(d)diam𝒳𝒪𝑑\mathrm{diam}(\mathcal{X})=\mathcal{O}(\sqrt{d})roman_diam ( caligraphic_X ) = caligraphic_O ( square-root start_ARG italic_d end_ARG ). As for statistical estimation, Duchi et al. (2018) showed the matching upper and lower bounds for low dimensional, non-interactive linear regression are 𝒪(d/(nε2))𝒪𝑑𝑛superscript𝜀2\mathcal{O}(d/(n\varepsilon^{2}))caligraphic_O ( italic_d / ( italic_n italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ). Wang & Xu (2019) first provided the lower bound 𝒪(d/(nε2))𝒪𝑑𝑛superscript𝜀2\mathcal{O}(d/(n\varepsilon^{2}))caligraphic_O ( italic_d / ( italic_n italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ) for LDP linear regression with 1-sparsity, which is then generalized to s𝑠sitalic_s-sparsity by Zhu et al. (2023). In summary, these prohibitive results indicate that there exists no meaningful approach when dnε2asymptotically-equals𝑑𝑛superscript𝜀2d\asymp n\varepsilon^{2}italic_d ≍ italic_n italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, which is often the case in practice.

Our approach utilize a selection-estimation strategy, which is shown to be advantages under many situation. Under non-private setting, Wang et al. (2011); Liang et al. (2023) select candidate variables by aggregating Lasso fitted on random subsamples, which is also adopted with privacy (Kifer et al., 2012). More recently, the strategy has been used for communication-constrained learning (Duchi & Rogers, 2019; Barik & Honorio, 2020; Acharya et al., 2021). Note that our method is also communication efficient, as each user sends only 1111 bit of information. Acharya et al. (2021) tackled LDP sparse discrete distribution estimation by selecting the support variables. However, this is only feasible for such specific problems where users can provide useful information about which variables are potentially non-zero given only one sample. Their result does not generalize to other problems.

During the preparation of the camera-ready version of this paper, Kent et al. (2024) appeared online and analyzed sparse mean estimation under ULDP. We share some results with their conclusions, including the negative results for mslogd𝑚superscript𝑠𝑑m\leq s^{*}\log ditalic_m ≤ italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT roman_log italic_d, the established rates, and a support estimation type estimator. However, we primarily consider the case where s,logdnε2,mdformulae-sequenceless-than-or-similar-tosuperscript𝑠𝑑𝑛superscript𝜀2less-than-or-similar-to𝑚𝑑s^{*},\log d\lesssim n\varepsilon^{2},m\lesssim ditalic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , roman_log italic_d ≲ italic_n italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_m ≲ italic_d, whereas their analysis is more comprehensive, considering other regions and, more importantly, identifying the phase transition.

2.3 Minimax Lower Bound

We introduce the related minimax results of locally private sparse linear regression. For any loss function \ellroman_ℓ (squared loss in our case), the minimax convergence rate is

infβsupP𝔼P[(β,β(X,y))]subscriptinfimum𝛽subscriptsupremumPsubscript𝔼Pdelimited-[]superscript𝛽𝛽𝑋𝑦\displaystyle\inf_{\beta}\sup_{\mathrm{P}\in\mathcal{H}}\mathbb{E}_{\mathrm{P}% }\left[\ell(\beta^{*},\beta(X,y))\right]roman_inf start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT roman_sup start_POSTSUBSCRIPT roman_P ∈ caligraphic_H end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT roman_P end_POSTSUBSCRIPT [ roman_ℓ ( italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_β ( italic_X , italic_y ) ) ]

where \mathcal{H}caligraphic_H is the hypothesis distribution class and β𝛽\betaitalic_β is any estimator of βsuperscript𝛽\beta^{*}italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. The minimax lower bound for sparse linear regression under LDP is well explored in Wang & Xu (2019) and Zhu et al. (2023).

Proposition 2.2 (LDP lower bound).

Let \mathcal{H}caligraphic_H be distribution class satisfying (1) for 0a10𝑎10\leq a\leq 10 ≤ italic_a ≤ 1. Let data {(Xi,yi)}i=1nsuperscriptsubscriptsubscript𝑋𝑖subscript𝑦𝑖𝑖1𝑛\{(X_{i},y_{i})\}_{i=1}^{n}{ ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT be generated from (1) with n=nm𝑛superscript𝑛superscript𝑚n=n^{\prime}m^{\prime}italic_n = italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and m=1𝑚1m=1italic_m = 1. For 0<ε10𝜀10<\varepsilon\leq 10 < italic_ε ≤ 1, let βεsubscript𝛽𝜀\beta_{\varepsilon}italic_β start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT be any ε𝜀\varepsilonitalic_ε-LDP estimator of βsuperscript𝛽\beta^{*}italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. Then we have

infβεsupP𝔼P[ββε22]dsnmε2.greater-than-or-equivalent-tosubscriptinfimumsubscript𝛽𝜀subscriptsupremumPsubscript𝔼Pdelimited-[]superscriptsubscriptnormsuperscript𝛽subscript𝛽𝜀22𝑑superscript𝑠superscript𝑛superscript𝑚superscript𝜀2\displaystyle\inf_{\beta_{\varepsilon}}\sup_{\mathrm{P}\in\mathcal{H}}\mathbb{% E}_{\mathrm{P}}\left[\left\|\beta^{*}-\beta_{\varepsilon}\right\|_{2}^{2}% \right]\gtrsim\frac{ds^{*}}{n^{\prime}m^{\prime}\varepsilon^{2}}.roman_inf start_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_sup start_POSTSUBSCRIPT roman_P ∈ caligraphic_H end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT roman_P end_POSTSUBSCRIPT [ ∥ italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≳ divide start_ARG italic_d italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .

The above result yields that for dnmgreater-than-or-equivalent-to𝑑𝑛𝑚d\gtrsim nmitalic_d ≳ italic_n italic_m, any attempt to LDP sparse linear regression is effortless, as the estimation error does not even converge. In fact, similar negative result also holds when m𝑚mitalic_m is small yet larger than 1.

Proposition 2.3 (Necessity of sufficiently large 𝐦𝐦\mathbf{m}bold_m).

Suppose s2nε2dsuperscript𝑠absent2𝑛superscript𝜀2less-than-or-similar-to𝑑s^{*2}\leq n\varepsilon^{2}\lesssim\sqrt{d}italic_s start_POSTSUPERSCRIPT ∗ 2 end_POSTSUPERSCRIPT ≤ italic_n italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≲ square-root start_ARG italic_d end_ARG and mslogd𝑚superscript𝑠𝑑m\leq s^{*}\log ditalic_m ≤ italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT roman_log italic_d. Let \mathcal{H}caligraphic_H be distribution class satisfying (1) with some constant a[0,1]𝑎01a\in[0,1]italic_a ∈ [ 0 , 1 ]. Let data {(Xi,yi)}i=1nsuperscriptsubscriptsubscript𝑋𝑖subscript𝑦𝑖𝑖1𝑛\{(X_{i},y_{i})\}_{i=1}^{n}{ ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT be generated from (1). For 0<ε10𝜀10<\varepsilon\leq 10 < italic_ε ≤ 1, let βεsubscript𝛽𝜀\beta_{\varepsilon}italic_β start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT be any ε𝜀\varepsilonitalic_ε-LDP estimator of βsuperscript𝛽\beta^{*}italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. Then we have

infβεsupP𝔼P[ββε22]1s.greater-than-or-equivalent-tosubscriptinfimumsubscript𝛽𝜀subscriptsupremumPsubscript𝔼Pdelimited-[]superscriptsubscriptnormsuperscript𝛽subscript𝛽𝜀221superscript𝑠\displaystyle\inf_{\beta_{\varepsilon}}\sup_{\mathrm{P}\in\mathcal{H}}\mathbb{% E}_{\mathrm{P}}\left[\left\|\beta^{*}-\beta_{\varepsilon}\right\|_{2}^{2}% \right]\gtrsim\frac{1}{s^{*}}.roman_inf start_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_sup start_POSTSUBSCRIPT roman_P ∈ caligraphic_H end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT roman_P end_POSTSUBSCRIPT [ ∥ italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≳ divide start_ARG 1 end_ARG start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG .

Proposition 2.3 shows that for mslogd𝑚superscript𝑠𝑑m\leq s^{*}\log ditalic_m ≤ italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT roman_log italic_d, the error does dot converge to zero as n𝑛nitalic_n grows. However, this is not the case for user-level LDP if m𝑚mitalic_m is sufficiently large. The rigorous counterargument is by establishing upper bound in Theorem 3.6, which is 𝒪(s2/nmε2)𝒪superscript𝑠absent2𝑛𝑚superscript𝜀2\mathcal{O}(s^{*2}/nm\varepsilon^{2})caligraphic_O ( italic_s start_POSTSUPERSCRIPT ∗ 2 end_POSTSUPERSCRIPT / italic_n italic_m italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). We explain why the bound fails to generalize. Its proof involves construction of a function class PZsubscriptP𝑍\mathrm{P}_{Z}roman_P start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT and a distribution of Z𝑍Zitalic_Z, such that the mutual information between Z𝑍Zitalic_Z and private views U1,,Unsubscript𝑈1subscript𝑈𝑛U_{1},\cdots,U_{n}italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_U start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is bounded from above and below. The former does not hold any more given m𝑚mitalic_m samples, since the mutual information becomes larger exponentially in m𝑚mitalic_m. By carefully bounding the quantity, we establish the following lower bound for ULDP.

Theorem 2.4 (ULDP lower bound).

Suppose nε2s2𝑛superscript𝜀2superscript𝑠absent2n\varepsilon^{2}\geq s^{*2}italic_n italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ italic_s start_POSTSUPERSCRIPT ∗ 2 end_POSTSUPERSCRIPT, md𝑚𝑑m\leq ditalic_m ≤ italic_d, and nε2d𝑛superscript𝜀2𝑑n\varepsilon^{2}\leq ditalic_n italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_d. Let \mathcal{H}caligraphic_H be distribution class satisfying (1) with a=sm𝑎superscript𝑠𝑚a=\sqrt{\frac{s^{*}}{m}}italic_a = square-root start_ARG divide start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG start_ARG italic_m end_ARG end_ARG. Let data {(Xi,yi)}i=1nsuperscriptsubscriptsubscript𝑋𝑖subscript𝑦𝑖𝑖1𝑛\{(X_{i},y_{i})\}_{i=1}^{n}{ ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT be generated from (1). For 0<ε10𝜀10<\varepsilon\leq 10 < italic_ε ≤ 1, let βεsubscript𝛽𝜀\beta_{\varepsilon}italic_β start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT be any ε𝜀\varepsilonitalic_ε-ULDP estimator of βsuperscript𝛽\beta^{*}italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. Then we have

infβεsupP𝔼P[ββε22]s2nmε2.greater-than-or-equivalent-tosubscriptinfimumsubscript𝛽𝜀subscriptsupremumPsubscript𝔼Pdelimited-[]superscriptsubscriptnormsuperscript𝛽subscript𝛽𝜀22superscript𝑠absent2𝑛𝑚superscript𝜀2\displaystyle\inf_{\beta_{\varepsilon}}\sup_{\mathrm{P}\in\mathcal{H}}\mathbb{% E}_{\mathrm{P}}\left[\left\|\beta^{*}-\beta_{\varepsilon}\right\|_{2}^{2}% \right]\gtrsim\frac{s^{*2}}{nm\varepsilon^{2}}.roman_inf start_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_sup start_POSTSUBSCRIPT roman_P ∈ caligraphic_H end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT roman_P end_POSTSUBSCRIPT [ ∥ italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≳ divide start_ARG italic_s start_POSTSUPERSCRIPT ∗ 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n italic_m italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .

The result shows that any ULDP estimator admits an error scaling at least with 1/nmε21𝑛𝑚superscript𝜀21/nm\varepsilon^{2}1 / italic_n italic_m italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Thus, a possible improvement for ULDP over LDP lies in replacing d𝑑ditalic_d with ssuperscript𝑠s^{*}italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT.

3 An Algorithm

We begin by outlining our approach to solving the ULDP sparse linear regression problem. A key observation is that with m𝑚mitalic_m samples, each user can obtain a rough estimation of parameter with its local samples. The central challenge then lies in how to aggregate these rough estimations privately. As depicted in Figure 1, our proposed solution operates in two stages. In the initial stage, users within the first group independently identify the non-zero elements of βsuperscript𝛽\beta^{*}italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT from their local data and transmit privatized information accordingly. By aggregating this information, we determine the s𝑠sitalic_s most frequent elements, which serve as the candidate variables for our estimation process. On the narrowed parameter space, we estimate the parameter using remaining users. Subsequently, we present the candidate variable selection, final estimation, and extension to general sparse estimations in Section 3.1, 3.2, and 3.3, respectively.

Refer to caption
Figure 1: Illustration of the proposed sparse estimation framework.

3.1 Candidate Variable Selection

In this section, we elucidate the steps for candidate variable selection. First, each user prepares a piece of information vi[d]subscript𝑣𝑖delimited-[]𝑑v_{i}\in[d]italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ [ italic_d ], indicating the variable is selected by user i𝑖iitalic_i and probably belong to the true variable set. Then, a curator privately aggregates the information visubscript𝑣𝑖{v}_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPTs and outputs the candidate variables.

To formalize visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, each user i𝑖iitalic_i adopt a local selector 𝒮i:(𝒳×𝒴)m[d]:subscript𝒮𝑖superscript𝒳𝒴𝑚delimited-[]𝑑\mathcal{S}_{i}:(\mathcal{X}\times\mathcal{Y})^{m}\to[d]caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT : ( caligraphic_X × caligraphic_Y ) start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT → [ italic_d ]. For each i𝑖iitalic_i, 𝒮isubscript𝒮𝑖\mathcal{S}_{i}caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT can be any plug-in method and is chosen differently based on the constraints of sample size, computational power, and prior information, as long as it produces a good selection results described as follows.

Definition 3.1 (α𝛼\mathbf{\alpha}italic_α-Good selector).

Consider user i𝑖iitalic_i and its i.i.d. samples (Xi,yi)(𝒳×𝒴)msubscript𝑋𝑖subscript𝑦𝑖superscript𝒳𝒴𝑚(X_{i},y_{i})\in(\mathcal{X}\times\mathcal{Y})^{m}( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∈ ( caligraphic_X × caligraphic_Y ) start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT from PP\mathrm{P}roman_P. For 0<α<10𝛼10<\alpha<10 < italic_α < 1, an α𝛼\alphaitalic_α-good selector is an algorithm 𝒮𝒮\mathcal{S}caligraphic_S such that for all v{1,,s}𝑣1superscript𝑠v\in\{1,\cdots,s^{*}\}italic_v ∈ { 1 , ⋯ , italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT }, there holds

Pr(v=𝒮(Xi,yi))αs.Pr𝑣𝒮subscript𝑋𝑖subscript𝑦𝑖𝛼superscript𝑠\displaystyle\mathrm{Pr}\left(v=\mathcal{S}\left(X_{i},y_{i}\right)\right)\geq% \frac{\alpha}{s^{*}}.roman_Pr ( italic_v = caligraphic_S ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) ≥ divide start_ARG italic_α end_ARG start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG . (2)

Here, the probability is taken w.r.t. randomness of both (Xi,yi)subscript𝑋𝑖subscript𝑦𝑖(X_{i},y_{i})( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) and 𝒮𝒮\mathcal{S}caligraphic_S.

(2) requires a lower bound on probability for the true variables to be selected. To induce such selectors, we consider first conducting a local variable selection using (Xi,yi)subscript𝑋𝑖subscript𝑦𝑖(X_{i},y_{i})( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) and uniformly sampling a visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. The following proposition demonstrates that obtaining such a selector is feasible given mild assumptions on distribution PP\mathrm{P}roman_P, leveraging well-developed variable selection methods.

Proposition 3.2 (Existence of good selectors).

Under model (1), if either of the following conditions holds, there exists a α𝛼\alphaitalic_α-good selector with a constant α𝛼\alphaitalic_α: (i) maxij|Σij|3/ssubscript𝑖𝑗subscriptΣ𝑖𝑗3superscript𝑠\max_{i\neq j}|\Sigma_{ij}|\leq 3/s^{*}roman_max start_POSTSUBSCRIPT italic_i ≠ italic_j end_POSTSUBSCRIPT | roman_Σ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT | ≤ 3 / italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, a1/mgreater-than-or-equivalent-to𝑎1𝑚a\gtrsim\sqrt{1/m}italic_a ≳ square-root start_ARG 1 / italic_m end_ARG, and ms2logdgreater-than-or-equivalent-to𝑚superscript𝑠absent2𝑑m\gtrsim s^{*2}\log ditalic_m ≳ italic_s start_POSTSUPERSCRIPT ∗ 2 end_POSTSUPERSCRIPT roman_log italic_d; (ii) 1as/mlogmlogd/m1𝑎greater-than-or-equivalent-tosuperscript𝑠𝑚𝑚𝑑𝑚1\geq a\gtrsim\sqrt{s^{*}/m}\vee\sqrt{\log m\log d/m}1 ≥ italic_a ≳ square-root start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT / italic_m end_ARG ∨ square-root start_ARG roman_log italic_m roman_log italic_d / italic_m end_ARG.

See Appendix B.1 for examples of precise algorithms and detailed proofs. (i) and (ii) are examples of sufficient conditions that are relatively easy to satisfy. They require mild correlations among covariates, a strong signal (minimum absolute value of βsuperscript𝛽\beta^{*}italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT), and an adequate number of local samples. Similar conditions are standard in high-dimensional statistics (Fan & Li, 2001; Zhao & Yu, 2006). Though the lower bound of a𝑎aitalic_a is considerable and will leads to a improved minimax rate in the non-private case (Ndaoud, 2019), it is not the key for ULDP sparse linear regression to be advantages over its LDP counterpart. This is because the function classes constructed in Wang & Xu (2019); Zhu et al. (2023) for lower bound proof are all covered by the assumptions. Additionally, the sample size requirement remains polynomial in ssuperscript𝑠s^{*}italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and logd𝑑\log droman_log italic_d, which is theoretically reasonable.

Given the local information visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, we conduct a private voting to identify the frequently appeared variables {v^1,,v^s}superscript^𝑣1superscript^𝑣𝑠\{\widehat{v}^{1},\cdots,\widehat{v}^{s}\}{ over^ start_ARG italic_v end_ARG start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , ⋯ , over^ start_ARG italic_v end_ARG start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT }. Suppose we use the first n/2𝑛2n/2italic_n / 2 users for identification, although the proportion is arbitrary and can be any constant. Considering the large size d𝑑ditalic_d of the variable universe compared to number of available users, this task is closely related to the problem of heavy hitter detection (Bassily et al., 2020; Acharya et al., 2021). We solve the identification problem in standard manner (Bassily et al., 2020), while any tailored approach is adoptable. Specifically, we encode the d𝑑ditalic_d variables into a binary string using logd𝑑\lceil\log d\rceil⌈ roman_log italic_d ⌉ bits. Next, we traverse a binary prefix tree from level 1111 to logd𝑑\lceil\log d\rceil⌈ roman_log italic_d ⌉ and eliminate nodes that cannot serve as prefixes of heavy hitters, namely those with frequencies lower than a certain threshold ρ𝜌\rhoitalic_ρ. The key advantage of this method is its ability to identify frequent elements with frequencies above nlogdlogn/ε2𝑛𝑑𝑛superscript𝜀2\sqrt{n\log d\log n/\varepsilon^{2}}square-root start_ARG italic_n roman_log italic_d roman_log italic_n / italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG, which overcomes the polynomial dependency on d𝑑ditalic_d in LDP discrete density estimation (Kairouz et al., 2016; Duchi et al., 2018). The detailed procedure (HeavyHitter) is provided in Appendix B.2.

In the first part of Algorithm 1, we summarize the pipline for candidate variable selection. The following proposition demonstrates its effectiveness by establishing that, provided the existence of local good selectors, the curator can select a set of variables of size ssasymptotically-equals𝑠superscript𝑠s\asymp s^{*}italic_s ≍ italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT containing the true variables. This property, known as perfect selection or consistent selection (Zhao & Yu, 2006; Belloni & Chernozhukov, 2013), plays a crucial role in the theoretical properties of subsequent operations.

Proposition 3.3.

Let {v^1,,v^s}subscript^𝑣1subscript^𝑣𝑠\{\widehat{v}_{1},\cdots,\widehat{v}_{s}\}{ over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT } be the selected variables in Algorithm 1. Suppose that all 𝒮isubscript𝒮𝑖\mathcal{S}_{i}caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are α𝛼\alphaitalic_α-good selectors with αslognlogd/nε2greater-than-or-equivalent-to𝛼superscript𝑠𝑛𝑑𝑛superscript𝜀2\alpha\gtrsim s^{*}\sqrt{\log n\log d/n\varepsilon^{2}}italic_α ≳ italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT square-root start_ARG roman_log italic_n roman_log italic_d / italic_n italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG. If we take α/8sρα/4s𝛼8superscript𝑠𝜌𝛼4superscript𝑠\alpha/8s^{*}\leq\rho\leq\alpha/4s^{*}italic_α / 8 italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ≤ italic_ρ ≤ italic_α / 4 italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, then with probability 11/n211superscript𝑛21-1/n^{2}1 - 1 / italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, we have (i) {1,,s}{v^1,,v^s}1superscript𝑠subscript^𝑣1subscript^𝑣𝑠\{1,\cdots,s^{*}\}\subseteq\{\widehat{v}_{1},\cdots,\widehat{v}_{s}\}{ 1 , ⋯ , italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT } ⊆ { over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT }, (ii) s32s/α𝑠32superscript𝑠𝛼s\leq 32s^{*}/\alphaitalic_s ≤ 32 italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT / italic_α.

Note that our scheme samples only one locally selected variable and disregards the others. This select-one-and-aggregate approach has been demonstrated to be as effective as if each user had only one variable (Zhu et al., 2020; Cohen et al., 2023). To fully utilize the information of the selected variables, we can leverage the set-value heavy hitters (Qin et al., 2016; Zhu et al., 2020; Wang et al., 2023). However, this only results in an improvement of O(s)𝑂superscript𝑠O(\sqrt{s^{*}})italic_O ( square-root start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG ) in the threshold, which is not our primary focus.

3.2 Coefficient Estimation

Given the selected variables {v^1,,v^s}subscript^𝑣1subscript^𝑣𝑠\{\widehat{v}_{1},\cdots,\widehat{v}_{s}\}{ over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT }, the problem is reduced to low dimensional linear regression. Efficient algorithms and fundamental limits have been established (Duchi et al., 2018; Wang & Xu, 2019) for item-level LDP. Leveraging these algorithms, one can ignore all but one sample from each user and obtain an error bound depending polynomially on ssuperscript𝑠s^{*}italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT instead of d𝑑ditalic_d. However, we would like to explore the benefits brought by having multiple samples per user, as is addressed in the advanced research of ULDP.

We introduce necessary notations to define the learning problem on the selected subspace. Given Proposition 3.3, we assume the selected variables contain the true ones in the following analysis. Without loss of generality, let (v^1,,v^s)=(1,,s)subscript^𝑣1subscript^𝑣𝑠1𝑠(\widehat{v}_{1},\cdots,\widehat{v}_{s})=(1,\cdots,s)( over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) = ( 1 , ⋯ , italic_s ). We put a hat over the quantities on the selected space. Let P^^P\widehat{\mathrm{P}}over^ start_ARG roman_P end_ARG be the marginal distribution on the selected space 𝒳^×𝒴=s+1^𝒳𝒴superscript𝑠1\widehat{\mathcal{X}}\times\mathcal{Y}=\mathbb{R}^{s+1}over^ start_ARG caligraphic_X end_ARG × caligraphic_Y = blackboard_R start_POSTSUPERSCRIPT italic_s + 1 end_POSTSUPERSCRIPT, where P^X^=PX1:ssubscript^P^𝑋subscriptPsuperscript𝑋:1𝑠\widehat{\mathrm{P}}_{\widehat{X}}=\mathrm{P}_{X^{1:s}}over^ start_ARG roman_P end_ARG start_POSTSUBSCRIPT over^ start_ARG italic_X end_ARG end_POSTSUBSCRIPT = roman_P start_POSTSUBSCRIPT italic_X start_POSTSUPERSCRIPT 1 : italic_s end_POSTSUPERSCRIPT end_POSTSUBSCRIPT. Define the data on selected space as X^i,j=Xi,j1:ssubscript^𝑋𝑖𝑗superscriptsubscript𝑋𝑖𝑗:1𝑠\widehat{X}_{i,j}={X}_{i,j}^{1:s}over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 : italic_s end_POSTSUPERSCRIPT and X^i={Xi,j1:s}j=1msubscript^𝑋𝑖superscriptsubscriptsuperscriptsubscript𝑋𝑖𝑗:1𝑠𝑗1𝑚\widehat{X}_{i}=\{{X}_{i,j}^{1:s}\}_{j=1}^{m}over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = { italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 : italic_s end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT. The underlying coefficients becomes β^=β1:ssuperscript^𝛽superscript𝛽absent1:absent𝑠\widehat{\beta}^{*}=\beta^{*1:s}over^ start_ARG italic_β end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_β start_POSTSUPERSCRIPT ∗ 1 : italic_s end_POSTSUPERSCRIPT.

3.2.1 A Multi-round Protocol via SCO

At first glance, we can directly find βs𝛽superscript𝑠\beta\in\mathbb{R}^{s}italic_β ∈ blackboard_R start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT through the following ULDP stochastic convex optimization problem on the selected space

argminβ^1(F(β^)=𝒳^×𝒴(xβ^y)2𝑑P^(x,y)).subscriptargminsubscriptnorm^𝛽1𝐹^𝛽subscript^𝒳𝒴superscriptsuperscript𝑥top^𝛽𝑦2differential-d^P𝑥𝑦\displaystyle\operatorname*{arg\,min}_{\|\widehat{\beta}\|_{\infty}\leq 1}% \left(F(\widehat{\beta})=\int_{\widehat{\mathcal{X}}\times\mathcal{Y}}\left(x^% {\top}\widehat{\beta}-y\right)^{2}d\widehat{\mathrm{P}}(x,y)\right).start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT ∥ over^ start_ARG italic_β end_ARG ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≤ 1 end_POSTSUBSCRIPT ( italic_F ( over^ start_ARG italic_β end_ARG ) = ∫ start_POSTSUBSCRIPT over^ start_ARG caligraphic_X end_ARG × caligraphic_Y end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over^ start_ARG italic_β end_ARG - italic_y ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d over^ start_ARG roman_P end_ARG ( italic_x , italic_y ) ) . (3)

Recent study (Bassily & Sun, 2023) provided methodology and established theory with respect to smooth loss functions. We borrow their algorithm, presented in Appendix C.1, which is a private variant of accelerated mini-batch gradient descent. It utilize the fact that the gradient of a local batch concentrates with rate 1/m1𝑚\sqrt{1/m}square-root start_ARG 1 / italic_m end_ARG to reduce the magnitude of noise added to the gradients. While the methodology remains the same, we improve the theoretical analysis in Bassily & Sun (2023) to accommodate squared loss, which possesses strong convexity and leads to faster convergence.

Theorem 3.4 (Informal).

Let data {(Xi,yi)}i=1nsuperscriptsubscriptsubscript𝑋𝑖subscript𝑦𝑖𝑖1𝑛\{(X_{i},y_{i})\}_{i=1}^{n}{ ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT be generated as in (1). Suppose {𝒮i}i=1n/2superscriptsubscriptsubscript𝒮𝑖𝑖1𝑛2\{\mathcal{S}_{i}\}_{i=1}^{n/2}{ caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n / 2 end_POSTSUPERSCRIPT are α𝛼\alphaitalic_α-good selectors with αslognlogd/nε2greater-than-or-equivalent-to𝛼superscript𝑠𝑛𝑑𝑛superscript𝜀2\alpha\gtrsim s^{*}\sqrt{\log n\log d/n\varepsilon^{2}}italic_α ≳ italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT square-root start_ARG roman_log italic_n roman_log italic_d / italic_n italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG. Then with correct parameter choice, solving (3) leads to an estimation β𝛽\betaitalic_β such that

𝔼[ββ22]s9log6nnmε2α9+s4lognnmα4.less-than-or-similar-to𝔼delimited-[]superscriptsubscriptnormsuperscript𝛽𝛽22superscript𝑠absent9superscript6𝑛𝑛𝑚superscript𝜀2superscript𝛼9superscript𝑠absent4𝑛𝑛𝑚superscript𝛼4\displaystyle\mathbb{E}\left[\left\|\beta^{*}-\beta\right\|_{2}^{2}\right]% \lesssim\frac{s^{*9}\log^{6}n}{nm\varepsilon^{2}\alpha^{9}}+\frac{s^{*4}\log n% }{nm\alpha^{4}}.blackboard_E [ ∥ italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_β ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≲ divide start_ARG italic_s start_POSTSUPERSCRIPT ∗ 9 end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT italic_n end_ARG start_ARG italic_n italic_m italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_α start_POSTSUPERSCRIPT 9 end_POSTSUPERSCRIPT end_ARG + divide start_ARG italic_s start_POSTSUPERSCRIPT ∗ 4 end_POSTSUPERSCRIPT roman_log italic_n end_ARG start_ARG italic_n italic_m italic_α start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG .

The result stated in Theorem 3.4 holds in expectation, unlike the other conclusions which hold with high probability. This distinction arises from the formulation of the technical lemma we borrowed. Upon initial inspection, we notice that both parts of the theorem involve α𝛼\alphaitalic_α, indicating a degradation associating to variable selection performance. However, according to Proposition 3.2, α𝛼\alphaitalic_α is merely a constant given a sufficiently large m𝑚mitalic_m. The higher-order term of ssuperscript𝑠s^{*}italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT encompasses various overheads, including the private mean estimation error and the Lipschitz constant of the squared loss over the \|\cdot\|_{\infty}∥ ⋅ ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ball.

3.2.2 A Two Round Protocol

The multi-round protocol is disadvantageous from two perspectives. Firstly, as a gradient-based method, it necessitates 𝒪(nmε2)𝒪𝑛𝑚superscript𝜀2\mathcal{O}(\sqrt{nm\varepsilon^{2}})caligraphic_O ( square-root start_ARG italic_n italic_m italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) rounds of communication, which can be prohibitively slow in practice due to network latency (Smith et al., 2017; Zheng et al., 2017). Secondly, compared to Theorem 2.4, the upper bound provided in Theorem 3.4 is far from tight concerning ssuperscript𝑠s^{*}italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. We question whether these drawbacks can be mitigated for the specific problem of linear regression. In this section, we provide an affirmative answer. Our main inspiration stems from the following observation.

Proposition 3.5.

There exists estimators on selected variables β^n/2+1,,β^nsubscript^𝛽𝑛21subscript^𝛽𝑛\widehat{\beta}_{n/2+1},\cdots,\widehat{\beta}_{n}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_n / 2 + 1 end_POSTSUBSCRIPT , ⋯ , over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, such that for all β^issubscript^𝛽𝑖superscript𝑠\widehat{\beta}_{i}\in\mathbb{R}^{s}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT, we have 𝔼P[β^i]=β^subscript𝔼Pdelimited-[]subscript^𝛽𝑖superscript^𝛽\mathbb{E}_{\mathrm{P}}\left[\widehat{\beta}_{i}\right]=\widehat{\beta}^{*}blackboard_E start_POSTSUBSCRIPT roman_P end_POSTSUBSCRIPT [ over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] = over^ start_ARG italic_β end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and β^iβ^2slogn/mless-than-or-similar-tosubscriptnormsubscript^𝛽𝑖superscript^𝛽2𝑠𝑛𝑚\|\widehat{\beta}_{i}-\widehat{\beta}^{*}\|_{2}\lesssim\sqrt{{s\log n}/{m}}∥ over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over^ start_ARG italic_β end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≲ square-root start_ARG italic_s roman_log italic_n / italic_m end_ARG with probability 11/n211superscript𝑛21-1/n^{2}1 - 1 / italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Moreover, if either condition in Proposition 3.2 holds, the bound improves to β^iβ^2slogn/mless-than-or-similar-tosubscriptnormsubscript^𝛽𝑖superscript^𝛽2superscript𝑠𝑛𝑚\|\widehat{\beta}_{i}-\widehat{\beta}^{*}\|_{2}\lesssim\sqrt{{s^{*}\log n}/{m}}∥ over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over^ start_ARG italic_β end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≲ square-root start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT roman_log italic_n / italic_m end_ARG.

Since the mean of β^isubscript^𝛽𝑖\widehat{\beta}_{i}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is β^superscript^𝛽\widehat{\beta}^{*}over^ start_ARG italic_β end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, an ideal estimator would be the mean of β^isubscript^𝛽𝑖\widehat{\beta}_{i}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPTs. Moreover, Proposition 3.5 indicates that β^isubscript^𝛽𝑖\widehat{\beta}_{i}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT concentrates as m𝑚mitalic_m increases, suggesting that we can confine β^isubscript^𝛽𝑖\widehat{\beta}_{i}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to a restricted area to enhance estimation accuracy. We propose a two-stage estimation similar to Girgis et al. (2022). First, leveraging user indices n/2+1i3n/4𝑛21𝑖3𝑛4n/2+1\leq i\leq 3n/4italic_n / 2 + 1 ≤ italic_i ≤ 3 italic_n / 4, we designate a histogram bin on ssuperscript𝑠\mathbb{R}^{s}blackboard_R start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT, wherein almost all the β^isubscript^𝛽𝑖\widehat{\beta}_{i}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT values will fall. Then, the last group of users project their β^isubscript^𝛽𝑖\widehat{\beta}_{i}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT onto the bin and add a Laplace noise. Given the reduced sensitivity of the projected coefficients, the noise magnitude significantly diminishes. We provide detailed methodology (ULDPMean) in Appendix C and summarize the pipline in Algorithm 1.

Algorithm 1 Two-round ULDP sparse estimation.
  Input: Local data sets {(Xi,yi)}i=1nsuperscriptsubscriptsubscript𝑋𝑖subscript𝑦𝑖𝑖1𝑛\{(X_{i},y_{i})\}_{i=1}^{n}{ ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, selectors {𝒮i}i=1n/2superscriptsubscriptsubscript𝒮𝑖𝑖1𝑛2\{\mathcal{S}_{i}\}_{i=1}^{n/2}{ caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n / 2 end_POSTSUPERSCRIPT, privacy budget ε𝜀\varepsilonitalic_ε, threshold ρ𝜌\rhoitalic_ρ, concentration radius τ𝜏\tauitalic_τ.
  Initialization: βd𝛽superscript𝑑{\beta}\in\mathbb{R}^{d}italic_β ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT be a zero vector.
  # candidate variable selection
  # on local machine
  for i𝑖iitalic_i in 1,,n/21𝑛21,\cdots,n/21 , ⋯ , italic_n / 2 do
     vi=𝒮i(Xi,yi)subscript𝑣𝑖subscript𝒮𝑖subscript𝑋𝑖subscript𝑦𝑖v_{i}=\mathcal{S}_{i}(X_{i},y_{i})italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ).
  end for
  # logd𝑑\lceil\log d\rceil⌈ roman_log italic_d ⌉ round communication
  {v^1,,v^s}subscript^𝑣1subscript^𝑣𝑠\{\widehat{v}_{1},\cdots,\widehat{v}_{s}\}{ over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT } = HeavyHitter({vi}i=1n/2,εsuperscriptsubscriptsubscript𝑣𝑖𝑖1𝑛2𝜀\{v_{i}\}_{i=1}^{n/2},\varepsilon{ italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n / 2 end_POSTSUPERSCRIPT , italic_ε, ρ𝜌\rhoitalic_ρ).
  # coefficient estimation
  # on local machine
  for i𝑖iitalic_i in n/2+1,,n𝑛21𝑛n/2+1,\cdots,nitalic_n / 2 + 1 , ⋯ , italic_n do
     Fit β^isubscript^𝛽𝑖\widehat{\beta}_{i}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT according to (v^1,,v^s)subscript^𝑣1subscript^𝑣𝑠\left(\widehat{v}_{1},\cdots,\widehat{v}_{s}\right)( over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ).
  end for
  # 2 round communication
  β^^𝛽\widehat{\beta}over^ start_ARG italic_β end_ARG = ULDPMean({β^i}i=n/2+13n/4superscriptsubscriptsubscript^𝛽𝑖𝑖𝑛213𝑛4\{\widehat{\beta}_{i}\}_{i=n/2+1}^{3n/4}{ over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = italic_n / 2 + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 italic_n / 4 end_POSTSUPERSCRIPT, {β^i}i=3n/4+1nsuperscriptsubscriptsubscript^𝛽𝑖𝑖3𝑛41𝑛\{\widehat{\beta}_{i}\}_{i=3n/4+1}^{n}{ over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 3 italic_n / 4 + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, τ𝜏\tauitalic_τ, ε𝜀\varepsilonitalic_ε).
  βv^1:v^s=β^superscript𝛽:subscript^𝑣1subscript^𝑣𝑠^𝛽\beta^{\widehat{v}_{1}:\widehat{v}_{s}}=\widehat{\beta}italic_β start_POSTSUPERSCRIPT over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT : over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = over^ start_ARG italic_β end_ARG.
  Output: β𝛽\betaitalic_β.

The entire protocol requires a reasonable logd+2𝑑2\log d+2roman_log italic_d + 2 rounds of communication, with each user sending 1 bit of information. The logd𝑑\log droman_log italic_d communication rounds are necessary for HeavyHitter, which can be substituted by any other customized identification method for improved efficiency. In the coefficient estimation stage, our method takes two round communication, which is quite efficient. Fully utilizing multiple samples necessitates sequential interactivity (Acharya et al., 2023; Bassily & Sun, 2023).

We now present the main result, which is the error upper bound of the estimator summarized in Algorithm 1.

Theorem 3.6.

Let data {(Xi,yi)}i=1nsuperscriptsubscriptsubscript𝑋𝑖subscript𝑦𝑖𝑖1𝑛\{(X_{i},y_{i})\}_{i=1}^{n}{ ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT be generated as in (1). Suppose {𝒮i}i=1n/2superscriptsubscriptsubscript𝒮𝑖𝑖1𝑛2\{\mathcal{S}_{i}\}_{i=1}^{n/2}{ caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n / 2 end_POSTSUPERSCRIPT are α𝛼\alphaitalic_α-good selectors with αslognlogd/nε2greater-than-or-equivalent-to𝛼superscript𝑠𝑛𝑑𝑛superscript𝜀2\alpha\gtrsim s^{*}\sqrt{\log n\log d/n\varepsilon^{2}}italic_α ≳ italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT square-root start_ARG roman_log italic_n roman_log italic_d / italic_n italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG. Suppose we let α/8sρα/4s𝛼8superscript𝑠𝜌𝛼4superscript𝑠\alpha/8s^{*}\leq\rho\leq\alpha/4s^{*}italic_α / 8 italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ≤ italic_ρ ≤ italic_α / 4 italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and τlog2n/masymptotically-equals𝜏superscript2𝑛𝑚\tau\asymp\sqrt{\log^{2}n/m}italic_τ ≍ square-root start_ARG roman_log start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n / italic_m end_ARG. Let β𝛽{\beta}italic_β be the output of Algorithm 1. Then we have (i) Algorithm 1 is ε𝜀\varepsilonitalic_ε-ULDP. (ii) there holds

ββ22slognnmα+s2log3nnmε2α2less-than-or-similar-tosuperscriptsubscriptnormsuperscript𝛽𝛽22superscript𝑠𝑛𝑛𝑚𝛼superscript𝑠absent2superscript3𝑛𝑛𝑚superscript𝜀2superscript𝛼2\displaystyle\left\|\beta^{*}-\beta\right\|_{2}^{2}\lesssim\frac{s^{*}\log n}{% nm\alpha}+\frac{s^{*2}\log^{3}n}{nm\varepsilon^{2}\alpha^{2}}∥ italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_β ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≲ divide start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT roman_log italic_n end_ARG start_ARG italic_n italic_m italic_α end_ARG + divide start_ARG italic_s start_POSTSUPERSCRIPT ∗ 2 end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_n end_ARG start_ARG italic_n italic_m italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG (4)

with probability at least 14/n214superscript𝑛21-4/n^{2}1 - 4 / italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Moreover, if either condition in Proposition 3.2 holds, the bound improves to

ββ22slognnm+s2log3nnmε2α.less-than-or-similar-tosuperscriptsubscriptnormsuperscript𝛽𝛽22superscript𝑠𝑛𝑛𝑚superscript𝑠absent2superscript3𝑛𝑛𝑚superscript𝜀2𝛼\displaystyle\left\|\beta^{*}-\beta\right\|_{2}^{2}\lesssim\frac{s^{*}\log n}{% nm}+\frac{s^{*2}\log^{3}n}{nm\varepsilon^{2}\alpha}.∥ italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_β ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≲ divide start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT roman_log italic_n end_ARG start_ARG italic_n italic_m end_ARG + divide start_ARG italic_s start_POSTSUPERSCRIPT ∗ 2 end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_n end_ARG start_ARG italic_n italic_m italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_α end_ARG . (5)

The upper bound in (4) consists of two parts. Both terms include additional α𝛼\alphaitalic_αs and logn𝑛\log nroman_log italic_ns, which are inevitable due to selection degradation and the overhead of utilizing multiple local samples. The logn𝑛\log nroman_log italic_ns are due to the union bound arguments, while α𝛼\alphaitalic_α is merely constants by Proposition 3.2. Ignoring α𝛼\alphaitalic_α and logn𝑛\log nroman_log italic_n, the first part recovers the rate of non-private linear regression on 𝒪(s)𝒪superscript𝑠\mathcal{O}(s^{*})caligraphic_O ( italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) dimensional space. The second part corresponds to privacy. When εsgreater-than-or-equivalent-to𝜀superscript𝑠\varepsilon\gtrsim\sqrt{s^{*}}italic_ε ≳ square-root start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG, this part is negligible. Algorithm 1 achieves the same error as if its non-private. It worth noting that in most cases (see e.g. Table 1), locally private algorithm matches its non-private counterpart when εsgreater-than-or-equivalent-to𝜀superscript𝑠\varepsilon\gtrsim\sqrt{s^{*}}italic_ε ≳ square-root start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG. The improvement of (5) over (4) is based on the existence of sparse oracles that achieve error s/msuperscript𝑠𝑚s^{*}/mitalic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT / italic_m locally, instead of s/m𝑠𝑚s/mitalic_s / italic_m.

We observe that, unlike common high-dimensional results (Wang & Xu, 2019; Cai et al., 2023), our bound does not involve a logd𝑑\log droman_log italic_d term. This phenomenon is also noted in Ndaoud (2019), where the logd𝑑\log droman_log italic_d disappears if we leverage the beta-min condition in Proposition 3.2. We will observe in the experiments that if m𝑚mitalic_m is large enough, our method is more robust to changes in d𝑑ditalic_d. However, this is not to say that we can deal with arbitrarily large d𝑑ditalic_d. The logarithmical relationship is still contained in α𝛼\alphaitalic_α, which poses a requirement of mlogdgreater-than-or-equivalent-to𝑚𝑑m\gtrsim\log ditalic_m ≳ roman_log italic_d as in Proposition 3.2. Moreover, omitting the log factors, the privacy error is decided by the total number of samples mn𝑚𝑛mnitalic_m italic_n for sufficiently large m𝑚mitalic_m and n𝑛nitalic_n. Thus, we can achieve the same estimation error with less number of users if there are more local samples per user, while retaining the same level of privacy for each user since ε𝜀\varepsilonitalic_ε is fixed. On contrary, if there is mn𝑚𝑛mnitalic_m italic_n users with one sample each, the error is inevitably 𝒪(ds/nmε2)𝒪𝑑superscript𝑠𝑛𝑚superscript𝜀2\mathcal{O}(ds^{*}/nm\varepsilon^{2})caligraphic_O ( italic_d italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT / italic_n italic_m italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) (Proposition 2.2). This comparison illustrates the advantage of having both sufficient users and local samples compared to having abundant users and only one local sample. Note that this distinction holds only between sequential-interactive ULDP and LDP. It is unclear whether the lower bound holds under non-interactive ULDP, since most ULDP methods require sequential interactivity (Acharya et al., 2023; Bassily & Sun, 2023).

3.3 Extension to Sparse Estimation

In this section, we show our framework can be applied to various sparse problems through reduction to non-private learners. We consider estimation of βsuperscript𝛽\beta^{*}italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT from data {Xi}i=1n𝒳mnsuperscriptsubscriptsubscript𝑋𝑖𝑖1𝑛superscript𝒳𝑚𝑛\{X_{i}\}_{i=1}^{n}\in\mathcal{X}^{mn}{ italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∈ caligraphic_X start_POSTSUPERSCRIPT italic_m italic_n end_POSTSUPERSCRIPT, which is generated from distribution PβsubscriptPsuperscript𝛽\mathrm{P}_{\beta^{*}}roman_P start_POSTSUBSCRIPT italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT parameterized by βsuperscript𝛽\beta^{*}italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. βsuperscript𝛽\beta^{*}italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is assumed to be in Ωs,adsuperscriptsubscriptΩ𝑠𝑎𝑑\Omega_{s,a}^{d}roman_Ω start_POSTSUBSCRIPT italic_s , italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. The assumptions include linear regression as a special case. It’s important to note that Algorithm 1 depends on the particular problem form via two steps: (i) the selector 𝒮isubscript𝒮𝑖\mathcal{S}_{i}caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and (ii) the estimator β^isubscript^𝛽𝑖\widehat{\beta}_{i}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Both components depend on a non-private estimator of βsuperscript𝛽\beta^{*}italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. The following theorem demonstrates that, given a qualified estimator, our framework achieves fast convergence rates for the general problem of sparse estimation.

Theorem 3.7 (Informal).

Let data {Xi}i=1nsuperscriptsubscriptsubscript𝑋𝑖𝑖1𝑛\{X_{i}\}_{i=1}^{n}{ italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT be generated by PβsubscriptPsuperscript𝛽\mathrm{P}_{\beta^{*}}roman_P start_POSTSUBSCRIPT italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT for βΩs,adsuperscript𝛽superscriptsubscriptΩ𝑠𝑎𝑑\beta^{*}\in\Omega_{s,a}^{d}italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ roman_Ω start_POSTSUBSCRIPT italic_s , italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. Suppose we have non-private estimators: (i) estimator β~isubscript~𝛽𝑖\tilde{\beta}_{i}over~ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT with β~iβ2ν1subscriptnormsubscript~𝛽𝑖superscript𝛽2subscript𝜈1\|\tilde{\beta}_{i}-{\beta}^{*}\|_{2}\leq\nu_{1}∥ over~ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT for all 1in/21𝑖𝑛21\leq i\leq n/21 ≤ italic_i ≤ italic_n / 2 and (ii) estimator β^isubscript^𝛽𝑖\widehat{\beta}_{i}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT on selected variables with 𝔼[β^i]=β^𝔼delimited-[]subscript^𝛽𝑖superscript^𝛽\mathbb{E}\left[\widehat{\beta}_{i}\right]=\widehat{\beta}^{*}blackboard_E [ over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] = over^ start_ARG italic_β end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and β^iβ^2ν2subscriptnormsubscript^𝛽𝑖superscript^𝛽2subscript𝜈2\|\widehat{\beta}_{i}-\widehat{\beta}^{*}\|_{2}\leq\nu_{2}∥ over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over^ start_ARG italic_β end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT for all n/2+1in𝑛21𝑖𝑛n/2+1\leq i\leq nitalic_n / 2 + 1 ≤ italic_i ≤ italic_n. Then, for any aν1greater-than-or-equivalent-to𝑎subscript𝜈1a\gtrsim\nu_{1}italic_a ≳ italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, there exists an ε𝜀\varepsilonitalic_ε-ULDP algorithm whose output β𝛽\betaitalic_β has

ββ22ν22n+ν22slog2nnε2αless-than-or-similar-tosuperscriptsubscriptnormsuperscript𝛽𝛽22superscriptsubscript𝜈22𝑛superscriptsubscript𝜈22superscript𝑠superscript2𝑛𝑛superscript𝜀2𝛼\displaystyle\left\|\beta^{*}-\beta\right\|_{2}^{2}\lesssim\frac{\nu_{2}^{2}}{% n}+\frac{\nu_{2}^{2}s^{*}\log^{2}n}{n\varepsilon^{2}\alpha}∥ italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_β ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≲ divide start_ARG italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG + divide start_ARG italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n end_ARG start_ARG italic_n italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_α end_ARG (6)

with probability at least 13/n213superscript𝑛21-3/n^{2}1 - 3 / italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Moreover, for 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT norm, there holds

ββ1ν22snα+ν22s2log2nnε2α2less-than-or-similar-tosubscriptnormsuperscript𝛽𝛽1superscriptsubscript𝜈22superscript𝑠𝑛𝛼superscriptsubscript𝜈22superscript𝑠absent2superscript2𝑛𝑛superscript𝜀2superscript𝛼2\displaystyle\left\|\beta^{*}-\beta\right\|_{1}\lesssim\sqrt{\frac{\nu_{2}^{2}% s^{*}}{n\alpha}}+\sqrt{\frac{\nu_{2}^{2}s^{*2}\log^{2}n}{n\varepsilon^{2}% \alpha^{2}}}∥ italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_β ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≲ square-root start_ARG divide start_ARG italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG start_ARG italic_n italic_α end_ARG end_ARG + square-root start_ARG divide start_ARG italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_s start_POSTSUPERSCRIPT ∗ 2 end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n end_ARG start_ARG italic_n italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG (7)

with probability at least 13/n213superscript𝑛21-3/n^{2}1 - 3 / italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

We also present a result for the 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT norm. Comparing (7) to (6), the difference arises from the s𝑠\sqrt{s}square-root start_ARG italic_s end_ARG discrepancy between the 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norms, given that we only have s𝑠sitalic_s non-zero elements in our sparse estimation problem. We discuss the implications of Theorem 3.7. Consider the sparse mean estimation (Duchi et al., 2018; Zhou et al., 2022), where non-private estimator achieves ν2=𝒪(slogn/m)subscript𝜈2𝒪superscript𝑠𝑛𝑚\nu_{2}=\mathcal{O}(\sqrt{s^{*}\log n/m})italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = caligraphic_O ( square-root start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT roman_log italic_n / italic_m end_ARG ) (Johnstone, 1994) under mild conditions. Then the bound (6) becomes identical to (5), which eliminates the linear dependency of d𝑑ditalic_d in LDP (Duchi et al., 2018). For sparse discrete distribution estimation, Acharya et al. (2021) removed the linear dependency of d𝑑ditalic_d. With ν2=𝒪(slogn/m)subscript𝜈2𝒪superscript𝑠𝑛𝑚\nu_{2}=\mathcal{O}(\sqrt{s^{*}\log n/m})italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = caligraphic_O ( square-root start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT roman_log italic_n / italic_m end_ARG ), our bound (7) is ssuperscript𝑠\sqrt{s^{*}}square-root start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG larger than theirs in 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT sense.

It worth mentioning that when d𝑑ditalic_d is small, our upper bound matches the lower bound for m=1𝑚1m=1italic_m = 1. In this scenario, selector provides no useful information and is equivalent to a random selection, i.e. αP(v=𝒮(Xi,yi))s=s/d𝛼P𝑣𝒮subscript𝑋𝑖subscript𝑦𝑖superscript𝑠superscript𝑠𝑑\alpha\leq\mathrm{P}\left(v=\mathcal{S}(X_{i},y_{i})\right)\cdot s^{*}=s^{*}/ditalic_α ≤ roman_P ( italic_v = caligraphic_S ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) ⋅ italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT / italic_d. If α=s/dslognlogd/nε2𝛼superscript𝑠𝑑greater-than-or-equivalent-tosuperscript𝑠𝑛𝑑𝑛superscript𝜀2\alpha=s^{*}/d\gtrsim s^{*}\sqrt{\log n\log d/n\varepsilon^{2}}italic_α = italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT / italic_d ≳ italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT square-root start_ARG roman_log italic_n roman_log italic_d / italic_n italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG, then (5) becomes

slognnm+dslog3nnmε2.superscript𝑠𝑛𝑛𝑚𝑑superscript𝑠superscript3𝑛𝑛𝑚superscript𝜀2\displaystyle\frac{s^{*}\log n}{nm}+\frac{ds^{*}\log^{3}n}{nm\varepsilon^{2}}.divide start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT roman_log italic_n end_ARG start_ARG italic_n italic_m end_ARG + divide start_ARG italic_d italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_n end_ARG start_ARG italic_n italic_m italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .

Up to logarithmic factors, the second term matches the lower bound established in Zhu et al. (2023) for sparse linear regression and Duchi et al. (2018) for sparse mean estimation.

4 Experiment Results

We conduct experiments on both synthetic and real datasets to show the superiority of proposed methods and to validate our theoretical findings. The tested methods include: (i) 2-SLR: The proposed two-round ULDP sparse linear regression method outlined in Algorithm 1; (ii) M-SLR: The proposed multi-round version in Algorithm 6. The competing methods are: (iii) LDPPROX: The non-interactive LDP proxy estimator in Zhu et al. (2023); (iv) LDPIHT: The LDP iterative hard thresholding in Wang & Xu (2019); Zhu et al. (2023). Both comparison methods receive nm𝑛𝑚nmitalic_n italic_m samples with budget ε𝜀\varepsilonitalic_ε each. Additionally, we report performance of non-privately fitting (v) Lasso using m𝑚mitalic_m samples, representing an alternative for each user to rely solely on their local information. Implementation details are provided in Appendix E. For each model, we report the best result over its parameter grids, with the best result determined based on the average of at least 30 replications. The size of the parameter grids is selected based on running time to ensure that each method incurs an equal amount of computation. All experiments are conducted on a machine with 72-core Intel Xeon 2.60GHz and 128GB of main memory. The code is publicly available at GitHub444https://github.com/Karlmyh/ULDP-SL.

Refer to caption
(a) d𝑑ditalic_d - F1 score.
Refer to caption
(b) d𝑑ditalic_d - 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT error with m=100𝑚100m=100italic_m = 100.
Refer to caption
(c) d𝑑ditalic_d - 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT error with m=200𝑚200m=200italic_m = 200.
Refer to caption
(d) ε𝜀\varepsilonitalic_ε - 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT error.
Figure 2: Experiments w.r.t. d𝑑ditalic_d and ε𝜀\varepsilonitalic_ε. We plot the quantiles over 30 repetitions with 95%percent9595\%95 % coverage. We exclude LDPPROX in the last three figures since it is highly unstable and do not fit into our plot scale.

4.1 Simulation

We conducted experiments on synthetic data to validate the theoretical findings. Two sets of parallel experiments are conducted for independent and correlated marginal distributions, respectively, while results of the latter are presented in Appendix E. We draw each Xi,jksuperscriptsubscript𝑋𝑖𝑗𝑘X_{i,j}^{k}italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT and σi,jsubscript𝜎𝑖𝑗\sigma_{i,j}italic_σ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT independently from standard Gaussian distribution. For βsuperscript𝛽\beta^{*}italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, we randomly select s=8superscript𝑠8s^{*}=8italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = 8 coordinates to be 0.20.20.20.2 and let others be zero. Typically, we set n=400𝑛400n=400italic_n = 400, m=100𝑚100m=100italic_m = 100, d=256𝑑256d=256italic_d = 256, and ε=4𝜀4\varepsilon=4italic_ε = 4, while varying one of them to observe how the evaluated metric varies. We use squared error to evaluate the estimated coefficients and F1 score to evaluate variable selection.

We conduct experiments w.r.t. d𝑑ditalic_d. We first analyze the variable selection performance. For d{16,32,,1024}𝑑16321024d\in\{16,32,\cdots,1024\}italic_d ∈ { 16 , 32 , ⋯ , 1024 }, we compute the averaged F1 scores of the proposed candidate variable selection (represented by 2-SLR) and other methods. As shown in Figure 2(a), the selection performance of 2-SLR is superior to variables induced by other methods. Particularly noteworthy is that 2-SLR achieved higher F1 scores than Lasso. This observation aligns with Wang et al. (2011); Liang et al. (2023), where aggregating Lasso fitted on random subsamples leads to performance gains in both selection and prediction.

Next, we analyze the estimation performance with respect to d𝑑ditalic_d. In Figures 2(b) and 2(c), we plot the curve of 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT error w.r.t. d𝑑ditalic_d. Given a large m=200𝑚200m=200italic_m = 200, the proposed methods are less sensitive to d𝑑ditalic_d compared to LDPIHT and Lasso. This observation is compatible with the rate in (5), which is independent of d𝑑ditalic_d. Conversely, for smaller m=100𝑚100m=100italic_m = 100, the local selectors can not provide a constant α𝛼\alphaitalic_α for exponentially larger d𝑑ditalic_d. As a result, the trend of our methods is steeper.

We examine the privacy-utility trade-offs by investigating performances under different ε𝜀\varepsilonitalic_ε. In Figure 2(d), the error decreases as ε𝜀\varepsilonitalic_ε increases for all private methods. Moreover, the error of 2-SLR is consistently better than Lasso, while error of M-SLR quickly drops below Lasso at medium privacy levels (ε2𝜀2\varepsilon\geq 2italic_ε ≥ 2). This shows the superiority of our methods compared to fitting Lasso using only local information.

Refer to caption
(a) n=100𝑛100n=100italic_n = 100
Refer to caption
(b) n=400𝑛400n=400italic_n = 400
Refer to caption
(c) n=800𝑛800n=800italic_n = 800
Figure 3: Experiments w.r.t. m𝑚mitalic_m and 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT error.

Finally, we analyze the impact of sample sizes. We conducted experiments with varying m𝑚mitalic_m (ranging from 50 to 200) under different n𝑛nitalic_n, comparing the performance of our methods with Lasso on local samples. The results for varying m𝑚mitalic_m are presented in Figure 3. We observe that given a sufficiently large n=800𝑛800n=800italic_n = 800, 2-SLR always outperforms Lasso, and M-SLR performs comparably even for ε=1𝜀1\varepsilon=1italic_ε = 1. If n=400𝑛400n=400italic_n = 400, only M-SLR with ε=1𝜀1\varepsilon=1italic_ε = 1 performs worse than Lasso. However, given an insufficient n=100𝑛100n=100italic_n = 100, Lasso performs comparably to 2-SLR with ε=2𝜀2\varepsilon=2italic_ε = 2. Similarly, in Figures 4(a) and 4(b), the 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT error decreases as n𝑛nitalic_n increases for all ε𝜀\varepsilonitalic_ε. The results indicate that our methods outperform Lasso under various (n,m,ε)𝑛𝑚𝜀(n,m,\varepsilon)( italic_n , italic_m , italic_ε ) settings, except for M-SLR with ε=1𝜀1\varepsilon=1italic_ε = 1. This observation is reasonable and aligns with phenomena commonly observed in ULDP learning, federated learning, or transfer learning, where incorporating information from other data sources may not necessarily improve estimation if the quality of that additional information is low due to factors such as privacy constraints, data heterogeneity, or data compression.

Moreover, we set nm=400×100𝑛𝑚400100nm=400\times 100italic_n italic_m = 400 × 100 and varied the ratio n/m𝑛𝑚n/mitalic_n / italic_m. In Figure 4(c), we observe that, for each ε𝜀\varepsilonitalic_ε, the error of 2-SLR remains stable when nm𝑛𝑚n\approx mitalic_n ≈ italic_m, while it slightly increases when either n𝑛nitalic_n or m𝑚mitalic_m is too small, which is consistent with Theorem 3.6. Furthermore, the performance of M-SLR is more sensitive to n𝑛nitalic_n becoming small. This is attributed to its gradient nature, which requires a large number of users.

Refer to caption
(a) n𝑛nitalic_n - 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT error.
Refer to caption
(b) m𝑚mitalic_m - 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT error.
Refer to caption
(c) n/m𝑛𝑚n/mitalic_n / italic_m - 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT error.
Figure 4: Experiments w.r.t. n𝑛nitalic_n and n/m𝑛𝑚n/mitalic_n / italic_m.

4.2 Real Data

Table 2: Real data performances. To ensure significance, we employ the Wilcoxon signed-rank test (Wilcoxon, 1992) with a significance level of 0.05 to determine if a result is significantly better. The best results are bolded and those holding significance towards the rest results are marked with *.
Budget Datasets NP-2-SLR NP-M-SLR Lasso 2-SLR M-SLR LDPPROX LDPIHT
ε=1𝜀1\varepsilon=1italic_ε = 1 Airline 1.01 0.82 1.02 1.02 0.98* 1.38 1.85
Loan 0.97 0.88 0.99 0.98 0.97 5.27 2.00
MIP 1.00 0.96 1.65 1.00 0.98* 2.54 1.87
Taxi 0.95 0.01 1.04 0.96 0.01* 1.20 1.02
Wine 1.19 1.17 1.14* 1.34 1.37 7.71 2.30
Yolanda 1.10 1.14 1.19 1.19 1.22 1.90 2.36
ε=4𝜀4\varepsilon=4italic_ε = 4 Airline 1.01 0.82 1.02 1.02 0.88* 1.15 1.02
Loan 0.97 0.88 0.99 0.98 0.90* 2.05 1.65
MIP 1.00 0.96 1.65 1.01 0.96* 3.30 1.82
Taxi 0.95 0.01 1.04 0.95 0.01* 1.16 1.88
Wine 1.19 1.17 1.14* 1.19 1.27 5.39 1.74
Yolanda 1.10 1.14 1.19 1.11* 1.18 1.79 2.03
Rank sum - 31 24 19 56 50

We conduct experiments on six real datasets with various sample sizes and dimensionalities. Among the datasets, Airline and Taxi are the most suitable for our setting, where each user possesses small local samples with large dimensions. The datasets contain sensitive information and have been used in privacy research (Ma et al., 2024b). The other datasets are manually grouped to fit our framework. See Appendix E.3 for description of datasets.

We first compute the mean squared error over 30 random train-test splits for ε=1𝜀1\varepsilon=1italic_ε = 1 and ε=4𝜀4\varepsilon=4italic_ε = 4. To standardize the scale across datasets, we report the MSE ratio relative to non-private fitting with Lasso over all samples. The results are displayed in Table 2. For both high privacy (ε=1𝜀1\varepsilon=1italic_ε = 1) and medium privacy (ε=4𝜀4\varepsilon=4italic_ε = 4), the proposed methods significantly outperform competitors in terms of both average performance (rank sum) and the number of best results achieved. It is worth noting that in most cases, Lasso fitted on local datasets outperforms LDP competitors, yielding the effortlessness of LDP sparse regression. Moreover, the running time of the methods is displayed in Appendix E.3. The results show that, if properly paralleled, our methods are quite efficient.

We observe that our methods (2-SLR and M-SLR) can sometimes outperform non-private Lasso on the whole data. This is somewhat expected. As explained in previous literature (Ndaoud, 2019), given strong signal strength (minβj>0|βj|subscriptsuperscript𝛽𝑗0superscript𝛽𝑗\min_{\beta^{j}>0}|\beta^{j}|roman_min start_POSTSUBSCRIPT italic_β start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT > 0 end_POSTSUBSCRIPT | italic_β start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | is large), the optimal error can actually be improved and simply performing Lasso does not achieve this optimality. Moreover, methodological works (Wang et al., 2011; Liang et al., 2023) showed the effectiveness of selecting candidate variables by aggregating Lasso fitted on random subsamples. Intuitively, even with strong signal strength, fitting Lasso does not guarantee the selection of all true variables due to randomness, while aggregating variables selected on random subsamples is more likely to identify true variables. We validated our conjecture by running non-private SLRs (ε=1024𝜀1024\varepsilon=1024italic_ε = 1024). The results are presented in Table 2. We observe that 2-SLR and M-SLR never outperform their non-private counterparts, while non-private SLRs occasionally outperform Lasso on some datasets.

We also observe that 2-SLR outperforms M-SLR in simulation, while the opposite is true in real data. The phenomenon is attributed to the implicit regularization. In synthetic data, where the data is neatly generated, estimations tend to converge well. However, real data often contains more noise, leading to potentially unstable estimations. In such cases, using zero coefficients as the initial point yields a regularized estimator (Ali et al., 2019), which are biased yet stable.

5 Discussion

In this work, we investigate the ULDP sparse linear regression. By proposing a two-phase solution, we show the theoretical advantage of having multiple samples per user, which is then validated by exhaustive experiments.

It is worth mentioning that we do not explore scenarios where m𝑚mitalic_m is small, such as mslogd𝑚superscript𝑠𝑑m\leq s^{*}\log ditalic_m ≤ italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT roman_log italic_d. Our experiments, particularly with the MIP dataset, demonstrate that even with few local samples, satisfactory results can be achieved. However, dealing with small m𝑚mitalic_m may require a comprehensive distributional analysis of variable selection, which could be a promising avenue for future research. We also hope to establish a tight minimax lower bound of sparse estimation under ULDP.

Currently, we consider a support estimation based algorithm. As suggested by the reviewers, an interesting topic would be an algorithm that simultaneously learns the sparse coefficients and optimizes the model, potentially with a Lasso-type optimization objective. Directly solving such a problem is ineffective (Bassily & Sun, 2023). Each update step will involve updating ds𝑑𝑠d-sitalic_d - italic_s redundant parameters, whose information needs to be protected under differential privacy. Thus, excessive random noise is injected. By utilizing support estimation, we circumvent this issue in the second phase, leading to improved final rates. A private analog for algorithms with limited message passing each round is promising, such as least-angle regression (LARS) or coordinate gradient descent.

Impact Statement

We believe that it is difficult to clearly foresee societal consequence of the present work, which has a primary focus on machine learning theory and methodology. We believe this work can serve as a forward step to enclosing the gap between the theoretical study of LDP and practical situations.

Acknowledgement

We would like to thank the reviewers for their help and advice, which led to a significant improvement of the article. We also thank Yifan Gu for providing discussion on variable selection issues. The research is supported by the Special Funds of the National Natural Science Foundation of China (Grant No. 72342010). Yuheng Ma is supported by the Outstanding Innovative Talents Cultivation Funded Programs 2023 of Renmin University of China. This research is also supported by Public Computing Cloud, Renmin University of China.

References

  • Acharya et al. (2020) Acharya, J., Canonne, C. L., Sun, Z., and Tyagi, H. Unified lower bounds for interactive high-dimensional estimation under information constraints. arXiv preprint arXiv:2010.06562, 2020.
  • Acharya et al. (2021) Acharya, J., Kairouz, P., Liu, Y., and Sun, Z. Estimating sparse discrete distributions under privacy and communication constraints. In Proceedings of the 32nd International Conference on Algorithmic Learning Theory, volume 132 of Proceedings of Machine Learning Research. PMLR, 16–19 Mar 2021.
  • Acharya et al. (2023) Acharya, J., Liu, Y., and Sun, Z. Discrete distribution estimation under user-level local differential privacy. In International Conference on Artificial Intelligence and Statistics, pp.  8561–8585. PMLR, 2023.
  • Alabi et al. (2022) Alabi, D., McMillan, A., Sarathy, J., Smith, A., and Vadhan, S. Differentially private simple linear regression. Proceedings on Privacy Enhancing Technologies, 2022.
  • Ali et al. (2019) Ali, A., Kolter, J. Z., and Tibshirani, R. J. A continuous-time view of early stopping for least squares regression. In The 22nd international conference on artificial intelligence and statistics, pp.  1370–1378. PMLR, 2019.
  • Amin et al. (2023) Amin, K., Joseph, M., Ribero, M., and Vassilvitskii, S. Easy differentially private linear regression. In The Eleventh International Conference on Learning Representations, 2023.
  • Arora et al. (2022) Arora, R., Bassily, R., Guzmán, C., Menart, M., and Ullah, E. Differentially private generalized linear models revisited. Advances in Neural Information Processing Systems, 35:22505–22517, 2022.
  • Avella-Medina et al. (2023) Avella-Medina, M., Bradshaw, C., and Loh, P.-L. Differentially private inference via noisy optimization. The Annals of Statistics, 51(5):2067–2092, 2023.
  • Barik & Honorio (2020) Barik, A. and Honorio, J. Exact support recovery in federated regression with one-shot communication. arXiv preprint arXiv:2006.12583, 2020.
  • Bassily & Sun (2023) Bassily, R. and Sun, Z. User-level private stochastic convex optimization with optimal rates. In International Conference on Machine Learning, pp. 1838–1851. PMLR, 2023.
  • Bassily et al. (2020) Bassily, R., Nissim, K., Stemmer, U., and Thakurta, A. Practical locally private heavy hitters. The Journal of Machine Learning Research, 21(1):535–576, 2020.
  • Belloni & Chernozhukov (2013) Belloni, A. and Chernozhukov, V. Least squares after model selection in high-dimensional sparse models. Bernoulli, 19(2):521, 2013.
  • Bergdoll (2019) Bergdoll, R.-D. Mip-2016-regression, 2019. URL https://www.openml.org/search?type=data&status=active&id=41702.
  • Cai et al. (2021) Cai, T. T., Wang, Y., and Zhang, L. The cost of privacy: Optimal rates of convergence for parameter estimation with differential privacy. The Annals of Statistics, 49(5):2825–2850, 2021.
  • Cai et al. (2023) Cai, Z., Li, S., Xia, X., and Zhang, L. Private estimation and inference in high-dimensional regression with fdr control. arXiv preprint arXiv:2310.16260, 2023.
  • Cohen et al. (2023) Cohen, E., Lyu, X., Nelson, J., Sarlos, T., and Stemmer, U. Hot pate: Private aggregation of distributions for diverse task. arXiv preprint arXiv:2312.02132, 2023.
  • Cortez et al. (2009) Cortez, P., Cerdeira, A., Almeida, F., Matos, T., and Reis, J. Modeling wine preferences by data mining from physicochemical properties. Decision support systems, 47(4):547–553, 2009.
  • Cummings et al. (2022) Cummings, R., Feldman, V., McMillan, A., and Talwar, K. Mean estimation with user-level privacy under data heterogeneity. Advances in Neural Information Processing Systems, 35:29139–29151, 2022.
  • Dieuleveut et al. (2017) Dieuleveut, A., Flammarion, N., and Bach, F. Harder, better, faster, stronger convergence rates for least-squares regression. The Journal of Machine Learning Research, 18(1):3520–3570, 2017.
  • DrivenData (2021a) DrivenData. Loan default prediction - imperial college london, 2021a. URL https://www.kaggle.com/c/loan-default-prediction/data.
  • DrivenData (2021b) DrivenData. Differential privacy temporal map challenge: Sprint 3 (prescreened arena), 2021b. URL https://www.drivendata.org/competitions/77/deid2-sprint-3-prescreened/page/332/.
  • Duchi & Rogers (2019) Duchi, J. and Rogers, R. Lower bounds for locally private estimation via communication complexity. In Conference on Learning Theory, pp.  1161–1191. PMLR, 2019.
  • Duchi et al. (2018) Duchi, J., Jordan, M., and Wainwright, M. Minimax optimal procedures for locally private estimation. Journal of the American Statistical Association, 113(521):182–201, 2018.
  • Dwork et al. (2006) Dwork, C., McSherry, F., Nissim, K., and Smith, A. Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference, pp.  265–284. Springer, 2006.
  • Fan & Li (2001) Fan, J. and Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association, 96(456):1348–1360, 2001.
  • Fan & Lv (2008) Fan, J. and Lv, J. Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society Series B: Statistical Methodology, 70(5):849–911, 2008.
  • Fan & Lv (2011) Fan, J. and Lv, J. Nonconcave penalized likelihood with np-dimensionality. IEEE Transactions on Information Theory, 57(8):5467–5484, 2011.
  • Fan et al. (2014) Fan, J., Xue, L., and Zou, H. Strong oracle optimality of folded concave penalized estimation. Annals of statistics, 42(3):819, 2014.
  • Ghazi et al. (2021) Ghazi, B., Kumar, R., and Manurangsi, P. User-level differentially private learning via correlated sampling. Advances in Neural Information Processing Systems, 34:20172–20184, 2021.
  • Ghazi et al. (2023) Ghazi, B., Kamath, P., Kumar, R., Manurangsi, P., Meka, R., and Zhang, C. On user-level private convex optimization. In International Conference on Machine Learning, pp. 11283–11299. PMLR, 2023.
  • Girgis et al. (2022) Girgis, A. M., Data, D., and Diggavi, S. Distributed user-level private mean estimation. In 2022 IEEE International Symposium on Information Theory (ISIT), pp.  2196–2201. IEEE, 2022.
  • Guyon et al. (2019) Guyon, I., Sun-Hosoya, L., Boullé, M., Escalante, H. J., Escalera, S., Liu, Z., Jajetic, D., Ray, B., Saeed, M., Sebag, M., et al. Analysis of the automl challenge series. Automated Machine Learning, 177, 2019.
  • Hsu et al. (2012) Hsu, D., Kakade, S. M., and Zhang, T. A tail inequality for quadratic forms of subgaussian random vectors. Electronic Communications in Probability, 17:1, 2012.
  • Hu et al. (2022) Hu, L., Ni, S., Xiao, H., and Wang, D. High dimensional differentially private stochastic optimization with heavy-tailed data. In Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pp.  227–236, 2022.
  • Johnstone (1994) Johnstone, I. M. On minimax estimation of a sparse normal mean vector. The Annals of Statistics, pp.  271–289, 1994.
  • Kairouz et al. (2014) Kairouz, P., Oh, S., and Viswanath, P. Extremal mechanisms for local differential privacy. Advances in neural information processing systems, 27, 2014.
  • Kairouz et al. (2016) Kairouz, P., Bonawitz, K., and Ramage, D. Discrete distribution estimation under local privacy. In International Conference on Machine Learning, pp. 2436–2444. PMLR, 2016.
  • Kent et al. (2024) Kent, A., Berrett, T. B., and Yu, Y. Rate optimality and phase transition for user-level local differential privacy. arXiv preprint arXiv:2405.11923, 2024.
  • Khanna et al. (2023a) Khanna, A., Lu, F., and Raff, E. The challenge of differentially private screening rules. arXiv preprint arXiv:2303.10303, 2023a.
  • Khanna et al. (2023b) Khanna, A., Lu, F., and Raff, E. Sparse private lasso logistic regression. arXiv preprint arXiv:2304.12429, 2023b.
  • Kifer et al. (2012) Kifer, D., Smith, A., and Thakurta, A. Private convex empirical risk minimization and high-dimensional regression. In Conference on Learning Theory, pp.  25–1. JMLR Workshop and Conference Proceedings, 2012.
  • Kumar & Deisenroth (2019) Kumar, K. and Deisenroth, M. P. Differentially private empirical risk minimization with sparsity-inducing norms. arXiv preprint arXiv:1905.04873, 2019.
  • LeDell (2020) LeDell, E. Airlines depdelay 10m, 2020. URL https://www.openml.org/search?type=data&status=active&id=42728.
  • Levy et al. (2021) Levy, D., Sun, Z., Amin, K., Kale, S., Kulesza, A., Mohri, M., and Suresh, A. T. Learning with user-level privacy. Advances in Neural Information Processing Systems, 34:12466–12479, 2021.
  • Liang et al. (2023) Liang, J., Wang, C., Zhang, D., Xie, Y., Zeng, Y., Li, T., Zuo, Z., Ren, J., and Zhao, Q. Vsolassobag: a variable-selection oriented lasso bagging algorithm for biomarker discovery in omic-based translational research. Journal of Genetics and Genomics, 50(3):151–162, 2023.
  • Liu et al. (2022) Liu, X., Kong, W., and Oh, S. Differential privacy and robust statistics in high dimensions. In Conference on Learning Theory, pp.  1167–1246. PMLR, 2022.
  • Liu et al. (2020) Liu, Y., Suresh, A. T., Yu, F. X. X., Kumar, S., and Riley, M. Learning discrete distributions: user vs item-level privacy. Advances in Neural Information Processing Systems, 33:20965–20976, 2020.
  • Ma & Yang (2024) Ma, Y. and Yang, H. Optimal locally private nonparametric classification with public data. Journal of Machine Learning Research, 2024.
  • Ma et al. (2024a) Ma, Y., Jia, K., and Yang, H. Locally private estimation with public features. arXiv preprint arXiv:2405.13481, 2024a.
  • Ma et al. (2024b) Ma, Y., Zhang, H., Cai, Y., and Yang, H. Decision tree for locally private estimation with public data. Advances in Neural Information Processing Systems, 36, 2024b.
  • Narayanan et al. (2022) Narayanan, S., Mirrokni, V., and Esfandiari, H. Tight and robust private mean estimation with few users. In International Conference on Machine Learning, pp. 16383–16412. PMLR, 2022.
  • Ndaoud (2019) Ndaoud, M. Interplay of minimax estimation and minimax support recovery under sparsity. In Algorithmic Learning Theory, pp.  647–668. PMLR, 2019.
  • Papernot & Steinke (2021) Papernot, N. and Steinke, T. Hyperparameter tuning with renyi differential privacy. In International Conference on Learning Representations, 2021.
  • Pedregosa et al. (2011) Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
  • Qin et al. (2016) Qin, Z., Yang, Y., Yu, T., Khalil, I., Xiao, X., and Ren, K. Heavy hitter estimation over set-valued data with local differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pp.  192–203, 2016.
  • Raff et al. (2023) Raff, E., Khanna, A. A., and Lu, F. Scaling up differentially private lasso regularized logistic regression via faster frank-wolfe iterations. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  • Smith et al. (2017) Smith, A., Thakurta, A., and Upadhyay, J. Is interaction necessary for distributed private learning? In 2017 IEEE Symposium on Security and Privacy (SP), pp. 58–77. IEEE, 2017.
  • Talwar et al. (2015) Talwar, K., Guha Thakurta, A., and Zhang, L. Nearly optimal private lasso. Advances in Neural Information Processing Systems, 28, 2015.
  • Tibshirani (1996) Tibshirani, R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 58(1):267–288, 1996.
  • Tramèr et al. (2022) Tramèr, F., Kamath, G., and Carlini, N. Considerations for differentially private learning with large-scale public pretraining. arXiv preprint arXiv:2212.06470, 2022.
  • Varshney et al. (2022) Varshney, P., Thakurta, A., and Jain, P. (nearly) optimal private linear regression for sub-gaussian data via adaptive clipping. volume 178 of Proceedings of Machine Learning Research, pp. 1126–1166. PMLR, 2022.
  • Wainwright (2019) Wainwright, M. J. High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge University Press, 2019.
  • Wang & Xu (2019) Wang, D. and Xu, J. On sparse linear regression in the local differential privacy model. In International Conference on Machine Learning, pp. 6628–6637. PMLR, 2019.
  • Wang et al. (2011) Wang, S., Nan, B., Rosset, S., and Zhu, J. Random lasso. The annals of applied statistics, 5(1):468, 2011.
  • Wang et al. (2023) Wang, S., Li, Y., Zhong, Y., Chen, K., Wang, X., Zhou, Z., Peng, F., Qian, Y., Du, J., and Yang, W. Locally private set-valued data analyses: Distribution and heavy hitters estimation. IEEE Transactions on Mobile Computing, 2023.
  • Wang (2018) Wang, Y. Revisiting differentially private linear regression: optimal and adaptive prediction & estimation in unbounded domain. In Proceedings of the Thirty-Fourth Conference on Uncertainty in Artificial Intelligence, pp.  93–103, 2018.
  • Warner (1965) Warner, S. L. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309):63–69, 1965.
  • Wilcoxon (1992) Wilcoxon, F. Individual comparisons by ranking methods. In Breakthroughs in statistics, pp.  196–202. Springer, 1992.
  • Zhang & Zhang (2021) Zhang, Z. and Zhang, L. High-dimensional differentially-private em algorithm: Methods and near-optimal statistical guarantees. arXiv preprint arXiv:2104.00245, 2021.
  • Zhao & Yu (2006) Zhao, P. and Yu, B. On model selection consistency of lasso. The Journal of Machine Learning Research, 7:2541–2563, 2006.
  • Zheng et al. (2017) Zheng, K., Mou, W., and Wang, L. Collect at once, use effectively: Making non-interactive locally private learning possible. In International Conference on Machine Learning, pp. 4130–4139. PMLR, 2017.
  • Zhou et al. (2022) Zhou, M., Wang, T., Chan, T. H., Fanti, G., and Shi, E. Locally differentially private sparse vector aggregation. In 2022 IEEE Symposium on Security and Privacy (SP), pp. 422–439. IEEE, 2022.
  • Zhu et al. (2023) Zhu, L., Ding, M., Aggarwal, V., Xu, J., and Wang, D. Improved analysis of sparse linear regression in local differential privacy model. arXiv preprint arXiv:2310.07367, 2023.
  • Zhu et al. (2020) Zhu, W., Kairouz, P., McMahan, B., Sun, H., and Li, W. Federated heavy hitters discovery with differential privacy. In International Conference on Artificial Intelligence and Statistics, pp.  3837–3847. PMLR, 2020.

In this appendix, we provide the omitted content for minimax lower bound (Appendix A), the algorithm and theoretical results of candidate variable selection (Appendix B), the algorithm and theoretical results of coefficient estimation (Appendix C), an extension from our framework to general problems (Appendix D), and details as well as additional results of experiments (Appendix E).

Appendix A Minimax Lower Bound

We first borrow assumptions and definitions from Acharya et al. (2020). Let Z=(Z1,,Zd)𝑍subscript𝑍1subscript𝑍𝑑Z=\left(Z_{1},\ldots,Z_{d}\right)italic_Z = ( italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_Z start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) be a random variable over 𝒵={1,+1}d𝒵superscript11𝑑\mathcal{Z}=\{-1,+1\}^{d}caligraphic_Z = { - 1 , + 1 } start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT such that [Zi=1]=τdelimited-[]subscript𝑍𝑖1𝜏\mathbb{P}\left[Z_{i}=1\right]=\taublackboard_P [ italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 ] = italic_τ for all i[d]𝑖delimited-[]𝑑i\in[d]italic_i ∈ [ italic_d ] and the Zisubscript𝑍𝑖Z_{i}italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT s are all independent; we denote this distribution by Rad(τ)d\operatorname{Rad}(\tau)^{\otimes d}roman_Rad ( italic_τ ) start_POSTSUPERSCRIPT ⊗ italic_d end_POSTSUPERSCRIPT. For z𝒵𝑧𝒵z\in\mathcal{Z}italic_z ∈ caligraphic_Z, we denote zi𝒵superscript𝑧direct-sum𝑖𝒵z^{\oplus i}\in\mathcal{Z}italic_z start_POSTSUPERSCRIPT ⊕ italic_i end_POSTSUPERSCRIPT ∈ caligraphic_Z as the vector obtained by flipping the sign of the i𝑖iitalic_i-th coordinate of z𝑧zitalic_z.

Condition A.1.

For every z𝒵𝑧𝒵z\in\mathcal{Z}italic_z ∈ caligraphic_Z and i[d]𝑖delimited-[]𝑑i\in[d]italic_i ∈ [ italic_d ] it holds that PziPzmuch-less-thansubscriptPsuperscript𝑧direct-sum𝑖subscriptP𝑧\mathrm{P}_{z^{\oplus i}}\ll\mathrm{P}_{z}roman_P start_POSTSUBSCRIPT italic_z start_POSTSUPERSCRIPT ⊕ italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≪ roman_P start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT (we refer to PβzsubscriptPsubscript𝛽𝑧\mathrm{P}_{\beta_{z}}roman_P start_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT end_POSTSUBSCRIPT simply as PzsubscriptP𝑧\mathrm{P}_{z}roman_P start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT), and there exist measurable functions ϕz,i:d:subscriptitalic-ϕ𝑧𝑖superscript𝑑\phi_{z,i}:\mathbb{R}^{d}\rightarrow\mathbb{R}italic_ϕ start_POSTSUBSCRIPT italic_z , italic_i end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → blackboard_R such that

dPzidPz=1+ϕz,i.subscriptdPsuperscript𝑧direct-sum𝑖subscriptdP𝑧1subscriptitalic-ϕ𝑧𝑖\displaystyle\frac{\mathrm{d}\mathrm{P}_{z^{\oplus i}}}{\mathrm{~{}d}\mathrm{P% }_{z}}=1+\phi_{z,i}.divide start_ARG roman_dP start_POSTSUBSCRIPT italic_z start_POSTSUPERSCRIPT ⊕ italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG start_ARG roman_dP start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT end_ARG = 1 + italic_ϕ start_POSTSUBSCRIPT italic_z , italic_i end_POSTSUBSCRIPT .
Condition A.2.

There exists some α20superscript𝛼20\alpha^{2}\geq 0italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ 0 such that, for all z𝒵𝑧𝒵z\in\mathcal{Z}italic_z ∈ caligraphic_Z and distinct i,j𝑖𝑗absenti,j\initalic_i , italic_j ∈ [d],𝔼Pz[ϕz,iϕz,j]=0delimited-[]𝑑subscript𝔼subscriptP𝑧delimited-[]subscriptitalic-ϕ𝑧𝑖subscriptitalic-ϕ𝑧𝑗0[d],\mathbb{E}_{\mathrm{P}_{z}}\left[\phi_{z,i}\cdot\phi_{z,j}\right]=0[ italic_d ] , blackboard_E start_POSTSUBSCRIPT roman_P start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_ϕ start_POSTSUBSCRIPT italic_z , italic_i end_POSTSUBSCRIPT ⋅ italic_ϕ start_POSTSUBSCRIPT italic_z , italic_j end_POSTSUBSCRIPT ] = 0 and 𝔼Pz[ϕz,i2]α2subscript𝔼subscriptP𝑧delimited-[]superscriptsubscriptitalic-ϕ𝑧𝑖2superscript𝛼2\mathbb{E}_{\mathrm{P}_{z}}\left[\phi_{z,i}^{2}\right]\leq\alpha^{2}blackboard_E start_POSTSUBSCRIPT roman_P start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_ϕ start_POSTSUBSCRIPT italic_z , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

Condition A.3.

For every z,z𝒵={1,+1}d𝑧superscript𝑧𝒵superscript11𝑑z,z^{\prime}\in\mathcal{Z}=\{-1,+1\}^{d}italic_z , italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_Z = { - 1 , + 1 } start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT,

2(θz,θz)=4ν(dHam(z,z)τd)1/2subscript2subscript𝜃𝑧subscript𝜃superscript𝑧4𝜈superscriptsubscriptdHam𝑧superscript𝑧𝜏𝑑12\displaystyle\ell_{2}\left(\theta_{z},\theta_{z^{\prime}}\right)=4\nu\left(% \frac{\mathrm{d}_{\mathrm{Ham}}\left(z,z^{\prime}\right)}{\tau d}\right)^{1/2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) = 4 italic_ν ( divide start_ARG roman_d start_POSTSUBSCRIPT roman_Ham end_POSTSUBSCRIPT ( italic_z , italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_τ italic_d end_ARG ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT

where dHam(z,z):=i=1d𝟏{zizi}assignsubscriptdHam𝑧superscript𝑧superscriptsubscript𝑖1𝑑1subscript𝑧𝑖superscriptsubscript𝑧𝑖\mathrm{d}_{\mathrm{Ham}}\left(z,z^{\prime}\right):=\sum_{i=1}^{d}\boldsymbol{% 1}\left\{z_{i}\neq z_{i}^{\prime}\right\}roman_d start_POSTSUBSCRIPT roman_Ham end_POSTSUBSCRIPT ( italic_z , italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) := ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT bold_1 { italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≠ italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT } denotes the Hamming distance, where τ=s/2d,s𝜏superscript𝑠2𝑑superscript𝑠\tau=s^{*}/2d,s^{*}italic_τ = italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT / 2 italic_d , italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and ν𝜈\nuitalic_ν denotes sparsity and error rate respectively.

Proof of Theorem 2.4.

First, suppose Xjsuperscript𝑋𝑗X^{j}italic_X start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT is uniformly distributed on {1,1}11\{-1,1\}{ - 1 , 1 } for 1jd1𝑗𝑑1\leq j\leq d1 ≤ italic_j ≤ italic_d. Let

βZ,j=42νsZj+12superscriptsubscript𝛽𝑍𝑗42𝜈superscript𝑠subscript𝑍𝑗12\displaystyle\beta_{Z,j}^{*}=\frac{4\sqrt{2}\nu}{\sqrt{s^{*}}}\frac{Z_{j}+1}{2}italic_β start_POSTSUBSCRIPT italic_Z , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = divide start_ARG 4 square-root start_ARG 2 end_ARG italic_ν end_ARG start_ARG square-root start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG end_ARG divide start_ARG italic_Z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + 1 end_ARG start_ARG 2 end_ARG

for 1jd1𝑗𝑑1\leq j\leq d1 ≤ italic_j ≤ italic_d where Zjsubscript𝑍𝑗Z_{j}italic_Z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPTs are i.i.d. random variables with

Pr[Zi=+1]=s2d,Pr[Zi=1]=1s2d.formulae-sequencePrdelimited-[]subscript𝑍𝑖1superscript𝑠2𝑑Prdelimited-[]subscript𝑍𝑖11superscript𝑠2𝑑\displaystyle\mathrm{Pr}\left[Z_{i}=+1\right]=\frac{s^{*}}{2d},\quad\mathrm{Pr% }\left[Z_{i}=-1\right]=1-\frac{s^{*}}{2d}.roman_Pr [ italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = + 1 ] = divide start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_d end_ARG , roman_Pr [ italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = - 1 ] = 1 - divide start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_d end_ARG .

There holds βZsubscriptsuperscript𝛽𝑍\beta^{*}_{Z}italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT satisfies the conditions that βZ1subscriptnormsubscriptsuperscript𝛽𝑍1\|\beta^{*}_{Z}\|_{\infty}\leq 1∥ italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≤ 1 and βZ0ssubscriptnormsubscriptsuperscript𝛽𝑍0superscript𝑠\|\beta^{*}_{Z}\|_{0}\leq s^{*}∥ italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ≤ italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT with probability 1s/2d1superscript𝑠2𝑑1-s^{*}/2d1 - italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT / 2 italic_d using Fact 1 in Acharya et al. (2020). Next, for each Z𝑍Zitalic_Z we let:

σZ={1X,βZ w.p. 1+X,βZ21X,βZ w.p. 1X,βZ2subscript𝜎𝑍cases1𝑋subscriptsuperscript𝛽𝑍 w.p. 1𝑋subscriptsuperscript𝛽𝑍21𝑋subscriptsuperscript𝛽𝑍 w.p. 1𝑋subscriptsuperscript𝛽𝑍2\displaystyle\sigma_{Z}=\left\{\begin{array}[]{lll}1-\left\langle X,\beta^{*}_% {Z}\right\rangle&\text{ w.p. }&\frac{1+\left\langle X,\beta^{*}_{Z}\right% \rangle}{2}\\ -1-\left\langle X,\beta^{*}_{Z}\right\rangle&\text{ w.p. }&\frac{1-\left% \langle X,\beta^{*}_{Z}\right\rangle}{2}\end{array}\right.italic_σ start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT = { start_ARRAY start_ROW start_CELL 1 - ⟨ italic_X , italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ⟩ end_CELL start_CELL w.p. end_CELL start_CELL divide start_ARG 1 + ⟨ italic_X , italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ⟩ end_ARG start_ARG 2 end_ARG end_CELL end_ROW start_ROW start_CELL - 1 - ⟨ italic_X , italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ⟩ end_CELL start_CELL w.p. end_CELL start_CELL divide start_ARG 1 - ⟨ italic_X , italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ⟩ end_ARG start_ARG 2 end_ARG end_CELL end_ROW end_ARRAY

Thus, Y{1,1}𝑌11Y\in\{-1,1\}italic_Y ∈ { - 1 , 1 }. The above distribution satisfies (1) with probability 1s/2d1superscript𝑠2𝑑1-s^{*}/2d1 - italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT / 2 italic_d. The distribution PZsubscriptP𝑍\mathrm{P}_{Z}roman_P start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT has density function (1+YX,βZ)/2d+11𝑌𝑋subscriptsuperscript𝛽𝑍superscript2𝑑1\left(1+Y\left\langle X,\beta^{*}_{Z}\right\rangle\right)/{2^{d+1}}( 1 + italic_Y ⟨ italic_X , italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ⟩ ) / 2 start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT for (X,Y){+1,1}d+1𝑋𝑌superscript11𝑑1(X,Y)\in\{+1,-1\}^{d+1}( italic_X , italic_Y ) ∈ { + 1 , - 1 } start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT. Then, for the i𝑖iitalic_i-th user who has the data sample (Xi,yi)subscript𝑋𝑖subscript𝑦𝑖\left(X_{i},y_{i}\right)( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) from the distribution PZmsuperscriptsubscriptP𝑍𝑚\mathrm{P}_{Z}^{m}roman_P start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT, it sends its information through a private algorithm 𝒮𝒮\mathcal{S}caligraphic_S after getting messages S1,,Si1subscript𝑆1subscript𝑆𝑖1S_{1},\cdots,S_{i-1}italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_S start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT. By definition, for 1jm1𝑗𝑚1\leq j\leq m1 ≤ italic_j ≤ italic_m, we have

dPzkdPz=j=1m1+yi,jXi,j,βzk1+yi,jXi,j,βz=j=1m1+yi,jXi,j,βzkβz1+yi,jXi,j,βz=j=1m1yi,jXi,jkzk1+yi,jXi,j,βz42νs𝑑subscriptPsuperscript𝑧direct-sum𝑘𝑑subscriptP𝑧superscriptsubscriptproduct𝑗1𝑚1subscript𝑦𝑖𝑗subscript𝑋𝑖𝑗subscript𝛽superscript𝑧direct-sum𝑘1subscript𝑦𝑖𝑗subscript𝑋𝑖𝑗subscript𝛽𝑧superscriptsubscriptproduct𝑗1𝑚1subscript𝑦𝑖𝑗subscript𝑋𝑖𝑗subscript𝛽superscript𝑧direct-sum𝑘subscript𝛽𝑧1subscript𝑦𝑖𝑗subscript𝑋𝑖𝑗subscript𝛽𝑧superscriptsubscriptproduct𝑗1𝑚1subscript𝑦𝑖𝑗superscriptsubscript𝑋𝑖𝑗𝑘subscript𝑧𝑘1subscript𝑦𝑖𝑗subscript𝑋𝑖𝑗subscript𝛽𝑧42𝜈superscript𝑠\displaystyle\frac{d\mathrm{P}_{z^{\oplus k}}}{d\mathrm{P}_{z}}=\prod_{j=1}^{m% }\frac{1+y_{i,j}\left\langle X_{i,j},\beta_{z^{\oplus k}}\right\rangle}{1+y_{i% ,j}\left\langle X_{i,j},\beta_{z}\right\rangle}=\prod_{j=1}^{m}1+\frac{y_{i,j}% \left\langle X_{i,j},\beta_{z^{\oplus k}}-\beta_{z}\right\rangle}{1+y_{i,j}% \left\langle X_{i,j},\beta_{z}\right\rangle}=\prod_{j=1}^{m}1-\frac{y_{i,j}X_{% i,j}^{k}z_{k}}{1+y_{i,j}\left\langle X_{i,j},\beta_{z}\right\rangle}\cdot\frac% {4\sqrt{2}\nu}{\sqrt{s^{*}}}divide start_ARG italic_d roman_P start_POSTSUBSCRIPT italic_z start_POSTSUPERSCRIPT ⊕ italic_k end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG start_ARG italic_d roman_P start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT end_ARG = ∏ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT divide start_ARG 1 + italic_y start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ⟨ italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_z start_POSTSUPERSCRIPT ⊕ italic_k end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ⟩ end_ARG start_ARG 1 + italic_y start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ⟨ italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ⟩ end_ARG = ∏ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT 1 + divide start_ARG italic_y start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ⟨ italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_z start_POSTSUPERSCRIPT ⊕ italic_k end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_β start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ⟩ end_ARG start_ARG 1 + italic_y start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ⟨ italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ⟩ end_ARG = ∏ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT 1 - divide start_ARG italic_y start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG 1 + italic_y start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ⟨ italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ⟩ end_ARG ⋅ divide start_ARG 4 square-root start_ARG 2 end_ARG italic_ν end_ARG start_ARG square-root start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG end_ARG (8)

where the last step follows from Zhu et al. (2023). If we let ν𝜈\nuitalic_ν to be small enough, we can guarantee that |yi,jXi,j,βz|1/2subscript𝑦𝑖𝑗subscript𝑋𝑖𝑗subscript𝛽𝑧12|y_{i,j}\langle X_{i,j},\beta_{z}\rangle|\leq 1/2| italic_y start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ⟨ italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ⟩ | ≤ 1 / 2 for each z𝑧zitalic_z and |yi,jXi,jkzk1+yi,jXi,j,βz42νs|1/2subscript𝑦𝑖𝑗superscriptsubscript𝑋𝑖𝑗𝑘subscript𝑧𝑘1subscript𝑦𝑖𝑗subscript𝑋𝑖𝑗subscript𝛽𝑧42𝜈superscript𝑠12|\frac{y_{i,j}X_{i,j}^{k}z_{k}}{1+y_{i,j}\left\langle X_{i,j},\beta_{z}\right% \rangle}\cdot\frac{4\sqrt{2}\nu}{\sqrt{s^{*}}}|\leq 1/2| divide start_ARG italic_y start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG 1 + italic_y start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ⟨ italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ⟩ end_ARG ⋅ divide start_ARG 4 square-root start_ARG 2 end_ARG italic_ν end_ARG start_ARG square-root start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG end_ARG | ≤ 1 / 2. We compute the log\logroman_log transformation of the above quantity which is j=1mlog(1yi,jXi,jkzk1+yi,jXi,j,βz42νs)superscriptsubscript𝑗1𝑚1subscript𝑦𝑖𝑗superscriptsubscript𝑋𝑖𝑗𝑘subscript𝑧𝑘1subscript𝑦𝑖𝑗subscript𝑋𝑖𝑗subscript𝛽𝑧42𝜈superscript𝑠\sum_{j=1}^{m}\log\left(1-\frac{y_{i,j}X_{i,j}^{k}z_{k}}{1+y_{i,j}\left\langle X% _{i,j},\beta_{z}\right\rangle}\cdot\frac{4\sqrt{2}\nu}{\sqrt{s^{*}}}\right)∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT roman_log ( 1 - divide start_ARG italic_y start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG 1 + italic_y start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ⟨ italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ⟩ end_ARG ⋅ divide start_ARG 4 square-root start_ARG 2 end_ARG italic_ν end_ARG start_ARG square-root start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG end_ARG ). For each j𝑗jitalic_j, we bound the expectation by Jensen’s inequality

𝔼[log(1yi,jXi,jkzk1+yi,jXi,j,βz42νs)]log(1𝔼[yi,jXi,jkzk1+yi,jXi,j,βz]42νs).𝔼delimited-[]1subscript𝑦𝑖𝑗superscriptsubscript𝑋𝑖𝑗𝑘subscript𝑧𝑘1subscript𝑦𝑖𝑗subscript𝑋𝑖𝑗subscript𝛽𝑧42𝜈superscript𝑠1𝔼delimited-[]subscript𝑦𝑖𝑗superscriptsubscript𝑋𝑖𝑗𝑘subscript𝑧𝑘1subscript𝑦𝑖𝑗subscript𝑋𝑖𝑗subscript𝛽𝑧42𝜈superscript𝑠\displaystyle\mathbb{E}\left[\log\left(1-\frac{y_{i,j}X_{i,j}^{k}z_{k}}{1+y_{i% ,j}\left\langle X_{i,j},\beta_{z}\right\rangle}\cdot\frac{4\sqrt{2}\nu}{\sqrt{% s^{*}}}\right)\right]\leq\log\left(1-\mathbb{E}\left[\frac{y_{i,j}X_{i,j}^{k}z% _{k}}{1+y_{i,j}\left\langle X_{i,j},\beta_{z}\right\rangle}\right]\cdot\frac{4% \sqrt{2}\nu}{\sqrt{s^{*}}}\right).blackboard_E [ roman_log ( 1 - divide start_ARG italic_y start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG 1 + italic_y start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ⟨ italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ⟩ end_ARG ⋅ divide start_ARG 4 square-root start_ARG 2 end_ARG italic_ν end_ARG start_ARG square-root start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG end_ARG ) ] ≤ roman_log ( 1 - blackboard_E [ divide start_ARG italic_y start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG 1 + italic_y start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ⟨ italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ⟩ end_ARG ] ⋅ divide start_ARG 4 square-root start_ARG 2 end_ARG italic_ν end_ARG start_ARG square-root start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG end_ARG ) . (9)

For each k𝑘kitalic_k, we have

|𝔼[yi,jXi,jkzk1+yi,jXi,j,βz]||12+82sν1282sν|82sν.𝔼delimited-[]subscript𝑦𝑖𝑗superscriptsubscript𝑋𝑖𝑗𝑘subscript𝑧𝑘1subscript𝑦𝑖𝑗subscript𝑋𝑖𝑗subscript𝛽𝑧1282superscript𝑠𝜈1282superscript𝑠𝜈82superscript𝑠𝜈\displaystyle\left|\mathbb{E}\left[\frac{y_{i,j}X_{i,j}^{k}z_{k}}{1+y_{i,j}% \left\langle X_{i,j},\beta_{z}\right\rangle}\right]\right|\leq\left|\frac{1}{2% +8\sqrt{2s^{*}}\nu}-\frac{1}{2-8\sqrt{2s^{*}}\nu}\right|\leq 8\sqrt{2s^{*}}\nu.| blackboard_E [ divide start_ARG italic_y start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG 1 + italic_y start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ⟨ italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ⟩ end_ARG ] | ≤ | divide start_ARG 1 end_ARG start_ARG 2 + 8 square-root start_ARG 2 italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG italic_ν end_ARG - divide start_ARG 1 end_ARG start_ARG 2 - 8 square-root start_ARG 2 italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG italic_ν end_ARG | ≤ 8 square-root start_ARG 2 italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG italic_ν . (10)

Bringing (10) into (9) leads to

𝔼[log(1yi,jXi,jkzk1+yi,jXi,j,βz42νs)]log(1+64ν2)𝔼delimited-[]1subscript𝑦𝑖𝑗superscriptsubscript𝑋𝑖𝑗𝑘subscript𝑧𝑘1subscript𝑦𝑖𝑗subscript𝑋𝑖𝑗subscript𝛽𝑧42𝜈superscript𝑠164superscript𝜈2\displaystyle\mathbb{E}\left[\log\left(1-\frac{y_{i,j}X_{i,j}^{k}z_{k}}{1+y_{i% ,j}\left\langle X_{i,j},\beta_{z}\right\rangle}\cdot\frac{4\sqrt{2}\nu}{\sqrt{% s^{*}}}\right)\right]\leq\log\left(1+64\nu^{2}\right)blackboard_E [ roman_log ( 1 - divide start_ARG italic_y start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG 1 + italic_y start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ⟨ italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ⟩ end_ARG ⋅ divide start_ARG 4 square-root start_ARG 2 end_ARG italic_ν end_ARG start_ARG square-root start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG end_ARG ) ] ≤ roman_log ( 1 + 64 italic_ν start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT )

As a result, the expectation of the log transformation has

𝔼[j=1mlog(1yi,jXi,jkzk1+yi,jXi,j,βz42νs)]mlog(1+64ν2)𝔼delimited-[]superscriptsubscript𝑗1𝑚1subscript𝑦𝑖𝑗superscriptsubscript𝑋𝑖𝑗𝑘subscript𝑧𝑘1subscript𝑦𝑖𝑗subscript𝑋𝑖𝑗subscript𝛽𝑧42𝜈superscript𝑠𝑚164superscript𝜈2\displaystyle\mathbb{E}\left[\sum_{j=1}^{m}\log\left(1-\frac{y_{i,j}X_{i,j}^{k% }z_{k}}{1+y_{i,j}\left\langle X_{i,j},\beta_{z}\right\rangle}\cdot\frac{4\sqrt% {2}\nu}{{s^{*}}}\right)\right]\leq m\cdot\log\left(1+64\nu^{2}\right)blackboard_E [ ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT roman_log ( 1 - divide start_ARG italic_y start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG 1 + italic_y start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ⟨ italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ⟩ end_ARG ⋅ divide start_ARG 4 square-root start_ARG 2 end_ARG italic_ν end_ARG start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG ) ] ≤ italic_m ⋅ roman_log ( 1 + 64 italic_ν start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) (11)

Moreover, since 15|x|log(1+x)1+5|x|15𝑥1𝑥15𝑥1-5|x|\leq\log(1+x)\leq 1+5|x|1 - 5 | italic_x | ≤ roman_log ( 1 + italic_x ) ≤ 1 + 5 | italic_x | for |x|1/2𝑥12|x|\leq 1/2| italic_x | ≤ 1 / 2, we have

1102νslog(1yi,jXi,jkzk1+yi,jXi,j,βz42νs)1+102νs.1102𝜈superscript𝑠1subscript𝑦𝑖𝑗superscriptsubscript𝑋𝑖𝑗𝑘subscript𝑧𝑘1subscript𝑦𝑖𝑗subscript𝑋𝑖𝑗subscript𝛽𝑧42𝜈superscript𝑠1102𝜈superscript𝑠\displaystyle 1-\frac{10\sqrt{2}\nu}{\sqrt{s^{*}}}\leq\log\left(1-\frac{y_{i,j% }X_{i,j}^{k}z_{k}}{1+y_{i,j}\left\langle X_{i,j},\beta_{z}\right\rangle}\cdot% \frac{4\sqrt{2}\nu}{\sqrt{s^{*}}}\right)\leq 1+\frac{10\sqrt{2}\nu}{\sqrt{s^{*% }}}.1 - divide start_ARG 10 square-root start_ARG 2 end_ARG italic_ν end_ARG start_ARG square-root start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG end_ARG ≤ roman_log ( 1 - divide start_ARG italic_y start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG 1 + italic_y start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ⟨ italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ⟩ end_ARG ⋅ divide start_ARG 4 square-root start_ARG 2 end_ARG italic_ν end_ARG start_ARG square-root start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG end_ARG ) ≤ 1 + divide start_ARG 10 square-root start_ARG 2 end_ARG italic_ν end_ARG start_ARG square-root start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG end_ARG .

Recall that |{z{1,1}d|j𝟏{zj=1}s}|dsconditional-set𝑧superscript11𝑑subscript𝑗1superscript𝑧𝑗1superscript𝑠superscript𝑑superscript𝑠|\{z\in\{-1,1\}^{d}|\sum_{j}\boldsymbol{1}\{z^{j}=1\}\leq s^{*}\}|\leq d^{s^{*}}| { italic_z ∈ { - 1 , 1 } start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT | ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT bold_1 { italic_z start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT = 1 } ≤ italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT } | ≤ italic_d start_POSTSUPERSCRIPT italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT. Thus, applying Hoeffding’s inequality with union bound yields

|j=1mlog(1yi,jXi,jkzk1+yi,jXi,j,βz42νs)𝔼[j=1mlog(1yi,jXi,jkzk1+yi,jXi,j,βz42νs)]|superscriptsubscript𝑗1𝑚1subscript𝑦𝑖𝑗superscriptsubscript𝑋𝑖𝑗𝑘subscript𝑧𝑘1subscript𝑦𝑖𝑗subscript𝑋𝑖𝑗subscript𝛽𝑧42𝜈superscript𝑠𝔼delimited-[]superscriptsubscript𝑗1𝑚1subscript𝑦𝑖𝑗superscriptsubscript𝑋𝑖𝑗𝑘subscript𝑧𝑘1subscript𝑦𝑖𝑗subscript𝑋𝑖𝑗subscript𝛽𝑧42𝜈superscript𝑠\displaystyle\left|\sum_{j=1}^{m}\log\left(1-\frac{y_{i,j}X_{i,j}^{k}z_{k}}{1+% y_{i,j}\left\langle X_{i,j},\beta_{z}\right\rangle}\cdot\frac{4\sqrt{2}\nu}{% \sqrt{s^{*}}}\right)-\mathbb{E}\left[\sum_{j=1}^{m}\log\left(1-\frac{y_{i,j}X_% {i,j}^{k}z_{k}}{1+y_{i,j}\left\langle X_{i,j},\beta_{z}\right\rangle}\cdot% \frac{4\sqrt{2}\nu}{\sqrt{s^{*}}}\right)\right]\right|| ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT roman_log ( 1 - divide start_ARG italic_y start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG 1 + italic_y start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ⟨ italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ⟩ end_ARG ⋅ divide start_ARG 4 square-root start_ARG 2 end_ARG italic_ν end_ARG start_ARG square-root start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG end_ARG ) - blackboard_E [ ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT roman_log ( 1 - divide start_ARG italic_y start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG 1 + italic_y start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ⟨ italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ⟩ end_ARG ⋅ divide start_ARG 4 square-root start_ARG 2 end_ARG italic_ν end_ARG start_ARG square-root start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG end_ARG ) ] |
\displaystyle\leq 20νm(logn+slogd)s20νmds20𝜈𝑚𝑛superscript𝑠𝑑superscript𝑠20𝜈𝑚𝑑superscript𝑠\displaystyle\frac{20\nu\sqrt{m(\log n+s^{*}\log d)}}{\sqrt{s^{*}}}\leq\frac{2% 0\nu\sqrt{md}}{{s^{*}}}divide start_ARG 20 italic_ν square-root start_ARG italic_m ( roman_log italic_n + italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT roman_log italic_d ) end_ARG end_ARG start_ARG square-root start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG end_ARG ≤ divide start_ARG 20 italic_ν square-root start_ARG italic_m italic_d end_ARG end_ARG start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG (12)

for all 1in1𝑖𝑛1\leq i\leq n1 ≤ italic_i ≤ italic_n and z{1,1}d𝑧superscript11𝑑z\in\{-1,1\}^{d}italic_z ∈ { - 1 , 1 } start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT with j𝟏{zj=1}ssubscript𝑗1superscript𝑧𝑗1superscript𝑠\sum_{j}\boldsymbol{1}\{z^{j}=1\}\leq s^{*}∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT bold_1 { italic_z start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT = 1 } ≤ italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT with probability at least 12/n212superscript𝑛21-2/n^{2}1 - 2 / italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. As a result, plugging (11) and (12) into (8) yields

dPzkdPz=𝑑subscriptPsuperscript𝑧direct-sum𝑘𝑑subscriptP𝑧absent\displaystyle\frac{d\mathrm{P}_{z^{\oplus k}}}{d\mathrm{P}_{z}}=divide start_ARG italic_d roman_P start_POSTSUBSCRIPT italic_z start_POSTSUPERSCRIPT ⊕ italic_k end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG start_ARG italic_d roman_P start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT end_ARG = exp(j=1mlog(1yi,jXi,jkzk1+yi,jXi,j,βz42νs))superscriptsubscript𝑗1𝑚1subscript𝑦𝑖𝑗superscriptsubscript𝑋𝑖𝑗𝑘subscript𝑧𝑘1subscript𝑦𝑖𝑗subscript𝑋𝑖𝑗subscript𝛽𝑧42𝜈superscript𝑠\displaystyle\exp\left(\sum_{j=1}^{m}\log\left(1-\frac{y_{i,j}X_{i,j}^{k}z_{k}% }{1+y_{i,j}\left\langle X_{i,j},\beta_{z}\right\rangle}\cdot\frac{4\sqrt{2}\nu% }{\sqrt{s^{*}}}\right)\right)roman_exp ( ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT roman_log ( 1 - divide start_ARG italic_y start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG 1 + italic_y start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ⟨ italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ⟩ end_ARG ⋅ divide start_ARG 4 square-root start_ARG 2 end_ARG italic_ν end_ARG start_ARG square-root start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG end_ARG ) )
\displaystyle\leq exp(mlog(1+64ν2)+20νmds).𝑚164superscript𝜈220𝜈𝑚𝑑superscript𝑠\displaystyle\exp\left(m\cdot\log\left(1+64\nu^{2}\right)+\frac{20\nu\sqrt{md}% }{{s^{*}}}\right).roman_exp ( italic_m ⋅ roman_log ( 1 + 64 italic_ν start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + divide start_ARG 20 italic_ν square-root start_ARG italic_m italic_d end_ARG end_ARG start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG ) .

Since nε2s2𝑛superscript𝜀2superscript𝑠absent2n\varepsilon^{2}\geq s^{*2}italic_n italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ italic_s start_POSTSUPERSCRIPT ∗ 2 end_POSTSUPERSCRIPT, for sufficiently small ν𝜈\nuitalic_ν, one can justify condition A.1 and A.2 for the defined PzsubscriptP𝑧\mathrm{P}_{z}roman_P start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT, with α2ν2mds2asymptotically-equalssuperscript𝛼2superscript𝜈2𝑚𝑑superscript𝑠absent2\alpha^{2}\asymp\frac{\nu^{2}{md}}{{s^{*2}}}italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≍ divide start_ARG italic_ν start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_m italic_d end_ARG start_ARG italic_s start_POSTSUPERSCRIPT ∗ 2 end_POSTSUPERSCRIPT end_ARG. Applying Corollary 1 in Acharya et al. (2020) leads to

(1di=1ddTV(P+iSn,PiSn))2nmν2ε2s2.less-than-or-similar-tosuperscript1𝑑superscriptsubscript𝑖1𝑑subscriptdTVsuperscriptsubscriptP𝑖superscript𝑆𝑛superscriptsubscriptP𝑖superscript𝑆𝑛2𝑛𝑚superscript𝜈2superscript𝜀2superscript𝑠absent2\displaystyle\left(\frac{1}{d}\sum_{i=1}^{d}\mathrm{~{}d}_{\mathrm{TV}}\left(% \mathrm{P}_{+i}^{S^{n}},\mathrm{P}_{-i}^{S^{n}}\right)\right)^{2}\lesssim\frac% {nm\nu^{2}\varepsilon^{2}}{s^{*2}}.( divide start_ARG 1 end_ARG start_ARG italic_d end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT roman_d start_POSTSUBSCRIPT roman_TV end_POSTSUBSCRIPT ( roman_P start_POSTSUBSCRIPT + italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT , roman_P start_POSTSUBSCRIPT - italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≲ divide start_ARG italic_n italic_m italic_ν start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_s start_POSTSUPERSCRIPT ∗ 2 end_POSTSUPERSCRIPT end_ARG . (13)

Note that this result, as well as Lemma 3 of Acharya et al. (2020) in the following, are developed for Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT being a single sample. They are extendable to Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT being multiple samples since we can apply the original conclusion to the m(d+1)𝑚𝑑1m(d+1)italic_m ( italic_d + 1 ) dimensional vector, formulated by stacking the Xi,jsubscript𝑋𝑖𝑗X_{i,j}italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPTs. Next we focus on lower bound of the total variation distance. Since

βzβz2=32ν2si=1d𝟏{ZiZ^i}=4ν(dHam(z,z^)τd)1/2,subscriptnormsubscript𝛽𝑧subscript𝛽superscript𝑧232superscript𝜈2superscript𝑠superscriptsubscript𝑖1𝑑1subscript𝑍𝑖subscript^𝑍𝑖4𝜈superscriptsubscript𝑑Ham𝑧^𝑧𝜏𝑑12\displaystyle\left\|\beta_{z}-\beta_{z^{\prime}}\right\|_{2}=\sqrt{\frac{32\nu% ^{2}}{s^{*}}\sum_{i=1}^{d}\boldsymbol{1}\left\{Z_{i}\neq\hat{Z}_{i}\right\}}=4% \nu\left(\frac{d_{\operatorname{Ham}(z,\hat{z})}}{\tau d}\right)^{1/2},∥ italic_β start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT - italic_β start_POSTSUBSCRIPT italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = square-root start_ARG divide start_ARG 32 italic_ν start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT bold_1 { italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≠ over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } end_ARG = 4 italic_ν ( divide start_ARG italic_d start_POSTSUBSCRIPT roman_Ham ( italic_z , over^ start_ARG italic_z end_ARG ) end_POSTSUBSCRIPT end_ARG start_ARG italic_τ italic_d end_ARG ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ,

i.e. Condition A.3 holds, applying Lemma 3 of Acharya et al. (2020) leads to

1di=1ddTV(P+iSn,PiSn)14.1𝑑superscriptsubscript𝑖1𝑑subscriptdTVsuperscriptsubscriptP𝑖superscript𝑆𝑛superscriptsubscriptP𝑖superscript𝑆𝑛14\displaystyle\frac{1}{d}\sum_{i=1}^{d}\mathrm{~{}d}_{\mathrm{TV}}\left(\mathrm% {P}_{+i}^{S^{n}},\mathrm{P}_{-i}^{S^{n}}\right)\geq\frac{1}{4}.divide start_ARG 1 end_ARG start_ARG italic_d end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT roman_d start_POSTSUBSCRIPT roman_TV end_POSTSUBSCRIPT ( roman_P start_POSTSUBSCRIPT + italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT , roman_P start_POSTSUBSCRIPT - italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) ≥ divide start_ARG 1 end_ARG start_ARG 4 end_ARG . (14)

Combining (13) and (14) leads to the desired conclusion.

Proof of Proposition 2.3.

We follow the same construction as in the proof of Theorem 2.4 while adopting a different strategy to bound dPzkdPz𝑑subscriptPsuperscript𝑧direct-sum𝑘𝑑subscriptP𝑧\frac{d\mathrm{P}_{z^{\oplus k}}}{d\mathrm{P}_{z}}divide start_ARG italic_d roman_P start_POSTSUBSCRIPT italic_z start_POSTSUPERSCRIPT ⊕ italic_k end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG start_ARG italic_d roman_P start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT end_ARG. Namely, we let

dPzkdPz=j=1m1yi,jXi,jkzk1+yi,jXi,j,βz42νs(1+82νs)m.𝑑subscriptPsuperscript𝑧direct-sum𝑘𝑑subscriptP𝑧superscriptsubscriptproduct𝑗1𝑚1subscript𝑦𝑖𝑗superscriptsubscript𝑋𝑖𝑗𝑘subscript𝑧𝑘1subscript𝑦𝑖𝑗subscript𝑋𝑖𝑗subscript𝛽𝑧42𝜈superscript𝑠superscript182𝜈superscript𝑠𝑚\displaystyle\frac{d\mathrm{P}_{z^{\oplus k}}}{d\mathrm{P}_{z}}=\prod_{j=1}^{m% }1-\frac{y_{i,j}X_{i,j}^{k}z_{k}}{1+y_{i,j}\left\langle X_{i,j},\beta_{z}% \right\rangle}\cdot\frac{4\sqrt{2}\nu}{\sqrt{s^{*}}}\leq\left(1+\frac{8\sqrt{2% }\nu}{\sqrt{s^{*}}}\right)^{m}.divide start_ARG italic_d roman_P start_POSTSUBSCRIPT italic_z start_POSTSUPERSCRIPT ⊕ italic_k end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG start_ARG italic_d roman_P start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT end_ARG = ∏ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT 1 - divide start_ARG italic_y start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG 1 + italic_y start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ⟨ italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ⟩ end_ARG ⋅ divide start_ARG 4 square-root start_ARG 2 end_ARG italic_ν end_ARG start_ARG square-root start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG end_ARG ≤ ( 1 + divide start_ARG 8 square-root start_ARG 2 end_ARG italic_ν end_ARG start_ARG square-root start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG end_ARG ) start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT . (15)

Then one can justify condition A.1 and A.2 for the defined PzsubscriptP𝑧\mathrm{P}_{z}roman_P start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT, with

α2((1+82νs)m1)2.asymptotically-equalssuperscript𝛼2superscriptsuperscript182𝜈superscript𝑠𝑚12\displaystyle\alpha^{2}\asymp\left(\left(1+\frac{8\sqrt{2}\nu}{\sqrt{s^{*}}}% \right)^{m}-1\right)^{2}.italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≍ ( ( 1 + divide start_ARG 8 square-root start_ARG 2 end_ARG italic_ν end_ARG start_ARG square-root start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG end_ARG ) start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Applying Corollary 1 in Acharya et al. (2020) leads to

(1di=1ddTV(P+iSn,PiSn))2nε2d((1+82νs)m1)2.less-than-or-similar-tosuperscript1𝑑superscriptsubscript𝑖1𝑑subscriptdTVsuperscriptsubscriptP𝑖superscript𝑆𝑛superscriptsubscriptP𝑖superscript𝑆𝑛2𝑛superscript𝜀2𝑑superscriptsuperscript182𝜈superscript𝑠𝑚12\displaystyle\left(\frac{1}{d}\sum_{i=1}^{d}\mathrm{~{}d}_{\mathrm{TV}}\left(% \mathrm{P}_{+i}^{S^{n}},\mathrm{P}_{-i}^{S^{n}}\right)\right)^{2}\lesssim\frac% {n\varepsilon^{2}}{d}\cdot\left(\left(1+\frac{8\sqrt{2}\nu}{\sqrt{s^{*}}}% \right)^{m}-1\right)^{2}.( divide start_ARG 1 end_ARG start_ARG italic_d end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT roman_d start_POSTSUBSCRIPT roman_TV end_POSTSUBSCRIPT ( roman_P start_POSTSUBSCRIPT + italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT , roman_P start_POSTSUBSCRIPT - italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≲ divide start_ARG italic_n italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_d end_ARG ⋅ ( ( 1 + divide start_ARG 8 square-root start_ARG 2 end_ARG italic_ν end_ARG start_ARG square-root start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG end_ARG ) start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (16)

There holds similarly

1di=1ddTV(P+iSn,PiSn)14.1𝑑superscriptsubscript𝑖1𝑑subscriptdTVsuperscriptsubscriptP𝑖superscript𝑆𝑛superscriptsubscriptP𝑖superscript𝑆𝑛14\displaystyle\frac{1}{d}\sum_{i=1}^{d}\mathrm{~{}d}_{\mathrm{TV}}\left(\mathrm% {P}_{+i}^{S^{n}},\mathrm{P}_{-i}^{S^{n}}\right)\geq\frac{1}{4}.divide start_ARG 1 end_ARG start_ARG italic_d end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT roman_d start_POSTSUBSCRIPT roman_TV end_POSTSUBSCRIPT ( roman_P start_POSTSUBSCRIPT + italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT , roman_P start_POSTSUBSCRIPT - italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) ≥ divide start_ARG 1 end_ARG start_ARG 4 end_ARG . (17)

Combining (16) and (17) leads to

exp(νms)(1+82νs)m1+dnε2.asymptotically-equals𝜈𝑚superscript𝑠superscript182𝜈superscript𝑠𝑚greater-than-or-equivalent-to1𝑑𝑛superscript𝜀2\displaystyle\exp\left(\frac{\nu m}{\sqrt{s^{*}}}\right)\asymp\left(1+\frac{8% \sqrt{2}\nu}{\sqrt{s^{*}}}\right)^{m}\gtrsim 1+\sqrt{\frac{d}{n\varepsilon^{2}% }}.roman_exp ( divide start_ARG italic_ν italic_m end_ARG start_ARG square-root start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG end_ARG ) ≍ ( 1 + divide start_ARG 8 square-root start_ARG 2 end_ARG italic_ν end_ARG start_ARG square-root start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG end_ARG ) start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ≳ 1 + square-root start_ARG divide start_ARG italic_d end_ARG start_ARG italic_n italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG .

which yields

ν2sm2log2(1+dnε2).greater-than-or-equivalent-tosuperscript𝜈2superscript𝑠superscript𝑚2superscript21𝑑𝑛superscript𝜀2\displaystyle\nu^{2}\gtrsim\frac{{s^{*}}}{m^{2}}\log^{2}\left(1+\sqrt{\frac{d}% {n\varepsilon^{2}}}\right).italic_ν start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≳ divide start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG start_ARG italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG roman_log start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 + square-root start_ARG divide start_ARG italic_d end_ARG start_ARG italic_n italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG ) .

Note that if nε2dless-than-or-similar-to𝑛superscript𝜀2𝑑n\varepsilon^{2}\lesssim\sqrt{d}italic_n italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≲ square-root start_ARG italic_d end_ARG and mlogd𝑚𝑑m\leq\log ditalic_m ≤ roman_log italic_d, there holds

ν2sm2log2(1+dnε2)sm2log2(1+d1/4)log2dslog2d=1sgreater-than-or-equivalent-tosuperscript𝜈2superscript𝑠superscript𝑚2superscript21𝑑𝑛superscript𝜀2greater-than-or-equivalent-tosuperscript𝑠superscript𝑚2superscript21superscript𝑑14greater-than-or-equivalent-tosuperscript2𝑑superscript𝑠superscript2𝑑1superscript𝑠\displaystyle\nu^{2}\gtrsim\frac{{s^{*}}}{m^{2}}\log^{2}\left(1+\sqrt{\frac{d}% {n\varepsilon^{2}}}\right)\gtrsim\frac{{s^{*}}}{m^{2}}\log^{2}\left(1+d^{1/4}% \right)\gtrsim\frac{\log^{2}d}{s^{*}\log^{2}d}=\frac{1}{s^{*}}italic_ν start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≳ divide start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG start_ARG italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG roman_log start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 + square-root start_ARG divide start_ARG italic_d end_ARG start_ARG italic_n italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG ) ≳ divide start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG start_ARG italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG roman_log start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 + italic_d start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT ) ≳ divide start_ARG roman_log start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d end_ARG start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d end_ARG = divide start_ARG 1 end_ARG start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG

which yields the desired result. Note that in this case, the constructed function class has beta-min condition with a=ν/s1𝑎𝜈superscript𝑠greater-than-or-equivalent-to1a={\nu/\sqrt{s^{*}}}\gtrsim 1italic_a = italic_ν / square-root start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG ≳ 1 which is a constant in a [0,1]01[0,1][ 0 , 1 ].

Appendix B Candidate Variable Selection

B.1 Good Selectors

B.1.1 Plug-in High Dimensional Variable Selection

In the following, we provide some example selectors and demonstrate that, under mild assumptions, they serve as components of a good selector. We introduce commonly used variable selection approaches along with their associated theoretical results. Our goal is twofold. Firstly, we want the true variables to be selected. Conversely, the redundant variables that are selected should be as few as possible. We derive this from the perfect selection property (also known as strong oracle or consistent selection), which asserts that our goal is achieved with a high probability. The primary conditions we impose on the potential distributions fall into two categories:

  • Beta-min conditions, which necessitate that minβj>0|βj|subscriptsuperscript𝛽absent𝑗0superscript𝛽absent𝑗\min_{\beta^{*j}>0}|\beta^{*j}|roman_min start_POSTSUBSCRIPT italic_β start_POSTSUPERSCRIPT ∗ italic_j end_POSTSUPERSCRIPT > 0 end_POSTSUBSCRIPT | italic_β start_POSTSUPERSCRIPT ∗ italic_j end_POSTSUPERSCRIPT | is greater than a specified threshold. With this condition, the signal strength from the regression functions is robust enough for the selector to identify the variables..

  • Mild correlation conditions, which require that the correlation between the true and redundant variables is weak enough for the selectors to distinguish.

In this section, we omit the user index i𝑖iitalic_i and write (X,y)𝑋𝑦(X,y)( italic_X , italic_y ) representing the data of some user (Xi,yi)subscript𝑋𝑖subscript𝑦𝑖(X_{i},y_{i})( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), since the results in the section consider one local dataset at a time.

Example B.1 (Lasso (Tibshirani, 1996)).

Lasso, or Least Absolute Shrinkage and Selection Operator, is a regularization technique in statistical learning that adds a penalty term to the linear regression objective function, effectively promoting sparsity by encouraging some of the model coefficients to be exactly zero. Specifically, Lasso solves the regularized optimization object

minβd{1nyXβ22+λβ1}.subscript𝛽superscript𝑑1𝑛superscriptsubscriptnorm𝑦𝑋𝛽22𝜆subscriptnorm𝛽1\displaystyle\min_{\beta\in\mathbb{R}^{d}}\left\{\frac{1}{n}\|y-X\beta\|_{2}^{% 2}+\lambda\|\beta\|_{1}\right\}.roman_min start_POSTSUBSCRIPT italic_β ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT { divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∥ italic_y - italic_X italic_β ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_λ ∥ italic_β ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT } . (18)

Used for variable selection, Lasso identifies the non-zero elements of the optimization solution as the selected variable.

To study the selection consistency of Lasso, Zhao & Yu (2006) proposed a general condition called the Irrepresentable condition. Specifically, for Σ^=XX/n^Σsuperscript𝑋top𝑋𝑛\widehat{\Sigma}={X}^{\top}{X}/nover^ start_ARG roman_Σ end_ARG = italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_X / italic_n, let the block matrix

Σ^=(Σ^11Σ^12Σ^21Σ^22).^Σsubscript^Σ11subscript^Σ12subscript^Σ21subscript^Σ22\displaystyle\widehat{\Sigma}=\left(\begin{array}[]{ll}\widehat{\Sigma}_{11}&% \widehat{\Sigma}_{12}\\ \widehat{\Sigma}_{21}&\widehat{\Sigma}_{22}\end{array}\right).over^ start_ARG roman_Σ end_ARG = ( start_ARRAY start_ROW start_CELL over^ start_ARG roman_Σ end_ARG start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_CELL start_CELL over^ start_ARG roman_Σ end_ARG start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL over^ start_ARG roman_Σ end_ARG start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT end_CELL start_CELL over^ start_ARG roman_Σ end_ARG start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT end_CELL end_ROW end_ARRAY ) .

Here Σ^11subscript^Σ11\widehat{\Sigma}_{11}over^ start_ARG roman_Σ end_ARG start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT is a s×ssuperscript𝑠superscript𝑠s^{*}\times s^{*}italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT × italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT matrix, corresponding to the covariance matrix of the true variables. Irrepresentable Condition states that there exists a positive constant vector η𝜂\etaitalic_η

|Σ^21(Σ^11)1sign(β1:s)|𝟏η,subscript^Σ21superscriptsubscript^Σ111signsuperscript𝛽absent1:absentsuperscript𝑠1𝜂\displaystyle\left|\widehat{\Sigma}_{21}\left(\widehat{\Sigma}_{11}\right)^{-1% }\operatorname{sign}\left(\beta^{*1:s^{*}}\right)\right|\leq\mathbf{1}-\eta,| over^ start_ARG roman_Σ end_ARG start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT ( over^ start_ARG roman_Σ end_ARG start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_sign ( italic_β start_POSTSUPERSCRIPT ∗ 1 : italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) | ≤ bold_1 - italic_η , (19)

where 𝟏η1𝜂\mathbf{1}-\etabold_1 - italic_η is a ds𝑑superscript𝑠d-s^{*}italic_d - italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT vector with 1η1𝜂1-\eta1 - italic_η elementwisely. The following result holds for irrepresentable condition.

Lemma B.2.

Under our assumptions, when using (18) as selector, let βLASSOsubscript𝛽𝐿𝐴𝑆𝑆𝑂\beta_{LASSO}italic_β start_POSTSUBSCRIPT italic_L italic_A italic_S italic_S italic_O end_POSTSUBSCRIPT be the solution. Suppose (19) holds. Suppose the following conditions hold: (i) ms2logdgreater-than-or-equivalent-to𝑚superscript𝑠absent2𝑑m\gtrsim s^{*2}\log ditalic_m ≳ italic_s start_POSTSUPERSCRIPT ∗ 2 end_POSTSUPERSCRIPT roman_log italic_d. (ii) minβj>0|βj|1/mgreater-than-or-equivalent-tosubscriptsuperscript𝛽absent𝑗0superscript𝛽absent𝑗1𝑚\min_{\beta^{*j}>0}|\beta^{*j}|\gtrsim\sqrt{{1}/{m}}roman_min start_POSTSUBSCRIPT italic_β start_POSTSUPERSCRIPT ∗ italic_j end_POSTSUPERSCRIPT > 0 end_POSTSUBSCRIPT | italic_β start_POSTSUPERSCRIPT ∗ italic_j end_POSTSUPERSCRIPT | ≳ square-root start_ARG 1 / italic_m end_ARG. Then there exists a constant Cp<1subscript𝐶𝑝1C_{p}<1italic_C start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT < 1 such that, for sufficiently large m𝑚mitalic_m, with probability Cpsubscript𝐶𝑝C_{p}italic_C start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT, there holds

βLASSOj0 for j=1,,s and βLASSOj=0 for j=s+1,,d.formulae-sequencesuperscriptsubscript𝛽𝐿𝐴𝑆𝑆𝑂𝑗0 for 𝑗1superscript𝑠 and superscriptsubscript𝛽𝐿𝐴𝑆𝑆𝑂𝑗0 for 𝑗superscript𝑠1𝑑\displaystyle\beta_{LASSO}^{j}\neq 0\;\;\text{ for }\;\;j=1,\cdots,s^{*}\quad% \text{ and }\;\beta_{LASSO}^{j}=0\text{ for }j=s^{*}+1,\cdots,d.italic_β start_POSTSUBSCRIPT italic_L italic_A italic_S italic_S italic_O end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ≠ 0 for italic_j = 1 , ⋯ , italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and italic_β start_POSTSUBSCRIPT italic_L italic_A italic_S italic_S italic_O end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT = 0 for italic_j = italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + 1 , ⋯ , italic_d .

Moreover, for Σ=𝔼XXΣ𝔼𝑋superscript𝑋top\Sigma=\mathbb{E}XX^{\top}roman_Σ = blackboard_E italic_X italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT, if |Σij|3/ssubscriptΣ𝑖𝑗3superscript𝑠|\Sigma_{ij}|\leq 3/s^{*}| roman_Σ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT | ≤ 3 / italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT for ij𝑖𝑗i\neq jitalic_i ≠ italic_j, then we have the Irrepresentable Condition.

Proof of Lemma B.2.

Since we assume sub-Gaussian noises, any k𝑘kitalic_k-th moment of the random noise exists, i.e. k𝑘kitalic_k can be arbitrarily large. As a result, any λmgreater-than-or-equivalent-to𝜆𝑚\lambda\gtrsim\sqrt{m}italic_λ ≳ square-root start_ARG italic_m end_ARG implies (λ/m)2k/dsuperscript𝜆𝑚2𝑘𝑑(\lambda/\sqrt{m})^{2}k/d\to\infty( italic_λ / square-root start_ARG italic_m end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_k / italic_d → ∞ for some k𝑘kitalic_k. By Theorem 3 in Zhao & Yu (2006), for sufficiently large m𝑚mitalic_m, the probability of

sign(βLASSOj)=sign(βj) for j=1,,dformulae-sequencesignsuperscriptsubscript𝛽𝐿𝐴𝑆𝑆𝑂𝑗signsuperscript𝛽absent𝑗 for 𝑗1𝑑\displaystyle\mathrm{sign}(\beta_{LASSO}^{j})=\mathrm{sign}(\beta^{*j})\quad% \text{ for }j=1,\cdots,droman_sign ( italic_β start_POSTSUBSCRIPT italic_L italic_A italic_S italic_S italic_O end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) = roman_sign ( italic_β start_POSTSUPERSCRIPT ∗ italic_j end_POSTSUPERSCRIPT ) for italic_j = 1 , ⋯ , italic_d

is larger than some constant Cpsubscript𝐶𝑝C_{p}italic_C start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT, given that the conditions (5,6,7,8) are satisfied. Thus it suffices to verify the conditions. Condition (5) and (6) holds naturally due to our assumption of i.i.d. designs and boundedness of covariance matrix norm. (7) and (8) are in our assumptions. As for the last statement, Zhao & Yu (2006) provides several commonly seen sufficient conditions for the irrepresentable condition to hold, such as when |Σ^ij|1/(2s1)subscript^Σ𝑖𝑗12superscript𝑠1|\widehat{\Sigma}_{ij}|\leq 1/(2s^{*}-1)| over^ start_ARG roman_Σ end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT | ≤ 1 / ( 2 italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - 1 ). If |Σij|1/3ssubscriptΣ𝑖𝑗13superscript𝑠|\Sigma_{ij}|\leq 1/3s^{*}| roman_Σ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT | ≤ 1 / 3 italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, then |Σ^ij||Σij|+|ΣijΣ^ij|1/3s+c/m1/(2s1)subscript^Σ𝑖𝑗subscriptΣ𝑖𝑗subscriptΣ𝑖𝑗subscript^Σ𝑖𝑗13superscript𝑠𝑐𝑚12superscript𝑠1|\widehat{\Sigma}_{ij}|\leq|\Sigma_{ij}|+|\Sigma_{ij}-\widehat{\Sigma}_{ij}|% \leq 1/3s^{*}+c/\sqrt{m}\leq 1/(2s^{*}-1)| over^ start_ARG roman_Σ end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT | ≤ | roman_Σ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT | + | roman_Σ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT - over^ start_ARG roman_Σ end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT | ≤ 1 / 3 italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + italic_c / square-root start_ARG italic_m end_ARG ≤ 1 / ( 2 italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - 1 ) for some constant c𝑐citalic_c and sufficiently large msgreater-than-or-equivalent-to𝑚superscript𝑠m\gtrsim s^{*}italic_m ≳ italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. This bound holds for all users and all position i𝑖iitalic_i, j𝑗jitalic_j if we apply union bound, where we need logd/m1/s2less-than-or-similar-to𝑑𝑚1superscript𝑠absent2\log d/m\lesssim 1/s^{*2}roman_log italic_d / italic_m ≲ 1 / italic_s start_POSTSUPERSCRIPT ∗ 2 end_POSTSUPERSCRIPT, i.e. ms2logdgreater-than-or-equivalent-to𝑚superscript𝑠absent2𝑑m\gtrsim s^{*2}\log ditalic_m ≳ italic_s start_POSTSUPERSCRIPT ∗ 2 end_POSTSUPERSCRIPT roman_log italic_d. Note here we assumed dngreater-than-or-equivalent-to𝑑𝑛d\gtrsim nitalic_d ≳ italic_n. Thus the lemma is proved. ∎

Example B.3 (SCAD (Fan & Li, 2001)).

SCAD, or smoothly clipped absolute deviation, is a non-convex penalty function used in statistical learning and regression analysis. It is designed to address limitations of traditional L1 regularization methods like Lasso by providing a smooth and more robust penalty on regression coefficients, promoting sparsity while mitigating some of the biases associated with sharp discontinuities in penalty functions. Specifically, SCAD solves the regularized optimization object

minβd{1nyXβ22+λj=1dψλ(βj)} where ψλ(t)=λI{tλ}+(aλt)+a1I{t>λ} for some a>2.subscript𝛽superscript𝑑1𝑛superscriptsubscriptnorm𝑦𝑋𝛽22𝜆superscriptsubscript𝑗1𝑑subscript𝜓𝜆subscript𝛽𝑗 where superscriptsubscript𝜓𝜆𝑡𝜆subscript𝐼𝑡𝜆subscript𝑎𝜆𝑡𝑎1subscript𝐼𝑡𝜆 for some 𝑎2\displaystyle\min_{\beta\in\mathbb{R}^{d}}\left\{\frac{1}{n}\|y-X\beta\|_{2}^{% 2}+\lambda\sum_{j=1}^{d}\psi_{\lambda}\left(\beta_{j}\right)\right\}\text{ % where }\psi_{\lambda}^{\prime}(t)=\lambda I_{\{t\leq\lambda\}}+\frac{(a\lambda% -t)_{+}}{a-1}I_{\{t>\lambda\}}\;\;\text{ for some }a>2.roman_min start_POSTSUBSCRIPT italic_β ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT { divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∥ italic_y - italic_X italic_β ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_λ ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) } where italic_ψ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_t ) = italic_λ italic_I start_POSTSUBSCRIPT { italic_t ≤ italic_λ } end_POSTSUBSCRIPT + divide start_ARG ( italic_a italic_λ - italic_t ) start_POSTSUBSCRIPT + end_POSTSUBSCRIPT end_ARG start_ARG italic_a - 1 end_ARG italic_I start_POSTSUBSCRIPT { italic_t > italic_λ } end_POSTSUBSCRIPT for some italic_a > 2 . (20)

Used for variable selection, SCAD identifies the non-zero elements of the optimization solution as the selected variable.

The following lemma, which is a straightforward implication of Fan & Lv (2011), states that the essential condition for SCAD estimator to consistently select the variables is the Beta-min condition, given that the sample size is relatively large.

Lemma B.4.

Under our assumptions, when using (20) as selector, let βSCADsubscript𝛽𝑆𝐶𝐴𝐷\beta_{SCAD}italic_β start_POSTSUBSCRIPT italic_S italic_C italic_A italic_D end_POSTSUBSCRIPT be the solution. Suppose the following conditions hold: (i) The sparsity ssuperscript𝑠s^{*}italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is 𝒪(1)𝒪1\mathcal{O}(1)caligraphic_O ( 1 ). (ii) mslogmlogdgreater-than-or-equivalent-to𝑚superscript𝑠𝑚𝑑m\gtrsim s^{*}\vee\log m\log ditalic_m ≳ italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∨ roman_log italic_m roman_log italic_d. (iii) minβj>0|βj|s/mlogdlogm/mgreater-than-or-equivalent-tosubscriptsuperscript𝛽absent𝑗0superscript𝛽absent𝑗superscript𝑠𝑚𝑑𝑚𝑚\min_{\beta^{*j}>0}|\beta^{*j}|\gtrsim\sqrt{{s^{*}}/{m}}\vee\sqrt{{\log d\log m% }/{m}}roman_min start_POSTSUBSCRIPT italic_β start_POSTSUPERSCRIPT ∗ italic_j end_POSTSUPERSCRIPT > 0 end_POSTSUBSCRIPT | italic_β start_POSTSUPERSCRIPT ∗ italic_j end_POSTSUPERSCRIPT | ≳ square-root start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT / italic_m end_ARG ∨ square-root start_ARG roman_log italic_d roman_log italic_m / italic_m end_ARG. Then there exists a constant Cp<1subscript𝐶𝑝1C_{p}<1italic_C start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT < 1 and a suitable choice of λmsubscript𝜆𝑚\lambda_{m}italic_λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT such that, for sufficiently large m𝑚mitalic_m, with probability Cpsubscript𝐶𝑝C_{p}italic_C start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT, there holds

βSCADj0 for j=1,,s and βSCADj=0 for j=s+1,,d.formulae-sequencesuperscriptsubscript𝛽𝑆𝐶𝐴𝐷𝑗0 for 𝑗1superscript𝑠 and superscriptsubscript𝛽𝑆𝐶𝐴𝐷𝑗0 for 𝑗superscript𝑠1𝑑\displaystyle\beta_{SCAD}^{j}\neq 0\;\;\text{ for }\;\;j=1,\cdots,s^{*}\quad% \text{ and }\;\beta_{SCAD}^{j}=0\text{ for }j=s^{*}+1,\cdots,d.italic_β start_POSTSUBSCRIPT italic_S italic_C italic_A italic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ≠ 0 for italic_j = 1 , ⋯ , italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and italic_β start_POSTSUBSCRIPT italic_S italic_C italic_A italic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT = 0 for italic_j = italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + 1 , ⋯ , italic_d .
Proof of Lemma B.4.

By Theorem 3 in Fan & Lv (2011), for sufficiently large m𝑚mitalic_m, the probability of

βSCADβ2sm for j=1,,s and βSCADj=0 for j=s+1,,dformulae-sequenceless-than-or-similar-tosubscriptnormsubscript𝛽𝑆𝐶𝐴𝐷superscript𝛽2superscript𝑠𝑚 for 𝑗1superscript𝑠 and superscriptsubscript𝛽𝑆𝐶𝐴𝐷𝑗0 for 𝑗superscript𝑠1𝑑\displaystyle\|\beta_{SCAD}-\beta^{*}\|_{2}\lesssim\sqrt{\frac{s^{*}}{m}}\;\;% \text{ for }\;\;j=1,\cdots,s^{*}\quad\text{ and }\;\beta_{SCAD}^{j}=0\text{ % for }j=s^{*}+1,\cdots,d∥ italic_β start_POSTSUBSCRIPT italic_S italic_C italic_A italic_D end_POSTSUBSCRIPT - italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≲ square-root start_ARG divide start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG start_ARG italic_m end_ARG end_ARG for italic_j = 1 , ⋯ , italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and italic_β start_POSTSUBSCRIPT italic_S italic_C italic_A italic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT = 0 for italic_j = italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + 1 , ⋯ , italic_d

is larger than some constant Cpsubscript𝐶𝑝C_{p}italic_C start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT, given that the regularity conditions in the theorem are satisfied. Note that we have |βSCADj||βj|s/m|βj|/2>0superscriptsubscript𝛽𝑆𝐶𝐴𝐷𝑗superscript𝛽absent𝑗superscript𝑠𝑚superscript𝛽absent𝑗20|\beta_{SCAD}^{j}|\geq|\beta^{*j}|-\sqrt{s^{*}/m}\geq|\beta^{*j}|/2>0| italic_β start_POSTSUBSCRIPT italic_S italic_C italic_A italic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | ≥ | italic_β start_POSTSUPERSCRIPT ∗ italic_j end_POSTSUPERSCRIPT | - square-root start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT / italic_m end_ARG ≥ | italic_β start_POSTSUPERSCRIPT ∗ italic_j end_POSTSUPERSCRIPT | / 2 > 0. Thus it suffices to verify the conditions. The condition 1 is satisfied by SCAD penalty. Condition 5 is satisfied by our setting of sample size (note that logdnαless-than-or-similar-to𝑑superscript𝑛superscript𝛼\log d\lesssim n^{\alpha^{\prime}}roman_log italic_d ≲ italic_n start_POSTSUPERSCRIPT italic_α start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT for αsuperscript𝛼\alpha^{\prime}italic_α start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT defined in their context). (26) and (28) of Condition 2 follows from our assumptions on the upper and lower bound of 𝔼XX2subscriptnorm𝔼𝑋superscript𝑋top2\|\mathbb{E}XX^{\top}\|_{2}∥ blackboard_E italic_X italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and the estimation error of covariance matrix which is 𝒪(s/m)𝒪superscript𝑠𝑚\mathcal{O}(\sqrt{s^{*}/m})caligraphic_O ( square-root start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT / italic_m end_ARG ) (Wainwright, 2019). (27) comes from s=𝒪(1)superscript𝑠𝒪1s^{*}=\mathcal{O}(1)italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = caligraphic_O ( 1 ). ∎

B.1.2 Proof of Proposition 3.2

Proof of Proposition 3.2.

Under the two conditions, using Lemma B.2 and B.4, we can show that there exists a variable selection method that perfectly select the true variables with a positive probability Cpsubscript𝐶𝑝C_{p}italic_C start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT. Then by sampling among the selected variables, the probability can be computed as

Pr(𝒮(Xi,yi)=v)CpPr(v=j for vUnif(1,,s))CpsPr𝒮subscript𝑋𝑖subscript𝑦𝑖𝑣subscript𝐶𝑝Pr𝑣𝑗 for 𝑣similar-toUnif1superscript𝑠subscript𝐶𝑝superscript𝑠\displaystyle\mathrm{Pr}\left(\mathcal{S}(X_{i},y_{i})=v\right)\geq C_{p}% \mathrm{Pr}\left(v=j\;\text{ for }\;v\sim\text{Unif}\left(1,\cdots,s^{*}\right% )\right)\geq\frac{C_{p}}{s^{*}}roman_Pr ( caligraphic_S ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = italic_v ) ≥ italic_C start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT roman_Pr ( italic_v = italic_j for italic_v ∼ Unif ( 1 , ⋯ , italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) ≥ divide start_ARG italic_C start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_ARG start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG

for 1vs1𝑣superscript𝑠1\leq v\leq s^{*}1 ≤ italic_v ≤ italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. This yields the desired conclusion. ∎

B.1.3 Computational Issue

For SCAD, the incorporation of a non-convex penalty proves effective in attaining coefficient sparsity while maintaining oracle properties. Nonetheless, the non-convex nature introduces a challenge—the guarantee of solution uniqueness becomes elusive, leading to the presence of multiple local optima. Consequently, the stability of results may be compromised. Fan et al. (2014) introduce additional concave parameter to ensure consistency, which contributes to increased computational complexity, further posing challenges in the computational efficiency of SCAD. As a result, Lasso is more preferable In practice. We introduce another technique which can be useful to enhance the computation efficiency.

Example B.5 (Screening (Fan & Lv, 2008)).

Sure Independence Screening (SIS) is a feature selection method in statistical learning that aims to identify relevant variables in high-dimensional datasets. It does so by assessing the correlation between each predictor and the response variable, and selecting a subset with the highest scores. Specifically, For (X,y)(𝒳×𝒴)m𝑋𝑦superscript𝒳𝒴𝑚(X,y)\in(\mathcal{X}\times\mathcal{Y})^{m}( italic_X , italic_y ) ∈ ( caligraphic_X × caligraphic_Y ) start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT, let

w=Xy.𝑤superscript𝑋top𝑦\displaystyle w=X^{\top}y.italic_w = italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_y .

Then the s𝑠sitalic_s most largest position of w𝑤witalic_w are identified as the selected variables. Screening can be a valuable pre-procedure for other selection methods. Screening is employed to quickly identify and retain a subset of potentially important features, reducing the dimensionality of the data before applying more computationally intensive or elaborate feature selection techniques.

B.2 Aggregation of Local Selected Variables

In this section, we present the omitted algorithm and technical proofs for the aggregation step after local variable selection. In B.2.1, we introduce the detailed variable selection algorithm. In B.2.2, we present proofs omitted in Section 3.1.

B.2.1 Heavy Hitter Algorithm

First, we introduce necessary definitions. Let 𝒱𝒱\mathcal{V}caligraphic_V be a collection of binary prefixes. The define ChildSet ={v+0,v+1 for v𝒱}absent𝑣0𝑣1 for 𝑣𝒱=\{v+0,v+1\text{ for }v\in\mathcal{V}\}= { italic_v + 0 , italic_v + 1 for italic_v ∈ caligraphic_V }. We define several public randomness that will be shared among users. See Bassily et al. (2020, Section 3.1) for details. Let 𝒱¯={v{0,1}\overline{\mathcal{V}}=\left\{v\in\{0,1\}^{\ell}\right.over¯ start_ARG caligraphic_V end_ARG = { italic_v ∈ { 0 , 1 } start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT for some absent\ell\inroman_ℓ ∈ [logd]}[\log d]\}[ roman_log italic_d ] }. Define integer t=3log(n)𝑡3𝑛t=3\log(n)italic_t = 3 roman_log ( italic_n ) and k=O(n/3log(n))𝑘𝑂𝑛3𝑛k=O(\sqrt{{n}/{3\log(n)}})italic_k = italic_O ( square-root start_ARG italic_n / 3 roman_log ( italic_n ) end_ARG ). We will consider a set of t𝑡titalic_t pairs of hash functions {(h1,g1),,(ht,gt)}subscript1subscript𝑔1subscript𝑡subscript𝑔𝑡\left\{\left(h_{1},g_{1}\right),\ldots,\left(h_{t},g_{t}\right)\right\}{ ( italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) }, where for each i[t],hi:𝒱¯[k]:𝑖delimited-[]𝑡subscript𝑖¯𝒱delimited-[]𝑘i\in[t],h_{i}:\overline{\mathcal{V}}\rightarrow[k]italic_i ∈ [ italic_t ] , italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT : over¯ start_ARG caligraphic_V end_ARG → [ italic_k ] and gi:𝒱¯{1,+1}:subscript𝑔𝑖¯𝒱11g_{i}:\overline{\mathcal{V}}\rightarrow\{-1,+1\}italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT : over¯ start_ARG caligraphic_V end_ARG → { - 1 , + 1 } are independently and uniformly chosen pairwise independent hash functions. We assume that the server creates a random partition Π:[n][logd]×[k]:Πdelimited-[]𝑛delimited-[]𝑑delimited-[]𝑘\Pi:[n]\rightarrow[\log d]\times[k]roman_Π : [ italic_n ] → [ roman_log italic_d ] × [ italic_k ] that assigns to each user i[n]𝑖delimited-[]𝑛i\in[n]italic_i ∈ [ italic_n ] a random pair (i,ji)[log(d)]×[k]subscript𝑖subscript𝑗𝑖delimited-[]𝑑delimited-[]𝑘\left(\ell_{i},j_{i}\right)\leftarrow[\log(d)]\times[k]( roman_ℓ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_j start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ← [ roman_log ( italic_d ) ] × [ italic_k ], as in the initialization of Algorithm 4. We also have another random function 𝒬:[n][k]:𝒬delimited-[]𝑛delimited-[]𝑘\mathcal{Q}:[n]\leftarrow[k]caligraphic_Q : [ italic_n ] ← [ italic_k ] that assigns to each user i𝑖iitalic_i a uniformly random index ri[k]subscript𝑟𝑖delimited-[]𝑘r_{i}\leftarrow[k]italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ← [ italic_k ]. We assume that such random indices i,ji,risubscript𝑖subscript𝑗𝑖subscript𝑟𝑖\ell_{i},j_{i},r_{i}roman_ℓ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_j start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are shared between the server and each user. Finally, we adopt shared encoding and decoding schemes for bijection between [d]delimited-[]𝑑[d][ italic_d ] and logd𝑑\lceil\log d\rceil⌈ roman_log italic_d ⌉ binary strings, denoted as Encoding and Decoding, respectively.

Before presenting the HeavyHitter, we first introduce the functions it uses. The following algorithm generate a private report for a single user. We seal the information of each visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT into a binary value that is the Hardamard transform of hashes of its prefix. The information is privatized using the random response mechanism (Warner, 1965) and sent to the curator.

Algorithm 2 LocalRnd (Bassily et al., 2020)
  Input: Privacy budget ε𝜀\varepsilonitalic_ε, input visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.
  Compute v~i=Encoding(vi)subscript~𝑣𝑖Encodingsubscript𝑣𝑖\tilde{v}_{i}=\texttt{Encoding}(v_{i})over~ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = Encoding ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) the binary string encoding.
  Using pubic randomness to get (i,ji)subscript𝑖subscript𝑗𝑖(\ell_{i},j_{i})( roman_ℓ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_j start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) and risubscript𝑟𝑖r_{i}italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.
  Let si:=gji(v~i[1:i])s_{i}:=g_{j_{i}}\left(\tilde{v}_{i}\left[1:\ell_{i}\right]\right)italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := italic_g start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( over~ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ 1 : roman_ℓ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] ) and ci:=hji(v~i[1:i])c_{i}:=h_{j_{i}}\left(\tilde{v}_{i}\left[1:\ell_{i}\right]\right)italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := italic_h start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( over~ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ 1 : roman_ℓ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] ). Here v[1:]v[1:\ell]italic_v [ 1 : roman_ℓ ] denote the \ellroman_ℓ -bit prefix of v𝑣vitalic_v.
  Compute xi=siWri,cisubscript𝑥𝑖subscript𝑠𝑖subscript𝑊subscript𝑟𝑖subscript𝑐𝑖x_{i}=s_{i}\cdot W_{r_{i},c_{i}}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ italic_W start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT. Here Wr,csubscript𝑊𝑟𝑐W_{r,c}italic_W start_POSTSUBSCRIPT italic_r , italic_c end_POSTSUBSCRIPT denotes the sign of (r,c)𝑟𝑐(r,c)( italic_r , italic_c ) entry of Hadamard matrix with size k𝑘kitalic_k.
  Random permute xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT with
yi={xi w.p. eϵeϵ+1xi w.p. 1eϵ+1subscript𝑦𝑖casessubscript𝑥𝑖 w.p. superscript𝑒italic-ϵsuperscript𝑒italic-ϵ1subscript𝑥𝑖 w.p. 1superscript𝑒italic-ϵ1\displaystyle y_{i}=\left\{\begin{array}[]{cc}x_{i}&\text{ w.p. }\frac{e^{% \epsilon}}{e^{\epsilon}+1}\\ -x_{i}&\text{ w.p. }\frac{1}{e^{\epsilon}+1}\end{array}\right.italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = { start_ARRAY start_ROW start_CELL italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL start_CELL w.p. divide start_ARG italic_e start_POSTSUPERSCRIPT italic_ϵ end_POSTSUPERSCRIPT end_ARG start_ARG italic_e start_POSTSUPERSCRIPT italic_ϵ end_POSTSUPERSCRIPT + 1 end_ARG end_CELL end_ROW start_ROW start_CELL - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL start_CELL w.p. divide start_ARG 1 end_ARG start_ARG italic_e start_POSTSUPERSCRIPT italic_ϵ end_POSTSUPERSCRIPT + 1 end_ARG end_CELL end_ROW end_ARRAY
  Output: yisubscript𝑦𝑖y_{i}italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

The following algorithm shows how LocalRnd is invoked multiple times to scan the prefix tree.

Algorithm 3 FreqOracle (Bassily et al., 2020)
  Input: Prefixes length \ellroman_ℓ, a subset of \ellroman_ℓ-bit prefixes 𝒱^{0,1}^𝒱superscript01\widehat{\mathcal{V}}\subseteq\{0,1\}^{\ell}over^ start_ARG caligraphic_V end_ARG ⊆ { 0 , 1 } start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT, collection of t𝑡titalic_t disjoint subsets of users: {~j:j[t]}conditional-setsubscript~𝑗𝑗delimited-[]𝑡\left\{\tilde{\mathcal{I}}_{j}:j\in[t]\right\}{ over~ start_ARG caligraphic_I end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT : italic_j ∈ [ italic_t ] }, privacy budget ε𝜀\varepsilonitalic_ε.
  for v^𝒱^^𝑣^𝒱\widehat{v}\in\widehat{\mathcal{V}}over^ start_ARG italic_v end_ARG ∈ over^ start_ARG caligraphic_V end_ARG do
     for Hash index j=1𝑗1j=1italic_j = 1 to t𝑡titalic_t  do
        Let s:=gj(v^)assign𝑠subscript𝑔𝑗^𝑣s:=g_{j}(\widehat{v})italic_s := italic_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( over^ start_ARG italic_v end_ARG ) and c:=hj(v^)assign𝑐subscript𝑗^𝑣c:=h_{j}(\widehat{v})italic_c := italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( over^ start_ARG italic_v end_ARG ).
        for i~j𝑖subscript~𝑗i\in\tilde{\mathcal{I}}_{j}italic_i ∈ over~ start_ARG caligraphic_I end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT do
           yisubscript𝑦𝑖y_{i}italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = LocalRnd(ε𝜀\varepsilonitalic_ε, visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT).
        end for
        Compute the j-th estimate of the frequency of v^^𝑣\widehat{v}over^ start_ARG italic_v end_ARG: f^j(v^)=tlogdeε+1eε1i~jyisWri,csubscript^𝑓𝑗^𝑣𝑡𝑑superscript𝑒𝜀1superscript𝑒𝜀1subscript𝑖subscript~𝑗subscript𝑦𝑖𝑠subscript𝑊subscript𝑟𝑖𝑐\widehat{f}_{j}(\widehat{v})=t\log d\cdot\frac{e^{\varepsilon}+1}{e^{% \varepsilon}-1}\sum_{i\in\tilde{\mathcal{I}}_{j}}y_{i}\cdot s\cdot W_{r_{i},c}over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( over^ start_ARG italic_v end_ARG ) = italic_t roman_log italic_d ⋅ divide start_ARG italic_e start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT + 1 end_ARG start_ARG italic_e start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT - 1 end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ over~ start_ARG caligraphic_I end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ italic_s ⋅ italic_W start_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_c end_POSTSUBSCRIPT.
     end for
     The final estimation of v^:f^(v^):=Median({f^j(v^):j[t]}):^𝑣assign^𝑓^𝑣Medianconditional-setsubscript^𝑓𝑗^𝑣𝑗delimited-[]𝑡\widehat{v}:\widehat{f}(\widehat{v}):=\operatorname{Median}\left(\left\{% \widehat{f}_{j}(\widehat{v}):j\in[t]\right\}\right)over^ start_ARG italic_v end_ARG : over^ start_ARG italic_f end_ARG ( over^ start_ARG italic_v end_ARG ) := roman_Median ( { over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( over^ start_ARG italic_v end_ARG ) : italic_j ∈ [ italic_t ] } ).
  end for
  FreqList ={(v^,f^(v^)):v^𝒱^}.absentconditional-set^𝑣^𝑓^𝑣^𝑣^𝒱=\{(\widehat{v},\widehat{f}(\widehat{v})):\widehat{v}\in\widehat{\mathcal{V}}\}.= { ( over^ start_ARG italic_v end_ARG , over^ start_ARG italic_f end_ARG ( over^ start_ARG italic_v end_ARG ) ) : over^ start_ARG italic_v end_ARG ∈ over^ start_ARG caligraphic_V end_ARG } . Output: FreqList

The final algorithm is presented in Algorithm 4. We modify the algorithm in Bassily et al. (2020) by removing the second phase of frequency estimation, since we only want to identify the heavy hitters and do not care about their frequencies. This allows a saving of ε/2𝜀2\varepsilon/2italic_ε / 2 budget.

Algorithm 4 HeavyHitter
  Input: User values 𝒱={vi[d]}𝒱subscript𝑣𝑖delimited-[]𝑑\mathcal{V}=\{v_{i}\in[d]\}caligraphic_V = { italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ [ italic_d ] }, privacy budget ε𝜀\varepsilonitalic_ε, threshold ρ𝜌\rhoitalic_ρ.
  Initialization: Prefixes ={}absent=\{\}= { }, public randomness pairs Γ={(i,ji)[logd]×[3logn] for 1in}Γsubscript𝑖subscript𝑗𝑖delimited-[]𝑑delimited-[]3𝑛 for 1𝑖𝑛\Gamma=\{(\ell_{i},j_{i})\in[\log d]\times[3\log n]\text{ for }1\leq i\leq n\}roman_Γ = { ( roman_ℓ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_j start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∈ [ roman_log italic_d ] × [ 3 roman_log italic_n ] for 1 ≤ italic_i ≤ italic_n }, partition I,j={i if (i,ji)=(,j)}subscript𝐼𝑗𝑖 if subscript𝑖subscript𝑗𝑖𝑗I_{\ell,j}=\{i\text{ if }(\ell_{i},j_{i})=(\ell,j)\}italic_I start_POSTSUBSCRIPT roman_ℓ , italic_j end_POSTSUBSCRIPT = { italic_i if ( roman_ℓ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_j start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = ( roman_ℓ , italic_j ) }.
  for \ellroman_ℓ in 1,,logd1𝑑1,\cdots,\lceil\log d\rceil1 , ⋯ , ⌈ roman_log italic_d ⌉ do
     {(v^,f^(v^)):v^ChildSet(Prefixes)}= FreqOracle(, ChildSet (Prefixes) ,{,j:j[3logn]},ε)conditional-set^𝑣^𝑓^𝑣^𝑣ChildSetPrefixes FreqOracle ChildSet (Prefixes) conditional-setsubscript𝑗𝑗delimited-[]3𝑛𝜀\{(\widehat{v},\widehat{f}(\widehat{v})):\widehat{v}\in\texttt{ChildSet}(\text% {Prefixes})\}=\texttt{ FreqOracle}\left(\ell,\text{ ChildSet (Prefixes) },% \left\{\mathcal{I}_{\ell,j}:j\in[3\log n]\right\},\varepsilon\right){ ( over^ start_ARG italic_v end_ARG , over^ start_ARG italic_f end_ARG ( over^ start_ARG italic_v end_ARG ) ) : over^ start_ARG italic_v end_ARG ∈ ChildSet ( Prefixes ) } = FreqOracle ( roman_ℓ , ChildSet (Prefixes) , { caligraphic_I start_POSTSUBSCRIPT roman_ℓ , italic_j end_POSTSUBSCRIPT : italic_j ∈ [ 3 roman_log italic_n ] } , italic_ε ).
     Let NewPrefixes ={}absent=\{\}= { }.
     for vChildSet(Prefixes)𝑣ChildSetPrefixesv\in\texttt{ChildSet}(\text{Prefixes})italic_v ∈ ChildSet ( Prefixes ) do
        if f^(v^)ρn^𝑓^𝑣𝜌𝑛\widehat{f}(\widehat{v})\geq\rho nover^ start_ARG italic_f end_ARG ( over^ start_ARG italic_v end_ARG ) ≥ italic_ρ italic_n then
           Add v^^𝑣\widehat{v}over^ start_ARG italic_v end_ARG to NewPrefixes.
        end if
     end for
     if |NewPrefixes|=0NewPrefixes0|\text{NewPrefixes}|=0| NewPrefixes | = 0 then
        Add argmaxv^f^(v^)subscript^𝑣^𝑓^𝑣\arg\max_{\widehat{v}}\widehat{f}(\widehat{v})roman_arg roman_max start_POSTSUBSCRIPT over^ start_ARG italic_v end_ARG end_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG ( over^ start_ARG italic_v end_ARG ) to NewPrefixes. # Ensure NewPrefixes is non-empty.
     end if
     Prefixes \leftarrow NewPrefixes.
  end for
  Output: {Decoding(v) for (v,f^(v)) Prefixes}Decoding𝑣 for 𝑣^𝑓𝑣 Prefixes\{\texttt{Decoding}(v)\text{ for }(v,\widehat{f}(v))\in\text{ Prefixes}\}{ Decoding ( italic_v ) for ( italic_v , over^ start_ARG italic_f end_ARG ( italic_v ) ) ∈ Prefixes }.

B.2.2 Proof Related to Section 3.1

To give the proof of Proposition 3.3, we need the following necessary technical result.

Lemma B.6.

Algorithm 4 is ε𝜀\varepsilonitalic_ε- ULDP. Moreover, if αslognlogd/n/εgreater-than-or-equivalent-to𝛼superscript𝑠𝑛𝑑𝑛𝜀\alpha\gtrsim s^{*}\sqrt{\log n\log d/n}/\varepsilonitalic_α ≳ italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT square-root start_ARG roman_log italic_n roman_log italic_d / italic_n end_ARG / italic_ε, then with probability at least 11/n211superscript𝑛21-1/n^{2}1 - 1 / italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, the output list of the HeavyHitter protocol satisfies the following properties given sufficiently large n𝑛nitalic_n: (i) it contains all items v𝒱𝑣𝒱v\in\mathcal{V}italic_v ∈ caligraphic_V whose true frequencies above 2ρn2𝜌𝑛2\rho n2 italic_ρ italic_n. (ii) it does not contain any item v𝒱𝑣𝒱v\in\mathcal{V}italic_v ∈ caligraphic_V whose true frequency below ρn/2𝜌𝑛2\rho n/2italic_ρ italic_n / 2.

Proof of Lemma B.6.

Lemma 5.3 in Bassily et al. (2020) yields that the variables v𝑣vitalic_v retained in Prefixes in Algorithm 4 has |f^(v)f(v)|nlognlogd/εless-than-or-similar-to^𝑓𝑣𝑓𝑣𝑛𝑛𝑑𝜀|\widehat{f}(v)-f(v)|\lesssim{\sqrt{n\log n\log d}}/{\varepsilon}| over^ start_ARG italic_f end_ARG ( italic_v ) - italic_f ( italic_v ) | ≲ square-root start_ARG italic_n roman_log italic_n roman_log italic_d end_ARG / italic_ε. Since αslognlogd/n/εgreater-than-or-equivalent-to𝛼superscript𝑠𝑛𝑑𝑛𝜀\alpha\gtrsim s^{*}\sqrt{\log n\log d/n}/\varepsilonitalic_α ≳ italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT square-root start_ARG roman_log italic_n roman_log italic_d / italic_n end_ARG / italic_ε, we have nlognlogd/ερn/2𝑛𝑛𝑑𝜀𝜌𝑛2{\sqrt{n\log n\log d}}/{\varepsilon}\leq\rho n/2square-root start_ARG italic_n roman_log italic_n roman_log italic_d end_ARG / italic_ε ≤ italic_ρ italic_n / 2 for sufficiently large n𝑛nitalic_n. Then for any v𝑣vitalic_v in Prefixes, we have f(v)ρnnlognlogd/ερn/2greater-than-or-equivalent-to𝑓𝑣𝜌𝑛𝑛𝑛𝑑𝜀𝜌𝑛2f(v)\gtrsim\rho n-{\sqrt{n\log n\log d}}/{\varepsilon}\geq\rho n/2italic_f ( italic_v ) ≳ italic_ρ italic_n - square-root start_ARG italic_n roman_log italic_n roman_log italic_d end_ARG / italic_ε ≥ italic_ρ italic_n / 2. On the contrary, if f(v)2ρn𝑓𝑣2𝜌𝑛f(v)\geq 2\rho nitalic_f ( italic_v ) ≥ 2 italic_ρ italic_n, then f^(v)2ρnnlognlogd/ερngreater-than-or-equivalent-to^𝑓𝑣2𝜌𝑛𝑛𝑛𝑑𝜀𝜌𝑛\widehat{f}(v)\gtrsim 2\rho n-{\sqrt{n\log n\log d}}/{\varepsilon}\geq\rho nover^ start_ARG italic_f end_ARG ( italic_v ) ≳ 2 italic_ρ italic_n - square-root start_ARG italic_n roman_log italic_n roman_log italic_d end_ARG / italic_ε ≥ italic_ρ italic_n, which will be included in Prefixes.

Proof of Proposition 3.3.

For notation simplicity, we denote the number of users and selectors used in the selection as n𝑛nitalic_n instead of n/2𝑛2n/2italic_n / 2 throughout this proof. We compute the frequency of variable j𝑗jitalic_j, namely i=1n𝟏(vi=j)superscriptsubscript𝑖1𝑛1subscript𝑣𝑖𝑗\sum_{i=1}^{n}\boldsymbol{1}\left(v_{i}=j\right)∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_1 ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_j ). By Hoeffding’s inequality, we have

Pr(|i=1n𝟏(vi=j)i=1nPr(vi=j)|n(lognd))2exp(2(logn+logd)).Prsuperscriptsubscript𝑖1𝑛1subscript𝑣𝑖𝑗superscriptsubscript𝑖1𝑛Prsubscript𝑣𝑖𝑗𝑛𝑛𝑑22𝑛𝑑\displaystyle\mathrm{Pr}\left(\left|\sum_{i=1}^{n}\boldsymbol{1}\left(v_{i}=j% \right)-\sum_{i=1}^{n}\mathrm{Pr}\left(v_{i}=j\right)\right|\geq\sqrt{n(\log nd% )}\right)\leq 2\exp\left(-2(\log n+\log d)\right).roman_Pr ( | ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_1 ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_j ) - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_Pr ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_j ) | ≥ square-root start_ARG italic_n ( roman_log italic_n italic_d ) end_ARG ) ≤ 2 roman_exp ( - 2 ( roman_log italic_n + roman_log italic_d ) ) .

Applying union bound, we get

Pr(|i=1n𝟏(vi=j)i=1nPr(vi=j)|nlognd for 1jd)Prformulae-sequencesuperscriptsubscript𝑖1𝑛1subscript𝑣𝑖𝑗superscriptsubscript𝑖1𝑛Prsubscript𝑣𝑖𝑗𝑛𝑛𝑑 for 1𝑗𝑑absent\displaystyle\mathrm{Pr}\left(\left|\sum_{i=1}^{n}\boldsymbol{1}\left(v_{i}=j% \right)-\sum_{i=1}^{n}\mathrm{Pr}\left(v_{i}=j\right)\right|\geq\sqrt{n\log nd% }\quad\text{ for }1\leq j\leq d\right)\leqroman_Pr ( | ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_1 ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_j ) - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_Pr ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_j ) | ≥ square-root start_ARG italic_n roman_log italic_n italic_d end_ARG for 1 ≤ italic_j ≤ italic_d ) ≤ 2dexp(2(logn+logd))2𝑑2𝑛𝑑\displaystyle 2d\exp\left(-2(\log n+\log d)\right)2 italic_d roman_exp ( - 2 ( roman_log italic_n + roman_log italic_d ) )
<\displaystyle<< exp(2logn)=1/n2.2𝑛1superscript𝑛2\displaystyle\exp\left(-2\log n\right)=1/n^{2}.roman_exp ( - 2 roman_log italic_n ) = 1 / italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (21)

For conclusion (i), since visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is generated by a good selector, Definition 3.1 yields that

i=1nPr(vi=j)nαssuperscriptsubscript𝑖1𝑛Prsubscript𝑣𝑖𝑗𝑛𝛼superscript𝑠\displaystyle\sum_{i=1}^{n}\mathrm{Pr}\left(v_{i}=j\right)\geq\frac{n\alpha}{s% ^{*}}∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_Pr ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_j ) ≥ divide start_ARG italic_n italic_α end_ARG start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG

for j=1,,s𝑗1superscript𝑠j=1,\cdots,s^{*}italic_j = 1 , ⋯ , italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. This together with (21) leads to

i=1n𝟏(vi=j)nαsnlogndnα2ssuperscriptsubscript𝑖1𝑛1subscript𝑣𝑖𝑗𝑛𝛼superscript𝑠𝑛𝑛𝑑𝑛𝛼2superscript𝑠\displaystyle\sum_{i=1}^{n}\boldsymbol{1}\left(v_{i}=j\right)\geq\frac{n\alpha% }{s^{*}}-\sqrt{n\log nd}\geq\frac{n\alpha}{2s^{*}}∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_1 ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_j ) ≥ divide start_ARG italic_n italic_α end_ARG start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG - square-root start_ARG italic_n roman_log italic_n italic_d end_ARG ≥ divide start_ARG italic_n italic_α end_ARG start_ARG 2 italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG

for any 1js1𝑗superscript𝑠1\leq j\leq s^{*}1 ≤ italic_j ≤ italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and sufficiently large n𝑛nitalic_n. Then for any ρα/4s𝜌𝛼4superscript𝑠\rho\leq\alpha/4s^{*}italic_ρ ≤ italic_α / 4 italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, by Lemma B.6, we have f^(j)ρn^𝑓𝑗𝜌𝑛\widehat{f}(j)\geq\rho nover^ start_ARG italic_f end_ARG ( italic_j ) ≥ italic_ρ italic_n. This means the frequency of any true variable must be large enough to be detected as a heavy hitter. Next, we show (ii). Suppose that there are s𝑠sitalic_s variables j1,,jssubscript𝑗1subscript𝑗𝑠j_{1},\cdots,j_{s}italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_j start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT satisfying i=1n𝟏(vi=j)ρn/2superscriptsubscript𝑖1𝑛1subscript𝑣𝑖𝑗𝜌𝑛2\sum_{i=1}^{n}\boldsymbol{1}\left(v_{i}=j\right)\geq\rho n/2∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_1 ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_j ) ≥ italic_ρ italic_n / 2, i.e. potentially identified by the heavy hitters by Lemma B.6. Then by applying (21), there holds

k=1si=1nPr(vi=jk)sρn2snlogndsρn4superscriptsubscript𝑘1𝑠superscriptsubscript𝑖1𝑛Prsubscript𝑣𝑖subscript𝑗𝑘𝑠𝜌𝑛2𝑠𝑛𝑛𝑑𝑠𝜌𝑛4\displaystyle\sum_{k=1}^{s}\sum_{i=1}^{n}\mathrm{Pr}\left(v_{i}=j_{k}\right)% \geq s\cdot\frac{\rho n}{2}-s\sqrt{n\log nd}\geq s\cdot\frac{\rho n}{4}∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_Pr ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_j start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ≥ italic_s ⋅ divide start_ARG italic_ρ italic_n end_ARG start_ARG 2 end_ARG - italic_s square-root start_ARG italic_n roman_log italic_n italic_d end_ARG ≥ italic_s ⋅ divide start_ARG italic_ρ italic_n end_ARG start_ARG 4 end_ARG

for sufficiently large n𝑛nitalic_n, with probability at least 11/n211superscript𝑛21-1/n^{2}1 - 1 / italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. However, there holds

sρn4k=1si=1nPr(vi=jk)j=1di=1nPr(vi=j)=n,𝑠𝜌𝑛4superscriptsubscript𝑘1𝑠superscriptsubscript𝑖1𝑛Prsubscript𝑣𝑖subscript𝑗𝑘superscriptsubscript𝑗1𝑑superscriptsubscript𝑖1𝑛Prsubscript𝑣𝑖𝑗𝑛\displaystyle s\cdot\frac{\rho n}{4}\leq\sum_{k=1}^{s}\sum_{i=1}^{n}\mathrm{Pr% }\left(v_{i}=j_{k}\right)\leq\sum_{j=1}^{d}\sum_{i=1}^{n}\mathrm{Pr}\left(v_{i% }=j\right)=n,italic_s ⋅ divide start_ARG italic_ρ italic_n end_ARG start_ARG 4 end_ARG ≤ ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_Pr ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_j start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ≤ ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_Pr ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_j ) = italic_n ,

which indicates that s4/ρ32s/α𝑠4𝜌32superscript𝑠𝛼s\leq 4/\rho\leq 32s^{*}/\alphaitalic_s ≤ 4 / italic_ρ ≤ 32 italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT / italic_α.

Appendix C Coefficient Estimation

C.1 The Multiple Round Protocol

C.1.1 SCO Algorithm

We use the same algorithm as in the Bassily & Sun (2023) while adopting a different set of default values of its parameters. Such changes are due to the differential technical requirements for the theoretical analysis with strong convexity. Also, the algorithm requires a solution to the user-level locally differentially private mean estimation (ULDPMean), which is presented later in Section C.2.2. For notation simplicity, we denote the number of users and selectors used in the selection as n𝑛nitalic_n instead of n/2𝑛2n/2italic_n / 2 in this section.

Algorithm 5 ULDPSCO
  Input: Local data sets {(Xi,yi)}i=1nsuperscriptsubscriptsubscript𝑋𝑖subscript𝑦𝑖𝑖1𝑛\{(X_{i},y_{i})\}_{i=1}^{n}{ ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, number of iterations T𝑇Titalic_T, concentration radius τ𝜏\tauitalic_τ, privacy budget ε𝜀\varepsilonitalic_ε.
  Initialization : β0=0subscript𝛽00\beta_{0}=\overrightarrow{0}italic_β start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = over→ start_ARG 0 end_ARG, βag=β0superscript𝛽𝑎𝑔subscript𝛽0\beta^{ag}=\beta_{0}italic_β start_POSTSUPERSCRIPT italic_a italic_g end_POSTSUPERSCRIPT = italic_β start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, and {ηt,γt}t[T]subscriptsubscript𝜂𝑡subscript𝛾𝑡𝑡delimited-[]𝑇\left\{\eta_{t},\gamma_{t}\right\}_{t\in[T]}{ italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT as in Lemma C.2.
  for t=0,1,,T1𝑡01𝑇1t=0,1,\cdots,T-1italic_t = 0 , 1 , ⋯ , italic_T - 1 do
     Compute βtmd=γt1βt+(1γt1)βtagsuperscriptsubscript𝛽𝑡𝑚𝑑superscriptsubscript𝛾𝑡1subscript𝛽𝑡1superscriptsubscript𝛾𝑡1superscriptsubscript𝛽𝑡𝑎𝑔\beta_{t}^{md}=\gamma_{t}^{-1}\beta_{t}+\left(1-\gamma_{t}^{-1}\right)\beta_{t% }^{ag}italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m italic_d end_POSTSUPERSCRIPT = italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + ( 1 - italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_g end_POSTSUPERSCRIPT.
     Choose two fresh batches St,1subscript𝑆𝑡1S_{t,1}italic_S start_POSTSUBSCRIPT italic_t , 1 end_POSTSUBSCRIPT and St,2subscript𝑆𝑡2S_{t,2}italic_S start_POSTSUBSCRIPT italic_t , 2 end_POSTSUBSCRIPT of n0=n/2Tsubscript𝑛0𝑛2𝑇n_{0}=\lfloor n/2T\rflooritalic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = ⌊ italic_n / 2 italic_T ⌋ users, respectively.
     Compute the average gradient at each user at βtsubscript𝛽𝑡\beta_{t}italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, gi(βtmd)=1mLj=1m(Xi,jβtmdyi,j)Xi,jsubscript𝑔𝑖superscriptsubscript𝛽𝑡𝑚𝑑1𝑚𝐿superscriptsubscript𝑗1𝑚superscriptsubscript𝑋𝑖𝑗topsuperscriptsubscript𝛽𝑡𝑚𝑑subscript𝑦𝑖𝑗subscript𝑋𝑖𝑗g_{i}\left(\beta_{t}^{md}\right)=\frac{1}{mL}\sum_{j=1}^{m}\left(X_{i,j}^{\top% }\beta_{t}^{md}-y_{i,j}\right)X_{i,j}italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m italic_d end_POSTSUPERSCRIPT ) = divide start_ARG 1 end_ARG start_ARG italic_m italic_L end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m italic_d end_POSTSUPERSCRIPT - italic_y start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ) italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT for iSt,1St,2𝑖subscript𝑆𝑡1subscript𝑆𝑡2i\in S_{t,1}\cup S_{t,2}italic_i ∈ italic_S start_POSTSUBSCRIPT italic_t , 1 end_POSTSUBSCRIPT ∪ italic_S start_POSTSUBSCRIPT italic_t , 2 end_POSTSUBSCRIPT.
     Compute the average gradients ~F(βtmd)=ULDPMean({gi(βtmd)i}iSt,1,{gi(βtmd)i}iSt,2,τ,ε).~𝐹superscriptsubscript𝛽𝑡𝑚𝑑ULDPMeansubscriptsubscript𝑔𝑖subscriptsuperscriptsubscript𝛽𝑡𝑚𝑑𝑖𝑖subscript𝑆𝑡1subscriptsubscript𝑔𝑖subscriptsuperscriptsubscript𝛽𝑡𝑚𝑑𝑖𝑖subscript𝑆𝑡2𝜏𝜀\tilde{\nabla}F\left(\beta_{t}^{md}\right)=\texttt{ULDPMean}(\{{g_{i}\left(% \beta_{t}^{md}\right)}_{i}\}_{i\in S_{t,1}},\{{g_{i}\left(\beta_{t}^{md}\right% )}_{i}\}_{i\in S_{t,2}},\tau,\varepsilon).over~ start_ARG ∇ end_ARG italic_F ( italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m italic_d end_POSTSUPERSCRIPT ) = ULDPMean ( { italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m italic_d end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i ∈ italic_S start_POSTSUBSCRIPT italic_t , 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , { italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m italic_d end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i ∈ italic_S start_POSTSUBSCRIPT italic_t , 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_τ , italic_ε ) .
     Update βt+1=βtmdηtL~F(βtmd)subscript𝛽𝑡1superscriptsubscript𝛽𝑡𝑚𝑑subscript𝜂𝑡𝐿~𝐹superscriptsubscript𝛽𝑡𝑚𝑑\beta_{t+1}=\beta_{t}^{md}-\eta_{t}\cdot L\cdot\tilde{\nabla}F\left(\beta_{t}^% {md}\right)italic_β start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m italic_d end_POSTSUPERSCRIPT - italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⋅ italic_L ⋅ over~ start_ARG ∇ end_ARG italic_F ( italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m italic_d end_POSTSUPERSCRIPT ).
     Compute βt+1ag=γt1βt+1+(1γt1)βtagsuperscriptsubscript𝛽𝑡1𝑎𝑔superscriptsubscript𝛾𝑡1subscript𝛽𝑡11superscriptsubscript𝛾𝑡1superscriptsubscript𝛽𝑡𝑎𝑔\beta_{t+1}^{ag}=\gamma_{t}^{-1}\beta_{t+1}+\left(1-\gamma_{t}^{-1}\right)% \beta_{t}^{ag}italic_β start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_g end_POSTSUPERSCRIPT = italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT + ( 1 - italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_g end_POSTSUPERSCRIPT.
  end for
  Output: βTagsuperscriptsubscript𝛽𝑇𝑎𝑔\beta_{T}^{ag}italic_β start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_g end_POSTSUPERSCRIPT.

For the algorithm, we have the following result. Note that Algorithm 5 adopts disjoint mini-batch when computing the gradients while Lemma C.1 was established completely based on stochastic gradient descent. Yet, the theoretical analysis generalize straightforwardly.

Lemma C.1 (Theorem 3 of Dieuleveut et al. (2017)).

Consider the stochastic convex optimization problem (3). Suppose each ~F(β)~𝐹𝛽\tilde{\nabla}F(\beta)over~ start_ARG ∇ end_ARG italic_F ( italic_β ) is an unbiased stochastic oracle to F(β)𝐹𝛽\nabla F(\beta)∇ italic_F ( italic_β ) with variance ν2superscript𝜈2\nu^{2}italic_ν start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Let βTagsuperscriptsubscript𝛽𝑇𝑎𝑔\beta_{T}^{\prime ag}italic_β start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ italic_a italic_g end_POSTSUPERSCRIPT be the associated non-private output of Algorithm 5 (ε=𝜀\varepsilon=\inftyitalic_ε = ∞). There exists settings of {ηt,γt}t[T]subscriptsubscript𝜂𝑡subscript𝛾𝑡𝑡delimited-[]𝑇\left\{\eta_{t},\gamma_{t}\right\}_{t\in[T]}{ italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT such that

𝔼[F(βTag)minβF(β)]sν2T+β22λn(𝔼[XX])1T2.less-than-or-similar-to𝔼delimited-[]𝐹superscriptsubscript𝛽𝑇𝑎𝑔subscript𝛽𝐹𝛽𝑠superscript𝜈2𝑇superscriptsubscriptnormsuperscript𝛽22subscript𝜆𝑛superscript𝔼delimited-[]𝑋superscript𝑋top1superscript𝑇2\displaystyle\mathbb{E}\left[F\left(\beta_{T}^{\prime ag}\right)-\min_{\beta}F% (\beta)\right]\lesssim\frac{s\nu^{2}}{T}+\frac{\|\beta^{*}\|_{2}^{2}\lambda_{n% }\left(\mathbb{E}\left[XX^{\top}\right]\right)^{-1}}{T^{2}}.blackboard_E [ italic_F ( italic_β start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ italic_a italic_g end_POSTSUPERSCRIPT ) - roman_min start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT italic_F ( italic_β ) ] ≲ divide start_ARG italic_s italic_ν start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_T end_ARG + divide start_ARG ∥ italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( blackboard_E [ italic_X italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ] ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG start_ARG italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .

For clearness, we additionally include the full multi-round protocol.

Algorithm 6 Multi-round ULDP sparse linear regression.
  Input: Local data sets {(Xi,yi)}i=1nsuperscriptsubscriptsubscript𝑋𝑖subscript𝑦𝑖𝑖1𝑛\{(X_{i},y_{i})\}_{i=1}^{n}{ ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, selectors {𝒮i}i=1n/2superscriptsubscriptsubscript𝒮𝑖𝑖1𝑛2\{\mathcal{S}_{i}\}_{i=1}^{n/2}{ caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n / 2 end_POSTSUPERSCRIPT, privacy budget ε𝜀\varepsilonitalic_ε, threshold ρ𝜌\rhoitalic_ρ.
  Initialization: βd𝛽superscript𝑑{\beta}\in\mathbb{R}^{d}italic_β ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT be a zero vector.
  # candidate variable selection
  # on local machine
  for i𝑖iitalic_i in 1,,n/21𝑛21,\cdots,n/21 , ⋯ , italic_n / 2 do
     vi=𝒮i(Xi,yi)subscript𝑣𝑖subscript𝒮𝑖subscript𝑋𝑖subscript𝑦𝑖v_{i}=\mathcal{S}_{i}(X_{i},y_{i})italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ).
  end for
  # logd𝑑\lceil\log d\rceil⌈ roman_log italic_d ⌉ round communication
  {v^1,,v^s}subscript^𝑣1subscript^𝑣𝑠\{\widehat{v}_{1},\cdots,\widehat{v}_{s}\}{ over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT } = HeavyHitter({vi}i=1n/2,εsuperscriptsubscriptsubscript𝑣𝑖𝑖1𝑛2𝜀\{v_{i}\}_{i=1}^{n/2},\varepsilon{ italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n / 2 end_POSTSUPERSCRIPT , italic_ε, ρ𝜌\rhoitalic_ρ).
  # coefficient estimation
  # nnmε2𝑛𝑛𝑚superscript𝜀2n\wedge\sqrt{nm\varepsilon^{2}}italic_n ∧ square-root start_ARG italic_n italic_m italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG round communication
  β^^𝛽\widehat{\beta}over^ start_ARG italic_β end_ARG = ULDPSCO({(Xi,yi)}i=n/2+1nsuperscriptsubscriptsubscript𝑋𝑖subscript𝑦𝑖𝑖𝑛21𝑛\{(X_{i},y_{i})\}_{i=n/2+1}^{n}{ ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i = italic_n / 2 + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, T𝑇Titalic_T, τ𝜏\tauitalic_τ, ε𝜀\varepsilonitalic_ε).
  βv^1:v^s=β^superscript𝛽:subscript^𝑣1subscript^𝑣𝑠^𝛽\beta^{\widehat{v}_{1}:\widehat{v}_{s}}=\widehat{\beta}italic_β start_POSTSUPERSCRIPT over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT : over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = over^ start_ARG italic_β end_ARG.
  Output: β𝛽\betaitalic_β.

C.1.2 Proof of Theorem 3.4

We need the following technical result which states the effectiveness of optimization procedures in Algorithm 5.

Lemma C.2.

Consider the stochastic convex optimization problem (3). Let T=nnmε2𝑇𝑛𝑛𝑚superscript𝜀2T=n\wedge\sqrt{nm\varepsilon^{2}}italic_T = italic_n ∧ square-root start_ARG italic_n italic_m italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG, and {ηt,γt}t[T]subscriptsubscript𝜂𝑡subscript𝛾𝑡𝑡delimited-[]𝑇\left\{\eta_{t},\gamma_{t}\right\}_{t\in[T]}{ italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT as in Lemma C.2, L=6s3logn𝐿6superscript𝑠3𝑛L=6s^{3}\log nitalic_L = 6 italic_s start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT roman_log italic_n, and τLlognlog(nm)logT/masymptotically-equals𝜏𝐿𝑛𝑛𝑚𝑇𝑚\tau\asymp L\sqrt{\log n\log\left(n\vee m\right)\log T/m}italic_τ ≍ italic_L square-root start_ARG roman_log italic_n roman_log ( italic_n ∨ italic_m ) roman_log italic_T / italic_m end_ARG. Then Algorithm 5 is ε𝜀\varepsilonitalic_ε-ULDP and has

𝔼[βTagβ^22]𝔼[F(βTag)F(β^)]s9log6nnmε2+s4lognnm.less-than-or-similar-to𝔼delimited-[]superscriptsubscriptnormsubscriptsuperscript𝛽𝑎𝑔𝑇superscript^𝛽22𝔼delimited-[]𝐹superscriptsubscript𝛽𝑇𝑎𝑔𝐹superscript^𝛽less-than-or-similar-tosuperscript𝑠9superscript6𝑛𝑛𝑚superscript𝜀2superscript𝑠4𝑛𝑛𝑚\displaystyle\mathbb{E}\left[\left\|{\beta}^{ag}_{T}-\widehat{\beta}^{*}\right% \|_{2}^{2}\right]\lesssim\mathbb{E}\left[F\left({\beta}_{T}^{ag}\right)-F(% \widehat{\beta}^{*})\right]\lesssim\frac{s^{9}\log^{6}n}{nm\varepsilon^{2}}+% \frac{s^{4}\log n}{nm}.blackboard_E [ ∥ italic_β start_POSTSUPERSCRIPT italic_a italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT - over^ start_ARG italic_β end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≲ blackboard_E [ italic_F ( italic_β start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_g end_POSTSUPERSCRIPT ) - italic_F ( over^ start_ARG italic_β end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ] ≲ divide start_ARG italic_s start_POSTSUPERSCRIPT 9 end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT italic_n end_ARG start_ARG italic_n italic_m italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG italic_s start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT roman_log italic_n end_ARG start_ARG italic_n italic_m end_ARG .
Proof of Lemma C.2.

The privacy guarantee comes from the privacy of Algorithm 9 and the fact that each batch of samples are disjoint. For the privacy guarantee, consider the mean squared error

F(β)=𝒳^×𝒴(xβy)2𝑑P(x,y)=𝔼[σ2]+(ββ)Σ(ββ).𝐹𝛽subscript^𝒳𝒴superscriptsuperscript𝑥top𝛽𝑦2differential-dP𝑥𝑦𝔼delimited-[]superscript𝜎2superscript𝛽superscript𝛽topΣ𝛽superscript𝛽\displaystyle F(\beta)=\int_{\widehat{\mathcal{X}}\times\mathcal{Y}}\left(x^{% \top}\beta-y\right)^{2}d\mathrm{P}(x,y)=\mathbb{E}\left[\sigma^{2}\right]+% \left(\beta-\beta^{*}\right)^{\top}\Sigma\left(\beta-\beta^{*}\right).italic_F ( italic_β ) = ∫ start_POSTSUBSCRIPT over^ start_ARG caligraphic_X end_ARG × caligraphic_Y end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_β - italic_y ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d roman_P ( italic_x , italic_y ) = blackboard_E [ italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] + ( italic_β - italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ ( italic_β - italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) .

By assumption on Σ=𝔼(XX)Σ𝔼𝑋superscript𝑋top\Sigma=\mathbb{E}\left(XX^{\top}\right)roman_Σ = blackboard_E ( italic_X italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ), we have

F(β)infβF(β)=F(β)F(β^)=(ββ)Σ(ββ)CX1ββ22.𝐹𝛽subscriptinfimum𝛽𝐹𝛽𝐹𝛽𝐹superscript^𝛽superscript𝛽superscript𝛽topΣ𝛽superscript𝛽superscriptsubscript𝐶𝑋1superscriptsubscriptnorm𝛽superscript𝛽22\displaystyle F(\beta)-\inf_{\beta}F(\beta)=F(\beta)-F(\widehat{\beta}^{*})=% \left(\beta-\beta^{*}\right)^{\top}\Sigma\left(\beta-\beta^{*}\right)\geq C_{X% }^{-1}\|\beta-\beta^{*}\|_{2}^{2}.italic_F ( italic_β ) - roman_inf start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT italic_F ( italic_β ) = italic_F ( italic_β ) - italic_F ( over^ start_ARG italic_β end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = ( italic_β - italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ ( italic_β - italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ≥ italic_C start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∥ italic_β - italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Thus, it suffices to bound 𝔼[F(β)F(β^)]𝔼delimited-[]𝐹𝛽𝐹superscript^𝛽\mathbb{E}\left[F(\beta)-F(\widehat{\beta}^{*})\right]blackboard_E [ italic_F ( italic_β ) - italic_F ( over^ start_ARG italic_β end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ] and the estimation error is of the same order. Note that under the assumption β21subscriptnormsuperscript𝛽21\|\beta^{*}\|_{2}\leq 1∥ italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ 1, the squared loss function (β)𝛽\ell(\beta)roman_ℓ ( italic_β ) constraint on the unit ball has

(β)2(xβy)x2s3lognsubscriptnorm𝛽2subscriptnormsuperscript𝑥top𝛽𝑦𝑥2less-than-or-similar-tosuperscript𝑠3𝑛\displaystyle\|\nabla\ell(\beta)\|_{2}\leq\|(x^{\top}\beta-y)x\|_{2}\lesssim s% ^{3}\log n∥ ∇ roman_ℓ ( italic_β ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ ∥ ( italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_β - italic_y ) italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≲ italic_s start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT roman_log italic_n

i.e. s3lognsuperscript𝑠3𝑛s^{3}\log nitalic_s start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT roman_log italic_n-Lipschitzness. Let (β1ag,,βTag)superscriptsubscript𝛽1𝑎𝑔superscriptsubscript𝛽𝑇𝑎𝑔\left(\beta_{1}^{ag},\ldots,\beta_{T}^{ag}\right)( italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_g end_POSTSUPERSCRIPT , … , italic_β start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_g end_POSTSUPERSCRIPT ) be the parameter trajector of Algorithm 5. Let (β1ag,,βTag)superscriptsubscript𝛽1𝑎𝑔superscriptsubscript𝛽𝑇𝑎𝑔\left(\beta_{1}^{\prime ag},\ldots,\beta_{T}^{\prime ag}\right)( italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ italic_a italic_g end_POSTSUPERSCRIPT , … , italic_β start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ italic_a italic_g end_POSTSUPERSCRIPT ) be the parameter trajectory of another algorithm which replaces the gradient estimate ~F(θtmd )~𝐹superscriptsubscript𝜃𝑡md \tilde{\nabla}F\left(\theta_{t}^{\text{md }}\right)over~ start_ARG ∇ end_ARG italic_F ( italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT md end_POSTSUPERSCRIPT ) by

~F(βtmd)1n0iSt,1St,2gi(βtmd)+Lap(0,6τεId).similar-to~superscript𝐹superscriptsubscript𝛽𝑡𝑚𝑑1subscript𝑛0subscript𝑖subscript𝑆𝑡1subscript𝑆𝑡2subscript𝑔𝑖superscriptsubscript𝛽𝑡𝑚𝑑Lap06𝜏𝜀subscript𝐼𝑑\displaystyle\tilde{\nabla}F^{\prime}\left(\beta_{t}^{md}\right)\sim\frac{1}{n% _{0}}\sum_{i\in S_{t,1}\cup S_{t,2}}g_{i}\left(\beta_{t}^{md}\right)+\mathrm{% Lap}\left(0,\frac{6\tau}{\varepsilon}{I}_{d}\right).over~ start_ARG ∇ end_ARG italic_F start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m italic_d end_POSTSUPERSCRIPT ) ∼ divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_S start_POSTSUBSCRIPT italic_t , 1 end_POSTSUBSCRIPT ∪ italic_S start_POSTSUBSCRIPT italic_t , 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m italic_d end_POSTSUPERSCRIPT ) + roman_Lap ( 0 , divide start_ARG 6 italic_τ end_ARG start_ARG italic_ε end_ARG italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) .

By analysis analogous to the proof of Lemma C.5, if we take τLlognlog(nm)logT/masymptotically-equals𝜏𝐿𝑛𝑛𝑚𝑇𝑚\tau\asymp L\sqrt{\log n\log\left(n\vee m\right)\log T/m}italic_τ ≍ italic_L square-root start_ARG roman_log italic_n roman_log ( italic_n ∨ italic_m ) roman_log italic_T / italic_m end_ARG, there holds

βtag=𝒟βtagsuperscript𝒟superscriptsubscript𝛽𝑡𝑎𝑔superscriptsubscript𝛽𝑡superscript𝑎𝑔\displaystyle\beta_{t}^{ag}\stackrel{{\scriptstyle\mathcal{D}}}{{=}}\beta_{t}^% {{}^{\prime}ag}italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_g end_POSTSUPERSCRIPT start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG caligraphic_D end_ARG end_RELOP italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT italic_a italic_g end_POSTSUPERSCRIPT

with probability 11/nm11𝑛𝑚1-1/nm1 - 1 / italic_n italic_m for all 1tT1𝑡𝑇1\leq t\leq T1 ≤ italic_t ≤ italic_T, where =𝒟superscript𝒟\stackrel{{\scriptstyle\mathcal{D}}}{{=}}start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG caligraphic_D end_ARG end_RELOP stands for equal in distribution. Hence we have

𝔼[F(βTag)]𝔼[F(βTag)]+s4lognnm.𝔼delimited-[]𝐹superscriptsubscript𝛽𝑇𝑎𝑔𝔼delimited-[]𝐹superscriptsubscript𝛽𝑇𝑎𝑔superscript𝑠4𝑛𝑛𝑚\displaystyle\mathbb{E}\left[F\left(\beta_{T}^{ag}\right)\right]\leq\mathbb{E}% \left[F\left(\beta_{T}^{\prime ag}\right)\right]+\frac{s^{4}\log n}{nm}.blackboard_E [ italic_F ( italic_β start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_g end_POSTSUPERSCRIPT ) ] ≤ blackboard_E [ italic_F ( italic_β start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ italic_a italic_g end_POSTSUPERSCRIPT ) ] + divide start_ARG italic_s start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT roman_log italic_n end_ARG start_ARG italic_n italic_m end_ARG . (22)

For 𝔼[F(βTag)]𝔼delimited-[]𝐹superscriptsubscript𝛽𝑇𝑎𝑔\mathbb{E}\left[F\left(\beta_{T}^{\prime ag}\right)\right]blackboard_E [ italic_F ( italic_β start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ italic_a italic_g end_POSTSUPERSCRIPT ) ], we use the fact that 𝔼[~F(βtmd)]=F(βtmd)𝔼delimited-[]~superscript𝐹superscriptsubscript𝛽𝑡𝑚𝑑𝐹superscriptsubscript𝛽𝑡𝑚𝑑\mathbb{E}\left[\tilde{\nabla}F^{\prime}\left(\beta_{t}^{md}\right)\right]=% \nabla F\left(\beta_{t}^{md}\right)blackboard_E [ over~ start_ARG ∇ end_ARG italic_F start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m italic_d end_POSTSUPERSCRIPT ) ] = ∇ italic_F ( italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m italic_d end_POSTSUPERSCRIPT ) and, by Lemma C.5,

𝔼[~F(βtmd)F(βtmd)22]L2s2log3nlogTn0mε2+slognn0m.less-than-or-similar-to𝔼delimited-[]superscriptsubscriptnorm~superscript𝐹superscriptsubscript𝛽𝑡𝑚𝑑𝐹superscriptsubscript𝛽𝑡𝑚𝑑22superscript𝐿2superscript𝑠2superscript3𝑛𝑇subscript𝑛0𝑚superscript𝜀2𝑠𝑛subscript𝑛0𝑚\displaystyle\mathbb{E}\left[\left\|\tilde{\nabla}F^{\prime}\left(\beta_{t}^{% md}\right)-\nabla F\left(\beta_{t}^{md}\right)\right\|_{2}^{2}\right]\lesssim% \frac{L^{2}s^{2}\log^{3}n\log T}{n_{0}m\varepsilon^{2}}+\frac{s\log n}{n_{0}m}.blackboard_E [ ∥ over~ start_ARG ∇ end_ARG italic_F start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m italic_d end_POSTSUPERSCRIPT ) - ∇ italic_F ( italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m italic_d end_POSTSUPERSCRIPT ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≲ divide start_ARG italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_n roman_log italic_T end_ARG start_ARG italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_m italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG italic_s roman_log italic_n end_ARG start_ARG italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_m end_ARG .

Applying Lemma C.1, we have

𝔼[F(βTag)minβF(β)]s9log5nlogTnmε2+s2lognnm+sT2.less-than-or-similar-to𝔼delimited-[]𝐹superscriptsubscript𝛽𝑇𝑎𝑔subscript𝛽𝐹𝛽superscript𝑠9superscript5𝑛𝑇𝑛𝑚superscript𝜀2superscript𝑠2𝑛𝑛𝑚𝑠superscript𝑇2\displaystyle\mathbb{E}\left[F\left(\beta_{T}^{\prime ag}\right)-\min_{\beta}F% (\beta)\right]\lesssim\frac{s^{9}\log^{5}n\log T}{nm\varepsilon^{2}}+\frac{s^{% 2}\log n}{nm}+\frac{s}{T^{2}}.blackboard_E [ italic_F ( italic_β start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ italic_a italic_g end_POSTSUPERSCRIPT ) - roman_min start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT italic_F ( italic_β ) ] ≲ divide start_ARG italic_s start_POSTSUPERSCRIPT 9 end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT italic_n roman_log italic_T end_ARG start_ARG italic_n italic_m italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log italic_n end_ARG start_ARG italic_n italic_m end_ARG + divide start_ARG italic_s end_ARG start_ARG italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .

Taking Tnnmε2asymptotically-equals𝑇𝑛𝑛𝑚superscript𝜀2T\asymp n\wedge\sqrt{nm\varepsilon^{2}}italic_T ≍ italic_n ∧ square-root start_ARG italic_n italic_m italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG, this together with (22) lead to

𝔼[F(βTag)F(β^)]s9log6nnmε2+s4lognnm.less-than-or-similar-to𝔼delimited-[]𝐹superscriptsubscript𝛽𝑇𝑎𝑔𝐹superscript^𝛽superscript𝑠9superscript6𝑛𝑛𝑚superscript𝜀2superscript𝑠4𝑛𝑛𝑚\displaystyle\mathbb{E}\left[F\left(\beta_{T}^{ag}\right)-F(\widehat{\beta}^{*% })\right]\lesssim\frac{s^{9}\log^{6}n}{nm\varepsilon^{2}}+\frac{s^{4}\log n}{% nm}.blackboard_E [ italic_F ( italic_β start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_g end_POSTSUPERSCRIPT ) - italic_F ( over^ start_ARG italic_β end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ] ≲ divide start_ARG italic_s start_POSTSUPERSCRIPT 9 end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT italic_n end_ARG start_ARG italic_n italic_m italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG italic_s start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT roman_log italic_n end_ARG start_ARG italic_n italic_m end_ARG .

Theorem C.3 (Formal version of Theorem 3.4).

Let data {(Xi,yi)}i=1nsuperscriptsubscriptsubscript𝑋𝑖subscript𝑦𝑖𝑖1𝑛\{(X_{i},y_{i})\}_{i=1}^{n}{ ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT be generated as in (1). Suppose {𝒮i}i=1nsuperscriptsubscriptsubscript𝒮𝑖𝑖1𝑛\{\mathcal{S}_{i}\}_{i=1}^{n}{ caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT are α𝛼\alphaitalic_α-good selectors with αslognlogd/nε2greater-than-or-equivalent-to𝛼superscript𝑠𝑛𝑑𝑛superscript𝜀2\alpha\gtrsim s^{*}\sqrt{\log n\log d/n\varepsilon^{2}}italic_α ≳ italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT square-root start_ARG roman_log italic_n roman_log italic_d / italic_n italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG. Suppose we let α/8sρα/4s𝛼8superscript𝑠𝜌𝛼4superscript𝑠\alpha/8s^{*}\leq\rho\leq\alpha/4s^{*}italic_α / 8 italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ≤ italic_ρ ≤ italic_α / 4 italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, T=nnmε2𝑇𝑛𝑛𝑚superscript𝜀2T=n\wedge\sqrt{nm\varepsilon^{2}}italic_T = italic_n ∧ square-root start_ARG italic_n italic_m italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG, and {ηt,γt}t[T]subscriptsubscript𝜂𝑡subscript𝛾𝑡𝑡delimited-[]𝑇\left\{\eta_{t},\gamma_{t}\right\}_{t\in[T]}{ italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT as in Lemma C.2, L=6s3logn𝐿6superscript𝑠3𝑛L=6s^{3}\log nitalic_L = 6 italic_s start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT roman_log italic_n, τLlognlog(nm)logT/masymptotically-equals𝜏𝐿𝑛𝑛𝑚𝑇𝑚\tau\asymp L\sqrt{\log n\log\left(n\vee m\right)\log T/m}italic_τ ≍ italic_L square-root start_ARG roman_log italic_n roman_log ( italic_n ∨ italic_m ) roman_log italic_T / italic_m end_ARG. Let β𝛽{\beta}italic_β be the output of Algorithm 6. Then we have (i) Algorithm 6 is ε𝜀\varepsilonitalic_ε-ULDP. (ii) there holds

𝔼[ββ22]s9log6nnmε2+s4lognnm.less-than-or-similar-to𝔼delimited-[]superscriptsubscriptnormsuperscript𝛽𝛽22superscript𝑠9superscript6𝑛𝑛𝑚superscript𝜀2superscript𝑠4𝑛𝑛𝑚\displaystyle\mathbb{E}\left[\left\|\beta^{*}-\beta\right\|_{2}^{2}\right]% \lesssim\frac{s^{9}\log^{6}n}{nm\varepsilon^{2}}+\frac{s^{4}\log n}{nm}.blackboard_E [ ∥ italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_β ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≲ divide start_ARG italic_s start_POSTSUPERSCRIPT 9 end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT italic_n end_ARG start_ARG italic_n italic_m italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG italic_s start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT roman_log italic_n end_ARG start_ARG italic_n italic_m end_ARG .
Proof of Theorem C.3.

By Lemma B.6 and C.2, both HeavyHitter and ULDPSCO are ε𝜀\varepsilonitalic_ε-ULDP. Since their associated users do not cross, we have Algorithm 6 is also ε𝜀\varepsilonitalic_ε-ULDP. As for (ii), by Proposition 3.3, we know that all the non-zero variables of βsuperscript𝛽\beta^{*}italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is included in {v^1,,v^s}subscript^𝑣1subscript^𝑣𝑠\{\widehat{v}_{1},\cdots,\widehat{v}_{s}\}{ over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT } with probability 11/n211superscript𝑛21-1/n^{2}1 - 1 / italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Thus, we have

ββ22=β^β^22.superscriptsubscriptnormsuperscript𝛽𝛽22superscriptsubscriptnormsuperscript^𝛽^𝛽22\displaystyle\left\|\beta^{*}-\beta\right\|_{2}^{2}=\left\|\widehat{\beta}^{*}% -\widehat{\beta}\right\|_{2}^{2}.∥ italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_β ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∥ over^ start_ARG italic_β end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - over^ start_ARG italic_β end_ARG ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Applying Lemma C.2, this leads to

𝔼[ββ22]𝔼[βTagβ^22]s9log6nnmε2+s4lognnm+1n2s9log6nnmε2α9+s4lognnmα4,less-than-or-similar-to𝔼delimited-[]superscriptsubscriptnormsuperscript𝛽𝛽22𝔼delimited-[]superscriptsubscriptnormsubscriptsuperscript𝛽𝑎𝑔𝑇superscript^𝛽22less-than-or-similar-tosuperscript𝑠9superscript6𝑛𝑛𝑚superscript𝜀2superscript𝑠4𝑛𝑛𝑚1superscript𝑛2less-than-or-similar-tosuperscript𝑠absent9superscript6𝑛𝑛𝑚superscript𝜀2superscript𝛼9superscript𝑠absent4𝑛𝑛𝑚superscript𝛼4\displaystyle\mathbb{E}\left[\left\|\beta^{*}-\beta\right\|_{2}^{2}\right]% \lesssim\mathbb{E}\left[\left\|{\beta}^{ag}_{T}-\widehat{\beta}^{*}\right\|_{2% }^{2}\right]\lesssim\frac{s^{9}\log^{6}n}{nm\varepsilon^{2}}+\frac{s^{4}\log n% }{nm}+\frac{1}{n^{2}}\lesssim\frac{s^{*9}\log^{6}n}{nm\varepsilon^{2}\alpha^{9% }}+\frac{s^{*4}\log n}{nm\alpha^{4}},blackboard_E [ ∥ italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_β ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≲ blackboard_E [ ∥ italic_β start_POSTSUPERSCRIPT italic_a italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT - over^ start_ARG italic_β end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≲ divide start_ARG italic_s start_POSTSUPERSCRIPT 9 end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT italic_n end_ARG start_ARG italic_n italic_m italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG italic_s start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT roman_log italic_n end_ARG start_ARG italic_n italic_m end_ARG + divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ≲ divide start_ARG italic_s start_POSTSUPERSCRIPT ∗ 9 end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT italic_n end_ARG start_ARG italic_n italic_m italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_α start_POSTSUPERSCRIPT 9 end_POSTSUPERSCRIPT end_ARG + divide start_ARG italic_s start_POSTSUPERSCRIPT ∗ 4 end_POSTSUPERSCRIPT roman_log italic_n end_ARG start_ARG italic_n italic_m italic_α start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG ,

where in the last step we used ss/αless-than-or-similar-to𝑠superscript𝑠𝛼s\lesssim s^{*}/\alphaitalic_s ≲ italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT / italic_α as in Proposition 3.3. The additional term 1/n21superscript𝑛21/n^{2}1 / italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is due to the failure probability of Proposition 3.3 and is omitted since it is adjustable to any level with a constant multiplicative cost on the other terms. ∎

C.2 The Two Round Protocol

C.2.1 Proof of Proposition 3.5

Proof of Proposition 3.5.

For the first conclusion, consider th local OLS estimator on selected variables of user i𝑖iitalic_i, which is β^=(X^iX^i)1X^iyi^𝛽superscriptsuperscriptsubscript^𝑋𝑖topsubscript^𝑋𝑖1superscriptsubscript^𝑋𝑖topsubscript𝑦𝑖\widehat{\beta}=(\widehat{X}_{i}^{\top}\widehat{X}_{i})^{-1}\widehat{X}_{i}^{% \top}y_{i}over^ start_ARG italic_β end_ARG = ( over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Given the fact that ms𝑚𝑠m\geq sitalic_m ≥ italic_s, X^iX^isuperscriptsubscript^𝑋𝑖topsubscript^𝑋𝑖\widehat{X}_{i}^{\top}\widehat{X}_{i}over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is invertible and we have

β^i=(X^iX^i)1X^iy^i=(X^iX^i)1X^i(X^iβ^+σi)=β^+(X^iX^i)1X^iσi,subscript^𝛽𝑖superscriptsuperscriptsubscript^𝑋𝑖topsubscript^𝑋𝑖1superscriptsubscript^𝑋𝑖topsubscript^𝑦𝑖superscriptsuperscriptsubscript^𝑋𝑖topsubscript^𝑋𝑖1superscriptsubscript^𝑋𝑖topsubscript^𝑋𝑖superscript^𝛽subscript𝜎𝑖superscript^𝛽superscriptsuperscriptsubscript^𝑋𝑖topsubscript^𝑋𝑖1superscriptsubscript^𝑋𝑖topsubscript𝜎𝑖\displaystyle\widehat{\beta}_{i}=(\widehat{X}_{i}^{\top}\widehat{X}_{i})^{-1}% \widehat{X}_{i}^{\top}\widehat{y}_{i}=(\widehat{X}_{i}^{\top}\widehat{X}_{i})^% {-1}\widehat{X}_{i}^{\top}(\widehat{X}_{i}\widehat{\beta}^{*}+\sigma_{i})=% \widehat{\beta}^{*}+(\widehat{X}_{i}^{\top}\widehat{X}_{i})^{-1}\widehat{X}_{i% }^{\top}\sigma_{i},over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT over^ start_ARG italic_β end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = over^ start_ARG italic_β end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + ( over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ,

where σi,jsubscript𝜎𝑖𝑗\sigma_{i,j}italic_σ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT are i.i.d. sub-Gaussian random variables for 1jm1𝑗𝑚1\leq j\leq m1 ≤ italic_j ≤ italic_m. Therefore, the first argument follows from

𝔼[β^i]=β^+(X^iX^i)1X^i𝔼[σi]=β^.𝔼delimited-[]subscript^𝛽𝑖superscript^𝛽superscriptsuperscriptsubscript^𝑋𝑖topsubscript^𝑋𝑖1superscriptsubscript^𝑋𝑖top𝔼delimited-[]subscript𝜎𝑖superscript^𝛽\displaystyle\mathbb{E}[\widehat{\beta}_{i}]=\widehat{\beta}^{*}+(\widehat{X}_% {i}^{\top}\widehat{X}_{i})^{-1}\widehat{X}_{i}^{\top}\mathbb{E}[\sigma_{i}]=% \widehat{\beta}^{*}.blackboard_E [ over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] = over^ start_ARG italic_β end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + ( over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT blackboard_E [ italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] = over^ start_ARG italic_β end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT .

By implication of Hsu et al. (2012, Theorem 2.1), we have

Pr(β^iβ^23logntr[(X^iX^i)1]𝔼[σi,j2])11n3.Prsubscriptnormsubscript^𝛽𝑖superscript^𝛽23𝑛trdelimited-[]superscriptsuperscriptsubscript^𝑋𝑖topsubscript^𝑋𝑖1𝔼delimited-[]superscriptsubscript𝜎𝑖𝑗211superscript𝑛3\displaystyle\mathrm{Pr}\left(\|\widehat{\beta}_{i}-\widehat{\beta}^{*}\|_{2}% \geq\sqrt{3\log n\cdot\mathrm{tr}\left[\left(\widehat{X}_{i}^{\top}\widehat{X}% _{i}\right)^{-1}\right]\cdot\mathbb{E}[\sigma_{i,j}^{2}]}\right)\leq 1-\frac{1% }{n^{3}}.roman_Pr ( ∥ over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over^ start_ARG italic_β end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≥ square-root start_ARG 3 roman_log italic_n ⋅ roman_tr [ ( over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ] ⋅ blackboard_E [ italic_σ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_ARG ) ≤ 1 - divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG .

This together with covariance matrix estimation bounds ( e.g. Wainwright (2019, Theorem 6.5)) lead to

β^iβ^2tr[Σ^1]lognmslognmless-than-or-similar-tosubscriptnormsubscript^𝛽𝑖superscript^𝛽2trdelimited-[]superscript^Σ1𝑛𝑚less-than-or-similar-to𝑠𝑛𝑚\displaystyle\|\widehat{\beta}_{i}-\widehat{\beta}^{*}\|_{2}\lesssim\sqrt{% \frac{\mathrm{tr}[\widehat{\Sigma}^{-1}]\log n}{m}}\lesssim\sqrt{\frac{s\log n% }{m}}∥ over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over^ start_ARG italic_β end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≲ square-root start_ARG divide start_ARG roman_tr [ over^ start_ARG roman_Σ end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ] roman_log italic_n end_ARG start_ARG italic_m end_ARG end_ARG ≲ square-root start_ARG divide start_ARG italic_s roman_log italic_n end_ARG start_ARG italic_m end_ARG end_ARG (23)

with probability 11/n311superscript𝑛31-1/n^{3}1 - 1 / italic_n start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT. Applying union bound, (23) holds for all i=n/2+1,,n𝑖𝑛21𝑛i=n/2+1,\cdots,nitalic_i = italic_n / 2 + 1 , ⋯ , italic_n with probability at least 11/n211superscript𝑛21-1/n^{2}1 - 1 / italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. For the second statement, if either conditions in Proposition 3.2 holds, we can adopt Lasso (or SCAD) on the selected variables. See Example B.1 and B.3. The oracle results in Belloni & Chernozhukov (2013) (or Fan & Lv (2011)) yield the concentration bound with the true sparsity parameter

β^iβ^2slognmless-than-or-similar-tosubscriptnormsubscript^𝛽𝑖superscript^𝛽2superscript𝑠𝑛𝑚\displaystyle\|\widehat{\beta}_{i}-\widehat{\beta}^{*}\|_{2}\lesssim\sqrt{% \frac{s^{*}\log n}{m}}∥ over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over^ start_ARG italic_β end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≲ square-root start_ARG divide start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT roman_log italic_n end_ARG start_ARG italic_m end_ARG end_ARG

for all i=n/2+1,,n𝑖𝑛21𝑛i=n/2+1,\cdots,nitalic_i = italic_n / 2 + 1 , ⋯ , italic_n with probability at least 11/n211superscript𝑛21-1/n^{2}1 - 1 / italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. ∎

C.2.2 ULDP Mean Estimation

We borrow the idea from Girgis et al. (2022) while slight modifications are made. The estimation is conducted in two stages. In the first stage, a histogram partition of 𝒳^^𝒳\widehat{\mathcal{X}}over^ start_ARG caligraphic_X end_ARG with bin width log2n/msuperscript2𝑛𝑚\sqrt{\log^{2}n/m}square-root start_ARG roman_log start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n / italic_m end_ARG is created. The server privately estimates the range in which the means β^isubscript^𝛽𝑖\widehat{\beta}_{i}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT lie with high probability (Algorithm 7). In the second stage, each user projects its β^isubscript^𝛽𝑖\widehat{\beta}_{i}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT into the determined range from the first step. Then, all users send the LDP versions of their projected β^isubscript^𝛽𝑖\widehat{\beta}_{i}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to the curator (Algorithm 8). Both steps are scalar operations. In the vector case, instead of applying them to each dimension separately, random rotation (Levy et al., 2021) is adopted to eliminate a superfluous factor of 𝒪(s)𝒪𝑠\mathcal{O}(\sqrt{s})caligraphic_O ( square-root start_ARG italic_s end_ARG ). The full algorithm is summarized in Algorithm 9. We only consider pure differential privacy here and utilize Laplace noise instead of Gaussian in Girgis et al. (2022).

Algorithm 7 Range
  Input: Scalars {yi}subscript𝑦𝑖\{y_{i}\}{ italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT }, concentration radius τ𝜏\tauitalic_τ, privacy budget ε𝜀\varepsilonitalic_ε.
  # user side
  All users divide the interval [1,1]11[-1,1][ - 1 , 1 ] into k=1/τ𝑘1𝜏k=1/\tauitalic_k = 1 / italic_τ disjoint intervals, each with width 2τ2𝜏2\tau2 italic_τ. Let 𝒯:={a1,a2,,ak}assign𝒯subscript𝑎1subscript𝑎2subscript𝑎𝑘\mathcal{T}:=\{a_{1},a_{2},\ldots,a_{k}\}caligraphic_T := { italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } be the index set of middle points of intervals.
  for y𝑦yitalic_y in {yi}subscript𝑦𝑖\{{y}_{i}\}{ italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } do
     Compute ν=argminaj𝒯|yaj|𝜈subscriptsubscript𝑎𝑗𝒯𝑦subscript𝑎𝑗\nu=\arg\min_{a_{j}\in\mathcal{T}}\left|y-a_{j}\right|italic_ν = roman_arg roman_min start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ caligraphic_T end_POSTSUBSCRIPT | italic_y - italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT |.
     Uniformly sample j[k]𝑗delimited-[]𝑘j\in[k]italic_j ∈ [ italic_k ].
     Compute p=Hkjeν/k𝑝superscriptsubscript𝐻𝑘top𝑗subscript𝑒𝜈𝑘p={H}_{k}^{\top j}\cdot e_{\nu}/\sqrt{k}italic_p = italic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ italic_j end_POSTSUPERSCRIPT ⋅ italic_e start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT / square-root start_ARG italic_k end_ARG, where eνsubscript𝑒𝜈e_{\nu}italic_e start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT denotes the basis vector corresponding to ν𝜈\nuitalic_ν and Hksubscript𝐻𝑘{H}_{k}italic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is a size k𝑘kitalic_k Hadamard matrix.
     Compute vector zisubscript𝑧𝑖z_{i}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT :
zi={+Hkjeε+1eε1 w.p. 12+kp2eε1eε+1Hkjeε+1eε1 w.p. 12kp2eε1eε+1subscript𝑧𝑖casessuperscriptsubscript𝐻𝑘top𝑗superscript𝑒𝜀1superscript𝑒𝜀1 w.p. 12𝑘𝑝2superscript𝑒𝜀1superscript𝑒𝜀1superscriptsubscript𝐻𝑘top𝑗superscript𝑒𝜀1superscript𝑒𝜀1 w.p. 12𝑘𝑝2superscript𝑒𝜀1superscript𝑒𝜀1\displaystyle{z}_{i}=\begin{cases}+{H}_{k}^{\top j}\cdot\frac{e^{\varepsilon}+% 1}{e^{\varepsilon}-1}&\text{ w.p. }\frac{1}{2}+\frac{\sqrt{k}\cdot p}{2}\frac{% e^{\varepsilon}-1}{e^{\varepsilon}+1}\\ -{H}_{k}^{\top j}\cdot\frac{e^{\varepsilon}+1}{e^{\varepsilon}-1}&\text{ w.p. % }\frac{1}{2}-\frac{\sqrt{k}\cdot p}{2}\frac{e^{\varepsilon}-1}{e^{\varepsilon}% +1}\end{cases}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = { start_ROW start_CELL + italic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ italic_j end_POSTSUPERSCRIPT ⋅ divide start_ARG italic_e start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT + 1 end_ARG start_ARG italic_e start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT - 1 end_ARG end_CELL start_CELL w.p. divide start_ARG 1 end_ARG start_ARG 2 end_ARG + divide start_ARG square-root start_ARG italic_k end_ARG ⋅ italic_p end_ARG start_ARG 2 end_ARG divide start_ARG italic_e start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT - 1 end_ARG start_ARG italic_e start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT + 1 end_ARG end_CELL end_ROW start_ROW start_CELL - italic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ italic_j end_POSTSUPERSCRIPT ⋅ divide start_ARG italic_e start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT + 1 end_ARG start_ARG italic_e start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT - 1 end_ARG end_CELL start_CELL w.p. divide start_ARG 1 end_ARG start_ARG 2 end_ARG - divide start_ARG square-root start_ARG italic_k end_ARG ⋅ italic_p end_ARG start_ARG 2 end_ARG divide start_ARG italic_e start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT - 1 end_ARG start_ARG italic_e start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT + 1 end_ARG end_CELL end_ROW
  end for
  # curator side
  z¯=zi¯𝑧subscript𝑧𝑖\overline{z}=\sum z_{i}over¯ start_ARG italic_z end_ARG = ∑ italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and =argmaxjz¯jsubscript𝑗superscript¯𝑧𝑗\ell={\arg\max}_{j}\overline{z}^{j}roman_ℓ = roman_arg roman_max start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT over¯ start_ARG italic_z end_ARG start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT.Output: Bin [a3τ,a+3τ]subscript𝑎3𝜏subscript𝑎3𝜏[a_{\ell}-3\tau,a_{\ell}+3\tau][ italic_a start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - 3 italic_τ , italic_a start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT + 3 italic_τ ].

Let the standard Laplace random variable have probability density function e|x|/2superscript𝑒𝑥2e^{-|x|}/2italic_e start_POSTSUPERSCRIPT - | italic_x | end_POSTSUPERSCRIPT / 2 for x𝑥x\in\mathbb{R}italic_x ∈ blackboard_R.

Algorithm 8 Mean
  Input: Scalars {yi}i=1nsuperscriptsubscriptsubscript𝑦𝑖𝑖1𝑛\{{y}_{i}\}_{i=1}^{n}{ italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, concentration range [a,b]𝑎𝑏[a,b][ italic_a , italic_b ], privacy budget ε𝜀\varepsilonitalic_ε.
  # user side
  for i𝑖iitalic_i in 1,,n1𝑛1,\cdots,n1 , ⋯ , italic_n do
     Let y~i=Π[a,b]yi+Lap(0,|ba|/ε)subscript~𝑦𝑖subscriptΠ𝑎𝑏subscript𝑦𝑖Lap0𝑏𝑎𝜀\tilde{y}_{i}=\Pi_{[a,b]}y_{i}+\textrm{Lap}(0,|b-a|/\varepsilon)over~ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_Π start_POSTSUBSCRIPT [ italic_a , italic_b ] end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + Lap ( 0 , | italic_b - italic_a | / italic_ε ), where Π[a,b]subscriptΠ𝑎𝑏\Pi_{[a,b]}roman_Π start_POSTSUBSCRIPT [ italic_a , italic_b ] end_POSTSUBSCRIPT is the projection onto [a,b]𝑎𝑏[a,b][ italic_a , italic_b ].
  end for
  # curator side Output: y~i/nsubscript~𝑦𝑖𝑛\sum\tilde{y}_{i}/n∑ over~ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT / italic_n.
Algorithm 9 ULDPMean
  Input: Two groups of local coefficients 1={βi}i=1n/2subscript1superscriptsubscriptsubscript𝛽𝑖𝑖1𝑛2\mathcal{B}_{1}=\{{\beta}_{i}\}_{i=1}^{n/2}caligraphic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = { italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n / 2 end_POSTSUPERSCRIPT and 2={βi}i=n/2nsubscript2superscriptsubscriptsubscript𝛽𝑖𝑖𝑛2𝑛\mathcal{B}_{2}=\{{\beta}_{i}\}_{i=n/2}^{n}caligraphic_B start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = { italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = italic_n / 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, concentration radius τ𝜏\tauitalic_τ, privacy budget ε𝜀\varepsilonitalic_ε.
  Initialization: Let D=Diag(w)𝐷Diag𝑤D=\mathrm{Diag}(w)italic_D = roman_Diag ( italic_w ) and U=HsD/s𝑈subscript𝐻𝑠𝐷𝑠U=H_{s}D/\sqrt{s}italic_U = italic_H start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_D / square-root start_ARG italic_s end_ARG, where wiUnif{1,1}similar-tosubscript𝑤𝑖Unif11w_{i}\sim\mathrm{Unif}\{-1,1\}italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ roman_Unif { - 1 , 1 } and Hssubscript𝐻𝑠H_{s}italic_H start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT is a size s𝑠sitalic_s Hadamard matrix. Let z𝑧zitalic_z be a s𝑠sitalic_s dimensional zero vector.
  # histogram selection
  for \ellroman_ℓ in 1,,s1𝑠1,\cdots,s1 , ⋯ , italic_s do
     for βisubscript𝛽𝑖\beta_{i}italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in 1subscript1\mathcal{B}_{1}caligraphic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT do
        y,i=(Uβi)subscript𝑦𝑖superscript𝑈subscript𝛽𝑖y_{\ell,i}=(U\beta_{i})^{\ell}italic_y start_POSTSUBSCRIPT roman_ℓ , italic_i end_POSTSUBSCRIPT = ( italic_U italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT.
     end for
     R=𝚁𝚊𝚗𝚐𝚎({y,i}i=1n/2,τ,ε/s)subscript𝑅𝚁𝚊𝚗𝚐𝚎superscriptsubscriptsubscript𝑦𝑖𝑖1𝑛2𝜏𝜀𝑠R_{\ell}=\mathtt{Range}(\{y_{\ell,i}\}_{i=1}^{n/2},\tau,\varepsilon/s)italic_R start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT = typewriter_Range ( { italic_y start_POSTSUBSCRIPT roman_ℓ , italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n / 2 end_POSTSUPERSCRIPT , italic_τ , italic_ε / italic_s ).
  end for
  # coefficient estimation
  for βi2subscript𝛽𝑖subscript2\beta_{i}\in\mathcal{B}_{2}italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_B start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT do
     =i mod s𝑖 mod 𝑠\ell=i\text{ mod }sroman_ℓ = italic_i mod italic_s.
     y,i=(Uβi)subscript𝑦𝑖superscript𝑈subscript𝛽𝑖y_{\ell,i}=(U\beta_{i})^{\ell}italic_y start_POSTSUBSCRIPT roman_ℓ , italic_i end_POSTSUBSCRIPT = ( italic_U italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT.
  end for
  for j𝑗jitalic_j in 1,,s1𝑠1,\cdots,s1 , ⋯ , italic_s do
     zj=sMean({y,i such that =j},Rj,ε)superscript𝑧𝑗𝑠Meansubscript𝑦𝑖 such that 𝑗subscript𝑅𝑗𝜀z^{j}=s\cdot\texttt{Mean}(\{y_{\ell,i}\text{ such that }\ell=j\},R_{j},\varepsilon)italic_z start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT = italic_s ⋅ Mean ( { italic_y start_POSTSUBSCRIPT roman_ℓ , italic_i end_POSTSUBSCRIPT such that roman_ℓ = italic_j } , italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_ε ).
  end for
  Output: U1zsuperscript𝑈1𝑧U^{-1}zitalic_U start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_z.

The following lemma is a modified version of Theorem 2 of Girgis et al. (2022) under pure differential privacy.

Lemma C.4.

Let β^superscript^𝛽\widehat{\beta}^{*}over^ start_ARG italic_β end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT be the true underlying coefficient, and β^isubscript^𝛽𝑖\widehat{\beta}_{i}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPTs be the coefficients estimated by each user. Suppose 𝔼[β^i]=β𝔼delimited-[]subscript^𝛽𝑖superscript𝛽\mathbb{E}\left[\widehat{\beta}_{i}\right]=\beta^{*}blackboard_E [ over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] = italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and β^iβ2τsubscriptnormsubscript^𝛽𝑖superscript𝛽2𝜏\|\widehat{\beta}_{i}-\beta^{*}\|_{2}\leq\tau∥ over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ italic_τ with probability 11/n211superscript𝑛21-1/n^{2}1 - 1 / italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT for all i𝑖iitalic_i. Then with probability 11/n211superscript𝑛21-1/n^{2}1 - 1 / italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, we have

2ni=n/2+1nβ^i𝚄𝙻𝙳𝙿𝙼𝚎𝚊𝚗({β^i}i=1n/2,{β^i}i=n/2+1n,τ,ε)22sτ2log2nnε2less-than-or-similar-tosubscriptsuperscriptnorm2𝑛superscriptsubscript𝑖𝑛21𝑛subscript^𝛽𝑖𝚄𝙻𝙳𝙿𝙼𝚎𝚊𝚗superscriptsubscriptsubscript^𝛽𝑖𝑖1𝑛2superscriptsubscriptsubscript^𝛽𝑖𝑖𝑛21𝑛𝜏𝜀22𝑠superscript𝜏2superscript2𝑛𝑛superscript𝜀2\displaystyle\left\|\frac{2}{n}\sum_{i=n/2+1}^{n}{\widehat{\beta}}_{i}-\mathtt% {ULDPMean}(\{\widehat{\beta}_{i}\}_{i=1}^{n/2},\{\widehat{\beta}_{i}\}_{i=n/2+% 1}^{n},\tau,\varepsilon)\right\|^{2}_{2}\lesssim\frac{s\tau^{2}\log^{2}n}{n% \varepsilon^{2}}∥ divide start_ARG 2 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = italic_n / 2 + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - typewriter_ULDPMean ( { over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n / 2 end_POSTSUPERSCRIPT , { over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = italic_n / 2 + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , italic_τ , italic_ε ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≲ divide start_ARG italic_s italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n end_ARG start_ARG italic_n italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG
Proof of Lemma C.4.

We know the β^isubscript^𝛽𝑖\widehat{\beta}_{i}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPTs satisfy Definition 2 in Girgis et al. (2022) with parameter (τ,1/n2)𝜏1superscript𝑛2(\tau,{1}/{n^{2}})( italic_τ , 1 / italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). By Levy et al. (2021), we have

Uβ^iUβ^τ2logsn2s.less-than-or-similar-tosubscriptnorm𝑈subscript^𝛽𝑖𝑈superscript^𝛽superscript𝜏2𝑠superscript𝑛2𝑠\displaystyle\|U\widehat{\beta}_{i}-U\widehat{\beta}^{*}\|_{\infty}\lesssim% \sqrt{\frac{\tau^{2}\log sn^{2}}{s}}.∥ italic_U over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_U over^ start_ARG italic_β end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≲ square-root start_ARG divide start_ARG italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log italic_s italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_s end_ARG end_ARG .

If we choose ττ2logsn2/sτ2logn/sasymptotically-equalssuperscript𝜏superscript𝜏2𝑠superscript𝑛2𝑠asymptotically-equalssuperscript𝜏2𝑛𝑠\tau^{\prime}\asymp\sqrt{{\tau^{2}\log sn^{2}}/{s}}\asymp\sqrt{\tau^{2}{\log n% }/{s}}italic_τ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≍ square-root start_ARG italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log italic_s italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_s end_ARG ≍ square-root start_ARG italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log italic_n / italic_s end_ARG, then yisubscript𝑦𝑖y_{i}italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT satisfy Definition 2 in Girgis et al. (2022) with parameter (τ,1/n2)superscript𝜏1superscript𝑛2(\tau^{\prime},{1}/{n^{2}})( italic_τ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , 1 / italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Then the Lemma 1 of Girgis et al. (2022) implies that Π[a,b]yi=yisubscriptΠ𝑎𝑏subscript𝑦𝑖subscript𝑦𝑖\Pi_{[a,b]}y_{i}=y_{i}roman_Π start_POSTSUBSCRIPT [ italic_a , italic_b ] end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT with probability 11/n211superscript𝑛21-1/n^{2}1 - 1 / italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT in Algorithm 8. Then the least square error for Mean is

|2sni=1n/2sy~i2sni=1n/2syi|2=||ba|snεi=1n/2sγi|288sτ2lognnε2superscript2𝑠𝑛superscriptsubscript𝑖1𝑛2𝑠subscript~𝑦𝑖2𝑠𝑛superscriptsubscript𝑖1𝑛2𝑠subscript𝑦𝑖2𝑏𝑎𝑠𝑛𝜀superscriptsubscript𝑖1𝑛2𝑠subscript𝛾𝑖288𝑠superscript𝜏2𝑛𝑛superscript𝜀2\displaystyle\left|\frac{2s}{n}\sum_{i=1}^{n/2s}\tilde{y}_{i}-\frac{2s}{n}\sum% _{i=1}^{n/2s}y_{i}\right|^{2}=\left|\frac{|b-a|s}{n\varepsilon}\sum_{i=1}^{n/2% s}\gamma_{i}\right|\leq\sqrt{\frac{288s\tau^{\prime 2}\log n}{n\varepsilon^{2}}}| divide start_ARG 2 italic_s end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n / 2 italic_s end_POSTSUPERSCRIPT over~ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG 2 italic_s end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n / 2 italic_s end_POSTSUPERSCRIPT italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = | divide start_ARG | italic_b - italic_a | italic_s end_ARG start_ARG italic_n italic_ε end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n / 2 italic_s end_POSTSUPERSCRIPT italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ≤ square-root start_ARG divide start_ARG 288 italic_s italic_τ start_POSTSUPERSCRIPT ′ 2 end_POSTSUPERSCRIPT roman_log italic_n end_ARG start_ARG italic_n italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG

where the inequality follows from (2.18) in Wainwright (2019). Since 2\|\cdot\|_{2}∥ ⋅ ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is upper bounded by s𝑠\sqrt{s}square-root start_ARG italic_s end_ARG times infinity norm, there holds

2ni=n/2+1nβ^i𝚄𝙻𝙳𝙿𝙼𝚎𝚊𝚗({β^i}i=1n/2,{β^i}i=n/2+1n,τ,ε)22subscriptsuperscriptnorm2𝑛superscriptsubscript𝑖𝑛21𝑛subscript^𝛽𝑖𝚄𝙻𝙳𝙿𝙼𝚎𝚊𝚗superscriptsubscriptsubscript^𝛽𝑖𝑖1𝑛2superscriptsubscriptsubscript^𝛽𝑖𝑖𝑛21𝑛𝜏𝜀22\displaystyle\left\|\frac{2}{n}\sum_{i=n/2+1}^{n}{\widehat{\beta}}_{i}-\mathtt% {ULDPMean}(\{{\widehat{\beta}}_{i}\}_{i=1}^{n/2},\{{\widehat{\beta}}_{i}\}_{i=% n/2+1}^{n},\tau,\varepsilon)\right\|^{2}_{2}∥ divide start_ARG 2 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = italic_n / 2 + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - typewriter_ULDPMean ( { over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n / 2 end_POSTSUPERSCRIPT , { over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = italic_n / 2 + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , italic_τ , italic_ε ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
=\displaystyle== 2nUi=n/2+1nβ^iz22s2nUi=n/2+1nβ^iz2288s2τ2lognnε2sτ2log2nnε2.superscriptsubscriptnorm2𝑛𝑈superscriptsubscript𝑖𝑛21𝑛subscript^𝛽𝑖𝑧22𝑠superscriptsubscriptnorm2𝑛𝑈superscriptsubscript𝑖𝑛21𝑛subscript^𝛽𝑖𝑧2288superscript𝑠2superscript𝜏2𝑛𝑛superscript𝜀2less-than-or-similar-to𝑠superscript𝜏2superscript2𝑛𝑛superscript𝜀2\displaystyle\left\|\frac{2}{n}U\sum_{i=n/2+1}^{n}{\widehat{\beta}}_{i}-z% \right\|_{2}^{2}\leq s\cdot\left\|\frac{2}{n}U\sum_{i=n/2+1}^{n}{\widehat{% \beta}}_{i}-z\right\|_{\infty}^{2}\leq\frac{288s^{2}\tau^{\prime 2}\log n}{n% \varepsilon^{2}}\lesssim\frac{s\tau^{2}\log^{2}n}{n\varepsilon^{2}}.∥ divide start_ARG 2 end_ARG start_ARG italic_n end_ARG italic_U ∑ start_POSTSUBSCRIPT italic_i = italic_n / 2 + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_z ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_s ⋅ ∥ divide start_ARG 2 end_ARG start_ARG italic_n end_ARG italic_U ∑ start_POSTSUBSCRIPT italic_i = italic_n / 2 + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_z ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG 288 italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_τ start_POSTSUPERSCRIPT ′ 2 end_POSTSUPERSCRIPT roman_log italic_n end_ARG start_ARG italic_n italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ≲ divide start_ARG italic_s italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n end_ARG start_ARG italic_n italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .

The following lemma is the key technical result to prove Theorem 3.6.

Lemma C.5 (Privacy and utility of Algorithm 9).

Let β^superscript^𝛽\widehat{\beta}^{*}over^ start_ARG italic_β end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT be the true underlying coefficient, and β^isubscript^𝛽𝑖\widehat{\beta}_{i}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPTs be the coefficients estimated by each user. Then the algorithm 9 is ε𝜀\varepsilonitalic_ε-ULDP. Moreover, there exists some τlog2n/masymptotically-equals𝜏superscript2𝑛𝑚\tau\asymp\sqrt{{\log^{2}n}/{m}}italic_τ ≍ square-root start_ARG roman_log start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n / italic_m end_ARG such that, with probability 12/n212superscript𝑛21-2/n^{2}1 - 2 / italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, we have

β^𝚄𝙻𝙳𝙿𝙼𝚎𝚊𝚗({β^i}i=1n/2,{β^i}i=n/2+1n,τ,ε)22s2log3nnmε2+slognnmless-than-or-similar-tosubscriptsuperscriptnormsuperscript^𝛽𝚄𝙻𝙳𝙿𝙼𝚎𝚊𝚗superscriptsubscriptsubscript^𝛽𝑖𝑖1𝑛2superscriptsubscriptsubscript^𝛽𝑖𝑖𝑛21𝑛𝜏𝜀22superscript𝑠2superscript3𝑛𝑛𝑚superscript𝜀2𝑠𝑛𝑛𝑚\displaystyle\left\|\widehat{\beta}^{*}-\mathtt{ULDPMean}(\{\widehat{\beta}_{i% }\}_{i=1}^{n/2},\{\widehat{\beta}_{i}\}_{i=n/2+1}^{n},\tau,\varepsilon)\right% \|^{2}_{2}\lesssim\frac{s^{2}\log^{3}n}{nm\varepsilon^{2}}+\frac{s\log n}{nm}∥ over^ start_ARG italic_β end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - typewriter_ULDPMean ( { over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n / 2 end_POSTSUPERSCRIPT , { over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = italic_n / 2 + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , italic_τ , italic_ε ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≲ divide start_ARG italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_n end_ARG start_ARG italic_n italic_m italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG italic_s roman_log italic_n end_ARG start_ARG italic_n italic_m end_ARG
Proof of Lemma C.5.

We first show the privacy property of Algorithm 9. Since the users of Range and Mean do not across, it suffices to show that both of the algorithms are ε𝜀\varepsilonitalic_ε-ULDP. The privacy of Range follows from Lemma 1 of Girgis et al. (2022). The privacy of Mean is straightforward by property of Laplace mechanism, given that the sensitivity of Π[a,b]ysubscriptΠ𝑎𝑏𝑦\Pi_{[a,b]}yroman_Π start_POSTSUBSCRIPT [ italic_a , italic_b ] end_POSTSUBSCRIPT italic_y is |ba|𝑏𝑎|b-a|| italic_b - italic_a |. Now we prove the accuracy part. The squared error can be decomposed into two parts associating to private estimation error and non-private estimation error, respectively.

β^𝚄𝙻𝙳𝙿𝙼𝚎𝚊𝚗({β^i}i=1n/2,{β^i}i=n/2+1n,τ,ε)22subscriptsuperscriptnormsuperscript^𝛽𝚄𝙻𝙳𝙿𝙼𝚎𝚊𝚗superscriptsubscriptsubscript^𝛽𝑖𝑖1𝑛2superscriptsubscriptsubscript^𝛽𝑖𝑖𝑛21𝑛𝜏𝜀22\displaystyle\left\|\widehat{\beta}^{*}-\mathtt{ULDPMean}(\{\widehat{\beta}_{i% }\}_{i=1}^{n/2},\{\widehat{\beta}_{i}\}_{i=n/2+1}^{n},\tau,\varepsilon)\right% \|^{2}_{2}∥ over^ start_ARG italic_β end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - typewriter_ULDPMean ( { over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n / 2 end_POSTSUPERSCRIPT , { over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = italic_n / 2 + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , italic_τ , italic_ε ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
\displaystyle\leq 2(2ni=n/2+1nβ^i𝚄𝙻𝙳𝙿𝙼𝚎𝚊𝚗({β^i}i=1n/2,{β^i}i=n/2+1n,τ,ε)22+2ni=n/2+1nβ^iβ^22).2subscriptsuperscriptnorm2𝑛superscriptsubscript𝑖𝑛21𝑛subscript^𝛽𝑖𝚄𝙻𝙳𝙿𝙼𝚎𝚊𝚗superscriptsubscriptsubscript^𝛽𝑖𝑖1𝑛2superscriptsubscriptsubscript^𝛽𝑖𝑖𝑛21𝑛𝜏𝜀22superscriptsubscriptnorm2𝑛superscriptsubscript𝑖𝑛21𝑛subscript^𝛽𝑖superscript^𝛽22\displaystyle 2\cdot\left(\left\|\frac{2}{n}\sum_{i=n/2+1}^{n}{\widehat{\beta}% }_{i}-\mathtt{ULDPMean}(\{{\widehat{\beta}}_{i}\}_{i=1}^{n/2},\{{\widehat{% \beta}}_{i}\}_{i=n/2+1}^{n},\tau,\varepsilon)\right\|^{2}_{2}+\left\|\frac{2}{% n}\sum_{i=n/2+1}^{n}{\widehat{\beta}}_{i}-\widehat{\beta}^{*}\right\|_{2}^{2}% \right).2 ⋅ ( ∥ divide start_ARG 2 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = italic_n / 2 + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - typewriter_ULDPMean ( { over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n / 2 end_POSTSUPERSCRIPT , { over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = italic_n / 2 + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , italic_τ , italic_ε ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + ∥ divide start_ARG 2 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = italic_n / 2 + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over^ start_ARG italic_β end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) .

We deal with private estimation error part first. From Proposition 3.5, we know the β^isubscript^𝛽𝑖\widehat{\beta}_{i}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPTs satisfy Lemma C.4 with τ=slogn/m𝜏𝑠𝑛𝑚\tau=\sqrt{s\log n/m}italic_τ = square-root start_ARG italic_s roman_log italic_n / italic_m end_ARG. Then we have

2ni=n/2+1nβ^i𝚄𝙻𝙳𝙿𝙼𝚎𝚊𝚗({β^i}i=1n/2,{β^i}i=n/2+1n,τ,ε)22s2log2nlognnmε2.less-than-or-similar-tosubscriptsuperscriptnorm2𝑛superscriptsubscript𝑖𝑛21𝑛subscript^𝛽𝑖𝚄𝙻𝙳𝙿𝙼𝚎𝚊𝚗superscriptsubscriptsubscript^𝛽𝑖𝑖1𝑛2superscriptsubscriptsubscript^𝛽𝑖𝑖𝑛21𝑛𝜏𝜀22superscript𝑠2superscript2𝑛𝑛𝑛𝑚superscript𝜀2\displaystyle\left\|\frac{2}{n}\sum_{i=n/2+1}^{n}{\widehat{\beta}}_{i}-\mathtt% {ULDPMean}(\{{\widehat{\beta}}_{i}\}_{i=1}^{n/2},\{{\widehat{\beta}}_{i}\}_{i=% n/2+1}^{n},\tau,\varepsilon)\right\|^{2}_{2}\lesssim\frac{s^{2}\log^{2}n\log n% }{nm\varepsilon^{2}}.∥ divide start_ARG 2 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = italic_n / 2 + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - typewriter_ULDPMean ( { over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n / 2 end_POSTSUPERSCRIPT , { over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = italic_n / 2 + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , italic_τ , italic_ε ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≲ divide start_ARG italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n roman_log italic_n end_ARG start_ARG italic_n italic_m italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG . (24)

If either conditions in Proposition 3.2 holds, the parameter becomes τ=slogn/m𝜏superscript𝑠𝑛𝑚\tau=\sqrt{s^{*}\log n/m}italic_τ = square-root start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT roman_log italic_n / italic_m end_ARG by Proposition 3.5, and the same analysis goes with ssuperscript𝑠s^{*}italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT instead of s𝑠sitalic_s.

2ni=n/2+1nβ^i𝚄𝙻𝙳𝙿𝙼𝚎𝚊𝚗({β^i}i=1n/2,{β^i}i=n/2+1n,τ,ε)22sslog2nlognnmε2.less-than-or-similar-tosubscriptsuperscriptnorm2𝑛superscriptsubscript𝑖𝑛21𝑛subscript^𝛽𝑖𝚄𝙻𝙳𝙿𝙼𝚎𝚊𝚗superscriptsubscriptsubscript^𝛽𝑖𝑖1𝑛2superscriptsubscriptsubscript^𝛽𝑖𝑖𝑛21𝑛𝜏𝜀22𝑠superscript𝑠superscript2𝑛𝑛𝑛𝑚superscript𝜀2\displaystyle\left\|\frac{2}{n}\sum_{i=n/2+1}^{n}{\widehat{\beta}}_{i}-\mathtt% {ULDPMean}(\{{\widehat{\beta}}_{i}\}_{i=1}^{n/2},\{{\widehat{\beta}}_{i}\}_{i=% n/2+1}^{n},\tau,\varepsilon)\right\|^{2}_{2}\lesssim\frac{ss^{*}\log^{2}n\log n% }{nm\varepsilon^{2}}.∥ divide start_ARG 2 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = italic_n / 2 + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - typewriter_ULDPMean ( { over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n / 2 end_POSTSUPERSCRIPT , { over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = italic_n / 2 + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , italic_τ , italic_ε ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≲ divide start_ARG italic_s italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n roman_log italic_n end_ARG start_ARG italic_n italic_m italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG . (25)

Next, we bound the non-private estimation error. When β^isubscript^𝛽𝑖\widehat{\beta}_{i}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the OLS estimator, by its sub-Gaussianality, we have

2ni=n/2+1nβ^iβ^22slognnm.less-than-or-similar-tosuperscriptsubscriptnorm2𝑛superscriptsubscript𝑖𝑛21𝑛subscript^𝛽𝑖superscript^𝛽22𝑠𝑛𝑛𝑚\displaystyle\left\|\frac{2}{n}\sum_{i=n/2+1}^{n}{\widehat{\beta}}_{i}-% \widehat{\beta}^{*}\right\|_{2}^{2}\lesssim\frac{s\log n}{nm}.∥ divide start_ARG 2 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = italic_n / 2 + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over^ start_ARG italic_β end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≲ divide start_ARG italic_s roman_log italic_n end_ARG start_ARG italic_n italic_m end_ARG . (26)

If either conditions in Proposition 3.2 holds, this becomes

2ni=n/2+1nβ^iβ^22slognnm.less-than-or-similar-tosuperscriptsubscriptnorm2𝑛superscriptsubscript𝑖𝑛21𝑛subscript^𝛽𝑖superscript^𝛽22superscript𝑠𝑛𝑛𝑚\displaystyle\left\|\frac{2}{n}\sum_{i=n/2+1}^{n}{\widehat{\beta}}_{i}-% \widehat{\beta}^{*}\right\|_{2}^{2}\lesssim\frac{s^{*}\log n}{nm}.∥ divide start_ARG 2 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = italic_n / 2 + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over^ start_ARG italic_β end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≲ divide start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT roman_log italic_n end_ARG start_ARG italic_n italic_m end_ARG . (27)

Together, (24) and (26) lead to

β^𝚄𝙻𝙳𝙿𝙼𝚎𝚊𝚗({β^i}i=1n/2,{β^i}i=n/2+1n,τ,ε)22s2log3nnmε2+slognnm.less-than-or-similar-tosubscriptsuperscriptnormsuperscript^𝛽𝚄𝙻𝙳𝙿𝙼𝚎𝚊𝚗superscriptsubscriptsubscript^𝛽𝑖𝑖1𝑛2superscriptsubscriptsubscript^𝛽𝑖𝑖𝑛21𝑛𝜏𝜀22superscript𝑠2superscript3𝑛𝑛𝑚superscript𝜀2𝑠𝑛𝑛𝑚\displaystyle\left\|\widehat{\beta}^{*}-\mathtt{ULDPMean}(\{\widehat{\beta}_{i% }\}_{i=1}^{n/2},\{\widehat{\beta}_{i}\}_{i=n/2+1}^{n},\tau,\varepsilon)\right% \|^{2}_{2}\lesssim\frac{s^{2}\log^{3}n}{nm\varepsilon^{2}}+\frac{s\log n}{nm}.∥ over^ start_ARG italic_β end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - typewriter_ULDPMean ( { over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n / 2 end_POSTSUPERSCRIPT , { over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = italic_n / 2 + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , italic_τ , italic_ε ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≲ divide start_ARG italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_n end_ARG start_ARG italic_n italic_m italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG italic_s roman_log italic_n end_ARG start_ARG italic_n italic_m end_ARG .

The overall failure probability is at least 2/n22superscript𝑛22/n^{2}2 / italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT since we utilized two high probability arguments. Similarly, (25) and (27) lead to

β^𝚄𝙻𝙳𝙿𝙼𝚎𝚊𝚗({β^i}i=1n/2,{β^i}i=n/2+1n,τ,ε)22sslog3nnmε2+slognnm.less-than-or-similar-tosubscriptsuperscriptnormsuperscript^𝛽𝚄𝙻𝙳𝙿𝙼𝚎𝚊𝚗superscriptsubscriptsubscript^𝛽𝑖𝑖1𝑛2superscriptsubscriptsubscript^𝛽𝑖𝑖𝑛21𝑛𝜏𝜀22𝑠superscript𝑠superscript3𝑛𝑛𝑚superscript𝜀2superscript𝑠𝑛𝑛𝑚\displaystyle\left\|\widehat{\beta}^{*}-\mathtt{ULDPMean}(\{\widehat{\beta}_{i% }\}_{i=1}^{n/2},\{\widehat{\beta}_{i}\}_{i=n/2+1}^{n},\tau,\varepsilon)\right% \|^{2}_{2}\lesssim\frac{ss^{*}\log^{3}n}{nm\varepsilon^{2}}+\frac{s^{*}\log n}% {nm}.∥ over^ start_ARG italic_β end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - typewriter_ULDPMean ( { over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n / 2 end_POSTSUPERSCRIPT , { over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = italic_n / 2 + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , italic_τ , italic_ε ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≲ divide start_ARG italic_s italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_n end_ARG start_ARG italic_n italic_m italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT roman_log italic_n end_ARG start_ARG italic_n italic_m end_ARG .

C.2.3 Proof of Theorem 3.6

Proof of Theorem 3.6.

By Lemma B.6 and C.5, both HeavyHitter and ULDPMean are ε𝜀\varepsilonitalic_ε-ULDP. Since their associated users do not cross, we have Algorithm 1 is also ε𝜀\varepsilonitalic_ε-ULDP. As for (ii), by Proposition 3.3, we know that all the non-zero variables of βsuperscript𝛽\beta^{*}italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is included in {v^1,,v^s}subscript^𝑣1subscript^𝑣𝑠\{\widehat{v}_{1},\cdots,\widehat{v}_{s}\}{ over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT } with probability 11/n211superscript𝑛21-1/n^{2}1 - 1 / italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Thus, we have

ββ22=β^β^22.superscriptsubscriptnormsuperscript𝛽𝛽22superscriptsubscriptnormsuperscript^𝛽^𝛽22\displaystyle\left\|\beta^{*}-\beta\right\|_{2}^{2}=\left\|\widehat{\beta}^{*}% -\widehat{\beta}\right\|_{2}^{2}.∥ italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_β ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∥ over^ start_ARG italic_β end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - over^ start_ARG italic_β end_ARG ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Applying Lemma C.5, this leads to

ββ22s2log3nnmε2+slognnms2log3nnmε2α2+slognnmα,less-than-or-similar-tosuperscriptsubscriptnormsuperscript𝛽𝛽22superscript𝑠2superscript3𝑛𝑛𝑚superscript𝜀2𝑠𝑛𝑛𝑚less-than-or-similar-tosuperscript𝑠absent2superscript3𝑛𝑛𝑚superscript𝜀2superscript𝛼2superscript𝑠𝑛𝑛𝑚𝛼\displaystyle\left\|\beta^{*}-\beta\right\|_{2}^{2}\lesssim\frac{s^{2}\log^{3}% n}{nm\varepsilon^{2}}+\frac{s\log n}{nm}\lesssim\frac{s^{*2}\log^{3}n}{nm% \varepsilon^{2}\alpha^{2}}+\frac{s^{*}\log n}{nm\alpha},∥ italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_β ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≲ divide start_ARG italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_n end_ARG start_ARG italic_n italic_m italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG italic_s roman_log italic_n end_ARG start_ARG italic_n italic_m end_ARG ≲ divide start_ARG italic_s start_POSTSUPERSCRIPT ∗ 2 end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_n end_ARG start_ARG italic_n italic_m italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT roman_log italic_n end_ARG start_ARG italic_n italic_m italic_α end_ARG ,

where in the last step we used Proposition 3.3. In the last, the overall failure probability of Proposition 3.3, Lemma B.6, and Lemma C.5 is at most 4/n24superscript𝑛24/n^{2}4 / italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. ∎

Appendix D Extension to Sparse Estimation

The full statement of Theorem 3.7 is as follows. We utilize Algorithm 1 while modifying the estimators β^isubscript^𝛽𝑖\widehat{\beta}_{i}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and selectors 𝒮isubscript𝒮𝑖\mathcal{S}_{i}caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to accommodate the general problem.

Theorem D.1 (Formal version of Theorem 3.7).

Let data {Xi}i=1nsuperscriptsubscriptsubscript𝑋𝑖𝑖1𝑛\{X_{i}\}_{i=1}^{n}{ italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT be generated by PβsubscriptPsuperscript𝛽\mathrm{P}_{\beta^{*}}roman_P start_POSTSUBSCRIPT italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT for βΩs,adsuperscript𝛽superscriptsubscriptΩ𝑠𝑎𝑑\beta^{*}\in\Omega_{s,a}^{d}italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ roman_Ω start_POSTSUBSCRIPT italic_s , italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. Suppose we have non-private estimators: (i) estimator β~isubscript~𝛽𝑖\tilde{\beta}_{i}over~ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT with β~iβ2ν1subscriptnormsubscript~𝛽𝑖superscript𝛽2subscript𝜈1\|\tilde{\beta}_{i}-{\beta}^{*}\|_{2}\leq\nu_{1}∥ over~ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT for all 1in/21𝑖𝑛21\leq i\leq n/21 ≤ italic_i ≤ italic_n / 2 and (ii) estimator β^isubscript^𝛽𝑖\widehat{\beta}_{i}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT on selected variables with 𝔼[β^i]=β^𝔼delimited-[]subscript^𝛽𝑖superscript^𝛽\mathbb{E}\left[\widehat{\beta}_{i}\right]=\widehat{\beta}^{*}blackboard_E [ over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] = over^ start_ARG italic_β end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and β^iβ^2ν2subscriptnormsubscript^𝛽𝑖superscript^𝛽2subscript𝜈2\|\widehat{\beta}_{i}-\widehat{\beta}^{*}\|_{2}\leq\nu_{2}∥ over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over^ start_ARG italic_β end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT for all n/2+1in𝑛21𝑖𝑛n/2+1\leq i\leq nitalic_n / 2 + 1 ≤ italic_i ≤ italic_n. Then there exist α𝛼\alphaitalic_α-good selectors {𝒮i}i=1n/2superscriptsubscriptsubscript𝒮𝑖𝑖1𝑛2\{\mathcal{S}_{i}\}_{i=1}^{n/2}{ caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n / 2 end_POSTSUPERSCRIPT with αslognlogd/nε2greater-than-or-equivalent-to𝛼superscript𝑠𝑛𝑑𝑛superscript𝜀2\alpha\gtrsim s^{*}\sqrt{\log n\log d/n\varepsilon^{2}}italic_α ≳ italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT square-root start_ARG roman_log italic_n roman_log italic_d / italic_n italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG, that is Pr(v=𝒮i(Xi))α/sPr𝑣subscript𝒮𝑖subscript𝑋𝑖𝛼superscript𝑠\mathrm{Pr}\left(v=\mathcal{S}_{i}(X_{i})\right)\geq\alpha/s^{*}roman_Pr ( italic_v = caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) ≥ italic_α / italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT for 1vs1𝑣superscript𝑠1\leq v\leq s^{*}1 ≤ italic_v ≤ italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and 1in/21𝑖𝑛21\leq i\leq n/21 ≤ italic_i ≤ italic_n / 2. Suppose we let α/8sρα/4s𝛼8superscript𝑠𝜌𝛼4superscript𝑠\alpha/8s^{*}\leq\rho\leq\alpha/4s^{*}italic_α / 8 italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ≤ italic_ρ ≤ italic_α / 4 italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, τν2αlogn/sasymptotically-equals𝜏superscript𝜈2𝛼𝑛superscript𝑠\tau\asymp\sqrt{\nu^{2}\alpha\log n/s^{*}}italic_τ ≍ square-root start_ARG italic_ν start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_α roman_log italic_n / italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG. Then, for any aν1greater-than-or-equivalent-to𝑎subscript𝜈1a\gtrsim\nu_{1}italic_a ≳ italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, Algorithm 1 is ε𝜀\varepsilonitalic_ε-ULDP and has an output β𝛽\betaitalic_β with

ββ22ν22n+ν22slog2nnε2αless-than-or-similar-tosuperscriptsubscriptnormsuperscript𝛽𝛽22superscriptsubscript𝜈22𝑛superscriptsubscript𝜈22superscript𝑠superscript2𝑛𝑛superscript𝜀2𝛼\displaystyle\left\|\beta^{*}-\beta\right\|_{2}^{2}\lesssim\frac{\nu_{2}^{2}}{% n}+\frac{\nu_{2}^{2}s^{*}\log^{2}n}{n\varepsilon^{2}\alpha}∥ italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_β ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≲ divide start_ARG italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG + divide start_ARG italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n end_ARG start_ARG italic_n italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_α end_ARG (28)

with probability at least 13/n213superscript𝑛21-3/n^{2}1 - 3 / italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Moreover, for 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT norm, there holds

ββ1ν22snα+ν22s2log2nnε2α2less-than-or-similar-tosubscriptnormsuperscript𝛽𝛽1superscriptsubscript𝜈22superscript𝑠𝑛𝛼superscriptsubscript𝜈22superscript𝑠absent2superscript2𝑛𝑛superscript𝜀2superscript𝛼2\displaystyle\left\|\beta^{*}-\beta\right\|_{1}\lesssim\sqrt{\frac{\nu_{2}^{2}% s^{*}}{n\alpha}}+\sqrt{\frac{\nu_{2}^{2}s^{*2}\log^{2}n}{n\varepsilon^{2}% \alpha^{2}}}∥ italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_β ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≲ square-root start_ARG divide start_ARG italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG start_ARG italic_n italic_α end_ARG end_ARG + square-root start_ARG divide start_ARG italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_s start_POSTSUPERSCRIPT ∗ 2 end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n end_ARG start_ARG italic_n italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG (29)

with probability at least 13/n213superscript𝑛21-3/n^{2}1 - 3 / italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

Proof of Theorem D.1.

The privacy guarantee follows from Theorem 3.6. Since aν1greater-than-or-equivalent-to𝑎subscript𝜈1a\gtrsim\nu_{1}italic_a ≳ italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, we can consistently select all true variables with proxy estimators. This implies we can have α𝛼\alphaitalic_α-good selectors with α1slognlogd/nε2greater-than-or-equivalent-to𝛼1greater-than-or-equivalent-tosuperscript𝑠𝑛𝑑𝑛superscript𝜀2\alpha\gtrsim 1\gtrsim s^{*}\sqrt{\log n\log d/n\varepsilon^{2}}italic_α ≳ 1 ≳ italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT square-root start_ARG roman_log italic_n roman_log italic_d / italic_n italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG. By Proposition 3.3, we know that all the non-zero variables of βsuperscript𝛽\beta^{*}italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is selected with probability 11/n211superscript𝑛21-1/n^{2}1 - 1 / italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Thus, we have ββ22=β^β^22superscriptsubscriptnormsuperscript𝛽𝛽22superscriptsubscriptnormsuperscript^𝛽^𝛽22\left\|\beta^{*}-\beta\right\|_{2}^{2}=\left\|\widehat{\beta}^{*}-\widehat{% \beta}\right\|_{2}^{2}∥ italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_β ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∥ over^ start_ARG italic_β end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - over^ start_ARG italic_β end_ARG ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Applying Lemma C.4, we have

β^β^22β^2ni=n/2+1nβ^i22+sν22log2nnε2.less-than-or-similar-tosuperscriptsubscriptnormsuperscript^𝛽^𝛽22superscriptsubscriptnormsuperscript^𝛽2𝑛superscriptsubscript𝑖𝑛21𝑛subscript^𝛽𝑖22𝑠superscriptsubscript𝜈22superscript2𝑛𝑛superscript𝜀2\displaystyle\left\|\widehat{\beta}^{*}-\widehat{\beta}\right\|_{2}^{2}% \lesssim\left\|\widehat{\beta}^{*}-\frac{2}{n}\sum_{i=n/2+1}^{n}\hat{\beta}_{i% }\right\|_{2}^{2}+\frac{s\nu_{2}^{2}\log^{2}n}{n\varepsilon^{2}}.∥ over^ start_ARG italic_β end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - over^ start_ARG italic_β end_ARG ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≲ ∥ over^ start_ARG italic_β end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - divide start_ARG 2 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = italic_n / 2 + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_s italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n end_ARG start_ARG italic_n italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .

Since β^isubscript^𝛽𝑖\widehat{\beta}_{i}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are concentrated, it is sub-Gaussian. Thus, there holds

β^β^22ν22n+sν22log2nnε2ν22n+sν22log2nnε2αless-than-or-similar-tosuperscriptsubscriptnormsuperscript^𝛽^𝛽22superscriptsubscript𝜈22𝑛𝑠superscriptsubscript𝜈22superscript2𝑛𝑛superscript𝜀2less-than-or-similar-tosuperscriptsubscript𝜈22𝑛superscript𝑠superscriptsubscript𝜈22superscript2𝑛𝑛superscript𝜀2𝛼\displaystyle\left\|\widehat{\beta}^{*}-\widehat{\beta}\right\|_{2}^{2}% \lesssim\frac{\nu_{2}^{2}}{n}+\frac{s\nu_{2}^{2}\log^{2}n}{n\varepsilon^{2}}% \lesssim\frac{\nu_{2}^{2}}{n}+\frac{s^{*}\nu_{2}^{2}\log^{2}n}{n\varepsilon^{2% }\alpha}∥ over^ start_ARG italic_β end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - over^ start_ARG italic_β end_ARG ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≲ divide start_ARG italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG + divide start_ARG italic_s italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n end_ARG start_ARG italic_n italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ≲ divide start_ARG italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG + divide start_ARG italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n end_ARG start_ARG italic_n italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_α end_ARG

where in the last step we used Proposition 3.3. In the last, the overall failure probability of Proposition 3.3, Lemma B.6, and Lemma C.4 is at most 3/n23superscript𝑛23/n^{2}3 / italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. This yields (28). For (29), note that there is only ss/αless-than-or-similar-to𝑠superscript𝑠𝛼s\lesssim s^{*}/\alphaitalic_s ≲ italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT / italic_α none zero elements. Using the difference between 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norms, which is s𝑠\sqrt{s}square-root start_ARG italic_s end_ARG, yields (29). ∎

Appendix E Additional Experiment Results

E.1 Implementation Details

For each model, we report the best result over its parameter grids, with the best result determined based on the average result of at least 30 replications. We do not perform any parameter selection (e.g. cross validation or validation set) since they are prohibitive under locally private setting (Ma & Yang, 2024; Ma et al., 2024a) or will cost too much privacy budget (Papernot & Steinke, 2021). The parameter grids size are selected based on running time so that each method costs equal amount of computation. Efficient methods receive a exhaustive parameter grid and can be properly tuned. Computation heavy methods receive a small grid with insensitive parameters set to default.

  • For candidate variable selector of our methods, we adopt the Lasso estimator and identify its non-zero coefficients as the selected variables. Moreover, we conduct a feature screening (see Appendix B.1 for detail) for acceleration. The number of screened variables is set to 64. The number of selected variables s𝑠sitalic_s is selected in {2,4,8,16}24816\{2,4,8,16\}{ 2 , 4 , 8 , 16 }.

    • 2-SLR: The two-round sparse linear regression protocol is implemented based on Algorithm 1. We select the range [B,B]𝐵𝐵[-B,B][ - italic_B , italic_B ] in B{1,2,3}𝐵123B\in\{1,2,3\}italic_B ∈ { 1 , 2 , 3 } and the concentration radius is decided by the number of bins, which is in {2,4,8,16,32}2481632\{2,4,8,16,32\}{ 2 , 4 , 8 , 16 , 32 }.

    • M-SLR: The multi-round sparse linear regression protocol is implemented based on Algorithm 6. We set B=3𝐵3B=3italic_B = 3 and select the number of bins in {2,4,8,16,32}2481632\{2,4,8,16,32\}{ 2 , 4 , 8 , 16 , 32 }. Moreover, we set the learning rate of the gradient to be ηt=0.1(1+t2)0.2subscript𝜂𝑡0.1superscript1𝑡20.2\eta_{t}=0.1\cdot(\frac{1+t}{2})^{0.2}italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 0.1 ⋅ ( divide start_ARG 1 + italic_t end_ARG start_ARG 2 end_ARG ) start_POSTSUPERSCRIPT 0.2 end_POSTSUPERSCRIPT.

  • LDPPROX: The non-interactive locally differentially private sparse linear regressor based on proxy estimator is implemented according to Algorithm 1 in Zhu et al. (2023). Due to the heavy computation burden, we set r=dlogn𝑟𝑑𝑛r=\sqrt{d\cdot\log n}italic_r = square-root start_ARG italic_d ⋅ roman_log italic_n end_ARG, τ1=4subscript𝜏14\tau_{1}=4italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 4, τ2=8subscript𝜏28\tau_{2}=8italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 8. In simulation where we know minβj0|βj|=0.2subscriptsuperscript𝛽absent𝑗0superscript𝛽absent𝑗0.2\min_{\beta^{*j}\neq 0}|\beta^{*j}|=0.2roman_min start_POSTSUBSCRIPT italic_β start_POSTSUPERSCRIPT ∗ italic_j end_POSTSUPERSCRIPT ≠ 0 end_POSTSUBSCRIPT | italic_β start_POSTSUPERSCRIPT ∗ italic_j end_POSTSUPERSCRIPT | = 0.2, we set λ=0.05𝜆0.05\lambda=0.05italic_λ = 0.05. In real data, we set λ𝜆\lambdaitalic_λ to the 10-th lower quantile of the absolute fitted coefficients.

  • LDPIHT: The locally differentially private iterative hard thresholding is implemented according to Algorithm 2 in Zhu et al. (2023). We select T{2,5,10,20,50}𝑇25102050T\in\{2,5,10,20,50\}italic_T ∈ { 2 , 5 , 10 , 20 , 50 }, η{0.01,0.1,1}𝜂0.010.11\eta\in\{0.01,0.1,1\}italic_η ∈ { 0.01 , 0.1 , 1 }, τ1,τ2{2,4,8}subscript𝜏1subscript𝜏2248\tau_{1},\tau_{2}\in\{2,4,8\}italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ { 2 , 4 , 8 }, k{5,10,20,50}superscript𝑘5102050k^{\prime}\in\{5,10,20,50\}italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ { 5 , 10 , 20 , 50 }.

  • Lasso: The conventional Lasso regressor is fitted using the LassoCV class in scikit-learn package (Pedregosa et al., 2011). We set n_alphas=300𝑛_𝑎𝑙𝑝𝑎𝑠300n\_alphas=300italic_n _ italic_a italic_l italic_p italic_h italic_a italic_s = 300, max_iter=3000𝑚𝑎𝑥_𝑖𝑡𝑒𝑟3000max\_iter=3000italic_m italic_a italic_x _ italic_i italic_t italic_e italic_r = 3000, and tol=104𝑡𝑜𝑙superscript104tol=10^{-4}italic_t italic_o italic_l = 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT.

E.2 Additional Simulation Results

We present the additional result of the correlated marginal distribution data experiment omitted in the main text due to page limitation. The correlation of the first 50 dimensions are set to be exponentially decaying, i.e.

Cov(Xi,jk,Xi,jk)=2|kk|.Covsuperscriptsubscript𝑋𝑖𝑗𝑘superscriptsubscript𝑋𝑖𝑗superscript𝑘superscript2𝑘superscript𝑘\displaystyle\mathrm{Cov}\left(X_{i,j}^{k},X_{i,j}^{k^{\prime}}\right)=2^{-|k-% k^{\prime}|}.roman_Cov ( italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) = 2 start_POSTSUPERSCRIPT - | italic_k - italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | end_POSTSUPERSCRIPT .

We draw each σi,jsubscript𝜎𝑖𝑗\sigma_{i,j}italic_σ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT correlatedly from a standard Gaussian distribution. For βsuperscript𝛽\beta^{*}italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, we randomly select s=8superscript𝑠8s^{*}=8italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = 8 coordinates in the first 50 dimensions to be 0.20.20.20.2 and let others be zero. We typically set n=400𝑛400n=400italic_n = 400, m=100𝑚100m=100italic_m = 100, d=256𝑑256d=256italic_d = 256, and ε=4𝜀4\varepsilon=4italic_ε = 4, while varying one of them to observe how the evaluated metric varies. We use squared error as evaluation of the estimated coefficient and F1 score as evaluation of the selected variables.

Refer to caption
(a) d𝑑ditalic_d - F1 score.
Refer to caption
(b) d𝑑ditalic_d - 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT error with m=100𝑚100m=100italic_m = 100.
Refer to caption
(c) d𝑑ditalic_d - 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT error with m=200𝑚200m=200italic_m = 200.
Refer to caption
(d) ε𝜀\varepsilonitalic_ε - 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT error.
Figure 5: Experiments w.r.t. d𝑑ditalic_d and ε𝜀\varepsilonitalic_ε for correlated marginal. We plot the quantiles over 30 repetitions with 95%percent9595\%95 % coverage. We exclude LDPPROX in some figures since it is highly unstable and do not fit into our plot scale.

We conduct experiments with respect to d𝑑ditalic_d. We first analyze the variable selection performance. Due to the high sparsity, we use F1 score as the evaluation criterion. For d{16,32,,1024}𝑑16321024d\in\{16,32,\cdots,1024\}italic_d ∈ { 16 , 32 , ⋯ , 1024 }, we compute the averaged F1 scores of the proposed candidate variable selection (represented by 2-SLR) and other methods. As depicted in Fig. 5(a), the overall performance of all methods deteriorates compared to that under the independent setting, whereas 2-SLR remains stable and maintains its advantage. When d=16𝑑16d=16italic_d = 16, the selection performances of Lasso and LDPPROX are slightly superior than the variables induced by other methods. However, as d𝑑ditalic_d increases, the variable selection performance of Lasso, LDPIHT and LDPPROX decreases sharply, while the F1 scores of 2-SLR only fluctuate slightly and become higher than those of other competitors when d64𝑑64d\geq 64italic_d ≥ 64.

Then, we analyze the estimation performance. In 5(b) and 5(c), we plot the curve of 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT error with respect to d𝑑ditalic_d. Whether m=100𝑚100m=100italic_m = 100 or m=200𝑚200m=200italic_m = 200, the proposed methods are less sensitive to d𝑑ditalic_d compared to LDPIHT. The is also compatible with rate in (5) which scales with logd𝑑\log droman_log italic_d. Compared to the independent case, the correlated case requires more local samples to achieve a consistent selection. Thus, thus difference in results for large m𝑚mitalic_m and small m𝑚mitalic_m is less apparent.

We examine the privacy-utility trade-offs by investigating performances under different ε𝜀\varepsilonitalic_εs. In 5(d), the error decreases as ε𝜀\varepsilonitalic_ε increases for all private methods as expected. Moreover, the error of 2-SLR is comparable to Lasso, while error of M-SLR quickly drops below Lasso at medium privacy region ε4𝜀4\varepsilon\geq 4italic_ε ≥ 4. This again ensures the superiority of our methods compared to fitting Lasso using only local information.

Refer to caption
(a) n𝑛nitalic_n - 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT error.
Refer to caption
(b) m𝑚mitalic_m - 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT error.
Refer to caption
(c) n/m𝑛𝑚n/mitalic_n / italic_m - 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT error.
Figure 6: Experiments w.r.t. sample sizes for correlated marginal.

Finally, we analyze the impact of sample sizes. In Figure 6(a) and 6(b), the 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT error decreases as both n𝑛nitalic_n and m𝑚mitalic_m increases for all ε𝜀\varepsilonitalic_ε, which confirms our theoretical claims. The error is generally higher than that in the independent case. The overall 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT curve is less sensitive to n𝑛nitalic_n and m𝑚mitalic_m. Moreover, we let nm=400×100𝑛𝑚400100nm=400\times 100italic_n italic_m = 400 × 100 and vary the ratio n/m𝑛𝑚n/mitalic_n / italic_m. In 4(c), we observe that, for each ε𝜀\varepsilonitalic_ε, the error of 2-SLR retains for n/m1𝑛𝑚1n/m\approx 1italic_n / italic_m ≈ 1, while increase slightly when either n𝑛nitalic_n or m𝑚mitalic_m is too small, which is compatible with Theorem 3.6. The performance of M-SLR is still sensitive to n𝑛nitalic_n becoming small.

E.3 Real Dataset Description

A summary of key information for these datasets after pre-processing can be found in Table 3. For user-specific sample partitioning, certain datasets come with predefined partitions, while others undergo random partitioning. Categorical features in the datasets are transformed into dummy variables, while each continuous feature is individually scaled to zero mean and unit variance. We also present additional information of the data sets including the data source and the pre-processing details.

Table 3: Information of real datasets.
Dataset Sample Partition d n m Area
Airline Predefined 260 205 200-400 Social
Loan Random 735 500 100 Business
Mip Predefined 144 218 5 Computer Science
Taxi Predefined 213 1200 189-200 Social
Wine Random 41 60 100 Business
Yolanda Random 100 800 200 Social

Airline: The Airlines-Departure-Delay dataset originally comes from United States Department of Transportation and currently available on OpenML (LeDell, 2020), consists of 1,048,575 observations, including one target variable and 9 attributes pertaining to flight information. We partition samples into users based on the ”Destination” variable, selecting 205205205205 users with sample counts ranging from 200200200200 to 400400400400. Attributes such as ”Origin” and ”UniqueCarrier” are transformed into dummy variables, contributing to a total of 260260260260 features in the ”Airlines” dataset. Overall, the Airlines dataset contains 75,6007560075,60075 , 600 samples.

Loan: The Loan-Default-Prediction dataset is obtained from the training set of the Kaggle Loan Default Prediction challenge (DrivenData, 2021a), which aims to reduce the consumption of economic capital and optimize on the risk to the financial investor. The original dataset comprises 55319 instances of 735735735735 attributes We randomly select 50,0005000050,00050 , 000 samples and partition the data into 500500500500 groups, with each group containing 100100100100 samples.

Mip: The MIP-2016-regression dataset, available on OpenML, comprises 1,09010901,0901 , 090 instances featuring 144144144144 attributes and 1111 output attribute (Bergdoll, 2019). Within this dataset, there are a total of 218 users, with each user possessing 5 samples.

Taxi: The Taxi dataset is obtained from the Differential Privacy Temporal Map Challenge (DrivenData, 2021b), which aims to develop algorithms that preserve data utility while guaranteeing individual privacy protection. The dataset contains quantitative and categorical information about taxi trips in Chicago, including time, distance, location, payment, and service provider. We partition the samples based on the unique identification number of taxis (taxiid𝑡𝑎𝑥subscript𝑖𝑖𝑑taxi_{i}ditalic_t italic_a italic_x italic_i start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_d), resulting in 1200 taxis with sample counts ranging from 189189189189 to 200200200200. Other features include the time of each trip (seconds𝑠𝑒𝑐𝑜𝑛𝑑𝑠secondsitalic_s italic_e italic_c italic_o italic_n italic_d italic_s), the distance of each trip (miles𝑚𝑖𝑙𝑒𝑠milesitalic_m italic_i italic_l italic_e italic_s), the time period during which each trip occurs(shift𝑠𝑖𝑓𝑡shiftitalic_s italic_h italic_i italic_f italic_t), index of the zone where the trip starts (pca𝑝𝑐𝑎pcaitalic_p italic_c italic_a), index of the zone where the trip ends (dca𝑑𝑐𝑎dcaitalic_d italic_c italic_a), service provider (company𝑐𝑜𝑚𝑝𝑎𝑛𝑦companyitalic_c italic_o italic_m italic_p italic_a italic_n italic_y), the method used to pay for the trip (payment_type𝑝𝑎𝑦𝑚𝑒𝑛𝑡_𝑡𝑦𝑝𝑒payment\_typeitalic_p italic_a italic_y italic_m italic_e italic_n italic_t _ italic_t italic_y italic_p italic_e) and amount of tips (tips𝑡𝑖𝑝𝑠tipsitalic_t italic_i italic_p italic_s) and fares (fare𝑓𝑎𝑟𝑒fareitalic_f italic_a italic_r italic_e). We use the other variable to predict the fares of the fares (fare𝑓𝑎𝑟𝑒fareitalic_f italic_a italic_r italic_e) of the trips. Attributes such as shift𝑠𝑖𝑓𝑡shiftitalic_s italic_h italic_i italic_f italic_t, pca𝑝𝑐𝑎pcaitalic_p italic_c italic_a,dca𝑑𝑐𝑎dcaitalic_d italic_c italic_a, company𝑐𝑜𝑚𝑝𝑎𝑛𝑦companyitalic_c italic_o italic_m italic_p italic_a italic_n italic_y and payment_type𝑝𝑎𝑦𝑚𝑒𝑛𝑡_𝑡𝑦𝑝𝑒payment\_typeitalic_p italic_a italic_y italic_m italic_e italic_n italic_t _ italic_t italic_y italic_p italic_e are transformed into dummy variables, resulting in a total of 213213213213 features in the Taxi dataset.

Wine: This dataset originates from the Wine Quality dataset (Cortez et al., 2009) on UCI Machine Learning Repository, which combines data from both the ”red wine” and ”white wine” datasets. The original dataset comprises 11111111 features associated with wine to predict the corresponding wine quality. In an effort to enhance dimensionality, Gaussian random noise in 30303030 dimensions has been incorporated. 6000600060006000 instances are collected in the dataset. The samples are randomly partitioned among 60606060 users, with each user having 100100100100 samples.

Yolanda: The Yolanda dataset (Guyon et al., 2019) contains 400000 instances of 100100100100 attributes and 1111 output attribute. We randomly select 160,000160000160,000160 , 000 samples and distribute them into 800800800800 groups, with each group containing 200200200200 samples.

E.4 Additional Real Datasets Results

Table 4: Running time(seconds) on real datasets.
Datasets Lasso 2-SLR M-SLR LDPPROX LDPIHT
Airline 0.7 15.8 15.3 766.5 6.1
Loan 43.1 74.7 106.9 4124.0 30.6
MIP 0.1 0.1 2.3 5.5 4.3
Taxi 0.1 0.7 11.5 1569.1 7.0
Wine 0.2 0.7 2.9 3.5 2.6
Yolanda 0.5 0.4 12.5 365.8 4.4