JDS: Vol 1, No 2

Volume 1, Issue 2June 2024

Volume 1, Issue 2

June 2024

Editor:

Jelena Bradic
UC San Diego
,
Stratos Idreos
Harvard University
,
John Lafferty
Yale University

Publisher:

Association for Computing Machinery
New York
NY
United States

EISSN:2831-3194

Tags:

PDF eReader

Bibliometrics

Issue Downloads

PDFfront matter (TOC, masthead, submission information)

Select All

Export Citations Save to Binder

research-article

Open Access

Identification and Semiparametric Efficiency Theory of Nonignorable Missing Data with a Shadow Variable

Article No.: 5, Pages 1–23https://doi.org/10.1145/3592389

We consider identification and estimation with an outcome missing not at random (MNAR). We study an identification strategy based on a so-called shadow variable. A shadow variable is assumed to be correlated with the outcome but independent of the ...

Highlights

Problem statement

Missingness not at random (MNAR) arises in many empirical studies in biomedical, socioeconomic, and epidemiological researches. A fundamental problem of MNAR is the identification problem, that is, the parameter of interest ...

research-article

Open Access

Optimistic Rates: A Unifying Theory for Interpolation Learning and Regularization in Linear Regression

Article No.: 6, Pages 1–51https://doi.org/10.1145/3594234

We study a localized notion of uniform convergence known as an “optimistic rate” [34, 39] for linear regression with Gaussian data. Our refined analysis avoids the hidden constant and logarithmic factor in existing results, which are known to be crucial ...

Highlights

Problem Statement

Generalization theory proposes to explain the ability of machine learning models to generalize to fresh examples by bounding the gap between the test error (error on new examples) and training error (error on the data they ...

research-article

Open Access

Language Models in the Loop: Incorporating Prompting into Weak Supervision

Article No.: 7, Pages 1–30https://doi.org/10.1145/3617130

We propose a new strategy for applying large pre-trained language models to novel tasks when labeled training data is limited. Rather than apply the model in a typical zero-shot or few-shot fashion, we treat the model as the basis for labeling functions ...

Highlights

Problem statement

The goal of this paper is to use large language models to create smaller, specialized models. These specialized models can be better suited to specific tasks because they are tuned for them and are less expensive to serve in ...

research-article

Open Access

Principal Component Networks: Utilizing Low-Rank Activation Structure to Reduce Parameters Early in Training

Article No.: 8, Pages 1–27https://doi.org/10.1145/3617778

Recent works show that overparameterized neural networks contain small subnetworks that exhibit comparable accuracy to the full model when trained in isolation. These results highlight the potential to reduce the computational costs of deep neural network ...

Highlights

Problem Statement

Many recent results show that large neural networks can lead to improved generalization. Yet, training these large models comes with increased computational costs. In an effort to address this issue, several works have show ...

ACM / IMS Journal of Data Science

Sections

Issue Downloads

Identification and Semiparametric Efficiency Theory of Nonignorable Missing Data with a Shadow Variable

Optimistic Rates: A Unifying Theory for Interpolation Learning and Regularization in Linear Regression

Language Models in the Loop: Incorporating Prompting into Weak Supervision

Principal Component Networks: Utilizing Low-Rank Activation Structure to Reduce Parameters Early in Training

Sections

Issue Downloads

Identification and Semiparametric Efficiency Theory of Nonignorable Missing Data with a Shadow Variable

Optimistic Rates: A Unifying Theory for Interpolation Learning and Regularization in Linear Regression

Language Models in the Loop: Incorporating Prompting into Weak Supervision

Principal Component Networks: Utilizing Low-Rank Activation Structure to Reduce Parameters Early in Training

Save to Binder

Subjects

Comments