Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Julie Gershunskaya

In this excellent overview of the history of probability and nonprobability sampling from the end of the nineteenth century to the present day, Professor Graham Kalton outlines the essence of past endeavors that helped to define... more
In this excellent overview of the history of probability and nonprobability sampling from the end of the nineteenth century to the present day, Professor Graham Kalton outlines the essence of past endeavors that helped to define philosophical approaches and stimulate the development of survey sampling methodologies. From the beginning, there was an understanding that a sample should, in some ways, resemble the population under study. In Kiær’s ideas of “representative sampling” and Neyman’s invention of probability-based approach, the prime concern of survey sampling has been to properly plan for representing characteristics of the finite population. Poststratification and other calibration methods were developed for the same important goal of better representation.
Government statistical agencies compose a population statistic for a given domain using a sample of units nested in that domain. Subsequent modeling of these domain survey estimates is often used to “borrow strength” across a dependence... more
Government statistical agencies compose a population statistic for a given domain using a sample of units nested in that domain. Subsequent modeling of these domain survey estimates is often used to “borrow strength” across a dependence structure among the domains to improve estimation accuracy and efficiency. This paper focuses on models jointly defined for sample-based point estimates along with their sample-based estimates of variances. Bias may be present in the sample-based (observed) variances due to small sample sizes or the estimation procedure. We propose a new formulation that extends existing joint model formulations to allow for a multiplicative bias in observed variances. Our approach capitalizes on the unbiasedness property of point estimates. We utilize a nonparametric mixture construction that allows the data to discover distinct bias regimes. As a consequence of the better variance estimation, domain point estimates are more robustly estimated under a joint model fo...
Different methods have been proposed in the small area estimation literature to deal with outliers in individual observations and in the area-level random effects. In this paper, we propose a new method based on a scale mixture of two... more
Different methods have been proposed in the small area estimation literature to deal with outliers in individual observations and in the area-level random effects. In this paper, we propose a new method based on a scale mixture of two normal distributions. Using a simulation study, we compare the performance of a few recently proposed robust small area estimators and our proposed estimator based on a mixture distribution. We then compare the proposed method with the existing methods to estimate monthly employment changes in the metropolitan statistical areas using data from the Current Employment Statistics Survey conducted by the U.S. Bureau of Labor Statistics (BLS).
The sampling weight in the Current Employment Statistics Survey is determined at the time of sample selection. It depends on a unit’s State, industry, and size class. However, the population of businesses is highly dynamic. Establishments... more
The sampling weight in the Current Employment Statistics Survey is determined at the time of sample selection. It depends on a unit’s State, industry, and size class. However, the population of businesses is highly dynamic. Establishments constantly grow or contract; sometimes they also change their industrial classification or geographical location. Even the number of population units is not fixed but continuously changes over time. A unit may change its size class at the time of estimation or the content of the original stratum may change. Under such circumstances, application of the original survey weights may increase volatility of survey estimates. In this paper we investigate if the survey estimates can be improved by adjusting the original weights.
Each month, the Bureau of Labor Statistics publishes estimates of employment for industrial supersectors at the metropolitan statistical area (MSA) level. The survey-weighted ratio estimator that is used to produce estimates for large... more
Each month, the Bureau of Labor Statistics publishes estimates of employment for industrial supersectors at the metropolitan statistical area (MSA) level. The survey-weighted ratio estimator that is used to produce estimates for large domains is generally less reliable for MSA level estimation due to the unavailability of adequate sample from a given MSA. We also note that the effect of a few establishments, which are influential in terms of unusual employment numbers or sampling weights or both, could be prominent for the small area estimation. In this paper, we develop an empirical hierarchical Bayes method based on a unit level model. Our proposed method is found to be less variable and less sensitive to influential establishments when compared to the direct survey-weighted ratio estimator or estimators based on an area level model. Empirical evaluation of the estimators is performed using the population data from administrative file.
The work of this paper is prompted by the particular case of the Current Employment Statistics (CES) Survey conducted monthly by the U.S. Bureau of Labor Statistics. Besides estimates at the national level, the survey yields estimates of... more
The work of this paper is prompted by the particular case of the Current Employment Statistics (CES) Survey conducted monthly by the U.S. Bureau of Labor Statistics. Besides estimates at the national level, the survey yields estimates of employment for numerous domains defined by intersection of industry and geography, providing important information about the current status of the local economy. Variances of the employment estimates are estimated from the sample. However, the sample based estimated variances can be unstable, especially in smaller domains.
Every month, the Bureau of Labor Statistics publishes estimates of employment from the Current Employment Statistics survey at the state and national total levels, as well as at various detailed levels by industry and geography. For... more
Every month, the Bureau of Labor Statistics publishes estimates of employment from the Current Employment Statistics survey at the state and national total levels, as well as at various detailed levels by industry and geography. For smaller domains, where the direct sample-based estimates are not reliable, estimates are produced using models. We adopt a Bayesian approach and consider the area level Fay-Herriot model along with several alternatives that: (i) co-model the variances of the direct estimators instead of adhering to the traditional assumption of the “fixed and known” variances; (ii) accounts for possible deviations from the normality assumption of the random effects by assuming a mixture of the normal distributions. Models are compared based on the direct estimates and variances from the Current Employment Statistics survey, as well as using a simulation study. We further propose a model-based method of screening that could become a useful tool for analyst’s review of the...
The work of this paper is prompted by the particular case of the Current Employment Statistics (CES) Survey conducted monthly by the U.S. Bureau of Labor Statistics. Besides estimates at the national level, the survey yields estimates of... more
The work of this paper is prompted by the particular case of the Current Employment Statistics (CES) Survey conducted monthly by the U.S. Bureau of Labor Statistics. Besides estimates at the national level, the survey yields estimates of employment for numerous domains defined by intersection of industry and geography, providing important information about the current status of the local economy. Variances of the employment estimates are estimated from the sample. However, the sample based estimated variances can be unstable, especially in smaller domains.
Small domain estimation models, like the Fay-Herriot, often assume a normally distributed latent process centered on a linear mean function. The linearity assumption may be violated for domains that express idiosyncratic phenomena not... more
Small domain estimation models, like the Fay-Herriot, often assume a normally distributed latent process centered on a linear mean function. The linearity assumption may be violated for domains that express idiosyncratic phenomena not captured by the predictors. Under a single component normal distribution prior for the random effects, direct sample estimates for those domains would be viewed as if they were outliers with respect to the model, when in fact they may reflect the underlying true population value. The model interpretation is also confounded by the variances of direct sample estimates because, while typically treated as fixed and known, they are estimates and thus contain noise. In this paper, we construct a joint model for the direct estimates and their variances where we replace the normal distribution for the latent process with a nonparametric mixtures of normal distributions with the goal to improve robustness in estimation quality for these idiosyncratic domains. W...
The Current Employment Statistics (CES) Survey uses a weighted link relative estimator to make estimates of employment at various levels of industry and area detail. The estimates are produced monthly approximately three weeks after the... more
The Current Employment Statistics (CES) Survey uses a weighted link relative estimator to make estimates of employment at various levels of industry and area detail. The estimates are produced monthly approximately three weeks after the reference date of the survey. Sometimes outliers combined with relatively large probability weights result in influential reporters that cause estimates of smaller domains to be very unstable. An employment figure reported to the survey may be considered typical for a relatively large estimation domain, however, it may be unusual and highly influential for a more detailed industry and area domain. The focus of the current simulation study is to explore the feasibility of using a robust estimation technique in a simple and automated way to detect and treat outliers during the short timeframe allotted for monthly survey processing. Results are evaluated based on the deviation of the estimates from the true population levels.
The sampling weight in the Current Employment Statistics Survey is determined at the time of sample selection. It depends on a unit’s State, industry, and size class. However, the population of businesses is highly dynamic. Establishments... more
The sampling weight in the Current Employment Statistics Survey is determined at the time of sample selection. It depends on a unit’s State, industry, and size class. However, the population of businesses is highly dynamic. Establishments constantly grow or contract; sometimes they also change their industrial classification or geographical location. Even the number of population units is not fixed but continuously changes over time. A unit may change its size class at the time of estimation or the content of the original stratum may change. Under such circumstances, application of the original survey weights may increase volatility of survey estimates. In this paper we investigate if the survey estimates can be improved by adjusting the original weights.
We propose a joint model for point estimates and their variances when observed variances may contain bias. The bias in variances for groups of domains may be induced by an estimation procedure, such the weight smoothing procedure of... more
We propose a joint model for point estimates and their variances when observed variances may contain bias. The bias in variances for groups of domains may be induced by an estimation procedure, such the weight smoothing procedure of Beaumont (2008) to compute a domain point estimator. While the weight-smoothed point estimator is more efficient than the original weighted survey estimator, its variance estimation procedure requires truncations that induces bias in the domain variance estimator. The proposed formulation generalizes the joint point estimator and variance models to explicitly parameterize a multiplicative bias in observed variances under a nonparametric formulation that allows the data to discover distinct bias regimes. As a consequence of the better variance estimation, domain point estimates are more robustly estimated under a joint model for the domain point estimates and their associated variances. We compare the performances of alternative models in application to e...
The U.S. Bureau of Labor Statistics (BLS) publishes monthly estimates of employment levels, one of the key indicators of the U.S. economy, for many domains. To assess the quality of these estimates, it is important to publish their... more
The U.S. Bureau of Labor Statistics (BLS) publishes monthly estimates of employment levels, one of the key indicators of the U.S. economy, for many domains. To assess the quality of these estimates, it is important to publish their associated standard error estimates. In our simulation study, the standard designbased variance estimators of the monthly employment growth rate estimators are found to be often unstable even at a statewide industrial level where there is a sample capable of producing a good point estimates. In this paper, we develop new direct design-based,, synthetic model-based and empirical linear Bayes (ELB) variance estimators Using a Monte Carlo simulation from a real finite population, we evaluate the bias, variance, mean squared error (MSE), and coverage properties of the proposed variance estimators with respect to the randomization principle.
We propose a new robust empirical best estimation approach to estimate small area finite population means that are relatively insensitive to a model misspecification or to the presence of outliers. This important robustness property is... more
We propose a new robust empirical best estimation approach to estimate small area finite population means that are relatively insensitive to a model misspecification or to the presence of outliers. This important robustness property is achieved by replacing the standard normality assumption of the sampling errors in a nested-error regression (NER) model by a scale mixture of two normal distributions with different variances. We present a formal statistical test to identify if a small area is an outlier and provide an efficient new computing algorithm to implement our procedure. We examine the finite sample robustness properties of our proposed method using a Monte Carlo simulation and compare the proposed method with alternative existing methods in a study using data from the Current Employment Statistics (CES) survey conducted by the US Bureau of Labor Statistics (BLS).
Publisher Summary The uncertainty associated with a survey estimate is commonly expressed in terms of its standard error estimate or a measure related to the standard error estimates such as estimated coefficient of variation or a... more
Publisher Summary The uncertainty associated with a survey estimate is commonly expressed in terms of its standard error estimate or a measure related to the standard error estimates such as estimated coefficient of variation or a confidence interval. For a linear survey estimator, the estimation of its design-based standard error for a simple probability sample design is straightforward and involves obtaining an exact expression of the true design-based variance and then estimating the true design-based variance by a design-based unbiased estimator. The methods, commonly referred to as random group methods, involve either drawing two or more subsamples from the finite population or splitting the original sample into several random subgroups, constructing separate estimate of the parameter of interest from each subsample and an estimate from the pooled sample, and computing the variance among the several estimates One possible remedy for the practical difficulty associated with the random group methods is to consider re-sampling methods, which are similar to the random group methods in terms of constructing variance estimates from the variation of the estimates for the subsamples, but differ in that they use subsamples that overlap.