Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
A thorough geostatistical analysis of spatial data, observed at given spatial locations, includes exploratory data analysis, spatial-model building, diagnosing the model fit, and inference on unknown model parameters or unobserved values... more
A thorough geostatistical analysis of spatial data, observed at given spatial locations, includes exploratory data analysis, spatial-model building, diagnosing the model fit, and inference on unknown model parameters or unobserved values (at known locations). Using results from mathematical analysis, exact and asymptotic distribution theory, and simulation studies, we argue that, when used sensibly, the geostatistical method is reassuringly stable.
Previously published statistical analyses of NCAA Division I Men’s Tournament (“March Madness”) game outcomes have revealed that the relationship between tournament seed and the time-aggregated number of third-round (“Sweet 16”)... more
Previously published statistical analyses of NCAA Division I Men’s Tournament (“March Madness”) game outcomes have revealed that the relationship between tournament seed and the time-aggregated number of third-round (“Sweet 16”) appearances for the middle half of the seeds exhibits a statistically and practically significant departure from monotonicity. In particular, the 8- and 9-seeds combined appear less often than any one of seeds 10–12. In this article, we show that a similar “middle-seed anomaly” also occurs in the NCAA Division I Women’s Tournament but does not occur in two other major sports tournaments that are similar in structure to March Madness. We offer explanations for the presence of a middle-seed anomaly in the NCAA basketball tournaments, and its absence in the others, that are based on the combined effects of the functional form of the relationship between team strength and seed specific to each tournament, the degree of parity among teams, and certain elements of tournament structure. Although these explanations account for the existence of middle-seed anomalies in the NCAA basketball tournaments, their larger-than-expected magnitudes, which arise mainly from the overperformance of seeds 10–12 in the second round, remain enigmatic.
Before we can begin to tackle the inference problems described near the end of the previous chapter, we must first develop an adequate working knowledge of matrix algebra useful for linear models. That is the objective of this chapter.... more
Before we can begin to tackle the inference problems described near the end of the previous chapter, we must first develop an adequate working knowledge of matrix algebra useful for linear models. That is the objective of this chapter. Admittedly, the topics and results selected for inclusion here are severely abridged, being limited almost exclusively to what will actually be needed in later chapters. Furthermore, for some of the results (particularly those that are used only once or twice in the sequel), little context is provided. For much more thorough treatments of matrix algebra useful for linear models and other areas of statistics, we refer the reader to the books by Harville (J Am Stat Assoc 72:320–338, 1977) and Schott (Matrix analysis for statistics, 3rd ed. Wiley, Hoboken, 2016). In fact, for proofs not given in this chapter, we provide a reference to a proof given in one or both of those books.
Recall from Chap. 7 that the least squares estimators of estimable functions are best linear unbiased estimators (BLUEs) of those functions under the Gauss–Markov model. But it turns out that this is not necessarily so under linear models... more
Recall from Chap. 7 that the least squares estimators of estimable functions are best linear unbiased estimators (BLUEs) of those functions under the Gauss–Markov model. But it turns out that this is not necessarily so under linear models having a more general variance–covariance structure, such as the Aitken model. In this chapter, we consider estimators that are best linear unbiased under the Aitken model. The first section considers the special case of an Aitken model in which the variance–covariance matrix is positive definite; BLUE in this case is also called generalized least squares estimation. The second section considers the general case. The third section characterizes those Aitken models for which the least squares estimators of estimable functions are BLUEs of those functions. A final section briefly considers an attempt to extend BLUE to the general mixed linear model.
The history of Statistics at the University of Iowa is recounted, from the hiring of Henry Rietz in 1918, to the formation of the Department of Statistics in 1965, to the present day. The key contributions of Rietz, Allen Craig, and... more
The history of Statistics at the University of Iowa is recounted, from the hiring of Henry Rietz in 1918, to the formation of the Department of Statistics in 1965, to the present day. The key contributions of Rietz, Allen Craig, and Robert Hogg are described. The prominent role of actuarial science in the development of the department is noted.
Abstract Previously published statistical analyses of NCAA Division I Men’s Basketball Tournament (“March Madness”) game outcomes since the 64-team format for its main draw began in 1985 have uncovered some apparent anomalies, such as... more
Abstract Previously published statistical analyses of NCAA Division I Men’s Basketball Tournament (“March Madness”) game outcomes since the 64-team format for its main draw began in 1985 have uncovered some apparent anomalies, such as 12-seeds upsetting 5-seeds more often than might be expected, and seeds 10 through 12 advancing to the Sweet Sixteen much more often than 8-seeds and 9-seeds—the so-called middle-seed anomaly. In this article, we address the questions of whether these perceived anomalies truly are anomalous and if so, what is responsible for them. We find that, in contrast to conclusions drawn from previous analyses, the statistical evidence for a 12-5 upset anomaly actually is very weak, while that for the middle-seed anomaly is quite strong. We dispel some (but not all) theories for the former and offer an explanation for the latter that is based primarily on the combined effects of a nonlinear relationship between team strength and seed, the lack of reseeding between rounds, and a strong quasi-home advantage accorded to 1-seeds. We also investigate the effects that hypothetical modifications to the tournament would have on the anomalies and explore whether similar anomalies exist in the NCAA Women’s Basketball Tournament.
ABSTRACT
This paper describes an algorithm for the optimal selection of sampling locations for semivariogram estimation. We assume that the semivariogram is estimated by fitting a parametric function of separation distance between observation... more
This paper describes an algorithm for the optimal selection of sampling locations for semivariogram estimation. We assume that the semivariogram is estimated by fitting a parametric function of separation distance between observation sites to a selected subset of the squared differences of original observations (thereby restricting ourselves to isotropic fields). We apply standard regression design theory to construct an optimal configuration of distances in the lag space, which is then mapped into the site space in such a way that dependence among the observations is minimized.
If small animal practice exposure, including the laboratory animal situations encountered in academic and other research pursuits, is more detrimental to veterinarians than large animal practice exposure for induction of allergic... more
If small animal practice exposure, including the laboratory animal situations encountered in academic and other research pursuits, is more detrimental to veterinarians than large animal practice exposure for induction of allergic respiratory disease, then preventive measures such as increased ventilation, use of high efficiency particulate filters, and wearing of masks should be encouraged to reduce allergen exposures. Migration from large animal practice, likewise, should be discouraged. Failure to migrate to low occupational allergy risk situations early enough in a veterinary career can have severe and even fatal results. If the observed respiratory disease in veterinarians is in fact due to exposure, then unfortunately, it may in some cases be progressive and not just chronic. Data which could provide criteria for predicting occupational allergy and possible related respiratory disease outcome is scant at this time and career counselling is difficult. If the veterinary occupational animal allergy data should be proven correct such results can be used to help others.
Page 1. Combining Temporally Correlated Environmental Data From Two Measurement Systems Jeffrey Dean ISAACSON and Dale L. ZIMMERMAN We consider the problem of combining temporally correlated environmental data from two measurement... more
Page 1. Combining Temporally Correlated Environmental Data From Two Measurement Systems Jeffrey Dean ISAACSON and Dale L. ZIMMERMAN We consider the problem of combining temporally correlated environmental data from two measurement systems. ...
ABSTRACT Inference for spatial variation in relative risk of disease is an important problem in spatial epidemiologic studies. A standard component of data assimilation in these studies is the assignment of a geocode, i.e. point-level... more
ABSTRACT Inference for spatial variation in relative risk of disease is an important problem in spatial epidemiologic studies. A standard component of data assimilation in these studies is the assignment of a geocode, i.e. point-level spatial coordinates, to the address of each subject in the study population. Unfortunately, when geocoding is performed by the standard procedure of street-segment matching to a georeferenced road file and subsequent interpolation, it is rarely completely successful. Typically, 10-30% of the addresses in the study population fail to geocode, which can adversely affect relative risk estimation, especially if one of the disease groups (e.g. cases) has a different geocoding success rate than another (e.g. controls). The possibility exists, however, for ameliorating this effect by incorporating geographic information coarser than a point (e.g. a Zip code) that is measured for the observations that fail to geocode. This article develops coarsened-data methods for relative risk estimation from incompletely geocoded data. Nonparametric (kernel smoothing) estimation procedures are featured; parametric (likelihood-based) procedures are described as well, but their applicability is much more limited. We demonstrate, via simulation and a real example of childhood asthma cases in an Iowa county that substantial improvements in the quality of relative risk estimates are possible using the proposed nonparametric coarsened-data methods.
Spatial confounding, that is, collinearity between fixed effects and random effects in a spatial generalized linear mixed model, can adversely affect estimates of the fixed effects. Restricted spat...
A basic problem in environmental analyses is to generate mapped surfaces from point observations. Effective incorporation of surface generation techniques into GIS-based analyses requires that they be systematically evaluated. In this... more
A basic problem in environmental analyses is to generate mapped surfaces from point observations. Effective incorporation of surface generation techniques into GIS-based analyses requires that they be systematically evaluated. In this paper, we evaluate kriging and inverse distance weighting in a computational experiment, using synthetic, realistic datasets that exhibit the type of autocorrelation expected in environmental data. The datasets were generated by sampling points from a mathematical surface, then adding autocorrelated error. Two levels of spatially autocorrelated error were used. Differences between the true surface and estimated values at evaluation points were used to visualize error and calculate summary statistics. INTRODUCTION When researchers wish to monitor and evaluate environmental conditions, they often generate mapped surfaces from a collection of observations made at points. A variety of methods for interpolating these surfaces exist. These include kriging, inverse distance weighting, interpolating polynomials, splines, and Fourier and power series (see Lam (1983) for a review of these and other methods). This wealth of choice, however, has created confusion about the conditions under which different interpolation methods should be selected. While some comparative evaluations have been undertaken, they are equivocal as to which interpolation method is most accurate. In some cases, kriging appears to perform best (Rouhani, 1986; Weber and Englund, 1994), while in others inverse distance weighting and splines seem to be superior.
The High Volume Small Surface Sampler (HVS3) is a dust-sampling vacuum that allows for set airflow and back pressure during sampling, increasing precision. Total dust collection efficiency of the HVS3 has been evaluated only on new... more
The High Volume Small Surface Sampler (HVS3) is a dust-sampling vacuum that allows for set airflow and back pressure during sampling, increasing precision. Total dust collection efficiency of the HVS3 has been evaluated only on new carpets-not worn carpets. We performed a factorial study to assess the impact of carpet wear, dust deposition level, carpet type, and relative humidity during sampling on HVS3 collection efficiency. House dust was aerosolized in a 1-m3 exposure chamber and allowed to settle on test carpets and reference filters. Dust was embedded into the carpets and later extracted with the HVS3 under controlled environmental conditions according to established protocols. Overall collection efficiency was high, 88.3%. Collection efficiency was significantly higher at low relative humidity levels (30%) relative to high (75%) (p = < 0.0001), though differentially between cut-pile and closed-loop carpets. Collection efficiency of carpets with high wear was significantly lower than those with midlevel wear (p = 0.01). These results demonstrate that the design of the HVS3 partially corrects for differences in dust load and carpet type. However, collection efficiency of the HVS3 is affected by high levels of carpet wear and ambient humidity during sampling.
In Chaps. 11 and 13, we obtained BLUEs for estimable linear functions under the Aitken model and BLUPs for predictable linear functions under the prediction-extended Aitken model, and we noted that this methodology could be used to... more
In Chaps. 11 and 13, we obtained BLUEs for estimable linear functions under the Aitken model and BLUPs for predictable linear functions under the prediction-extended Aitken model, and we noted that this methodology could be used to estimate estimable linear functions or predict predictable linear functions of β, b, and d in the mixed (and random) effects model, or more generally to estimate estimable and predict predictable linear functions in a general mixed linear model, provided that the variance–covariance parameters ψ (in the case of a mixed effects model) or θ (in the case of a general mixed model) are known. Moreover, it was also noted at the ends of both chapters that the customary procedure for performing these inferences when the variance–covariance parameters are unknown is to first estimate those parameters from the data and then use BLUE/BLUP formulas with the estimates substituted for the unknown true values. It is natural, then, to ask how the variance–covariance parameters should be estimated. Answering this question is the topic of this chapter. We begin with an answer that applies when the model is a components-of-variance model, for which a method known as quadratic unbiased estimation can be used to estimate the variance–covariance parameters, which are variance components in that case. Then we give an answer, based on likelihood-based estimation, that applies to any general mixed linear model.
Abstract Spatial data on a network, like spatial data on a Euclidean domain, may exhibit nonstationarity. This article develops two classes of nonstationary models for continuously indexed data on directed tree networks, such as stream... more
Abstract Spatial data on a network, like spatial data on a Euclidean domain, may exhibit nonstationarity. This article develops two classes of nonstationary models for continuously indexed data on directed tree networks, such as stream networks, that are adaptations of models used previously for nonstationary temporal or spatial data on Euclidean domains. These classes, called elastic models and spatially varying moving average models, allow the spatial dependence between observations at sites any fixed distance apart to grow monotonically as one moves either up or down the network. The process variance, or components thereof, may also be allowed to grow monotonically. An example of trout density data from a stream network in Wyoming, USA indicates that the proposed nonstationary models fit those data much better than their existing stationary or quasi-stationary counterparts.
Geocoding accuracy and the recovery of relationships between environmental exposures and health
The effects of local street network characteristics on the positional accuracy of automated geocoding for geographic health studies
Alpine plant communities vary, and their environmental covariates could influence their response to climate change. A single multilevel model of how alpine plant community composition is determined by hierarchical relations is compared to... more
Alpine plant communities vary, and their environmental covariates could influence their response to climate change. A single multilevel model of how alpine plant community composition is determined by hierarchical relations is compared to a separate examination of those relations at different scales. Nonmetric multidimensional scaling of species cover for plots in four regions across the Rocky Mountains created dependent variables. Climate variables are derived for the four regions from interpolated data. Plot environmental variables are measured directly and the presence of thirty-seven site characteristics is recorded and used to create additional independent variables. Multilevel and best subsets regressions are used to determine the strength of the hypothesized relations. The ordinations indicate structure in the assembly of plant communities. The multilevel analyses, although revealing significant relations, provide little explanation; of the site variables, those related to site microclimate are most important. In multiscale analyses (whole and separate regions), different variables are better explanations within the different regions. This result indicates weak environmental niche control of community composition. The weak relations of the structure in the patterns of species association to the environment indicates that either alpine vegetation represents a case of the neutral theory of biogeography being a valid explanation or that it represents disequilibrium conditions. The implications of neutral theory and disequilibrium explanations are similar: Response to climate change will be difficult to quantify above equilibrium background turnover.
The estimation of spatial intensity is an important inference problem in spatial epidemiologic studies. A standard data assimilation component of these studies is the assignment of a geocode, that is, point-level spatial coordinates, to... more
The estimation of spatial intensity is an important inference problem in spatial epidemiologic studies. A standard data assimilation component of these studies is the assignment of a geocode, that is, point-level spatial coordinates, to the address of each subject in the study population. Unfortunately, when geocoding is performed by the standard automated method of street-segment matching to a georeferenced road file and subsequent interpolation, it is rarely completely successful. Typically, 10-30% of the addresses in the study population, and even higher percentages in particular subgroups, fail to geocode, potentially leading to a selection bias, called geographic bias, and an inefficient analysis. Missing-data methods could be considered for analyzing such data; however, because there is almost always some geographic information coarser than a point (e.g., a Zip code) observed for the addresses that fail to geocode, a coarsened-data analysis is more appropriate. This article develops methodology for estimating spatial intensity from coarsened geocoded data. Both nonparametric (kernel smoothing) and likelihood-based estimation procedures are considered. Substantial improvements in the estimation quality of coarsened-data analyses relative to analyses of only the observations that geocode are demonstrated via simulation and an example from a rural health study in Iowa.
ABSTRACT
If small animal practice exposure, including the laboratory animal situations encountered in academic and other research pursuits, is more detrimental to veterinarians than large animal practice exposure for induction of allergic... more
If small animal practice exposure, including the laboratory animal situations encountered in academic and other research pursuits, is more detrimental to veterinarians than large animal practice exposure for induction of allergic respiratory disease, then preventive measures such as increased ventilation, use of high efficiency particulate filters, and wearing of masks should be encouraged to reduce allergen exposures. Migration from large animal practice, likewise, should be discouraged. Failure to migrate to low occupational allergy risk situations early enough in a veterinary career can have severe and even fatal results. If the observed respiratory disease in veterinarians is in fact due to exposure, then unfortunately, it may in some cases be progressive and not just chronic. Data which could provide criteria for predicting occupational allergy and possible related respiratory disease outcome is scant at this time and career counselling is difficult. If the veterinary occupational animal allergy data should be proven correct such results can be used to help others.

And 148 more