En esta comunicacion se abordara el problema de la estimacion por diferencia del total poblaciona... more En esta comunicacion se abordara el problema de la estimacion por diferencia del total poblacional cuando las observaciones en algunas unidades seleccionadas en la muestra no estan disponibles, de forma que el fenomeno de perdida de datos ocurre para ambas caracteristicas, pero no para las dos simultaneamente. De esta forma construiremos un estimador basado en los estimadores de Horvitz-Thompson del total poblacional que incorporara todas las observaciones disponibles.. Por otra parte la generalidad con que se formulan los estimadores y las buenas propiedades que presenta el estimador de Horvitz-Thompson de un parametro, nos permitira aplicar las conclusiones a cualquier diseno muestral. Por ultimo, veremos la aplicacion de este estimador al caso particular de muestreo aleatorio simple.
Our objective was to verify the effectiveness of a program based on the Life Skills Training appr... more Our objective was to verify the effectiveness of a program based on the Life Skills Training approach with a greater extent than usual, not applied by teachers and a very high degree of reliability regarding the implementation of the expected content. Twenty-eight secondary schools in Granada (Spain) were randomly assigned to the intervention or control group. The students in the intervention group received 21 one-hour sessions in the first year and 12 one-hour sessions in the second year, whereas those in the control group received no health education or preventive sessions. Students completed questionnaires before and after the first year of sessions, before and after the second year, and at 1 year after the program. All five questionnaires were completed by 77% of the 1048 students initially enrolled in the study. The results suggest that the program had no preventive effects either immediately or at 1 year after its application. Application of the Life Skills Training approach does not appear to prevent the onset of smoking but may prove effective for avoiding escalation of the consumption levels of tobacco or other problematic drugs.
This article has earned an open data badge "Reproducible Research" for making publicly available ... more This article has earned an open data badge "Reproducible Research" for making publicly available the code necessary to reproduce the reported results. The results reported in this article could fully be reproduced.
The estimation of a finite population distribution function is considered when there are missing ... more The estimation of a finite population distribution function is considered when there are missing data. Calibration adjustment is used for dealing with nonresponse at the estimation stage. Several procedures are proposed and compared. A numerical study is carried out to evaluate the performances of estimators. Computational problems with the implementation of the proposed calibration estimators are also considered.
Rigor académico, oficio periodístico Dos cartas publicadas en The Lancet y The Lancet Public Heal... more Rigor académico, oficio periodístico Dos cartas publicadas en The Lancet y The Lancet Public Health en los últimos meses defienden la necesidad de evaluar de forma independiente la respuesta española ante la covid-19. Estamos de acuerdo, pero nos gustaría complementarlas con tres puntos que nos acercan a la ciencia abierta. Nos referimos a la confusión terminológica, la calidad de los datos y su disponibilidad. Según las cartas, una de las razones por las que España se ha visto más afectada por la pandemia es la poca confianza que hay en el asesoramiento científico. Creemos que gran parte de esta desconfianza está ocasionada por una importante confusión terminológica. Esta se produce en prácticamente todos los medios de comunicación (con notables excepciones), y conduce a una malinterpretación de los datos y, como consecuencia, a una pérdida de confianza en los sistemas de información sanitaria, en la investigación y la epidemiología. ¿Casos nuevos o positivos? ¿Sospechosos o confirmados? La Shutterstock / Cryptographer Covid-19: la malinterpretación de los datos de la pandemia daña la confianza del público
In the last years, web surveys have established themselves as one of the main methods in empirica... more In the last years, web surveys have established themselves as one of the main methods in empirical research. However, the effect of coverage and selection bias in such surveys has undercut their utility for statistical inference in finite populations. To compensate for these biases, researchers have employed a variety of statistical techniques to adjust nonprobability samples so that they more closely match the population. In this study, we test the potential of the XGBoost algorithm in the most important methods for estimation that integrate data from a probability survey and a nonprobability survey. At the same time, a comparison is made of the effectiveness of these methods for the elimination of biases. The results show that the four proposed estimators based on gradient boosting frameworks can improve survey representativity with respect to other classic prediction methods. The proposed methodology is also used to analyze a real nonprobability survey sample on the social effect...
The problem of estimation of a finite population mean for the current occasion based on the sampl... more The problem of estimation of a finite population mean for the current occasion based on the samples selected over two occasions has been considered. For the case when the auxiliary variables are negatively correlated, a double-sampling product estimate from the matched portion of the sample is presented. Expressions for optimum estimator and its variance have been derived. The gain in efficiency of the combined estimate over the direct estimate using no information gathered on the first occasion is computed.
The complete shape of an active fold in the western margin of the South Caspian Basin has been es... more The complete shape of an active fold in the western margin of the South Caspian Basin has been established using a selected seismic section from a post-stacked seismic cube migrated in depth. The structure is an open anticline, which deforms a thick sequence (∼7 km) of Late Miocene to Pliocene sediments: the Productive Series (PS; 5.9 to ∼3.4-3.1 Ma). A major erosive unconformity separates the most recent sediments with onlap and draping geometries towards the anticline culmination. Deformation is reconstructed using the complete fit of numerous seismic reflections by the nonparametric regression method. This has been implemented in the programming language R to estimate for example, the flanks dip, the folded areas of every deformed horizon, alike their length in both, the deformed and the pre-fold situation. It is inferred that this fold has a detachment surface located at 9.6 km depth. The fold geometry resembles a detachment fold, although it is reconstructed a long-lived history of basinward tilting accompanying sedimentation and folding, which accelerated from 0.15 • /Ma to 0.31 • /Ma during deposition of the PS. Fold growth started at 3.5-3.4 Ma within the upper PS with a shortening rate of 0.2 mm/yr and coinciding with maximum sedimentation rates (3.24 mm/yr). Folding continued up-to-Present under lower sedimentation rates (av. 0.66 ±0.2 mm/yr) and a shortening rate that increased slightly from 0.17mm/yr
Online surveys, despite their cost and effort advantages, are particularly prone to selection bia... more Online surveys, despite their cost and effort advantages, are particularly prone to selection bias due to the differences between target population and potentially covered population (online population). This leads to the unreliability of estimates coming from online samples unless further adjustments are applied. Some techniques have arisen in the last years regarding this issue, among which superpopulation modeling can be useful in Big Data context where censuses are accesible. This technique uses the sample to train a model capturing the behaviour of a target variable which is to be estimated, and applies it to the nonsampled individuals to obtain population-level estimates. The modeling step has been usually done with linear regression or LASSO models, but machine learning (ML) algorithms has been pointed out as promising alternatives. In this study we examine the use of these algorithms in the online survey context, in order to evaluate and compare their performance and adequacy to the problem. A simulation study shows that ML algorithms can effectively volunteering bias to a greater extent than traditional methods in several scenarios.
Background: This manuscript describes the rationale and protocol of a real-world data (RWD) study... more Background: This manuscript describes the rationale and protocol of a real-world data (RWD) study entitled Health Care and Social Survey (ESSOC, Encuesta Sanitaria y Social). The study’s objective is to determine the magnitude, characteristics, and evolution of the COVID-19 impact on overall health as well as the socioeconomic, psychosocial, behavioural, occupational, environmental, and clinical determinants of both the general and more vulnerable population. Methods: The study integrates observational data collected through a survey using a probabilistic, overlapping panel design, and data from clinical, epidemiological, demographic, and environmental registries. The data will be analysed using advanced statistical, sampling, and machine learning techniques. The study is based on several measurements obtained from three random samples of the Andalusian (Spain) population: general population aged 16 years and over, residents of disadvantaged areas, and people over the age of 55. Dis...
New calibrated estimators of quantiles and poverty measures are proposed. These estimators combin... more New calibrated estimators of quantiles and poverty measures are proposed. These estimators combine the incorporation of auxiliary information provided by auxiliary variables related to the variable of interest by calibration techniques with the selection of optimal calibration points under simple random sampling without replacement. The problem of selecting calibration points that minimize the asymptotic variance of the quantile estimator is addressed. Once the problem is solved, the definition of the new quantile estimator requires that the optimal estimator of the distribution function on which it is based verifies the properties of the distribution function. Through a theorem, the nondecreasing monotony property for the optimal estimator of the distribution function is established and the corresponding optimal estimator can be defined. This optimal quantile estimator is also used to define new estimators for poverty measures. Simulation studies with real data from the Spanish living conditions survey compares the performance of the new estimators against various methods proposed previously, where some resampling techniques are used for the variance estimation. Based on the results of the simulation study, the proposed estimators show a good performance and are a reasonable alternative to other estimators.
Modern survey methods may be subject to non-observable bias, from various sources. Among online s... more Modern survey methods may be subject to non-observable bias, from various sources. Among online surveys, for example, selection bias is prevalent, due to the sampling mechanism commonly used, whereby participants self-select from a subgroup whose characteristics differ from those of the target population. Several techniques have been proposed to tackle this issue. One such is Propensity Score Adjustment (PSA), which is widely used and has been analysed in various studies. The usual method of estimating the propensity score is logistic regression, which requires a reference probability sample in addition to the online nonprobability sample. The predicted propensities can be used for reweighting using various estimators. However, in the online survey context, there are alternatives that might outperform logistic regression regarding propensity estimation. The aim of the present study is to determine the efficiency of some of these alternatives, involving Machine Learning (ML) classification algorithms. PSA is applied in two simulation scenarios, representing situations commonly found in online surveys, using logistic regression and ML models for propensity estimation. The results obtained show that ML algorithms remove selection bias more effectively than logistic regression when used for PSA, but that their efficacy depends largely on the selection mechanism employed and the dimensionality of the data.
Data from complex survey designs require special consideration with regard to estimation of finit... more Data from complex survey designs require special consideration with regard to estimation of finite population parameters and corresponding variance estimation procedures, as a consequence of significant departures from the simple random sampling assumption. In the past decade a number of statistical software packages have been developed to facilitate the analysis of complex survey data. All these statistical software packages are able to treat samples selected from one sampling frame containing all population units. Dual frame surveys are very useful when it is not possible to guarantee a complete coverage of the target population and may result in considerable cost savings over a single frame design with comparable precision. There are several estimators available in the statistical literature but no existing software covers dual frame estimation procedures. This gap is now filled by package Frames2. In this paper we highlight the main features of the package. The package includes the main estimators in dual frame surveys and also provides interval confidence estimation.
Female genital cutting (FGC) has major implications for women's physical, sexual and psycholo... more Female genital cutting (FGC) has major implications for women's physical, sexual and psychological health, and eliminating the practice is a key target for public health policy-makers. To date one of the main barriers to achieving this has been an inability to infer privately-held views on FGC within communities where it is prevalent. As a sensitive (and often illegal) topic, people are anticipated to hide their true support for the practice when questioned directly. Here we use an indirect questioning method (unmatched count technique) to identify hidden support for FGC in a rural South Central Ethiopian community where the practice is common, but thought to be in decline. Employing a socio-demographic household survey of 1620 Arsi Oromo adults, which incorporated both direct and indirect direct response (unmatched count) techniques we compare directly-stated versus privately-held views in support of FGC, and individual variation in responses by age, gender and education and ta...
El objetivo de este estudio fue evaluar el efecto de la asertividad y de la gravedad del consumo ... more El objetivo de este estudio fue evaluar el efecto de la asertividad y de la gravedad del consumo de drogas en el riesgo de recaída (a los seis meses) en dos grupos (abstinencia vs. recaída) que han recibido tratamiento. Participaron 90 drogodependientes que rellenaron la "Entrevista de investigación acerca del comportamiento adictivo" y el "Inventario de asertividad de Rathus" (RAS). Encontramos que la asertividad global y la dimensión de Confrontación (del RAS) estaban directamente relacionadas con el riesgo de recaída, indicando que a mayores puntuaciones en asertividad y confrontación mayor es el riesgo de recaída. Otras dimensiones del RAS (Defensa de los derechos e intereses personales, Evitación de enfrentamientos personales y Espontaneidad), así como la cronicidad del consumo de drogas mostraron una relación inversa con el riesgo de recaída, indicando que a mayores puntuaciones en estas dimensiones de asertividad y mayor cronicidad de consumo menor riesgo de recaída. Estos resultados señalan la importancia de analizar qué entrenamiento asertivo realizamos para prevenir las recaídas en las drogodependencias y la elección de instrumentos de evaluación para la asertividad.
The measurement of the surface molar fraction of CO 2 (atmosphere and sea water) and water column... more The measurement of the surface molar fraction of CO 2 (atmosphere and sea water) and water column pH T , total alkalinity, A T , nutrients and oxygen were carried out in spring 2000 at the European Station for Time Series in the Ocean at the Canary Islands (ESTOC) and in the area located south of the Canary Islands. The significant eddy field strongly affecting the pattern of the chemical and carbonate system variables is presented and discussed. A mixing model based on the thermohaline properties of the water masses was established. The model explained over 97% of the variability found in the distribution of the chemical variables. Intermediate waters to the south of the Canary Islands show a high contribution of Antarctic waters with about 5% of pure Antarctic Intermediate Water. Moreover, the surface structure affected the atmosphere-ocean carbon dioxide exchange, making the area act as a CO 2 sink taking up 9.1 mmol m-2 week-1 , corresponding to 0.03 Mt of CO 2 which were taken up by the area in a week at the end of March 2000.
