Artificial intelligence has evolved enormously over the last two decades, becoming mainstream in ... more Artificial intelligence has evolved enormously over the last two decades, becoming mainstream in different scientific domains including education, where so far, it is mainly utilized to enhance administrative and intelligent tutoring systems’ services and academic support. ChatGPT, an artificial intelligence-based chatbot, developed by OpenAI and released in November 2022, has rapidly gained attention from the entire international community for its impressive performance in generating comprehensive, systematic, and informative human-like responses to user input through natural language processing. Inevitably, it has also rapidly posed several challenges, opportunities, and potential issues and concerns raised regarding its use across various scientific disciplines. This paper aims to discuss the legal and ethical implications arising from this new technology, identify potential use cases, and enrich our understanding of Generative AI, such as ChatGPT, and its capabilities in educati...
In this paper, a Markov Regime Switching Model of Conditional Mean with covariates, is proposed a... more In this paper, a Markov Regime Switching Model of Conditional Mean with covariates, is proposed and investigated for the analysis of incidence rate data. The components of the model are selected by both penalized likelihood techniques in conjunction with the Expectation Maximization algorithm, with the goal of achieving a high level of robustness regarding the modeling of dynamic behaviors of epidemiological data. In addition to statistical inference, Changepoint Detection Analysis is performed for the selection of the number of regimes, which reduces the complexity associated with Likelihood Ratio Tests. Within this framework, a three-phase procedure for modeling incidence data is proposed and tested via real and simulated data.
OBJECTIVES To assess the data quality, reliability, and construct validity of the Greek EUROPEP a... more OBJECTIVES To assess the data quality, reliability, and construct validity of the Greek EUROPEP and to examine the instrument's robustness in terms of its psychometric properties in a pre- and post-economic crisis period. METHODS Taking into account the two key factors that affect the accuracy and quality of survey data, that is the representativeness of the sample selected from the population, and the response rate, four hundred ninety-two and five hundred thirty-two patients (492 and 532 patients) pre- and post-crisis, respectively, consulting GPs at 16 Primary Health Care Centers (PHCCs) in Greece were invited to complete the Greek EUROPEP. We assessed item missing, ceiling and floor effects, and used factor analysis to assess the structure of the 23 items of the EUROPEP. Scales were tested for reliability and construct validity. We further examined if the scales of EUROPEP need to be refined, taking into account the external validity across economic crises. RESULTS Factor analysis identified three groups of questions that formed scales with satisfactory internal consistency reliability, and validity. The clinical behavior scale, the support, and services scale, and the organization of care scale, all met the criterion of 0.7 for Cronbach's alpha. All scales were found to have a significant correlation with the majority of the examined variables. Moreover, the EUROPEP was found to be robust in effectively detecting differences in patients' views over time in different economic contexts. CONCLUSIONS The study identified three scales in the Greek EUROPEP-questionnaire with satisfactory psychometric properties, and its Greek version could be used in the recent primary health care (PHC) reform in this country.
Worldwide, the detection of epidemics has been recognized as a continuing problem of crucial impo... more Worldwide, the detection of epidemics has been recognized as a continuing problem of crucial importance to public health surveillance. Various approaches for detecting and quantifying epidemics of infectious diseases in the recent literature are directly influenced by methods of Statistical Process Control (SPC). However, implementing SPC quality tools directly to the general health care monitoring problem, in a similar manner as in industrial quality control, is not feasible since many assumptions such as stationarity, known asymptotic distribution etc. are not met. Toward this end, in this paper, some of the open statistical research issues involved in this field are discussed, and a distribution-free control charting technique based on change-point analysis is applied and evaluated for detection of epidemics. The main tool in this methodology is the detection of unusual trends, in the sense that the beginning of an unusual trend marks a switch from a control state to an epidemic ...
When it comes to incidence data, most of the work on this field focuses on the modeling of nonext... more When it comes to incidence data, most of the work on this field focuses on the modeling of nonextreme periods. Several attempts have been made and a variety of techniques are available to achieve so. In this work, in order to model not only the nonextreme periods but also capture the behavior of the whole time-series, we make use of a dataset on influenza-like illness rate for Greece, for the period 2014–2016. The identification of extreme periods is made possible via changepoint detection analysis and model selection techniques are developed in order to identify the optimal periodic-type auto-regressive moving average model with covariates that best describes the pattern of the time-series. In addition, in the context of incidence data modeling, an advanced algorithm was developed in order to improve the accuracy of the selected model. The derived results are satisfactory since the changepoint method seems to identify correctly the extreme periods, and the selected model: (1) estim...
Journal of Statistical Computation and Simulation, 2015
Supersaturated designs (SSDs) are defined as fractional factorial designs whose experimental run ... more Supersaturated designs (SSDs) are defined as fractional factorial designs whose experimental run size is smaller than the number of main effects to be estimated. While most of the literature on SSDs has focused only on main effects designs, the construction and analysis of such designs involving interactions has not been developed to a great extent. In this paper, we propose a backward elimination design-driven optimization (BEDDO) method, with one main goal in mind, to eliminate the factors which are identified to be fully aliased or highly partially aliased with each other in the design. Under the proposed BEDDO method, we implement and combine correlation-based statistical measures taken from classical test theory and design of experiments field, and we also present an optimality criterion which is a modified form of Cronbach's alpha coefficient. In this way, we provide a new class of computer-aided unbalanced SSDs involving interactions, that derive directly from BEDDO optimization.
Journal of Statistics Applications & Probability, 2013
The problem of statistical modelling and identifying the significant variables in large data sets... more The problem of statistical modelling and identifying the significant variables in large data sets is common nowadays. This paper deals with the statistical analysis of two large dimensional data sets; we firstly conduct a seismic hazard sensitivity analysis using seismic data from Greece acquired during the years 1962− 2003, and then analyze Trauma data collected in an annual registry conducted during the year 2005 by the Hellenic Trauma and Emergency Surgery Society involving 30 General Hospitals in Greece. The main purpose of both analyses is to extract high-level knowledge for the domain user or decision-maker. Eight non parametric classifiers derived from data mining methods (Multilayer Perceptrons (M LP) Neural Networks, Radial Basis Function Neural (RBFN) Networks, Bayesian Networks, Support Vector Machines (SVMs), Classification and Regression Tree (CR assess the importance of several input variables in order to detect the possible risk factors of large earthquakes or to prevent trauma dea ths, and examine which classifiers are most suited for a large dimensional data analysis, detecting effectively complex nonlinear relatio nships and potentially lead to more accurate predictions.
Proceedings of the 10th IEEE International Conference on Information Technology and Applications in Biomedicine, 2010
Last decades predictive models that assess probability of survival for trauma victims have been d... more Last decades predictive models that assess probability of survival for trauma victims have been developed. Some of the most commonly used are the TRISS methodology, the logistic regression modelling technique, and the Revised Trauma Score which derive from specific input variables. However recently, the development of neural network models reveal encouraging results as they most of the times outperform the traditional approaches. In this paper we present models' predictability comparing the Area Under the Curve and we discuss the results.
International Journal of Information and Decision Sciences, 2013
Variable selection is fundamental to statistical modelling in diverse fields of sciences. This pa... more Variable selection is fundamental to statistical modelling in diverse fields of sciences. This paper deals with the problem of high-dimensional statistical modelling through the analysis of seismological data in Greece acquired during the years 1962-2003. The dataset consists of 10,333 observations and 11 factors, used to detect possible risk factors of large earthquakes. In our study, different statistical variable selection techniques are applied, while data mining techniques enable us to discover associations, meaningful patterns and rules. The statistical methods employed in this work were the non-concave penalised likelihood methods, SCAD, LASSO and Hard, the generalised linear logistic regression and the best subset variable selection. The applied data mining methods were three decision trees algorithms, the classification and regression tree (C%RT), the chi-square automatic interaction detection (CHAID) and the C5.0 algorithm. The way of identifying the significant variables in large datasets along with the performance of used techniques are also discussed.
2013 International Conference on Availability, Reliability and Security, 2013
ABSTRACT Nowadays, variable selection is fundamental to large dimensional statistical modelling p... more ABSTRACT Nowadays, variable selection is fundamental to large dimensional statistical modelling problems, since large databases exist in diverse fields of science. In this paper, we benefit from the use of data mining tools and experimental designs in databases in order to select the most relevant variables for classification in regression problems in cases where observations and labels of a real-world dataset are available. Specifically, this study is of particular interest to use health data to identify the most significant variables containing all the necessary important information for classification and prediction of new data with respect to a certain effect (survival or death). The main goal is to determine the most important variables using methods that arise from the field of design of experiments combined with algorithmic concepts derived from data mining and metaheuristics. Our approach seems promising, since we are able to retrieve an optimal plan using only 6 runs of the available 8862 runs.
Abstract: The problem of variable selection is fundamental to statistical modelling in diverse fi... more Abstract: The problem of variable selection is fundamental to statistical modelling in diverse fields of sciences. In this paper, we study in particular the problem of selecting important variables in regression problems in the case where observations and labels of a real-world dataset are available. At first, we examine the performance of several existing statistical methods for analyzing a real large trauma dataset which consists of 7000 observations and 70 factors, that include demographic, transport and intrahospital data. The statistical methods employed in this work are the nonconcave penalized likelihood methods (SCAD, LASSO, and Hard), the generalized linear logistic regression, and the best subset variable selection (with AIC and BIC), used to detect possible risk factors of death. Supersaturated designs (SSDs) are a large class of factorial designs which can be used for screening out the important factors from a large set of potentially active variables. This paper present...
The last two decades, the emergence of new infectious diseases and the occasional rapid increase ... more The last two decades, the emergence of new infectious diseases and the occasional rapid increase of their cases worldwide, the intense concern about bioterrorism, pandemic influenza or/and other Public Health threats, and the increasing volumes of epidemiological data, are all key factors that made necessary the development of advanced biosurveillance systems. Additionally, these factors have resulted in the awakening of the scientific community for introducing new and more efficient epidemic outbreak detection methods. As seen from above, the biosurveillance is a dynamic scientific activity which progresses and requires systematic monitoring of developments in the field of health sciences and biostatistics. This paper deals with the development of statistical regression modelling techniques in order to provide guidelines for the selection of the optimal periodic regression model for early and accurate outbreak detection in an epidemiological surveillance system, as well as for its ...
Artificial intelligence has evolved enormously over the last two decades, becoming mainstream in ... more Artificial intelligence has evolved enormously over the last two decades, becoming mainstream in different scientific domains including education, where so far, it is mainly utilized to enhance administrative and intelligent tutoring systems’ services and academic support. ChatGPT, an artificial intelligence-based chatbot, developed by OpenAI and released in November 2022, has rapidly gained attention from the entire international community for its impressive performance in generating comprehensive, systematic, and informative human-like responses to user input through natural language processing. Inevitably, it has also rapidly posed several challenges, opportunities, and potential issues and concerns raised regarding its use across various scientific disciplines. This paper aims to discuss the legal and ethical implications arising from this new technology, identify potential use cases, and enrich our understanding of Generative AI, such as ChatGPT, and its capabilities in educati...
In this paper, a Markov Regime Switching Model of Conditional Mean with covariates, is proposed a... more In this paper, a Markov Regime Switching Model of Conditional Mean with covariates, is proposed and investigated for the analysis of incidence rate data. The components of the model are selected by both penalized likelihood techniques in conjunction with the Expectation Maximization algorithm, with the goal of achieving a high level of robustness regarding the modeling of dynamic behaviors of epidemiological data. In addition to statistical inference, Changepoint Detection Analysis is performed for the selection of the number of regimes, which reduces the complexity associated with Likelihood Ratio Tests. Within this framework, a three-phase procedure for modeling incidence data is proposed and tested via real and simulated data.
OBJECTIVES To assess the data quality, reliability, and construct validity of the Greek EUROPEP a... more OBJECTIVES To assess the data quality, reliability, and construct validity of the Greek EUROPEP and to examine the instrument's robustness in terms of its psychometric properties in a pre- and post-economic crisis period. METHODS Taking into account the two key factors that affect the accuracy and quality of survey data, that is the representativeness of the sample selected from the population, and the response rate, four hundred ninety-two and five hundred thirty-two patients (492 and 532 patients) pre- and post-crisis, respectively, consulting GPs at 16 Primary Health Care Centers (PHCCs) in Greece were invited to complete the Greek EUROPEP. We assessed item missing, ceiling and floor effects, and used factor analysis to assess the structure of the 23 items of the EUROPEP. Scales were tested for reliability and construct validity. We further examined if the scales of EUROPEP need to be refined, taking into account the external validity across economic crises. RESULTS Factor analysis identified three groups of questions that formed scales with satisfactory internal consistency reliability, and validity. The clinical behavior scale, the support, and services scale, and the organization of care scale, all met the criterion of 0.7 for Cronbach's alpha. All scales were found to have a significant correlation with the majority of the examined variables. Moreover, the EUROPEP was found to be robust in effectively detecting differences in patients' views over time in different economic contexts. CONCLUSIONS The study identified three scales in the Greek EUROPEP-questionnaire with satisfactory psychometric properties, and its Greek version could be used in the recent primary health care (PHC) reform in this country.
Worldwide, the detection of epidemics has been recognized as a continuing problem of crucial impo... more Worldwide, the detection of epidemics has been recognized as a continuing problem of crucial importance to public health surveillance. Various approaches for detecting and quantifying epidemics of infectious diseases in the recent literature are directly influenced by methods of Statistical Process Control (SPC). However, implementing SPC quality tools directly to the general health care monitoring problem, in a similar manner as in industrial quality control, is not feasible since many assumptions such as stationarity, known asymptotic distribution etc. are not met. Toward this end, in this paper, some of the open statistical research issues involved in this field are discussed, and a distribution-free control charting technique based on change-point analysis is applied and evaluated for detection of epidemics. The main tool in this methodology is the detection of unusual trends, in the sense that the beginning of an unusual trend marks a switch from a control state to an epidemic ...
When it comes to incidence data, most of the work on this field focuses on the modeling of nonext... more When it comes to incidence data, most of the work on this field focuses on the modeling of nonextreme periods. Several attempts have been made and a variety of techniques are available to achieve so. In this work, in order to model not only the nonextreme periods but also capture the behavior of the whole time-series, we make use of a dataset on influenza-like illness rate for Greece, for the period 2014–2016. The identification of extreme periods is made possible via changepoint detection analysis and model selection techniques are developed in order to identify the optimal periodic-type auto-regressive moving average model with covariates that best describes the pattern of the time-series. In addition, in the context of incidence data modeling, an advanced algorithm was developed in order to improve the accuracy of the selected model. The derived results are satisfactory since the changepoint method seems to identify correctly the extreme periods, and the selected model: (1) estim...
Journal of Statistical Computation and Simulation, 2015
Supersaturated designs (SSDs) are defined as fractional factorial designs whose experimental run ... more Supersaturated designs (SSDs) are defined as fractional factorial designs whose experimental run size is smaller than the number of main effects to be estimated. While most of the literature on SSDs has focused only on main effects designs, the construction and analysis of such designs involving interactions has not been developed to a great extent. In this paper, we propose a backward elimination design-driven optimization (BEDDO) method, with one main goal in mind, to eliminate the factors which are identified to be fully aliased or highly partially aliased with each other in the design. Under the proposed BEDDO method, we implement and combine correlation-based statistical measures taken from classical test theory and design of experiments field, and we also present an optimality criterion which is a modified form of Cronbach's alpha coefficient. In this way, we provide a new class of computer-aided unbalanced SSDs involving interactions, that derive directly from BEDDO optimization.
Journal of Statistics Applications & Probability, 2013
The problem of statistical modelling and identifying the significant variables in large data sets... more The problem of statistical modelling and identifying the significant variables in large data sets is common nowadays. This paper deals with the statistical analysis of two large dimensional data sets; we firstly conduct a seismic hazard sensitivity analysis using seismic data from Greece acquired during the years 1962− 2003, and then analyze Trauma data collected in an annual registry conducted during the year 2005 by the Hellenic Trauma and Emergency Surgery Society involving 30 General Hospitals in Greece. The main purpose of both analyses is to extract high-level knowledge for the domain user or decision-maker. Eight non parametric classifiers derived from data mining methods (Multilayer Perceptrons (M LP) Neural Networks, Radial Basis Function Neural (RBFN) Networks, Bayesian Networks, Support Vector Machines (SVMs), Classification and Regression Tree (CR assess the importance of several input variables in order to detect the possible risk factors of large earthquakes or to prevent trauma dea ths, and examine which classifiers are most suited for a large dimensional data analysis, detecting effectively complex nonlinear relatio nships and potentially lead to more accurate predictions.
Proceedings of the 10th IEEE International Conference on Information Technology and Applications in Biomedicine, 2010
Last decades predictive models that assess probability of survival for trauma victims have been d... more Last decades predictive models that assess probability of survival for trauma victims have been developed. Some of the most commonly used are the TRISS methodology, the logistic regression modelling technique, and the Revised Trauma Score which derive from specific input variables. However recently, the development of neural network models reveal encouraging results as they most of the times outperform the traditional approaches. In this paper we present models' predictability comparing the Area Under the Curve and we discuss the results.
International Journal of Information and Decision Sciences, 2013
Variable selection is fundamental to statistical modelling in diverse fields of sciences. This pa... more Variable selection is fundamental to statistical modelling in diverse fields of sciences. This paper deals with the problem of high-dimensional statistical modelling through the analysis of seismological data in Greece acquired during the years 1962-2003. The dataset consists of 10,333 observations and 11 factors, used to detect possible risk factors of large earthquakes. In our study, different statistical variable selection techniques are applied, while data mining techniques enable us to discover associations, meaningful patterns and rules. The statistical methods employed in this work were the non-concave penalised likelihood methods, SCAD, LASSO and Hard, the generalised linear logistic regression and the best subset variable selection. The applied data mining methods were three decision trees algorithms, the classification and regression tree (C%RT), the chi-square automatic interaction detection (CHAID) and the C5.0 algorithm. The way of identifying the significant variables in large datasets along with the performance of used techniques are also discussed.
2013 International Conference on Availability, Reliability and Security, 2013
ABSTRACT Nowadays, variable selection is fundamental to large dimensional statistical modelling p... more ABSTRACT Nowadays, variable selection is fundamental to large dimensional statistical modelling problems, since large databases exist in diverse fields of science. In this paper, we benefit from the use of data mining tools and experimental designs in databases in order to select the most relevant variables for classification in regression problems in cases where observations and labels of a real-world dataset are available. Specifically, this study is of particular interest to use health data to identify the most significant variables containing all the necessary important information for classification and prediction of new data with respect to a certain effect (survival or death). The main goal is to determine the most important variables using methods that arise from the field of design of experiments combined with algorithmic concepts derived from data mining and metaheuristics. Our approach seems promising, since we are able to retrieve an optimal plan using only 6 runs of the available 8862 runs.
Abstract: The problem of variable selection is fundamental to statistical modelling in diverse fi... more Abstract: The problem of variable selection is fundamental to statistical modelling in diverse fields of sciences. In this paper, we study in particular the problem of selecting important variables in regression problems in the case where observations and labels of a real-world dataset are available. At first, we examine the performance of several existing statistical methods for analyzing a real large trauma dataset which consists of 7000 observations and 70 factors, that include demographic, transport and intrahospital data. The statistical methods employed in this work are the nonconcave penalized likelihood methods (SCAD, LASSO, and Hard), the generalized linear logistic regression, and the best subset variable selection (with AIC and BIC), used to detect possible risk factors of death. Supersaturated designs (SSDs) are a large class of factorial designs which can be used for screening out the important factors from a large set of potentially active variables. This paper present...
The last two decades, the emergence of new infectious diseases and the occasional rapid increase ... more The last two decades, the emergence of new infectious diseases and the occasional rapid increase of their cases worldwide, the intense concern about bioterrorism, pandemic influenza or/and other Public Health threats, and the increasing volumes of epidemiological data, are all key factors that made necessary the development of advanced biosurveillance systems. Additionally, these factors have resulted in the awakening of the scientific community for introducing new and more efficient epidemic outbreak detection methods. As seen from above, the biosurveillance is a dynamic scientific activity which progresses and requires systematic monitoring of developments in the field of health sciences and biostatistics. This paper deals with the development of statistical regression modelling techniques in order to provide guidelines for the selection of the optimal periodic regression model for early and accurate outbreak detection in an epidemiological surveillance system, as well as for its ...
Uploads
Papers by Christina Parpoula