Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
Jinseog Kim

    Jinseog Kim

    Research Interests:
    Yuan an Lin (2004) proposed the grouped LASSO, which achieves shrinkage and selection simultaneously, as LASSO does, but works on blocks of covariates. That is, the grouped LASSO provides a model where some blocks of regression... more
    Yuan an Lin (2004) proposed the grouped LASSO, which achieves shrinkage and selection simultaneously, as LASSO does, but works on blocks of covariates. That is, the grouped LASSO provides a model where some blocks of regression coefficients are exactly zero. The grouped LASSO is useful when there are meaningful blocks of covariates such as polynomial regression and dummy variables from categorical variables. In this paper, we propose an extension of the grouped LASSO, called ‘Blockwise Sparse Regression’ (BSR). The BSR achieves shrinkage and selection simultaneously on blocks of covariates similarly to the grouped LASSO, but it works for general loss functions including generalized linear models. An efficient computational algorithm is developed and a blockwise standardization method is proposed. Simulation results show that the BSR compromises the ridge and LASSO for logistic regression. The proposed method is illustrated with two datasets.
    The glucose metabolisms and serum lipid are assumed as possible intermediary mechanisms in linking between breast cancer (BC) and obesity. The current report examines the associations between diabetes mellitus (DM) markers (glucose and... more
    The glucose metabolisms and serum lipid are assumed as possible intermediary mechanisms in linking between breast cancer (BC) and obesity. The current report examines the associations between diabetes mellitus (DM) markers (glucose and insulin) and BC markers (monocyte chemoattractant protein-1 (MCP-1), resistin, adiponectin, leptin). Glucose model shows that mean glucose levels are higher for breast cancer women (P=0.0222) then normal. Mean glucose levels are positively associated with leptin (P<0.0001) and homeostasis model assessment score insulin resistance (HOMA-IR) (P<0.0001), while they are negatively associated with interaction effects HOMA-IR*leptin (P<0.0001) and leptin*adiponectin (P=0.0883). On the other hand, variance of glucose levels is positively associated with HOMA-IR (P<0.0001) and resistin (P=0.0218), while it is negatively associated with leptin (P<0.0001), MCP-1 (P=0.0115). Insulin model shows that mean insulin levels are positively associated wi...
    The Bayesian bootstrap for doubly censored data is constructed from the empirical likelihood perspective, and a Gibbs sampler algorithm is proposed for evaluating the Bayesian bootstrap posterior. The proposed Bayesian bootstrap posterior... more
    The Bayesian bootstrap for doubly censored data is constructed from the empirical likelihood perspective, and a Gibbs sampler algorithm is proposed for evaluating the Bayesian bootstrap posterior. The proposed Bayesian bootstrap posterior is shown to be the limit of the nonparametric posteriors with Dirich- let process priors as the prior information vanishes, and to be equivalent to the weighted bootstrap on the observables. A small simulation study shows that the proposed Bayesian bootstrap estimator compares favorably with the nonparametric maximum likelihood estimator; furthermore its asymptotic properties are studied.
    Supplemental Digital Content is available in the text Abstract Aged population with comorbidities demonstrated high mortality rate and severe clinical outcome in the patients with coronavirus disease 2019 (COVID-19). However, whether... more
    Supplemental Digital Content is available in the text Abstract Aged population with comorbidities demonstrated high mortality rate and severe clinical outcome in the patients with coronavirus disease 2019 (COVID-19). However, whether age-adjusted Charlson comorbidity index score (CCIS) predict fatal outcomes remains uncertain. This retrospective, nationwide cohort study was performed to evaluate patient mortality and clinical outcome according to CCIS among the hospitalized patients with COVID-19 infection. We included 5621 patients who had been discharged from isolation or had died from COVID-19 by April 30, 2020. The primary outcome was composites of death, admission to intensive care unit, use of mechanical ventilator or extracorporeal membrane oxygenation. The secondary outcome was mortality. Multivariate Cox proportional hazard model was used to evaluate CCIS as the independent risk factor for death. Among 5621 patients, the high CCIS (≥ 3) group showed higher proportion of elderly population and lower plasma hemoglobin and lower lymphocyte and platelet counts. The high CCIS group was an independent risk factor for composite outcome (HR 3.63, 95% CI 2.45–5.37, P < .001) and patient mortality (HR 22.96, 95% CI 7.20–73.24, P < .001). The nomogram showed that CCIS was the most important factor contributing to the prognosis followed by the presence of dyspnea (hazard ratio [HR] 2.88, 95% confidence interval [CI] 2.16–3.83), low body mass index < 18.5 kg/m2 (HR 2.36, CI 1.49–3.75), lymphopenia (<0.8 x109/L) (HR 2.15, CI 1.59–2.91), thrombocytopenia (<150.0 x109/L) (HR 1.29, CI 0.94–1.78), anemia (<12.0 g/dL) (HR 1.80, CI 1.33–2.43), and male sex (HR 1.76, CI 1.32–2.34). The nomogram demonstrated that the CCIS was the most potent predictive factor for patient mortality. The predictive nomogram using CCIS for the hospitalized patients with COVID-19 may help clinicians to triage the high-risk population and to concentrate limited resources to manage them.
    ABSTRACTAged population with comorbidities demonstrated high mortality rate and severe clinical outcome in the patients with coronavirus disease 2019 (COVID-19). However, whether age-adjusted Charlson comorbidity index score (CCIS)... more
    ABSTRACTAged population with comorbidities demonstrated high mortality rate and severe clinical outcome in the patients with coronavirus disease 2019 (COVID-19). However, whether age-adjusted Charlson comorbidity index score (CCIS) predict fatal outcomes remains uncertain. This retrospective, nationwide cohort study was performed to evaluate patient mortality and clinical outcome according to CCIS among the hospitalized patients with COVID-19 infection. We included 5,621 patients who had been discharged from isolation or had died from COVID-19 by April 30, 2020. The primary outcome was composites of death, admission to intensive care unit (ICU), use of mechanical ventilator or extracorporeal membrane oxygenation. The secondary outcome was mortality. Multivariate Cox proportional hazard model was used to evaluate CCIS as the independent risk factor for death. Among 5,621 patients, the high CCIS (≥3) group showed higher proportion of elderly population and lower plasma hemoglobin and ...
    We investigated clinical outcome of asymptomatic coronavirus disease 2019 (COVID-19) and identified risk factors associated with high patient mortality using Korean nationwide public database of 5,621 hospitalized patients. The mortality... more
    We investigated clinical outcome of asymptomatic coronavirus disease 2019 (COVID-19) and identified risk factors associated with high patient mortality using Korean nationwide public database of 5,621 hospitalized patients. The mortality rate and admission rate to intensive care unit were compared between asymptomatic and symptomatic patients. The prediction model for patient mortality was developed through risk factor analysis among asymptomatic patients. The prevalence of asymptomatic COVID-19 infection was 25.8%. The mortality rates were not different between groups (3.3% vs. 4.5%, p=0.17). However, symptomatic patients were more likely to receive ICU care compared to asymptomatic patients (4.1% vs. 1.0%, p<0.0001). The age-adjusted Charlson comorbidity index score (CCIS) was the most potent predictor for patient mortality in asymptomatic patients. The clinicians should predict the risk of death by evaluating age and comorbidities but not the presence of symptoms.Article Summa...
    Objectives: The role of Cholesterol and its relationship with some cardiac risk factors for heart patients are examined in the current report using both Cholesterol level and two cardiac factors modeling. Materials and methods: A real... more
    Objectives: The role of Cholesterol and its relationship with some cardiac risk factors for heart patients are examined in the current report using both Cholesterol level and two cardiac factors modeling. Materials and methods: A real data set of 303 heart patients with 14 study characters are considered in the report. Statistical joint generalized linear models (JGLMs) are considered using both Gamma & Log-normal distributions. Results: It is observed from Cholesterol level modeling that Cholesterol level is higher for female heart patients (P=0.0013) than male, or at older ages (P=0.0012) than younger. It is higher for the patients with high maximum heart rate (P=0.0877), or having resting electrocardiographic at normal level (P=0.0107), or with thalassemia at reversal defect (P=0.0466) and at fixed defect (P=0.0940) than at normal. It is also higher for the patients having heart disease diagnosis (angiographic disease status) value 0 (meaning less than 50% diameter narrowing) (P=0.0515) than others. Variance of Cholesterol level is higher for female patients (P=0.0265) than male, and it increases as ST depression induced by exercise relative to rest (Oldpeak) (P=0.0095) increases. From maximum heart rate modeling, it is noted that maximum heart rate increases as the Cholesterol level (P=0.0325) increases. In addition, variance of maximum heart rate decreases as the Cholesterol level (P=0.0058) increases. Also from resting blood pressure modeling, it is observed that mean resting blood pressure increases as the Cholesterol level increases, where it is a confounder in the model. Conclusions: Cholesterol levels should be examined regularly at older ages, along with the maximum heart rate achieved, thalassemia status, and resting blood pressure for both male and female heart patients.
    Abstract This paper suggests a novel way of dramatically improving the Naive Bayes text classifier with our semantic tensor space model for document representation. In our work, we intend to achieve a perfect text classification with the... more
    Abstract This paper suggests a novel way of dramatically improving the Naive Bayes text classifier with our semantic tensor space model for document representation. In our work, we intend to achieve a perfect text classification with the semantic Naive Bayes learning that incorporates the semantic concept features into term feature statistics; for this, the Naive Bayes learning is semantically augmented under the tensor space model where the ‘concept’ space is regarded as an independent space equated with the ‘term’ and ‘document’ spaces, and it is produced with concept-level informative Wikipedia pages associated with a given document corpus. Through extensive experiments using three popular document corpora including Reuters-21578, 20Newsgroups, and OHSUMED corpora, we prove that the proposed method not only has superiority over the recent deep learning-based classification methods but also shows nearly perfect classification performance.
    ABSTRACT
    The leaking of personal information is mostly occurred by internal users. The confidential information such as credit card number can be disclosed or modified by system manager easily. The secure storaging and managing scheme for... more
    The leaking of personal information is mostly occurred by internal users. The confidential information such as credit card number can be disclosed or modified by system manager easily. The secure storaging and managing scheme for sensitive data of individual and enterprise is required for distributed data management. The manager owning private data is needed to have a weight which is a right to disclose a private data. For deciding a weight, it is required that system is able to designate the level of user`s right. In this paper, we propose the new algorithm named digit-independent algorithm. And we propose a new data management scheme of gathering and processing the data based on digit-independent algorithm. Our sharing and recovering scheme have the efficient computation operation for managing a large quantity of data using weight table. The proposed scheme is able to use for secure e-business data management and storage in ubiquitous computing environment.
    The Bayesian bootstrap for doubly censored data is constructed from the empirical likelihood perspective, and a Gibbs sampler algorithm is proposed for evaluating the Bayesian bootstrap posterior. The proposed Bayesian bootstrap posterior... more
    The Bayesian bootstrap for doubly censored data is constructed from the empirical likelihood perspective, and a Gibbs sampler algorithm is proposed for evaluating the Bayesian bootstrap posterior. The proposed Bayesian bootstrap posterior is shown to be the limit of the nonparametric posteriors with Dirich-let process priors as the prior information vanishes, and to be equivalent to the weighted bootstrap on the observables. A small simulation study shows that the proposed Bayesian bootstrap estimator compares favorably with the nonparametric maximum likelihood estimator; furthermore its asymptotic properties are studied.
    Yuan an Lin (2004) proposed the grouped LASSO, which achieves shrink-age and selection simultaneously, as LASSO does, but works on blocks of covariates. That is, the grouped LASSO provides a model where some blocks of regression... more
    Yuan an Lin (2004) proposed the grouped LASSO, which achieves shrink-age and selection simultaneously, as LASSO does, but works on blocks of covariates. That is, the grouped LASSO provides a model where some blocks of regression co-efficients are exactly zero. The grouped LASSO is useful when there are meaningful blocks of covariates such as polynomial regression and dummy variables from cat-egorical variables. In this paper, we propose an extension of the grouped LASSO, called 'Blockwise Sparse Regression' (BSR). The BSR achieves shrinkage and se-lection simultaneously on blocks of covariates similarly to the grouped LASSO, but it works for general loss functions including generalized linear models. An efficient computational algorithm is developed and a blockwise standardization method is proposed. Simulation results show that the BSR compromises the ridge and LASSO for logistic regression. The proposed method is illustrated with two datasets.
    ABSTRACT In quality engineering, the most commonly used lifetime distributions are log-normal, exponential, gamma and Weibull. Experimental designs are useful for predicting the optimal operating conditions of the process in lifetime... more
    ABSTRACT In quality engineering, the most commonly used lifetime distributions are log-normal, exponential, gamma and Weibull. Experimental designs are useful for predicting the optimal operating conditions of the process in lifetime improvement experiments. In the present article, invariant robust first-order D-optimal designs are derived for correlated lifetime responses having the above four distributions. Robust designs are developed for some correlated error structures. It is shown that robust first-order D-optimal designs for these lifetime distributions are always robust rotatable but the converse is not true. Moreover, it is observed that these designs depend on the respective error covariance structure but are invariant to the above four lifetime distributions. This article generalizes the results of Das and Lin [7] for the above four lifetime distributions with general (intra-class, inter-class, compound symmetry, and tri-diagonal) correlated error structures.
    ABSTRACT Experimental designs are widely used in predicting the optimal operating conditions of the process parameters in lifetime improvement experiments. The most commonly observed lifetime distributions are log-normal, exponential,... more
    ABSTRACT Experimental designs are widely used in predicting the optimal operating conditions of the process parameters in lifetime improvement experiments. The most commonly observed lifetime distributions are log-normal, exponential, gamma and Weibull. In the present article, invariant robust first-order rotatable designs are derived for autocorrelated lifetime responses having log-normal, exponential, gamma and Weibull distributions. In the process, robust first-order D-optimal and rotatable conditions have been derived under these situations. For these lifetime distributions with correlated errors, it is shown that robust first-order D-optimal designs are always robust rotatable but the converse is not true. Moreover, it is observed that robust first-order D-optimal and rotatable designs depend on the respective error variance-covariance structure but are independent from these considered lifetime response distributions.
    A logistic regression method can be applied to regressing the [Formula: see text]-year survival probability to covariates, if there are no censored observations before time [Formula: see text]. But if some observations are incomplete due... more
    A logistic regression method can be applied to regressing the [Formula: see text]-year survival probability to covariates, if there are no censored observations before time [Formula: see text]. But if some observations are incomplete due to censoring before time [Formula: see text], then the logistic regression cannot be applied. Jung (1996) proposed to modify the score function for logistic regression to accommodate the right censored observations. His modified score function, motivated for a consistent estimation of regression parameters, becomes a regular logistic score function if no observations are censored before time [Formula: see text]. In this paper, we propose a modification of Jung's estimating function for an optimal estimation for the regression parameters in addition to consistency. We prove that the optimal estimator is more efficient than Jung's estimator. This theoretical comparison is illustrated with a real example data analysis and simulations.
    Herbicide-resistant creeping bentgrass plants (Agrostis stolonifera L.) without antibiotic-resistant markers were produced by Agrobacterium-mediated transformation. Embryogenic callus tissues were infected with Agrobacterium tumefaciens... more
    Herbicide-resistant creeping bentgrass plants (Agrostis stolonifera L.) without antibiotic-resistant markers were produced by Agrobacterium-mediated transformation. Embryogenic callus tissues were infected with Agrobacterium tumefaciens EHA105, harboring the bar and the CP4-EPSPS genes for bialaphos and glyphosate resistance. Phosphinothricin-resistant calli and plants were selected. Soil-grown plants were obtained at 14-16 weeks after transformation. Genetic transformation of the selected, regenerated plants was validated by PCR. Southern blot analysis revealed that at least one copy of the transgene was integrated into the genome of the transgenic plants. Transgene expression was confirmed by Northern blot. CP4-EPSPS protein was detected by ELISA. Transgenic plants remained green and healthy when sprayed with Basta, containing 0.5% glufosinate ammonium or glyphosate. The optimized Agrobacterium-mediated transformation method resulted in an average of 9.4% transgenic plants. The re...
    Ensemble methods have received much attention recently for their significant improvements in classification accuracy. However, ensemble algorithms do not provide any information about how the final decision is made. That is, ensemble... more
    Ensemble methods have received much attention recently for their significant improvements in classification accuracy. However, ensemble algorithms do not provide any information about how the final decision is made. That is, ensemble methods improve classification accuracy at the expense of interpretability. In this chapter, we investigate possibilities of using ensemble methods for generating useful rules, which help understanding the data set as well as the decision. Extensive review of three ensemble algorithms — bagging, boosting, and CHEM is presented and the algorithm of rule generation with CHEM is proposed. The proposed rule generation algorithm is illustrated with a real data set.
    ABSTRACT The Kalman filter has been widely used in estimating the state of a process and it is well known that no other algorithm can out-perform it if the assumptions of the Kalman filter hold. For a non-Gaussian estimation problem, both... more
    ABSTRACT The Kalman filter has been widely used in estimating the state of a process and it is well known that no other algorithm can out-perform it if the assumptions of the Kalman filter hold. For a non-Gaussian estimation problem, both the extended Kalman filter and particle filter have been widely used. However, no one has performed comparison test of them. In the consequence, they arbitrarily choose one of them and apply it on their estimation process. Therefore, we have compared the performance of the Kalman filter against the performance of the particle filter. One of the practical fields on which these filters have been applied is indoor positioning. As the techniques of manufacturing mobile terminals have made a big progress, the demand for LBS (location based services) also has rapidly grown. One of the key techniques for LBS is positioning, or determining the location of the mobile terminal. Outdoor positioning is not a big burden to system developers because GPS (Global Positioning System) provides pretty accurate location information of a mobile terminal if the line of sight is not blocked. On the contrary, there is no practical solution for the indoor positioning problem. We can obtain exact location of a mobile terminal if we invest large amount of money, but this is economically not practical. One of the most practical candidate solutions for the indoor positioning problem is the WLAN (Wireless Local Area Network) based positioning methods because they do not require any special devices dedicated for indoor positioning. One of the most significant shortcomings of them is inaccuracy due to the noise on measured data. In order to improve the accuracy of WLAN based indoor positioning, both the Kalman filter and the particle filter processes have been applied on the measurements. This paper introduces our experimental results of comparing the Kalman filter and the particle filter processes in improving the accuracy of WLAN based indoor positioning so that indoor LBS developers can choose appropriate one for their applications.
    ABSTRACT In regression models with positive observations, estimation is often based on either the log-normal or the gamma model. Generalised linear models and joint generalised linear models are appropriate for analysing positive data... more
    ABSTRACT In regression models with positive observations, estimation is often based on either the log-normal or the gamma model. Generalised linear models and joint generalised linear models are appropriate for analysing positive data with constant and non-constant variance, respectively. This article focuses on the use of these two techniques in hydrogeology. As an illustration, groundwater quality factors are analysed. Softness, non-alkalinity, content dissolved oxygen, chemical oxygen demand, chloride content and electrical conductivity are all the basic positive characteristics (i.e., values are positive in nature) for good drinking water. This article identifies the causal factors of these basic quality characteristics of groundwater at Muzaffarpur Town, Bihar, India, using the above techniques. Many statistical significant factors for these six basic quality characteristics of groundwater are detected. In the process, probabilistic model for each characteristic is developed. Effects of different factors on each characteristic are examined.
    ABSTRACT
    We consider a Bayesian analysis method of paired survival data using a bivariate exponential model proposed by Moran (1967, Biometrika 54:385-394). Important features of... more
    We consider a Bayesian analysis method of paired survival data using a bivariate exponential model proposed by Moran (1967, Biometrika 54:385-394). Important features of Moran's model include that the marginal distributions are exponential and the range of the correlation coefficient is between 0 and 1. These contrast with the popular exponential model with gamma frailty. Despite these nice properties, statistical analysis with Moran's model has been hampered by lack of a closed form likelihood function. In this paper, we introduce a latent variable to circumvent the difficulty in the Bayesian computation. We also consider a model checking procedure using the predictive Bayesian P-value.
    ABSTRACT In this paper, we propose a sparse semi-supervised learning method, which combines the large margin approach and L 1 constraint. The major difficulty of the proposed method is computation since the objective function to be... more
    ABSTRACT In this paper, we propose a sparse semi-supervised learning method, which combines the large margin approach and L 1 constraint. The major difficulty of the proposed method is computation since the objective function to be minimized is non-convex and non-differentiable. To resolve this obstacle, we develop an efficient computa-tional algorithm, which is a hybrid of the CCCP and gradient Lasso algorithm. The advantage of the proposed method over existing semi-supervised learning methods is that it can identify a small number of relevant input variables while keeping the prediction accuracy high. To confirm these advantages, we compare the pro-posed method with the standard semi-supervised method by simulations as well as analyzing real data sets.
    We consider a Bayesian analysis method of paired survival data using a bivariate exponential model proposed by Moran (1967, Biometrika 54:385–394). Important features of Moran’s model include that the marginal distributions are... more
    We consider a Bayesian analysis method of paired survival data using a bivariate exponential model proposed by Moran (1967, Biometrika 54:385–394). Important features of Moran’s model include that the marginal distributions are exponential and the range of the correlation coefficient is between 0 and 1. These contrast with the popular exponential model with gamma frailty. Despite these nice properties, statistical analysis with Moran’s model has been hampered by lack of a closed form likelihood function. In this paper, we introduce a latent variable to circumvent the difficulty in the Bayesian computation. We also consider a model checking procedure using the predictive Bayesian P-value.
    The cost-complexity pruning generates nested subtrees and selects the best one. However, its com- putational cost is large since it uses holdout sample or cross-validation. On the other hand, the pruning algorithms based on posterior... more
    The cost-complexity pruning generates nested subtrees and selects the best one. However, its com- putational cost is large since it uses holdout sample or cross-validation. On the other hand, the pruning algorithms based on posterior calculations such as BIC (MDL) and MEP are faster, but they some- times produce too big or small trees to yield poor generalization errors. In
    One of main objectives of a genome-wide association study (GWAS) is to develop a prediction model for a binary clinical outcome using single-nucleotide polymorphisms (SNPs) which can be used for diagnostic and prognostic purposes and for... more
    One of main objectives of a genome-wide association study (GWAS) is to develop a prediction model for a binary clinical outcome using single-nucleotide polymorphisms (SNPs) which can be used for diagnostic and prognostic purposes and for better understanding of the relationship between the disease and SNPs. Penalized support vector machine (SVM) methods have been widely used toward this end. However, since investigators often ignore the genetic models of SNPs, a final model results in a loss of efficiency in prediction of the clinical outcome. In order to overcome this problem, we propose a two-stage method such that the the genetic models of each SNP are identified using the MAX test and then a prediction model is fitted using a penalized SVM method. We apply the proposed method to various penalized SVMs and compare the performance of SVMs using various penalty functions. The results from simulations and real GWAS data analysis show that the proposed method performs better than the...
    ABSTRACT Identification of influential genes and clinical covariates on the survival of patients is crucial because it can lead us to better understanding of underlying mechanism of diseases and better prediction models. Most of variable... more
    ABSTRACT Identification of influential genes and clinical covariates on the survival of patients is crucial because it can lead us to better understanding of underlying mechanism of diseases and better prediction models. Most of variable selection methods in penalized Cox models cannot deal properly with categorical variables such as gender and family history. The group lasso penalty can combine clinical and genomic covariates effectively. In this article, we introduce an optimization algorithm for Cox regression with group lasso penalty. We compare our method with other methods on simulated and real microarray data sets.