Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
Sujit Ghosh
  • 2311 K. Stinson Drive (5116 SAS Hall)
    Department of Statistics,
    NC State University
    Raleigh, NC 27695-8203, USA
  • 9195152570

Sujit Ghosh

  • Professor Sujit Kumar Ghosh, a tenured Professor in the Department of Statistics at North Carolina State University (... moreedit
  • Alan E. Gelfand, Duke Universityedit
Research Interests:
Modeling the correlation structure of returns is essential in many financial applications. Considerable evidence from empirical studies has shown that the correlation among asset returns is not stable over time. A recent development in... more
Modeling the correlation structure of returns is essential in many financial applications. Considerable evidence from empirical studies has shown that the correlation among asset returns is not stable over time. A recent development in the multivariate stochastic volatility literature is the application of inverse Wishart processes to characterize the evolution of return correlation matrices. Within the inverse Wishart multivariate stochastic volatility framework, we propose a flexible correlated latent factor model to achieve dimension reduction and capture the stylized fact of ‘correlation breakdown’ simultaneously. The parameter estimation is based on existing Markov chain Monte Carlo methods. We illustrate the proposed model with several empirical studies. In particular, we use high‐dimensional stock return data to compare our model with competing models based on multiple performance metrics and tests. The results show that the proposed model not only describes historic stylized...
In this paper, we investigate a validation process in order to assess the predictive capabilities of a single degree-of-freedom oscillator. Model validation is understood here as the process of determining the accuracy with which a model... more
In this paper, we investigate a validation process in order to assess the predictive capabilities of a single degree-of-freedom oscillator. Model validation is understood here as the process of determining the accuracy with which a model can predict observed physical events or important features of the physical system. Therefore, assessment of the model needs to be performed with respect to the conditions under which the model is used in actual simulations of the system and to specific quantities of interest used for decision-making. Model validation also supposes that the model be trained and tested against experimental data. In this work, virtual data are produced from a non-linear single degree-of-freedom oscillator, the so-called oracle model, which is supposed to provide an accurate representation of reality. The mathematical model to be validated is derived from the oracle model by simply neglecting the non-linear term. The model parameters are identified via Bayesian updating...
The theory of the natural hedge states that agricultural yields and prices are inversely related. Actuarial rules for U.S. crop revenue insurance assume that dependence between yield and price is constant across all counties within a... more
The theory of the natural hedge states that agricultural yields and prices are inversely related. Actuarial rules for U.S. crop revenue insurance assume that dependence between yield and price is constant across all counties within a state and that dependence can be adequately described by the Gaussian copula. We use nonlinear measures of association and a selection of bivariate copulas to empirically characterize spatially-varying dependence between prices and yields and examine premium rate sensitivity for all corn producing counties in the United States. A simulation analysis across copula types and parameter values exposes hypothetical impacts of actuarial changes.
Research Interests:
Research Interests:
The relationship between mass and radius (M–R relation) is the key for inferring the planetary compositions and thus valuable for the studies of formation and migration models. However, the M–R relation alone is not enough for planetary... more
The relationship between mass and radius (M–R relation) is the key for inferring the planetary compositions and thus valuable for the studies of formation and migration models. However, the M–R relation alone is not enough for planetary characterization due to the dependence of it on other confounding variables. This paper provides a non-trivial extension of the M–R relation by including the incident flux as an additional variable. By using Bayesian hierarchical modelling (BHM) that leverages the flexibility of finite mixture models, a probabilistic mass–radius–flux relationship (M–R–F relation) is obtained based on a sample of 319 exoplanets. We find that the flux has non-negligible impact on the M–R relation, while such impact is strongest for hot Jupiters. On the population level, the planets with higher level of flux tend to be denser, and high flux could trigger significant mass loss for plants with radii larger than 13R⊕. As a result, failing to account for the flux in mass pr...
Multivariate density estimation is a popular technique in statistics with wide applications including regression models allowing for heteroskedasticity in conditional variances. The estimation problems become more challenging when... more
Multivariate density estimation is a popular technique in statistics with wide applications including regression models allowing for heteroskedasticity in conditional variances. The estimation problems become more challenging when observations are missing in one or more variables of the multivariate vector. A flexible class of mixture of tensor products of kernel densities is proposed which allows for easy implementation of imputation methods using Gibbs sampling and shown to have superior performance compared to some of the exisiting imputation methods currently available in literature. Numerical illustrations are provided using several simulated data scenarios and applications to couple of case studies are also presented.
... The chapter by Basu and Mukhopadhyay presents a semiparametric method to model link functions for the binary response data. ... for considering our proposal. Our special thanks go to Debosri, Swagata and Mou for their encouragements... more
... The chapter by Basu and Mukhopadhyay presents a semiparametric method to model link functions for the binary response data. ... for considering our proposal. Our special thanks go to Debosri, Swagata and Mou for their encouragements in this project. ...
In the field of finance, insurance, and system reliability, etc., it is often of interest to measure the dependence among variables by modeling a multivariate distribution using a copula. The copula models with parametric assumptions are... more
In the field of finance, insurance, and system reliability, etc., it is often of interest to measure the dependence among variables by modeling a multivariate distribution using a copula. The copula models with parametric assumptions are easy to estimate but can be highly biased when such assumptions are false, while the empirical copulas are non-smooth and often not genuine copula making the inference about dependence challenging in practice. As a compromise, the empirical Bernstein copula provides a smooth estimator but the estimation of tuning parameters remains elusive. In this paper, by using the so-called empirical checkerboard copula we build a hierarchical empirical Bayes model that enables the estimation of a smooth copula function for arbitrary dimensions. The proposed estimator based on the multivariate Bernstein polynomials is itself a genuine copula and the selection of its dimension-varying degrees is data-dependent. We also show that the proposed copula estimator prov...
Duckworth-Lewis (D/L) method is the incumbent rain rule used to decide the result of a limited overs cricket match should it not be able to reach its natural conclusion. Duckworth and Lewis (1998) devised a two factor relationship between... more
Duckworth-Lewis (D/L) method is the incumbent rain rule used to decide the result of a limited overs cricket match should it not be able to reach its natural conclusion. Duckworth and Lewis (1998) devised a two factor relationship between the numbers of overs a team had remaining and the number of wickets they had lost in order to quantify the percentage resources a team has at any stage of the match. As number of remaining overs decrease and lost wickets increase the resources are expected to decrease. The resource table which is still being used by ICC (International Cricket Council) for 50 overs cricket match suffers from lack of monotonicity both in numbers of overs left and number of wickets lost. We apply Bayesian inference to build a resource table which overcomes the non monotonicity problem of the current D/L resource table and show that it gives better prediction for teams in first innings score and hence it is more suitable for using in rain affected matches.
Malaria and dengue fever are among the most important vectorborne diseases in the tropics and subtropics. Average weekly meteorological parameters—specifically, minimum temperature, maximum temperature, humidity, and rainfall—were... more
Malaria and dengue fever are among the most important vectorborne diseases in the tropics and subtropics. Average weekly meteorological parameters—specifically, minimum temperature, maximum temperature, humidity, and rainfall—were collected using data from 100 automated weather stations from the Indian Space Research Organization. We obtained district-level weekly reported malaria cases from the Integrated Disease Surveillance Program (IDSP), Department of Health and Family Welfare, Andhra Pradesh, India, for three years, 2014–16. We used a generalized linear model with Poisson distribution and default logarithm-link to estimate model parameters, and we used a quasi-Poisson method with a generalized additive model that uses nonparametric regression with smoothing splines. It appears that higher minimum temperatures (e.g., >24°C) tend to lead to higher malaria counts but lower values do not seem to have an impact on the malaria counts. On the other hand, higher values of maximum t...
A series of returns are often modeled using stochastic volatility models. Many observed financial series exhibit unit-root non-stationarybehavior in the latent AR(1) volatility process and tests for a unit-rootbecome necessary, especially... more
A series of returns are often modeled using stochastic volatility models. Many observed financial series exhibit unit-root non-stationarybehavior in the latent AR(1) volatility process and tests for a unit-rootbecome necessary, especially when the error process of the returns iscorrelated with the error terms of the AR(1) process. In this paper, wedevelop a class of priors that assigns positive prior probability on thenon-stationary region, employ credible interval for the test, and showthat Markov Chain Monte Carlo methods can be implemented usingstandard software. Several practical scenarios and real examples areexplored to investigate the performance of our method.
The ICH E14 guidelines recommend performing a ‘thorough QT/QTc study’ to support the safety profile of a drug. The standard way of analyzing a ‘thorough QT/QTc study’ to assess a drug for its potential for QT prolongation is to construct... more
The ICH E14 guidelines recommend performing a ‘thorough QT/QTc study’ to support the safety profile of a drug. The standard way of analyzing a ‘thorough QT/QTc study’ to assess a drug for its potential for QT prolongation is to construct a 90% two-sided (or a 95% one-sided) confidence interval (CI), for the difference in baseline corrected mean QTc (heart-rate corrected version of QT) between drug and placebo at each time-point, and to conclude non-inferiority if the upper limit for each CI is less than 10 ms. The intent of the ICH E14 guidelines is to establish that the mean effect of the drug is less than 5 ms and the standard approach may not be well suited to achieve this goal. In this paper, we propose a novel Bayesian approach to address this problem directly keeping in line with the intent of the ICH E14 guidelines. We assess the performance of our proposed approach using simulated data, discuss its advantages over the standard approach, and illustrate the method by applying ...
In the modern era of advanced medicine, often a fraction of patients might be cured from a disease and hence the survival probability may plateau at a non-zero value and a cure rate model is needed to capture such survival fractions. A... more
In the modern era of advanced medicine, often a fraction of patients might be cured from a disease and hence the survival probability may plateau at a non-zero value and a cure rate model is needed to capture such survival fractions. A semiparametric accelerated failure time (AFT) cure model is developed for time-to-event data with a positive surviving fraction. The error distribution of the AFT model for susceptible subjects is expressed as a nonparametric mixture of normal densities which can approximate an arbitrary distribution satisfying mild regularity conditions. A Bayesian inferential framework leads to efficient estimation of the posterior distribution of parameters. Posterior consistency of the proposed estimator is established under some regularity conditions providing large sample justification of the proposed model. Markov chain Monte Carlo methods are used to generate samples from the posterior distribution of the regression coefficients to aid statistical inference. S...
A novel semiparametric regression model for censored data is proposed as an alternative to the widely used proportional hazards survival model. The proposed regression model for censored data turns out to be flexible and practically... more
A novel semiparametric regression model for censored data is proposed as an alternative to the widely used proportional hazards survival model. The proposed regression model for censored data turns out to be flexible and practically meaningful. Features include physical interpretation of the regression coefficients through the mean response time instead of the hazard functions, and a rigorous proof of consistency of the posterior distribution. It is shown that the regression model obtained by a mixture of parametric families, has a proportional mean structure (as in an accelerated failure time models). The statistical inference is based on a nonparametric Bayesian approach that uses a Dirichlet process prior for the mixing distribution. Consistency of the posterior distribution of the regression parameters in the Euclidean metric is established. Finite sample parameter estimates along with associated measure of uncertainties can be computed by a MCMC method. Simulation studies are p...
The proportional hazards (PH), proportional odds (PO) and accelerated failure time (AFT) models have been widely used in different applications of survival analysis. Despite their popularity, these models are not suitable to handle... more
The proportional hazards (PH), proportional odds (PO) and accelerated failure time (AFT) models have been widely used in different applications of survival analysis. Despite their popularity, these models are not suitable to handle lifetime data with crossing survival curves. In 2005, Yang and Prentice proposed a semiparametric two-sample strategy (YP model), including the PH and PO frameworks as particular cases, to deal with this type of data. Assuming a general regression setting, the present paper proposes an unified approach to fit the YP model by employing Bernstein polynomials to manage the baseline hazard and odds under both the frequentist and Bayesian frameworks. The use of the Bernstein polynomials has some advantages: it allows for uniform approximation of the baseline distribution, it leads to closed-form expressions for all baseline functions, it simplifies the inference procedure, and the presence of a continuous survival function allows a more accurate estimation of ...
The order of smoothness chosen in nonparametric estimation problems is critical. This choice balances the tradeoff between model parsimony and data overfitting. The most common approach used in this context is cross-validation. However,... more
The order of smoothness chosen in nonparametric estimation problems is critical. This choice balances the tradeoff between model parsimony and data overfitting. The most common approach used in this context is cross-validation. However, cross-validation is computationally time consuming and often precludes valid post-selection inference without further considerations. With this in mind, borrowing elements from the objective Bayesian variable selection literature, we propose an approach to select the degree of a polynomial basis. Although the method can be extended to most series-based smoothers, we focus on estimates arising from Bernstein polynomials for the regression function, using mixtures of g-priors on the model parameter space and a hierarchical specification for the priors on the order of smoothness. We prove the asymptotic predictive optimality for the method, and through simulation experiments, demonstrate that, compared to cross-validation, our approach is one or two ord...
Catch curves have been used to estimate survival and instantaneous mortality for fish and wildlife populations for many years. In order to better analyze catch curve data from the Apostle Islands population lake trout Salvelinus namaycush... more
Catch curves have been used to estimate survival and instantaneous mortality for fish and wildlife populations for many years. In order to better analyze catch curve data from the Apostle Islands population lake trout Salvelinus namaycush in Lake Superior, we develop a Bayesian approach to catch curve analysis. First, the proposed Bayesian approach is illustrated for a single catch curve and then extended to multiple years of data. We also relax the model assumption of a stable age distribution to allow random effects across years. The proposed models are compared with the traditional methods using the focused DIC. There are many potential advantages to the Bayesian approach over the traditional methods such as least squares and maximum likelihood, based on large sample theory. Bayesian estimates are valid for finite samples, and efficient numerical methods can be used to obtain estimates of instantaneous mortality. We conclude that many benefits can be obtained from the Bayesian ap...
Abstract: Review of research on superintendent turnover finds that median tenure between 1990 and 1994 was 6.5 years; turnover has not increased significantly since 1975; turnover is related to the school-district enrollment size;... more
Abstract: Review of research on superintendent turnover finds that median tenure between 1990 and 1994 was 6.5 years; turnover has not increased significantly since 1975; turnover is related to the school-district enrollment size; turnover is not related to a district's location ...
ABSTRACT Abstract Few studies have empirically examined climate change impacts on managed forests in the southern United States. In this paper, we use the U.S. Forest Service's Forest Inventory and Analysis Database to fit two... more
ABSTRACT Abstract Few studies have empirically examined climate change impacts on managed forests in the southern United States. In this paper, we use the U.S. Forest Service's Forest Inventory and Analysis Database to fit two growth models across the South and apply the four Hadley III climate scenarios developed for the Intergovernmental Panel on Climate Change Fourth Assessment Report to project future growth and site productivity on loblolly pine plantations. The static growth model provides a direct test of whether a significant climate influence on forest growth can be statistically derived, while the dynamic growth model estimates climate effects through site productivity. Results indicate considerable spatial variation in potential future growth and productivity change on loblolly pine plantations due to climate change in the southern United States, while overall regional effects are projected to be marginal. The pattern of climate change impacts is consistent across the growth models and climate scenarios. These findings have several implications for climate change adaptation policies.
ABSTRACT In this article, we consider imputation in the USDA’s Agricultural Resource Management Survey (ARMS) data, which is a complex, high-dimensional economic dataset. We develop a robust joint model for ARMS data, which requires that... more
ABSTRACT In this article, we consider imputation in the USDA’s Agricultural Resource Management Survey (ARMS) data, which is a complex, high-dimensional economic dataset. We develop a robust joint model for ARMS data, which requires that variables are transformed using a suitable class of marginal densities (e.g., skew normal family). We assume that the transformed variables may be linked through a Gaussian copula, which enables construction of the joint model via a sequence of conditional linear models. We also discuss the criteria used to select the predictors for each conditional model. For the purpose of developing an imputation method that is conducive to these model assumptions, we propose a regression-based technique that allows for flexibility in the selection of conditional models while providing a valid joint distribution. In this procedure, labeled as iterative sequential regression (ISR), parameter estimates and imputations are obtained using a Markov chain Monte Carlo sampling method. Finally, we apply the proposed method to the full ARMS data, and we present a thorough data analysis that serves to gauge the appropriateness of the resulting imputations. Our results demonstrate the effectiveness of the proposed algorithm and illustrate the specific deficiencies of existing methods. Supplementary materials for this article are available online.

And 148 more