Sklar’s Omega: A Gaussian copula-based framework for assessing agreement

Hughes, John

doi:10.1007/s11222-022-10105-2

Sklar’s Omega: A Gaussian copula-based framework for assessing agreement

Published: 02 June 2022

Volume 32, article number 46, (2022)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

John Hughes ORCID: orcid.org/0000-0003-1538-2569¹

217 Accesses
1 Altmetric
Explore all metrics

Abstract

The statistical measurement of agreement—the most commonly used form of which is inter-coder agreement (also called inter-rater reliability), i.e., consistency of scoring among two or more coders for the same units of analysis—is important in a number of fields, e.g., content analysis, education, computational linguistics, sports. We propose Sklar’s Omega, a Gaussian copula-based framework for measuring not only inter-coder agreement but also intra-coder agreement, inter-method agreement, and agreement relative to a gold standard. We demonstrate the efficacy and advantages of our approach by applying both Sklar’s Omega and Krippendorff’s Alpha (a well-established nonparametric agreement coefficient) to simulated data, to nominal data previously analyzed by Krippendorff, and to continuous data from an imaging study of hip cartilage in femoroacetabular impingement. Application of our proposed methodology is supported by our open-source R package, sklarsomega, which is available for download from the Comprehensive R Archive Network. The package permits users to apply the Omega methodology to nominal scores, ordinal scores, percentages, counts, amounts (i.e., non-negative real numbers), and balances (i.e., any real number); and can accommodate any number of units, any number of coders, and missingness. Classical inference is available for all levels of measurement while Bayesian inference is available for continuous outcomes only.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Better to be in agreement than in bad company

Article Open access 16 September 2022

Detection of grey zones in inter-rater agreement studies

Article Open access 05 January 2023

A comprehensive guide to study the agreement and reliability of multi-observer ordinal data

Article Open access 20 December 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Akaike, H.: A new look at the statistical model identification. IEEE Trans. Autom. Control 19(6), 716–723 (1974)
Article MathSciNet MATH Google Scholar
Altman, D.G., Bland, J.M.: Measurement in medicine: The analysis of method comparison studies. The Statistician 32(3), 307–317 (1983)
Artstein, R., Poesio, M.: Inter-coder agreement for computational linguistics. Comput. Linguist. 34(4), 555–596 (2008)
Article Google Scholar
Banerjee, M., Capozzoli, M., McSweeney, L., Sinha, D.: Beyond kappa: A review of interrater agreement measures. Canadian Journal Statistics 27(1), 3–23 (1999)
Article MathSciNet MATH Google Scholar
Bennett, E.M., Alpert, R., Goldstein, A.C.: Communications Through Limited-Response Questioning. Public Opin. Q. 18(3), 303–308 (1954)
Article Google Scholar
Burgert, C., Rüschendorf, L.: On the optimal risk allocation problem. Statistics & Decisions 24(1/2006), 153–171 (2006)
MathSciNet MATH Google Scholar
Burnham, K.P., Anderson, D.R., Huyvaert, K.P.: AIC model selection and multimodel inference in behavioral ecology: Some background, observations, and comparisons. Behav. Ecol. Sociobiol. 65(1), 23–35 (2011)
Article Google Scholar
Byrd, R., Lu, P., Nocedal, J., Zhu, C.: A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput. 16(5), 1190–1208 (1995)
Article MathSciNet MATH Google Scholar
Chen, X., Fan, Y., Tsyrennikov, V.: Efficient estimation of semiparametric multivariate copula models. Technical Report 04-W20. Vanderbilt University, Nashville, TN (2004)
Google Scholar
Chrisman, N.R.: Rethinking levels of measurement for cartography. Cartography Geographic Information Systems 25(4), 231–242 (1998)
Article Google Scholar
Cicchetti, D.V., Feinstein, A.R.: High agreement but low kappa: II. resolving the paradoxes. J. Clin. Epidemiol. 43(6), 551–558 (1990)
Article Google Scholar
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Measur. 20(1), 37–46 (1960)
Article Google Scholar
Cohen, J.: Weighed kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychol. Bull. 70(4), 213–220 (1968)
Article Google Scholar
Conger, A.J.: Integration and generalization of kappas for multiple raters. Psychol. Bull. 88(2), 322 (1980)
Article Google Scholar
Conway, R.W., Maxwell, W.L.: Network dispatching by the shortest-operation discipline. Oper. Res. 10(1), 51–73 (1962)
Article Google Scholar
Davies, M., Fleiss, J.L.: Measuring agreement for multinomial data. Biometrics, pp. 1047–1051 (1982)
Davison, A.C., Hinkley, D.V.: Bootstrap Methods and their Application, vol. 1. Cambridge University Press, Cambridge (1997)
Book MATH Google Scholar
Eddelbuettel, D., Francois, R.: Rcpp: Seamless R and C++ integration. J. Stat. Softw. 40(8), 1–18 (2011)
Article Google Scholar
Feinstein, A.R., Cicchetti, D.V.: High agreement but low kappa: I. the problems of two paradoxes. J. Clin. Epidemiol. 43(6), 543–549 (1990)
Article Google Scholar
Ferguson, T.S.: Mathematical Statistics: A Decision Theoretic Approach. Academic Press, New York (1967)
MATH Google Scholar
Fernholz, L.T.: Almost sure convergence of smoothed empirical distribution functions. Scand. J. Stat. 18(3), 255–262 (1991)
MathSciNet MATH Google Scholar
Flegal, J.M., Haran, M., Jones, G.L.: Markov chain Monte Carlo: Can we trust the third significant figure? Stat. Sci. 23(2), 250–260 (2008)
Article MathSciNet MATH Google Scholar
Flegal, J.M., Hughes, J., Vats, D., Dai, N., Gupta, K., Maji, U.: mcmcse: Monte Carlo Standard Errors for MCMC. Riverside, CA, Kanpur, India (2021). (R package version 1.5-0)
Fleiss, J.L.: Measuring nominal scale agreement among many raters. Psychol. Bull. 76(5), 378 (1971)
Article Google Scholar
Furrer, R., Sain, S.R.: spam: A sparse matrix R package with emphasis on MCMC methods for Gaussian Markov random fields. J. Stat. Softw. 36(10), 1–25 (2010)
Article Google Scholar
Genest, C., Neslehova, J.: A primer on copulas for count data. Astin Bulletin 37(2), 475 (2007)
Article MathSciNet MATH Google Scholar
Genz, A.: Numerical computation of multivariate normal probabilities. J. Comput. Graph. Stat. 1(2), 141–149 (1992)
Geyer, C.J.: Le Cam made simple: Asymptotics of maximum likelihood without the LLN or CLT or sample size going to infinity. In: Jones, G.L., Shen, X. (eds.) Advances in Modern Statistical Theory and Applications: A Festschrift in honor of Morris L. Eaton, Institute of Mathematical Statistics, Beachwood, Ohio, USA (2013)
Google Scholar
Gilbert, P., Varadhan, R.: numDeriv: Accurate Numerical Derivatives. R package version 2016(8–1), 1 (2019)
Google Scholar
Godambe, V.: An optimum property of regular maximum likelihood estimation. Ann. Math. Stat. 31(4), 1208–1211 (1960)
Gwet, K.L.: Computing inter-rater reliability and its variance in the presence of high agreement. Br. J. Math. Stat. Psychol. 61(1), 29–48 (2008)
Article MathSciNet Google Scholar
Gwet, K.L.: Handbook of Inter-Rater Reliability: The Definitive Guide to Measuring the Extent of Agreement Among Raters, 4th edn. Advanced Analytics, LLC, Gaithersburg, MD (2014)
Google Scholar
Gwet, K.L.: Testing the difference of correlated agreement coefficients for statistical significance. Educ. Psychol. Measur. 76(4), 609–637 (2016)
Article Google Scholar
Han, Z., De Oliveira, V.: On the correlation structure of Gaussian copula models for geostatistical count data. Australian & New Zealand Journal of Statistics 58(1), 47–69 (2016)
Article MathSciNet MATH Google Scholar
Hayes, A.F., Krippendorff, K.: Answering the call for a standard reliability measure for coding data. Commun. Methods Meas. 1(1), 77–89 (2007)
Article Google Scholar
Henn, L.L.: Limitations and performance of three approaches to Bayesian inference for Gaussian copula regression models of discrete data. Computational Statistics, pp. 1–38 (2021)
Henn, L.L., Hughes, J., Iisakka, E., Ellermann, J., Mortazavi, S., Ziegler, C., Nissi, M.J., Morgan, P.: Disease severity classification using quantitative magnetic resonance imaging data of cartilage in femoroacetabular impingement. Stat. Med. 36(9), 1491–1505 (2017)
Article MathSciNet Google Scholar
Holm, S.: A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6(2), 65–70 (1979)
MathSciNet MATH Google Scholar
Hooke, R., Jeeves, T.A.: Direct search solution of numerical and statistical problems. J. ACM 8(2), 212–229 (1961)
Article MATH Google Scholar
Huang, A.: Mean-parametrized Conway-Maxwell-Poisson regression models for dispersed counts. Stat. Model. 17(6), 359–380 (2017)
Article MathSciNet MATH Google Scholar
Hughes, J.: krippendorffsalpha: An R package for measuring agreement using Krippendorff’s Alpha coefficient. The R Journal 13(1), 413–425 (2021)
Article Google Scholar
Hughes, J.: On the occasional exactness of the distributional transform approximation for direct Gaussian copula models with discrete margins. Statistics & Probability Letters 177, 109159 (2021)
Article MathSciNet MATH Google Scholar
Ihaka, R., Gentleman, R.: R: A language for data analysis and graphics. J. Comput. Graph. Stat. 5, 299–314 (1996)
Google Scholar
Kazianka, H.: Approximate copula-based estimation and prediction of discrete spatial data. Stoch. Env. Res. Risk Assess. 27(8), 2015–2026 (2013)
Article Google Scholar
Kazianka, H., Pilz, J.: Copula-based geostatistical modeling of continuous and discrete data including covariates. Stoch. Env. Res. Risk Assess. 24(5), 661–673 (2010)
Article MATH Google Scholar
Kendall, M.G.: A new measure of rank correlation. Biometrika 30(1/2), 81–93 (1938)
Article MATH Google Scholar
Klaassen, C.A., Wellner, J.A., et al.: Efficient estimation in the bivariate normal copula model: Normal margins are least favourable. Bernoulli 3(1), 55–77 (1997)
Article MathSciNet MATH Google Scholar
Krippendorff, K.: Content Analysis: An Introduction to Its Methodology. Sage, Los Angeles (2012)
Krippendorff, K.: Computing Krippendorff’s alpha-reliability. Technical report, University of Pennsylvania (2013)
Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics, pp. 159–174 (1977)
Lindsay, B.: Composite likelihood methods. Contemp. Math. 80(1), 221–239 (1988)
Article MathSciNet MATH Google Scholar
Liu, H., Lafferty, J., Wasserman, L.: The nonparanormal: Semiparametric estimation of high dimensional undirected graphs. J. Mach. Learn. Res. 10(Oct), 2295–2328 (2009)
MathSciNet MATH Google Scholar
Morgan, P., Nissi, M.J., Hughes, J., Mortazavi, S., Ellermann, J.: T2* mapping provides information that is statistically comparable to an arthroscopic evaluation of acetabular cartilage. Cartilage 9(3), 237–240 (2018)
Article Google Scholar
Mosteller, F., Tukey, J.: Data Analysis and Regression: A Second Course in Statistics. Addison-Wesley series in behavioral science, Addison-Wesley Publishing Company (1977)
Google Scholar
Musgrove, D., Hughes, J., Eberly, L.: Hierarchical copula regression models for areal data. Spatial Statistics 17, 38–49 (2016)
Article MathSciNet Google Scholar
Nelsen, R.B.: An Introduction to Copulas. Springer, New York (2006)
MATH Google Scholar
Nissi, M.J., Mortazavi, S., Hughes, J., Morgan, P., Ellermann, J.: T2* relaxation time of acetabular and femoral cartilage with and without intra-articular Gd-DTPA2 in patients with femoroacetabular impingement. Am. J. Roentgenol. 204(6), W695 (2015)
Article Google Scholar
Prentice, R.L.: Correlated binary regression with covariates specific to each binary observation. Biometrics, pp. 1033–1048 (1988)
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2021)
Google Scholar
Ribatet, M., Cooley, D., Davison, A.C.: Bayesian inference from composite likelihoods, with an application to spatial extremes. Statistica Sinica, pp. 813–845 (2012)
Rüschendorf, L.: Stochastically ordered distributions and monotonicity of the OC-function of sequential probability ratio tests. Statistics 12(3), 327–338 (1981)
MathSciNet MATH Google Scholar
Rüschendorf, L.: On the distributional transform, Sklar’s theorem, and the empirical copula process. J. Stat. Planning Inference 139(11), 3921–3927 (2009)
Article MathSciNet MATH Google Scholar
Scott, W.A.: Reliability of content analysis: The case of nominal scale coding. Public Opin. Q. 19, 321–325 (1955)
Article Google Scholar
Sellers, K.F., Borle, S., Shmueli, G.: The COM-Poisson model for count data: a survey of methods and applications. Appl. Stoch. Model. Bus. Ind. 28(2), 104–116 (2012)
Article MathSciNet MATH Google Scholar
Serfling, R., Mazumder, S.: Exponential probability inequality and convergence results for the median absolute deviation and its modifications. Statistics & Probability Letters 79(16), 1767–1773 (2009)
Article MathSciNet MATH Google Scholar
Shmueli, G., Minka, T.P., Kadane, J.B., Borle, S., Boatwright, P.: A useful distribution for fitting discrete data: Revival of the Conway-Maxwell-Poisson distribution. J. Roy. Stat. Soc.: Ser. C (Appl. Stat.) 54(1), 127–142 (2005)
MathSciNet MATH Google Scholar
Singh, S., Póczos, B.: Nonparanormal information estimation. In: Precup, D., Teh, Y.W., (eds), Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp. 3210–3219. PMLR (2017)
Sklar, A.: Fonctions de répartition à n dimensions et leurs marges. Publ. Inst. Statist. Univ. Paris 8, 229–231 (1959)
MathSciNet MATH Google Scholar
Smeeton, N.C.: Early history of the kappa statistic. Biometrics 41(3), 795–795 (1985)
Google Scholar
Spearman, C.E.: The proof and measurement of association between two things. Am. J. Psychol. 15, 72–101 (1904)
Article Google Scholar
Spiegelhalter, D.J., Best, N.G., Carlin, B.P., Van Der Linde, A.: Bayesian measures of model complexity and fit. J. Royal Stat. Society: Series B (Statistical Methodology) 64(4), 583–639 (2002)
Article MathSciNet MATH Google Scholar
Stevens, S.S.: On the theory of scales of measurement. Science 103(2684), 677–680 (1946)
Article MATH Google Scholar
Szabó, Z., Póczos, B., Szirtes, G., Lőrincz, A.: Post nonlinear independent subspace analysis. In: International Conference on Artificial Neural Networks, pp. 677–686. Springer (2007)
Tierney, L., Rossini, A.J., Li, N., Sevcikova, H.: snow: Simple Network of Workstations. R package version 0.4-3 (2018)
Varadhan, R., University, J.H., Borchers, H.W., Research, A.C., Bechard, V., Montreal, H.: dfoptim: Derivative-Free Optimization. R package version 2020.10-1 (2020)
Varin, C.: On composite marginal likelihoods. AStA Advances Statistical Analysis 92(1), 1–28 (2008)
Xue-Kun Song, P.: Multivariate dispersion models generated from Gaussian copula. Scand. J. Stat. 27(2), 305–320 (2000)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

College of Health, Lehigh University, Bethlehem, Pennsylvania, USA
John Hughes

Authors

John Hughes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to John Hughes.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

Here we briefly introduce our R package, sklarsomega, version 3.0 of which is available for download from the Comprehensive R Archive Network.

R package sklarsomega

We introduce our R package by way of a brief usage example. Additional examples are provided in the package documentation.

We apply our Bayesian methodology to a subset of the cartilage data, assuming first a $\textsc {Laplace}(\mu ,\sigma )$ and then a $\textsc {T}(\nu ,\mu )$ marginal distribution. First we load the cartilage data, which are included in the package.

We see that sampling terminated when 4,000 samples had been drawn, since that sample size yielded $\widehat{\text {cv}}_j<0.01$ for $j\in \{1,2,3\}$. As a second check we examine the plot given in Fig. 4, which shows the estimated posterior mean for $\omega $ as a function of sample size. The estimate evidently stabilized after approximately 2,500 samples had been drawn.

The proposal standard deviations (1 for $\mu $, 0.1 for $\sigma $, and 0.2 for $\omega $) led to sensible acceptance rates of 40%, 60%, and 67%.

For a t marginal distribution only 3,000 samples were required.

Note that the Laplace model yielded a much smaller value of DIC, and hence a very small relative likelihood for the t model.

Much additional functionality is supported by package sklarsomega, e.g., plotting, simulation, influence statistics. And we note that computational efficiency is supported by our use of sparse-matrix routines (Furrer and Sain 2010) and a clever bit of Fortran code (Genz 1992) for the CML method. Future versions of the package will employ C++ (Eddelbuettel and Francois 2011).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hughes, J. Sklar’s Omega: A Gaussian copula-based framework for assessing agreement. Stat Comput 32, 46 (2022). https://doi.org/10.1007/s11222-022-10105-2

Download citation

Received: 25 March 2021
Accepted: 10 May 2022
Published: 02 June 2022
DOI: https://doi.org/10.1007/s11222-022-10105-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sklar’s Omega: A Gaussian copula-based framework for assessing agreement

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Better to be in agreement than in bad company

Detection of grey zones in inter-rater agreement studies

A comprehensive guide to study the agreement and reliability of multi-observer ordinal data

References