Method for Data Quality Assessment of Synthetic Industrial Data
Abstract
:1. Introduction
2. State-of-the-Art Data Quality Assessment
2.1. Mathematical Modeling of Data Quality Assessment
2.2. State-of-the-Art Applications of Binary Logistic Regression
3. The Method Proposed for Data Quality Assessment
3.1. Basic Assumptions
3.2. Assumptions That Must Be Passed by the BLR
3.2.1. Test for Variables in the Equation Considering the Intercept Only Model
3.2.2. Omnibus Test of Model Coefficients
3.2.3. Hosmer–Lemeshow Test
3.2.4. The Variance Explained in the Dependent Variable
3.2.5. Evaluation of Obtained Classification
- The accuracy indicator;
- All the indicators accuracy, sensitivity and specificity.
3.2.6. Variables in the Equation with Predictor Variables Included
3.3. Interpretation of the Binary Logistic Regression Results
4. Experimental Data-Quality-Assessment Evaluation
4.1. The Synthetic Dataset Used in the Evaluation
4.2. Experimental Evaluation Results
- Step 1. Verification of passing all the basic assumptions.
- Step 2. Performing the test for the variables in the equation considering the intercept-only model.
- Step 3. Performing the Omnibus Test of Model Coefficients
- Step 4. Performing the Hosmer–Lemeshow Test
- Step 5: The variance explained in the dependent variable
- Step 6. Performing the classification by using the binary logistic regression.
- Step 7. Performing the Wald’s test for the variables in equation with the predictor variables included.
- Step 8. Improving the classification by adjusting the cut value.
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Matzka, S. AI4I 2020 Predictive Maintenance Dataset. UCI Machine Learning Repository. 2020. Available online: www.explorate.ai/dataset/predictiveMaintenanceDataset.csv (accessed on 22 December 2021).
- Chakraborty, D.; Alam, A.; Chaudhuri, S.; Basagaoglu, H.; Sulbaran, T.; Langar, S. Scenario-based prediction of climate change impacts on building cooling energy consumption with explainable artificial intelligence. Appl. Energy 2021, 291, 116807. [Google Scholar] [CrossRef]
- Jha, I.P.; Awasthi, R.; Kumar, A.; Kumar, V.; Sethi, T. Learning the Mental Health Impact of COVID-19 in the United States with Explainable Artificial Intelligence: Observational Study. JMIR Ment. Health 2021, 8, e25097. [Google Scholar] [CrossRef]
- Matzka, S. Explainable Artificial Intelligence for Predictive Maintenance Applications. In Proceedings of the 2020 Third International Conference on Artificial Intelligence for Industries (AI4I), Irvine, CA, USA, 21–23 September 2020; pp. 69–74. [Google Scholar] [CrossRef]
- Wu, Q.B.; Wang, L.; Ngan, K.N.; Li, H.L.; Meng, F.M. Beyond Synthetic Data: A Blind Deraining Quality Assessment Metric Towards Authentic Rain Image. In Proceedings of the 26th IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 2364–2368. [Google Scholar]
- Ben-Dor, E.; Kindel, B.; Goetz, A.F.H. Quality assessment of several methods to recover surface reflectance using synthetic imaging spectroscopy data. Remote Sens. Environ. 2004, 90, 389–404. [Google Scholar] [CrossRef]
- Dell’Amore, L.; Villano, M.; Krieger, G. Assessment of Image Quality of Waveform-Encoded Synthetic Aperture Radar Using Real Satellite Data. In Proceedings of the 20th International Radar Symposium (IRS), Ulm, Germany, 26–28 June 2019. [Google Scholar]
- Friedrich, A.; Raabe, H.; Schiefele, J.; Dorr, K.U. Airport-databases for 3D synthetic-vision flight-guidance displays database design, quality-assessment and data generation, Conference on Enhanced and Synthetic Vision 1999. Proc. SPIE 1999, 3691, 108–115. [Google Scholar]
- Papacharalampopoulos, A.; Tzimanis, K.; Sabatakakis, K.; Stavropoulos, P. Deep Quality Assessment of a Solar Reflector Based on Synthetic Data: Detecting Surficial Defects from Manufacturing and Use Phase. Sensors 2020, 20, 5481. [Google Scholar] [CrossRef]
- Masoum, S.; Gholami, A.; Hemmesi, M.; Abbasi, S. Quality assessment of the saffron samples using second-order spectrophotometric data assisted by three-way chemometric methods via quantitative analysis of synthetic colorants in adulterated saffron. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2015, 148, 389–395. [Google Scholar] [CrossRef] [PubMed]
- Fernández, J.M.M.; Cabal, V.A.; Montequin, V.R.; Balsera, J.V. Online estimation of electric arc furnace tap temperature by using fuzzy neural networks. Eng. Appl. Artif. Intell. 2008, 21, 1001–1012. [Google Scholar] [CrossRef]
- DiFilippo, F.P. Assessment of PET and SPECT phantom image quality through automated binary classification of cold rod arrays. Med. Phys. 2019, 46, 3451–3461. [Google Scholar] [CrossRef]
- Hoeijmakers, M.; Arteaga, I.L.; Kornilov, V.; Nijmeijer, H.; de Goey, P. Accuracy assessment of thermoacoustic instability models using binary classification. Int. J. Spray Combust. Dyn. 2013, 5, 201–223. [Google Scholar] [CrossRef] [Green Version]
- Garg, R.; Cecchi, G.A.; Rao, A.R. Causality Analysis of fMRI Data, Conference on Medical Imaging 2011—Biomedical Applications in Molecular, Structural, and Functional Imaging. Proc. SPIE 2011, 7965, 796502. [Google Scholar]
- Wang, J.; Yang, Y.Y.; Xia, B. A Simplified Cohen’S Kappa for Use in Binary Classification Data Annotation Tasks. IEEE Access 2019, 7, 164386–164397. [Google Scholar] [CrossRef]
- Saad, A.S.; Hamdy, A.M.; Salama, F.M.; Abdelkawy, M. Enhancing prediction power of chemometric models through manipulation of the fed spectrophotometric data: A comparative study. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2016, 167, 12–18. [Google Scholar] [CrossRef] [PubMed]
- Rymarczyk, T.; Kozlowski, E.; Klosowski, G.; Niderla, K. Logistic Regression for Machine Learning in Process Tomography. Sensors 2019, 19, 3400. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Liu, W.H.; Zeng, S.; Wu, G.J.; Li, H.; Chen, F.F. Rice Seed Purity Identification Technology Using Hyperspectral Image with LASSO Logistic Regression Model. Sensors 2021, 21, 4384. [Google Scholar] [CrossRef] [PubMed]
- Ahmed, A.; Jalal, A.; Kim, K. A Novel Statistical Method for Scene Classification Based on Multi-Object Categorization and Logistic Regression. Sensors 2020, 20, 3871. [Google Scholar] [CrossRef]
- Mallinis, G.; Koutsias, N. Spectral and Spatial-Based Classification for Broad-Scale Land Cover Mapping Based on Logistic Regression. Sensors 2008, 8, 8067–8085. [Google Scholar] [CrossRef] [Green Version]
- Xie, F.; Yang, H.P.; Wang, S.; Zhou, B.; Tong, F.Z.; Yang, D.Q.; Zhang, J.Q. A Logistic Regression Model for Predicting Axillary Lymph Node Metastases in Early Breast Carcinoma Patients. Sensors 2012, 12, 9936–9950. [Google Scholar] [CrossRef] [Green Version]
- Ruta, F.; Avram, C.; Voidazan, S.; Marginean, C.; Bacarea, V.; Abram, Z.; Foley, K.; Fogarasi-Grenczer, A.; Penzes, M.; Tarcea, M. Active Smoking and Associated Behavioural Risk Factors before and during Pregnancy—Prevalence and Attitudes among Newborns’ Mothers in Mures County, Romania. Cent. Eur. J. Public Health 2016, 24, 276–280. [Google Scholar] [CrossRef]
- Bouwmeester, W.; Zuithoff, N.P.; Mallett, S.; Geerlings, M.I.; Vergouwe, Y.; Steyerberg, E.W.; Altman, D.G.; Moons, K.G. Reporting and methods in clinical prediction research: A systematic review. PLoS Med. 2012, 9, e1001221. [Google Scholar] [CrossRef] [Green Version]
- Moons, K.G.; Altman, D.G.; Reitsma, J.B.; Ioannidis, J.P.; Macaskill, P.; Steyerberg, E.W.; Vickers, A.J.; Ransohoff, D.F.; Collins, G.S. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): Explanation and elaboration. Ann. Intern. Med. 2015, 162, W1–W73. [Google Scholar] [CrossRef] [Green Version]
- Collins, G.S.; Reitsma, J.B.; Altman, D.G.; Moons, K.G. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): The TRIPOD statement. Ann. Intern. Med. 2015, 162, 55. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Stöger, K.; Schneeberger, D.; Kieseberg, P.; Holzinger, A. Legal aspects of data cleansing in medical AI. Comput. Law Secur. Rev. 2021, 42, 105587. [Google Scholar] [CrossRef]
- Saha, S.; Saha, M.; Mukherjee, K.; Arabameri, A.; Ngo, P.T.T.; Paul, G.C. Predicting the deforestation probability using the binary logistic regression, random forest, ensemble rotational forest, REPTree: A case study at the Gumani River Basin, India. Sci. Total Environ. 2020, 730, 139197. [Google Scholar] [CrossRef] [PubMed]
- Cui, H.F.; Wu, L.; Hu, S.; Lu, R.J.; Wang, S.L. Research on the driving forces of urban hot spots based on exploratory analysis and binary logistic regression model. Trans. GIS 2021, 25, 1522–1541. [Google Scholar]
- Barnieh, B.A.; Jia, L.; Menenti, M.; Jiang, M.; Zhou, J.; Zeng, Y.L.; Bennour, A. Modeling the Underlying Drivers of Natural Vegetation Occurrence in West Africa with Binary Logistic Regression Method. Sustainability 2021, 13, 4673. [Google Scholar] [CrossRef]
- Ozen, M. Injury Severity Level Examination of Pedestrian Crashes: An Application of Binary Logistic Regression. Teknik Dergi 2021, 32, 10859–10883. [Google Scholar]
- Sanchez-Varela, Z.; Boullosa-Falces, D.; Barrena, J.L.L.; Gomez-Solaeche, M.A. Prediction of Loss of Position during Dynamic Positioning Drilling Operations Using Binary Logistic Regression Modeling. J. Mar. Sci. Eng. 2021, 9, 139. [Google Scholar] [CrossRef]
- Manoharan, H.; Teekaraman, Y.; Kirpichnikova, I.; Kuppusamy, R.; Nikolovski, S.; Baghaee, H.R. Smart Grid Monitoring by Wireless Sensors Using Binary Logistic Regression. Energies 2020, 13, 3974. [Google Scholar] [CrossRef]
- Lopez, A.S.V.; Rodriguez, C.A.M. Flash Flood Forecasting in Sao Paulo Using a Binary Logistic Regression Model. Atmosphere 2020, 11, 473. [Google Scholar] [CrossRef]
- Gonzalez-Betancor, S.M.; Dorta-Gonzalez, P. Risk of Interruption of Doctoral Studies and Mental Health in PhD Students. Mathematics 2020, 8, 1695. [Google Scholar] [CrossRef]
- Tesema, G.A.; Seretew, W.S.; Worku, M.G.; Angaw, D.A. Trends of infant mortality and its determinants in Ethiopia: Mixed-effect binary logistic regression and multivariate decomposition analysis. BMC Pregnancy Childbirth 2021, 21, 362. [Google Scholar] [CrossRef] [PubMed]
- Ferencek, A.; Borstnar, M.K. Data quality assessment in product failure prediction models. J. Decis. Syst. 2020, 29, 1–8. [Google Scholar] [CrossRef]
- Choi, S.U.; Choi, B.; Choi, S. Improving predictions made by ANN model using data quality assessment: An application to local scour around bridge piers. J. Hydroinformatics 2015, 17, 977–989. [Google Scholar] [CrossRef] [Green Version]
- Iantovics, L.B.; Rotar, C.; Morar, F. Survey on establishing the optimal number of factors in exploratory factor analysis applied to data mining. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2019, 9, e1294. [Google Scholar] [CrossRef]
- Morar, F.; Iantovics, L.B.; Gligor, A. Analysis of Phytoremediation Potential of Crop Plants in Industrial Heavy Metal Contaminated Soil in the Upper Mures River Basin. J. Environ. Inform. 2018, 31, 1–14. [Google Scholar]
- Kovács, L.; Joel, R. Analysis of linear interpolation of fuzzy sets with entropy-based distances. Acta Polytech. Hung. 2013, 10, 51–64. [Google Scholar]
- Iacob, O.M.; Bacârea, A.; Ruta, F.D.; Bacârea, V.C.; Gliga, F.I.; Buicu, F.; Tarcea, M.; Avram, C.; Costea, G.C.; Sin, A.I. Anthropometric indices of the newborns related with some lifestyle parameters of women during pregnancy in Tirgu Mures region—A pilot study. Prog. Nutr. 2018, 20, 585–591. [Google Scholar]
- Galton, F. Kinship and Correlation. Stat. Sci. 2013, 4, 80–86. [Google Scholar] [CrossRef]
- Tolles, J.; Meurer, W.J. Logistic Regression Relating Patient Characteristics to Outcomes. JAMA 2016, 316, 533–534. [Google Scholar] [CrossRef]
- Boyd, C.R.; Tolson, M.A.; Copes, W.S. Evaluating trauma care: The TRISS method. Trauma Score and the Injury Severity Score. J. Trauma 1987, 27, 370–378. [Google Scholar] [CrossRef]
- Biondo, S.; Ramos, E.; Deiros, M.; Ragué, J.M.; De Oca, J.; Moreno, P.; Farran, L.; Jaurrieta, E. Prognostic factors for mortality in left colonic peritonitis: A new scoring system. J. Am. Coll. Surg. 2000, 191, 635–642. [Google Scholar] [CrossRef]
- Marshall, J.C.; Cook, D.J.; Christou, N.V.; Bernard, G.R.; Sprung, C.L.; Sibbald, W.J. Multiple organ dysfunction score: A reliable descriptor of a complex clinical outcome. Crit. Care Med. 1995, 23, 1638–1652. [Google Scholar] [CrossRef]
- Le Gall, J.R.; Lemeshow, S.; Saulnier, F. A new Simplified Acute Physiology Score (SAPS II) based on a European/North American multicenter study. JAMA 1993, 270, 2957–2963. [Google Scholar] [CrossRef]
- Shapiro, S.S.; Wilk, M.B. An analysis of variance test for normality (complete samples). Biometrika 1965, 52, 591–611. [Google Scholar] [CrossRef]
- D’Agostino, R.B. An omnibus test of normality for moderate and large size samples. Biometrika 1971, 58, 341–348. [Google Scholar] [CrossRef]
- Razali, N.; Wah, Y.B. Power comparisons of Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors and Anderson-Darling tests. J. Stat. Model. Anal. 2011, 2, 21–33. [Google Scholar]
- Dallal, G.E.; Wilkinson, L. An analytic approximation to the distribution of Lilliefors’s test statistic for normality. Am. Stat. 1986, 40, 294–296. [Google Scholar]
- Makkonen, L. Bringing closure to the plotting position controversy. Commun. Stat. Theory Methods 2008, 37, 460–467. [Google Scholar] [CrossRef]
- Likert, R. A Technique for the Measurement of Attitudes. Arch. Psychol. 1932, 140, 1–55. [Google Scholar]
- Box, G.E.P.; Tidwell, P.W. Transformation of the Independent Variables. Technometrics 1962, 4, 531–550. [Google Scholar] [CrossRef]
- Royston, P.; Altman, D.G. Regression using fractional polynomials of continuous covariates: Parsimonious parametric modeling. Appl. Stat. 1994, 43, 429–467. [Google Scholar] [CrossRef]
- Royston, P.; Sauerbrei, W. Multivariable Model-Building: A Pragmatic Approach to Regression Analysis Based on Fractional Polynomials for Modelling Continuous Variables; Wiley: Chichester, UK, 2008. [Google Scholar]
- Altman, D.G.; Royston, P. What do we mean by validating a prognostic model? Stat. Med. 2000, 19, 453–473. [Google Scholar] [CrossRef]
- Harrell, F.E., Jr.; Lee, K.L.; Califf, R.M.; Pryor, D.B.; Rosati, R.A. Regression modelling strategies for improved prognostic prediction. Stat. Med. 1984, 3, 143–152. [Google Scholar] [CrossRef]
- Harrell, F.E. Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis; Springer: New York, NY, USA, 2001. [Google Scholar]
- Steyerberg, E.W.; Eijkemans, M.J.; Harrell, F.E., Jr.; Habbema, J.D.F. Prognostic modeling with logistic regression analysis. Med. Decis. Mak. 2001, 21, 45–56. [Google Scholar] [CrossRef]
- Steyerberg, E.W. Clinical Prediction Models; Springer: New York, NY, USA, 2009. [Google Scholar]
- Harrell, F.E.; Lee, K.L.; Mark, D.B. Tutorial in biostatistics—Multivariable prognostic models: Issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat. Med. 1996, 15, 361–387. [Google Scholar] [CrossRef]
- Steyerberg, E.W.; Eijkemans, M.J.; Harrell, F.E., Jr.; Habbema, J.D.F. Prognostic modelling with logistic regression analysis: A comparison of selection and estimation methods in small data sets. Stat. Med. 2000, 19, 1059–1079. [Google Scholar] [CrossRef]
- Steyerberg, E.W.; Bleeker, S.E.; Moll, H.A.; Grobbee, D.E.; Moons, K.G. Internal and external validation of predictive models: A simulation study of bias and precision in small samples. J. Clin. Epidemiol. 2003, 56, 441–447. [Google Scholar] [CrossRef]
- Ambler, G.; Brady, A.R.; Royston, P. Simplifying a prognostic model: A simulation study based on clinical data. Stat. Med. 2002, 21, 3803–3822. [Google Scholar] [CrossRef]
- Pavlou, M.; Ambler, G.; Seaman, S.; De Iorio, M.; Omar, R.Z. Review and evaluation of penalised regression methods for risk prediction in lowdimensional data with few events. Stat. Med. 2016, 35, 1159–1177. [Google Scholar] [CrossRef]
- Moons, K.G.; de Groot, J.A.; Bouwmeester, W.; Vergouwe, Y.; Mallett, S.; Altman, D.G.; Reitsma, J.B.; Collins, G.S. Critical appraisal and data extraction for systematic reviews of prediction modelling studies: The CHARMS checklist. PLoS Med 2014, 11, e1001744. [Google Scholar] [CrossRef]
- Pavlou, M.; Ambler, G.; Seaman, S.R.; Guttmann, O.; Elliott, P.; King, M.; Omar, R.Z. How to develop a more accurate risk prediction model when there are few events. BMJ 2015, 351, h3868. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Courvoisier, D.S.; Combescure, C.; Agoritsas, T.; Gayet-Ageron, A.; Perneger, T.V. Performance of logistic regression modeling: Beyond the number of events per variable, the role of data structure. J. Clin. Epidemiol. 2011, 64, 993–1000. [Google Scholar] [CrossRef] [PubMed]
- Van Smeden, M.; de Groot, J.A.; Moons, K.G.; Collins, G.S.; Altman, D.G.; Eijkemans, M.J.; Reitsma, J.B. No rationale for 1 variable per 10 events criterion for binary logistic regression analysis. BMC Med. Res. Methodol. 2016, 16, 163. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ogundimu, E.O.; Altman, D.G.; Collins, G.S. Adequate sample size for developing prediction models is not simply related to events per variable. J. Clin. Epidemiol. 2016, 76, 175–182. [Google Scholar] [CrossRef] [Green Version]
- Smeden, M.; Moons, K.G.M.; Groot, J.A.H.; Collins, G.S.; Altman, D.G.; Eijkemans, M.J.C.; Reitsma, J.B. Sample size for binary logistic prediction models: Beyond events per variable criteria. Stat. Methods Med. Res. 2019, 28, 2455–2474. [Google Scholar] [CrossRef] [Green Version]
- Fahrmeir, L.; Kneib, T.; Lang, S.; Marx, B. Regression: Models, Methods and Applications; Springer: Berlin/Heidelberg, Germany, 2013; p. 663. [Google Scholar]
- Ward, M.D.; Ahlquist, J.S. Maximum Likelihood for Social Science: Strategies for Analysis; Cambridge University Press: Cambridge, UK, 2018; p. 36. [Google Scholar]
- Hosmer, D.W.; Lemeshow, S. Applied Logistic Regression, 3rd ed.; Wiley: New York, NY, USA, 2013. [Google Scholar]
- Cohen, J.; Cohen, P.; West, S.G.; Aiken, L.S. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences, 3rd ed.; Routledge: Oxfordshire, UK, 2002. [Google Scholar]
- Cox, D.D.; Snell, E.J. The Analysis of Binary Data, 2nd ed.; Chapman and Hall: London, UK, 1989. [Google Scholar]
- Allison, P.D. Measures of fit for logistic regression. In Proceedings of the SAS Global Forum 2014 Conference, Washington, DC, USA, 23–26 March 2014; paper no. 1485–2014. pp. 1–13. [Google Scholar]
- Long, J.S.; Freese, J. Regression Models for Categorical Dependent Variables Using Stata, 3rd ed.; Stata Press: College Station, TX, USA, 2014. [Google Scholar]
- Huang, H.; Xu, H.H.; Wang, X.H.; Silamu, W. Maximum F1-Score Discriminative Training Criterion for Automatic Mispronunciation Detection. IEEE/ACM Trans. Audio Speech Lang. Processing 2015, 23, 787–797. [Google Scholar] [CrossRef]
- Ma, W.B.; Lejeune, M.A. A distributionally robust area under curve maximization model. Oper. Res. Lett. 2020, 48, 460–466. [Google Scholar] [CrossRef]
- Killeen, P.R.; Taylor, T.J. Symmetric receiver operating characteristics. J. Math. Psychol. 2004, 48, 432–434. [Google Scholar] [CrossRef]
- Somodi, I.; Lepesi, N.; Botta-Dukat, Z. Prevalence dependence in model goodness measures with special emphasis on true skill statistics. Ecol. Evol. 2017, 7, 863–872. [Google Scholar] [CrossRef] [Green Version]
- Uebersax, J.S. A Generalized Kappa Coefficient. Educ. Psychol. Meas. 1982, 42, 181–183. [Google Scholar] [CrossRef]
- Dua, D.; Graff, C. UCI Machine Learning Repository; University of California, School of Information and Computer Science: Irvine, CA, USA, 2019; Available online: http://archive.ics.uci.edu/ml (accessed on 22 October 2021).
- Pearson, K. The Problem of the Random Walk. Nature 1905, 72, 294. [Google Scholar] [CrossRef]
- Carrington, A.M.; Fieguth, P.W.; Qazi, H.; Holzinger, A.; Chen, H.H.; Mayr, F.; Manuel, D.G. A new concordant partial AUC and partial c statistic for imbalanced data in the evaluation of machine learning algorithms. BMC Med. Inform. Decis. Mak. 2020, 20, 1–12. [Google Scholar] [CrossRef] [PubMed]
β | SE | WaldStat | Sigve | Exp(β) | |
---|---|---|---|---|---|
Constant | Valβ | ValSe | ValWald | ValSigVE | ValExpB |
Name of the Class | Interval |
---|---|
“unsatisfactory” prediction power | <10% |
“week” prediction power | (10%, 20%) |
“appropriate” prediction power | (20%, 30%) |
“good” prediction power | (30%, 40%) |
“very good” prediction power | ≥40% |
Predicted | ||||
---|---|---|---|---|
Machine Failure | Percentage Correct | |||
NO | YES | |||
Known | NO | Val1(TN) | Val2(FP) | ValSPEC% |
YES | Val3(FN) | Val4(TP) | ValSENS% | |
Accuracy | ValTot% |
β | SE | WaldStat | Sigve | Exp(β) | CI of EXP(β) | ||
---|---|---|---|---|---|---|---|
LowExpβ | UppExpβ | ||||||
Var1(CatA) | 0.78 | 0.072 | 114.079 | 0.01 | 22.65 | 11.879 | 22.495 |
Var2 | −0.75 | 0.096 | 59.442 | 0.03 | 0.476 | 0.394 | 0.575 |
Var3 | 0.02 | 0.001 | 473.247 | 0.83 | 1.012 | 1.011 | 1.013 |
Constant | −37.7 | 14.641 | 6.278 | 0.012 | 0 |
UDI | Product ID | V1 | V2 | V3 | V4 | V5 | V6 | VF |
---|---|---|---|---|---|---|---|---|
1 | M14860 | M | 298.1 | 308.6 | 1551 | 42.8 | 0 | 0 |
2 | L47181 | L | 298.2 | 308.7 | 1408 | 46.3 | 3 | 0 |
… | … | … | … | … | … | … | … | … |
78 | L47257 | L | 298.8 | 308.9 | 1455 | 41.3 | 208 | 1 |
… | … | … | … | … | … | … | … | … |
V2 | V3 | V4 | V5 | V6 | |
---|---|---|---|---|---|
Statistic | 0.067 | 0.49 | 0.104 | 0.009 | 0.06 |
p-value | 0 | 0 | 0 | 0.64 | 0 |
QQ plot | Figure 1 | Figure 2 | Figure 3 | Figure 4 | Figure 5 |
Normality assumption passing (p-value > αnorm) | No | No | No | Yes | No |
β | SE | Wald | Sigve | Exp(β) | |
---|---|---|---|---|---|
Constant | −3.35 | 0.055 | 3675.133 | ~0 | 0.035 |
Chi-Square | Sigot | |
---|---|---|
Step | 1038.064 | ~0 |
Block | 1038.064 | ~0 |
Model | 1038.064 | ~0 |
Chi-Square | SigHL |
---|---|
13.39 | 0.099 |
−2 Log-Likelihood | Cox and Snell R2 | Nagelkerke R2 |
---|---|---|
1922.894 | 0.099 | 0.385 |
Known | Predicted | ||
---|---|---|---|
Machine Failure | Percentage Correct | ||
NO Failure (0) | Failure (1) | ||
NO Failure (0) | 9635 | 26 | 99.7% |
Failure (1) | 271 | 68 | 20.1% |
Accuracy | 97.0% |
β | SE | WaldStat | Sigve | Exp(β) | CI of EXP(β) | ||
---|---|---|---|---|---|---|---|
LowExpβ | UppExpβ | ||||||
V2 | 0.772 | 0.072 | 114.079 | ~0 | 2.165 | 1.879 | 2.495 |
V3 | −0.743 | 0.096 | 59.442 | ~0 | 0.476 | 0.394 | 0.575 |
V4 | 0.012 | 0.001 | 473.247 | ~0 | 1.012 | 1.011 | 1.013 |
V5 | 0.281 | 0.011 | 599.492 | ~0 | 1.324 | 1.295 | 1.354 |
V6 | 0.013 | 0.001 | 138.315 | ~0 | 1.013 | 1.011 | 1.016 |
Constant | −36.69 | 14.641 | 6.278 | 0.012 | 0 |
Known Machine Failure | Predicted | |||
---|---|---|---|---|
Machine Failure | Percentage Correct | |||
0 | 1 | |||
NO Failure | 0 | 9609 | 52 | 99.5% |
Failure | 1 | 243 | 96 | 28.3% |
Accuracy | 97.1% |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Iantovics, L.B.; Enăchescu, C. Method for Data Quality Assessment of Synthetic Industrial Data. Sensors 2022, 22, 1608. https://doi.org/10.3390/s22041608
Iantovics LB, Enăchescu C. Method for Data Quality Assessment of Synthetic Industrial Data. Sensors. 2022; 22(4):1608. https://doi.org/10.3390/s22041608
Chicago/Turabian StyleIantovics, László Barna, and Călin Enăchescu. 2022. "Method for Data Quality Assessment of Synthetic Industrial Data" Sensors 22, no. 4: 1608. https://doi.org/10.3390/s22041608
APA StyleIantovics, L. B., & Enăchescu, C. (2022). Method for Data Quality Assessment of Synthetic Industrial Data. Sensors, 22(4), 1608. https://doi.org/10.3390/s22041608