Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A naïve Bayes regularized logistic regression estimator for low-dimensional classification

Published: 01 September 2024 Publication History

Abstract

To reduce the estimator's variance and prevent overfitting, regularization techniques have attracted great interest from the statistics and machine learning communities. Most existing regularized methods rely on the sparsity assumption that a model with fewer parameters predicts better than one with many parameters. This assumption works particularly well in high-dimensional problems. However, the sparsity assumption may not be necessary when the number of predictors is relatively small compared to the number of training instances. This paper argues that shrinking the coefficients towards a low-variance data-driven estimate could be a better regularization strategy for such situations. For low-dimensional classification problems, we propose a naïve Bayes regularized logistic regression (NBRLR) that shrinks the logistic regression coefficients toward the naïve Bayes estimate to provide a reduction in variance. Our approach is primarily motivated by the fact that naïve Bayes is functionally equivalent to logistic regression if naïve Bayes' conditional independence assumption holds. Under standard conditions, we prove the consistency of the NBRLR estimator. Extensive simulation and empirical experimental results show that NBRLR is a competitive alternative to various state-of-the-art classifiers, especially on low-dimensional datasets.

Highlights

Propose a novel regularization method for classification problems.
Provide theoretical results, including consistency of the proposed estimator.
Provide extensive simulation and empirical experimental results, which support the competence of the proposed estimator.

References

[1]
E. Azeraf, E. Monfrini, W. Pieczynski, Using the naive Bayes as a discriminative model, in: Proceedings of the 2021 13th International Conference on Machine Learning and Computing, Association for Computing Machinery, New York, NY, USA, 2021, pp. 106–110,.
[2]
E. Azeraf, E. Monfrini, W. Pieczynski, Equivalence between lc-crf and hmm, and discriminative computing of hmm-based mpm and map, Algorithms 16 (2023) 173.
[3]
P. Domingos, M. Pazzani, Beyond independence: conditions for the optimality of the simple Bayesian classifier, in: Proc. 13th Intl. Conf. Machine Learning, 1996, pp. 105–112.
[4]
B. Efron, T. Hastie, I. Johnstone, R. Tibshirani, et al., Least angle regression, Ann. Stat. 32 (2004) 407–499.
[5]
H.G. Eggleston, Convexity, Cambridge Tracts in Mathematics and Mathematical Physics, vol. 47, Cambridge University Press, 1958.
[6]
J. Fan, R. Li, Variable selection via nonconcave penalized likelihood and its oracle properties, J. Am. Stat. Assoc. 96 (2001) 1348–1360.
[7]
J. Fan, H. Peng, Nonconcave penalized likelihood with a diverging number of parameters, Ann. Stat. 32 (2004) 928–961.
[8]
J. Friedman, T. Hastie, R. Tibshirani, Regularization paths for generalized linear models via coordinate descent, J. Stat. Softw. 33 (2010) 1.
[9]
A. Fujino, N. Ueda, K. Saito, A hybrid generative/discriminative approach to text classification with additional information, Inf. Process. Manag. 43 (2007) 379–392.
[10]
D.J. Hand, K. Yu, Idiot's Bayes—not so stupid after all?, Int. Stat. Rev. 69 (2001) 385–398.
[11]
T. Hastie, J. Friedman, R. Tibshirani, The Elements of Statistical Learning: Data Mining, Inference and Prediction, Springer, New York, NY, 2001.
[12]
X. He, P. Shi, Convergence rate of b-spline estimators of nonparametric conditional quantile functions, J. Nonparametr. Stat. 3 (1994) 299–308.
[13]
A.E. Hoerl, R.W. Kennard, Ridge regression: biased estimation for nonorthogonal problems, Technometrics 12 (1970) 55–67.
[14]
L. Jiang, L. Zhang, L. Yu, D. Wang, Class-specific attribute weighted naive Bayes, Pattern Recognit. 88 (2019) 321–330.
[15]
C. Kang, J. Tian, A hybrid generative/discriminative Bayesian classifier, in: Proceedings of the 19th International Florida Artificial Intelligence Research Society Conference, 2006, pp. 562–567.
[16]
K. Knight, W. Fu, Asymptotics for lasso-type estimators, Ann. Stat. 28 (2000) 1356–1378.
[17]
S. Kwon, Y. Kim, Large sample properties of the scad-penalized maximum likelihood estimation on high dimensions, Stat. Sin. 22 (2012) 629–653.
[18]
T.M. Mitchell, Machine Learning, WCB/McGraw-Hill, Boston, MA, 1997.
[19]
S.N. Negahban, P. Ravikumar, M.J. Wainwright, B. Yu, A unified framework for high-dimensional analysis of m-estimators with decomposable regularizers, Stat. Sci. 27 (2012) 538–557. http://www.jstor.org/stable/41714783.
[20]
A.Y. Ng, M.I. Jordan, On discriminative vs. generative classifiers: a comparison of logistic regression and naïve Bayes, in: Advances in Neural Information Processing Systems, 2002, pp. 841–848.
[21]
T. Niblett, Constructing decision trees in noisy domains, in: Proceedings of the Second European Working Session on Learning, Sigma, Bled, Yugoslavia, 1987, pp. 67–78.
[22]
R. Raina, Y. Shen, A.Y. Ng, A. McCallum, Classification with hybrid generative/discriminative models, in: S. Thrun, L.K. Saul, B. Schölkopf (Eds.), Advances in Neural Information Processing Systems, 2003, pp. 545–552.
[23]
Y. Tan, New Probabilistic Techniques for Classification Problems and an Application, Ph.D. thesis University of Kansas School of Business, Lawrence, KS, USA, 2020.
[24]
Y. Tan, P.P. Shenoy, A bias-variance based heuristic for constructing a hybrid logistic regression-naïve Bayes model for classification, Int. J. Approx. Reason. 117 (2020) 15–28.
[25]
R. Tibshirani, Regression shrinkage and selection via the lasso, J. R. Stat. Soc., Ser. B, Methodol. 58 (1996) 267–288.
[26]
F. Wilcoxon, Individual comparisons by ranking methods, in: Breakthroughs in Statistics: Methodology and Distribution, Springer, 1992, pp. 196–202.
[27]
N.A. Zaidi, M.J. Carman, J. Cerquides, G.I. Webb, Naive-Bayes inspired effective pre-conditioner for speeding-up logistic regression, in: IEEE 2014 International Conference on Data Mining (ICDM), IEEE, 2014, pp. 1097–1102.
[28]
N.A. Zaidi, J. Cerquides, M.J. Carman, G.I. Webb, Alleviating naive Bayes attribute independence assumption by attribute weighting, J. Mach. Learn. Res. 14 (2013) 1947–1988.
[29]
N.A. Zaidi, G.I. Webb, M.J. Carman, F. Petitjean, J. Cerquides, A L R n: accelerated higher-order logistic regression, Mach. Learn. 104 (2016) 151–194.
[30]
H. Zhang, L. Jiang, L. Yu, Class-specific attribute value weighting for naive Bayes, Inf. Sci. 508 (2020) 260–274.
[31]
W. Zhang, L. Jiang, H. Zhang, A feature augmentation-based method for constructing generative-discriminative hybrid models, Sci. Sin. Inform. 52 (2022) 1792–1807.
[32]
C. Zheng, G. Wu, F. Bao, Y. Cao, C. Li, J. Zhu, Revisiting discriminative vs. generative classifiers: theory and implications, in: International Conference on Machine Learning, PMLR, 2023, pp. 42420–42477.
[33]
H. Zou, T. Hastie, Regularization and variable selection via the elastic net, J. R. Stat. Soc., Ser. B, Stat. Methodol. 67 (2005) 301–320.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image International Journal of Approximate Reasoning
International Journal of Approximate Reasoning  Volume 172, Issue C
Sep 2024
314 pages

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 01 September 2024

Author Tags

  1. Classification
  2. Regularization method
  3. Logistic regression
  4. Naïve Bayes
  5. Data-driven shrinkage

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 26 Jan 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media