Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3583131.3590380acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article
Open access

Multi-Objective Optimization of Performance and Interpretability of Tabular Supervised Machine Learning Models

Published: 12 July 2023 Publication History

Abstract

We present a model-agnostic framework for jointly optimizing the predictive performance and interpretability of supervised machine learning models for tabular data. Interpretability is quantified via three measures: feature sparsity, interaction sparsity of features, and sparsity of non-monotone feature effects. By treating hyperparameter optimization of a machine learning algorithm as a multi-objective optimization problem, our framework allows for generating diverse models that trade off high performance and ease of interpretability in a single optimization run. Efficient optimization is achieved via augmentation of the search space of the learning algorithm by incorporating feature selection, interaction and monotonicity constraints into the hyperparameter search space. We demonstrate that the optimization problem effectively translates to finding the Pareto optimal set of groups of selected features that are allowed to interact in a model, along with finding their optimal monotonicity constraints and optimal hyperparameters of the learning algorithm itself. We then introduce a novel evolutionary algorithm that can operate efficiently on this augmented search space. In benchmark experiments, we show that our framework is capable of finding diverse models that are highly competitive or outperform state-of-the-art XGBoost or Explainable Boosting Machine models, both with respect to performance and interpretability.

References

[1]
D. W. Apley and J. Zhu. 2020. Visualizing the Effects of Predictor Variables in Black Box Supervised Learning Models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 82, 4 (2020), 1059--1086.
[2]
M. Binder, J. Moosbauer, J. Thomas, and B. Bischl. 2020. Multi-Objective Hyper-parameter Tuning and Feature Selection Using Filter Ensembles. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference. 471--479.
[3]
B. Bischl, M. Binder, M. Lang, T. Pielok, J. Richter, S. Coors, J. Thomas, T. Ullmann, M. Becker, A.-L. Boulesteix, D. Deng, and M. Lindauer. 2021. Hyperparameter Optimization: Foundations, Algorithms, Best Practices, and Open Challenges. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery (2021), e1484.
[4]
B. Bischl, G. Casalicchio, M. Feurer, P. Gijsbers, F. Hutter, M. Lang, R. Gomes Mantovani, J. N. van Rijn, and J. Vanschoren. 2021. OpenML Benchmarking Suites. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, J. Vanschoren and S. Yeung (Eds.), Vol. 1.
[5]
B. Bischl, I. Vatolkin, and M. Preuss. 2010. Selecting Small Audio Feature Sets in Music Classification by Means of Asymmetric Mutation. In Proceedings of the 11th International Conference on Parallel Problem Solving from Nature: Part I. 314--323.
[6]
L. Breiman. 2001. Random Forests. Machine Learning 45, 1 (2001), 5--32.
[7]
C.-H. Chang, R. Caruana, and A. Goldenberg. 2022. Node-GAM: Neural Generalized Additive Model for Interpretable Deep Learning. The Tenth International Conference on Learning Representations, ICLR (2022).
[8]
T. Chen and C. Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 785--794.
[9]
T. Y. Chen, H. Leung, and I. K. Mak. 2005. Adaptive Random Testing. In Advances in Computer Science - ASIAN 2004. Higher-Level Decision Making, M. J. Maher (Ed.). 320--329.
[10]
R. Couronné, P. Probst, and A.-L. Boulesteix. 2018. Random Forest versus Logistic Regression: A Large-Scale Benchmark Experiment. BMC Bioinformatics 19, 1 (2018), 1--14.
[11]
M. Dash and H. Liu. 1997. Feature Selection for Classification. Intelligent Data Analysis 1, 3 (1997), 131--156.
[12]
K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. 2002. A Fast and Elitist Multi-objective Genetic Algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6, 2 (2002), 182--197.
[13]
J. Demšar. 2006. Statistical Comparisons of Classifiers over Multiple Data Sets. The Journal of Machine Learning Research 7 (2006), 1--30.
[14]
A. Dubey, F. Radenovic, and D. Mahajan. 2022. Scalable Interpretability via Polynomials. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35.
[15]
N. Erickson, J. Mueller, A. Shirkov, H. Zhang, P. Larroy, M. Li, and A. Smola. 2020. AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data. In 7th ICML Workshop on Automated Machine Learning.
[16]
E. Falkenauer. 1993. The Grouping Genetic Algorithms: Widening the Scope of the GA's. Belgian Journal of Operations Research, Statistics, and Computer Science 33, 1--2 (1993), 79--102.
[17]
E. Falkenauer. 1996. A Hybrid Grouping Genetic Algorithm for Bin Packing. Journal of Heuristics 2, 1 (1996), 5--30.
[18]
M. Feurer and F. Hutter. 2019. Hyperparameter Optimization. In Automated Machine Learning: Methods, Systems, Challenges, F. Hutter, L. Kotthoff, and J. Vanschoren (Eds.). Springer International Publishing, Cham, 3--33.
[19]
M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, and F. Hutter. 2015. Efficient and Robust Automated Machine Learning. In Advances in Neural Information Processing Systems, C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett (Eds.), Vol. 28.
[20]
J. H. Friedman. 2001. Greedy Function Approximation: A Gradient Boosting Machine. Annals of Statistics 29, 5 (2001), 1189--1232.
[21]
J. H. Friedman and B. E. Popescu. 2008. Predictive Learning via Rule Ensembles. The Annals of Applied Statistics 2, 3 (2008), 916--954.
[22]
P. Gijsbers, M. L. P. Bueno, S. Coors, E. LeDell, S. Poirier, J. Thomas, B. Bischl, and J. Vanschoren. 2022. AMLB: An AutoML Benchmark. arXiv:2207.12560 [cs.LG] (2022).
[23]
Y. Gorishniy, I. Rubachev, V. Khrulkov, and A. Babenko. 2021. Revisiting Deep Learning Models for Tabular Data. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P. S. Liang, and J. Wortman Vaughan (Eds.), Vol. 34.
[24]
B. M. Greenwell, B. C. Boehmke, and A. J. McCarthy. 2018. A Simple and Effective Model-Based Variable Importance Measure. arXiv:1805.04755 [stat.ML] (2018).
[25]
L. Grinsztajn, E. Oyallon, and G. Varoquaux. 2022. Why Do Tree-Based Models Still Outperform Deep Learning on Typical Tabular Data?. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
[26]
I. Guyon and A. Elisseeff. 2003. An Introduction to Variable and Feature Selection. Journal of Machine Learning Research 3 (2003), 1157--1182.
[27]
J. Handl and J. Knowles. 2006. Feature Subset Selection in Unsupervised Learning via Multiobjective Optimization. International Journal of Computational Intelligence Research 2, 3 (2006), 217--238.
[28]
T. Hastie and R. Tibshirani. 1986. Generalized Additive Models. Statistical Science 1, 3 (1986), 297--310.
[29]
F. Imrie, A. Norcliffe, P. Liò, and M. van der Schaar. 2022. Composite Feature Selection using Deep Ensembles. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35.
[30]
D. R. Jones, M. Schonlau, and W. J. Welch. 1998. Efficient Global Optimization of Expensive Black-Box Functions. Journal of Global Optimization 13, 4 (1998), 455--492.
[31]
A. Kadra, M. Lindauer, F. Hutter, and J. Grabocka. 2021. Well-tuned Simple Nets Excel on Tabular Datasets. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P. S. Liang, and J. Wortman Vaughan (Eds.), Vol. 34.
[32]
F. Karl, T. Pielok, J. Moosbauer, F. Pfisterer, S. Coors, M. Binder, L. Schneider, J. Thomas, J. Richter, M. Lang, E. C. Garrido-Merchán, J. Branke, and B. Bischl. 2022. Multi-Objective Hyperparameter Optimization - An Overview. arXiv:2206.07438 [cs.LG] (2022).
[33]
G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y. Liu. 2017. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Advances in Neural Information Processing Systems, I. Guyon, U. von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30.
[34]
J. Knowles. 2006. ParEGO: A Hybrid Algorithm with On-Line Landscape Approximation for Expensive Multiobjective Optimization Problems. IEEE Transactions on Evolutionary Computation 10, 1 (2006), 50--66.
[35]
R. Kohavi and G. H. John. 1997. Wrappers for Feature Subset Selection. Artificial Intelligence 97, 1--2 (1997), 273--324.
[36]
C. Largeron, C. Moulin, and M. Géry. 2011. Entropy Based Feature Selection for Text Categorization. In Proceedings of the 2011 ACM Symposium on Applied Computing. 924--928.
[37]
R. Li, M. T. M. Emmerich, J. Eggermont, T. Bäck, M. Schütz, J. Dijkstra, and J. H. C. Reiber. 2013. Mixed Integer Evolution Strategies for Parameter Optimization. Evolutionary Computation 21, 1 (2013), 29--64.
[38]
M. Lindauer, K. Eggensperger, M. Feurer, A. Biedenkapp, D. Deng, C. Benjamins, T. Ruhkopf, R. Sass, and F. Hutter. 2022. SMAC3: A Versatile Bayesian Optimization Package for Hyperparameter Optimization. Journal of Machine Learning Research 23 (2022), 54--1.
[39]
Y. Lou, R. Caruana, and J. Gehrke. 2012. Intelligible Models for Classification and Regression. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 150--158.
[40]
Y. Lou, R. Caruana, J. Gehrke, and G. Hooker. 2013. Accurate Intelligible Models with Pairwise Interactions. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 623--631.
[41]
C. Molnar. 2022. Interpretable Machine Learning (2 ed.). https://christophm.github.io/interpretable-ml-book
[42]
C. Molnar, G. Casalicchio, and B. Bischl. 2020. Quantifying Model Complexity via Functional Decomposition for Better post-hoc Interpretability. In Machine Learning and Knowledge Discovery in Databases, P. Cellier and K. Driessens (Eds.). 193--204.
[43]
A. Morales-Hernández, I. van Nieuwenhuyse, and S. Rojas Gonzalez. 2022. A Survey on Multi-Objective Hyperparameter Optimization Algorithms for Machine Learning. Artificial Intelligence Review (2022), 1--51.
[44]
J. A. Nelder. 1977. A Reformulation of Linear Models. Journal of the Royal Statistical Society. Series A (General) 140, 1 (1977), 48--77.
[45]
H. Nori, S. Jenkins, P. Koch, and R. Caruana. 2019. InterpretML: A Unified Framework for Machine Learning Interpretability. arXiv:1909.09223 [cs.LG] (2019).
[46]
R. Potharst and A. J. Feelders. 2002. Classification Trees for Problems with Monotonicity Constraints. ACM SIGKDD Explorations Newsletter 4, 1 (2002), 1--10.
[47]
P. Probst, A.-L. Boulesteix, and B. Bischl. 2019. Tunability: Importance of Hyperparameters of Machine Learning Algorithms. Journal of Machine Learning Research 20, 53 (2019), 1--32.
[48]
F. Radenovic, A. Dubey, and D. Mahajan. 2022. Neural Basis Models for Interpretability. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35.
[49]
A. Sharma and H. Wehrheim. 2020. Testing Monotonicity of Machine Learning Models. arXiv:2002.12278 [cs.LG] (2020).
[50]
R. Shwartz-Ziv and A. Armon. 2022. Tabular Data: Deep Learning is Not All You Need. Information Fusion 81 (2022), 84--90.
[51]
M. Tsang, H. Liu, S. Purushotham, P. Murali, and Y. Liu. 2018. Neural Interaction Transparency (NIT): Disentangling Learned Interactions for Improved Interpretability. In Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), Vol. 31.
[52]
J. N. van Rijn and F. Hutter. 2018. Hyperparameter Importance Across Datasets. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2367--2376.
[53]
M. Velikova and H. A. M. Daniels. 2004. Decision Trees for Monotone Price Models. Computational Management Science 1 (2004), 231--244.
[54]
B. Xue, W. Fu, and M. Zhang. 2014. Multi-Objective Feature Selection in Classification: A Differential Evolution Approach. In Simulated Evolution and Learning: 10th International Conference. 516--528.
[55]
B. Xue, M. Zhang, W. N. Browne, and X. Yao. 2016. A Survey on Evolutionary Computation Approaches to Feature Selection. IEEE Transactions on Evolutionary Computation 20, 4 (2016), 606--626.
[56]
Z. Yang, A. Zhang, and A. Sudjianto. 2021. GAMI-Net: An Explainable Neural Network Based on Generalized Additive Models with Structured Interactions. Pattern Recognition 120 (2021), 108192.
[57]
E. Zitzler and L. Thiele. 1998. Multiobjective Optimization Using Evolutionary Algorithms - A Comparative Case Study. In Proceedings of the 5th International Conference on Parallel Problem Solving from Nature. 292--304.
[58]
H. Zou and T. Hastie. 2005. Regularization and Variable Selection via the Elastic Net. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67, 2 (2005), 301--320.

Cited By

View all
  • (2024)PositionProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693301(30566-30584)Online publication date: 21-Jul-2024
  • (2024)The Case for Hybrid Multi-Objective Optimisation in High-Stakes Machine Learning ApplicationsACM SIGKDD Explorations Newsletter10.1145/3682112.368211626:1(24-33)Online publication date: 25-Jul-2024
  • (2024)Concise rule induction algorithm based on one-sided maximum decision tree approachExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121365237:PAOnline publication date: 27-Feb-2024

Index Terms

  1. Multi-Objective Optimization of Performance and Interpretability of Tabular Supervised Machine Learning Models

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      GECCO '23: Proceedings of the Genetic and Evolutionary Computation Conference
      July 2023
      1667 pages
      ISBN:9798400701191
      DOI:10.1145/3583131
      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 12 July 2023

      Check for updates

      Author Tags

      1. supervised learning
      2. performance
      3. interpretability
      4. tabular data
      5. multi-objective
      6. evolutionary computation
      7. group structure

      Qualifiers

      • Research-article

      Funding Sources

      • Bavarian Ministry of Economic Affairs, Regional Development and Energy through the Center for Analytics - Data - Applications (ADACenter) within the framework of BAYERN DIGITAL II

      Conference

      GECCO '23
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)410
      • Downloads (Last 6 weeks)69
      Reflects downloads up to 06 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)PositionProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693301(30566-30584)Online publication date: 21-Jul-2024
      • (2024)The Case for Hybrid Multi-Objective Optimisation in High-Stakes Machine Learning ApplicationsACM SIGKDD Explorations Newsletter10.1145/3682112.368211626:1(24-33)Online publication date: 25-Jul-2024
      • (2024)Concise rule induction algorithm based on one-sided maximum decision tree approachExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121365237:PAOnline publication date: 27-Feb-2024

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media