research-article

Open access

Multi-Objective Optimization of Performance and Interpretability of Tabular Supervised Machine Learning Models

Authors:

Lennart Schneider,

Janek ThomasAuthors Info & Claims

GECCO '23: Proceedings of the Genetic and Evolutionary Computation Conference

Pages 538 - 547

https://doi.org/10.1145/3583131.3590380

Published: 12 July 2023 Publication History

Abstract

We present a model-agnostic framework for jointly optimizing the predictive performance and interpretability of supervised machine learning models for tabular data. Interpretability is quantified via three measures: feature sparsity, interaction sparsity of features, and sparsity of non-monotone feature effects. By treating hyperparameter optimization of a machine learning algorithm as a multi-objective optimization problem, our framework allows for generating diverse models that trade off high performance and ease of interpretability in a single optimization run. Efficient optimization is achieved via augmentation of the search space of the learning algorithm by incorporating feature selection, interaction and monotonicity constraints into the hyperparameter search space. We demonstrate that the optimization problem effectively translates to finding the Pareto optimal set of groups of selected features that are allowed to interact in a model, along with finding their optimal monotonicity constraints and optimal hyperparameters of the learning algorithm itself. We then introduce a novel evolutionary algorithm that can operate efficiently on this augmented search space. In benchmark experiments, we show that our framework is capable of finding diverse models that are highly competitive or outperform state-of-the-art XGBoost or Explainable Boosting Machine models, both with respect to performance and interpretability.

References

[1]

D. W. Apley and J. Zhu. 2020. Visualizing the Effects of Predictor Variables in Black Box Supervised Learning Models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 82, 4 (2020), 1059--1086.

[2]

M. Binder, J. Moosbauer, J. Thomas, and B. Bischl. 2020. Multi-Objective Hyper-parameter Tuning and Feature Selection Using Filter Ensembles. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference. 471--479.

[3]

B. Bischl, M. Binder, M. Lang, T. Pielok, J. Richter, S. Coors, J. Thomas, T. Ullmann, M. Becker, A.-L. Boulesteix, D. Deng, and M. Lindauer. 2021. Hyperparameter Optimization: Foundations, Algorithms, Best Practices, and Open Challenges. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery (2021), e1484.

[4]

B. Bischl, G. Casalicchio, M. Feurer, P. Gijsbers, F. Hutter, M. Lang, R. Gomes Mantovani, J. N. van Rijn, and J. Vanschoren. 2021. OpenML Benchmarking Suites. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, J. Vanschoren and S. Yeung (Eds.), Vol. 1.

[5]

B. Bischl, I. Vatolkin, and M. Preuss. 2010. Selecting Small Audio Feature Sets in Music Classification by Means of Asymmetric Mutation. In Proceedings of the 11th International Conference on Parallel Problem Solving from Nature: Part I. 314--323.

[6]

L. Breiman. 2001. Random Forests. Machine Learning 45, 1 (2001), 5--32.

Digital Library

[7]

C.-H. Chang, R. Caruana, and A. Goldenberg. 2022. Node-GAM: Neural Generalized Additive Model for Interpretable Deep Learning. The Tenth International Conference on Learning Representations, ICLR (2022).

[8]

T. Chen and C. Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 785--794.

[9]

T. Y. Chen, H. Leung, and I. K. Mak. 2005. Adaptive Random Testing. In Advances in Computer Science - ASIAN 2004. Higher-Level Decision Making, M. J. Maher (Ed.). 320--329.

[10]

R. Couronné, P. Probst, and A.-L. Boulesteix. 2018. Random Forest versus Logistic Regression: A Large-Scale Benchmark Experiment. BMC Bioinformatics 19, 1 (2018), 1--14.

[11]

M. Dash and H. Liu. 1997. Feature Selection for Classification. Intelligent Data Analysis 1, 3 (1997), 131--156.

[12]

K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. 2002. A Fast and Elitist Multi-objective Genetic Algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6, 2 (2002), 182--197.

Digital Library

[13]

J. Demšar. 2006. Statistical Comparisons of Classifiers over Multiple Data Sets. The Journal of Machine Learning Research 7 (2006), 1--30.

Digital Library

[14]

A. Dubey, F. Radenovic, and D. Mahajan. 2022. Scalable Interpretability via Polynomials. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35.

[15]

N. Erickson, J. Mueller, A. Shirkov, H. Zhang, P. Larroy, M. Li, and A. Smola. 2020. AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data. In 7th ICML Workshop on Automated Machine Learning.

[16]

E. Falkenauer. 1993. The Grouping Genetic Algorithms: Widening the Scope of the GA's. Belgian Journal of Operations Research, Statistics, and Computer Science 33, 1--2 (1993), 79--102.

[17]

E. Falkenauer. 1996. A Hybrid Grouping Genetic Algorithm for Bin Packing. Journal of Heuristics 2, 1 (1996), 5--30.

[18]

M. Feurer and F. Hutter. 2019. Hyperparameter Optimization. In Automated Machine Learning: Methods, Systems, Challenges, F. Hutter, L. Kotthoff, and J. Vanschoren (Eds.). Springer International Publishing, Cham, 3--33.

[19]

M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, and F. Hutter. 2015. Efficient and Robust Automated Machine Learning. In Advances in Neural Information Processing Systems, C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett (Eds.), Vol. 28.

[20]

J. H. Friedman. 2001. Greedy Function Approximation: A Gradient Boosting Machine. Annals of Statistics 29, 5 (2001), 1189--1232.

[21]

J. H. Friedman and B. E. Popescu. 2008. Predictive Learning via Rule Ensembles. The Annals of Applied Statistics 2, 3 (2008), 916--954.

[22]

P. Gijsbers, M. L. P. Bueno, S. Coors, E. LeDell, S. Poirier, J. Thomas, B. Bischl, and J. Vanschoren. 2022. AMLB: An AutoML Benchmark. arXiv:2207.12560 [cs.LG] (2022).

[23]

Y. Gorishniy, I. Rubachev, V. Khrulkov, and A. Babenko. 2021. Revisiting Deep Learning Models for Tabular Data. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P. S. Liang, and J. Wortman Vaughan (Eds.), Vol. 34.

[24]

B. M. Greenwell, B. C. Boehmke, and A. J. McCarthy. 2018. A Simple and Effective Model-Based Variable Importance Measure. arXiv:1805.04755 [stat.ML] (2018).

[25]

L. Grinsztajn, E. Oyallon, and G. Varoquaux. 2022. Why Do Tree-Based Models Still Outperform Deep Learning on Typical Tabular Data?. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track.

[26]

I. Guyon and A. Elisseeff. 2003. An Introduction to Variable and Feature Selection. Journal of Machine Learning Research 3 (2003), 1157--1182.

Digital Library

[27]

J. Handl and J. Knowles. 2006. Feature Subset Selection in Unsupervised Learning via Multiobjective Optimization. International Journal of Computational Intelligence Research 2, 3 (2006), 217--238.

[28]

T. Hastie and R. Tibshirani. 1986. Generalized Additive Models. Statistical Science 1, 3 (1986), 297--310.

[29]

F. Imrie, A. Norcliffe, P. Liò, and M. van der Schaar. 2022. Composite Feature Selection using Deep Ensembles. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35.

[30]

D. R. Jones, M. Schonlau, and W. J. Welch. 1998. Efficient Global Optimization of Expensive Black-Box Functions. Journal of Global Optimization 13, 4 (1998), 455--492.

Digital Library

[31]

A. Kadra, M. Lindauer, F. Hutter, and J. Grabocka. 2021. Well-tuned Simple Nets Excel on Tabular Datasets. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P. S. Liang, and J. Wortman Vaughan (Eds.), Vol. 34.

[32]

F. Karl, T. Pielok, J. Moosbauer, F. Pfisterer, S. Coors, M. Binder, L. Schneider, J. Thomas, J. Richter, M. Lang, E. C. Garrido-Merchán, J. Branke, and B. Bischl. 2022. Multi-Objective Hyperparameter Optimization - An Overview. arXiv:2206.07438 [cs.LG] (2022).

[33]

G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y. Liu. 2017. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Advances in Neural Information Processing Systems, I. Guyon, U. von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30.

[34]

J. Knowles. 2006. ParEGO: A Hybrid Algorithm with On-Line Landscape Approximation for Expensive Multiobjective Optimization Problems. IEEE Transactions on Evolutionary Computation 10, 1 (2006), 50--66.

Digital Library

[35]

R. Kohavi and G. H. John. 1997. Wrappers for Feature Subset Selection. Artificial Intelligence 97, 1--2 (1997), 273--324.

Digital Library

[36]

C. Largeron, C. Moulin, and M. Géry. 2011. Entropy Based Feature Selection for Text Categorization. In Proceedings of the 2011 ACM Symposium on Applied Computing. 924--928.

[37]

R. Li, M. T. M. Emmerich, J. Eggermont, T. Bäck, M. Schütz, J. Dijkstra, and J. H. C. Reiber. 2013. Mixed Integer Evolution Strategies for Parameter Optimization. Evolutionary Computation 21, 1 (2013), 29--64.

Digital Library

[38]

M. Lindauer, K. Eggensperger, M. Feurer, A. Biedenkapp, D. Deng, C. Benjamins, T. Ruhkopf, R. Sass, and F. Hutter. 2022. SMAC3: A Versatile Bayesian Optimization Package for Hyperparameter Optimization. Journal of Machine Learning Research 23 (2022), 54--1.

[39]

Y. Lou, R. Caruana, and J. Gehrke. 2012. Intelligible Models for Classification and Regression. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 150--158.

[40]

Y. Lou, R. Caruana, J. Gehrke, and G. Hooker. 2013. Accurate Intelligible Models with Pairwise Interactions. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 623--631.

[41]

C. Molnar. 2022. Interpretable Machine Learning (2 ed.). https://christophm.github.io/interpretable-ml-book

[42]

C. Molnar, G. Casalicchio, and B. Bischl. 2020. Quantifying Model Complexity via Functional Decomposition for Better post-hoc Interpretability. In Machine Learning and Knowledge Discovery in Databases, P. Cellier and K. Driessens (Eds.). 193--204.

[43]

A. Morales-Hernández, I. van Nieuwenhuyse, and S. Rojas Gonzalez. 2022. A Survey on Multi-Objective Hyperparameter Optimization Algorithms for Machine Learning. Artificial Intelligence Review (2022), 1--51.

[44]

J. A. Nelder. 1977. A Reformulation of Linear Models. Journal of the Royal Statistical Society. Series A (General) 140, 1 (1977), 48--77.

[45]

H. Nori, S. Jenkins, P. Koch, and R. Caruana. 2019. InterpretML: A Unified Framework for Machine Learning Interpretability. arXiv:1909.09223 [cs.LG] (2019).

[46]

R. Potharst and A. J. Feelders. 2002. Classification Trees for Problems with Monotonicity Constraints. ACM SIGKDD Explorations Newsletter 4, 1 (2002), 1--10.

Digital Library

[47]

P. Probst, A.-L. Boulesteix, and B. Bischl. 2019. Tunability: Importance of Hyperparameters of Machine Learning Algorithms. Journal of Machine Learning Research 20, 53 (2019), 1--32.

[48]

F. Radenovic, A. Dubey, and D. Mahajan. 2022. Neural Basis Models for Interpretability. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35.

[49]

A. Sharma and H. Wehrheim. 2020. Testing Monotonicity of Machine Learning Models. arXiv:2002.12278 [cs.LG] (2020).

[50]

R. Shwartz-Ziv and A. Armon. 2022. Tabular Data: Deep Learning is Not All You Need. Information Fusion 81 (2022), 84--90.

Digital Library

[51]

M. Tsang, H. Liu, S. Purushotham, P. Murali, and Y. Liu. 2018. Neural Interaction Transparency (NIT): Disentangling Learned Interactions for Improved Interpretability. In Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), Vol. 31.

[52]

J. N. van Rijn and F. Hutter. 2018. Hyperparameter Importance Across Datasets. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2367--2376.

[53]

M. Velikova and H. A. M. Daniels. 2004. Decision Trees for Monotone Price Models. Computational Management Science 1 (2004), 231--244.

[54]

B. Xue, W. Fu, and M. Zhang. 2014. Multi-Objective Feature Selection in Classification: A Differential Evolution Approach. In Simulated Evolution and Learning: 10th International Conference. 516--528.

[55]

B. Xue, M. Zhang, W. N. Browne, and X. Yao. 2016. A Survey on Evolutionary Computation Approaches to Feature Selection. IEEE Transactions on Evolutionary Computation 20, 4 (2016), 606--626.

Digital Library

[56]

Z. Yang, A. Zhang, and A. Sudjianto. 2021. GAMI-Net: An Explainable Neural Network Based on Generalized Additive Models with Structured Interactions. Pattern Recognition 120 (2021), 108192.

Digital Library

[57]

E. Zitzler and L. Thiele. 1998. Multiobjective Optimization Using Evolutionary Algorithms - A Comparative Case Study. In Proceedings of the 5th International Conference on Parallel Problem Solving from Nature. 292--304.

[58]

H. Zou and T. Hastie. 2005. Regularization and Variable Selection via the Elastic Net. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67, 2 (2005), 301--320.

Cited By

Lindauer MKarl FKlier AMoosbauer JTornede AMueller AHutter FFeurer MBischl BSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)PositionProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693301(30566-30584)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693301
Freitas A(2024)The Case for Hybrid Multi-Objective Optimisation in High-Stakes Machine Learning ApplicationsACM SIGKDD Explorations Newsletter10.1145/3682112.368211626:1(24-33)Online publication date: 25-Jul-2024
https://dl.acm.org/doi/10.1145/3682112.3682116
Hong JLee JSim M(2024)Concise rule induction algorithm based on one-sided maximum decision tree approachExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121365237:PAOnline publication date: 27-Feb-2024
https://dl.acm.org/doi/10.1016/j.eswa.2023.121365

Index Terms

Multi-Objective Optimization of Performance and Interpretability of Tabular Supervised Machine Learning Models
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by regression
    2. Machine learning algorithms
      1. Feature selection

Recommendations

Multi-objective safe reinforcement learning: the relationship between multi-objective reinforcement learning and safe reinforcement learning
Abstract
Reinforcement learning (RL) is a learning method that learns actions based on trial and error. Recently, multi-objective reinforcement learning (MORL) and safe reinforcement learning (SafeRL) have been studied. The objective of conventional RL is ...
A Decomposition-Based Evolutionary Algorithm with Adaptive Weight Vectors for Multi- and Many-objective Optimization
Applications of Evolutionary Computation
Abstract
The multi-objective evolutionary algorithms based on decomposition (MOEA/D) have achieved great success in the area of evolutionary multi-objective optimization. Numerous MOEA/D variants are focused on solving the normalized multi- and many-...
Movement Strategies for Multi-Objective Particle Swarm Optimization

Particle Swarm Optimization (PSO) is one of the most effective metaheuristics algorithms, with many successful real-world applications. The reason for the success of PSO is the movement behavior, which allows the swarm to effectively explore the search ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

GECCO '23: Proceedings of the Genetic and Evolutionary Computation Conference

July 2023

1667 pages

ISBN:9798400701191

DOI:10.1145/3583131

Chair:
Sara Silva,
Program Chair:
Luís Paquete

Copyright © 2023 Owner/Author(s).

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGEVO: ACM Special Interest Group on Genetic and Evolutionary Computation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 July 2023

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Bavarian Ministry of Economic Affairs, Regional Development and Energy through the Center for Analytics - Data - Applications (ADACenter) within the framework of BAYERN DIGITAL II

Conference

GECCO '23

Sponsor:

SIGEVO

GECCO '23: Genetic and Evolutionary Computation Conference

July 15 - 19, 2023

Lisbon, Portugal

Acceptance Rates

Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
730
Total Downloads

Downloads (Last 12 months)410
Downloads (Last 6 weeks)69

Reflects downloads up to 06 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Lindauer MKarl FKlier AMoosbauer JTornede AMueller AHutter FFeurer MBischl BSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)PositionProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693301(30566-30584)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693301
Freitas A(2024)The Case for Hybrid Multi-Objective Optimisation in High-Stakes Machine Learning ApplicationsACM SIGKDD Explorations Newsletter10.1145/3682112.368211626:1(24-33)Online publication date: 25-Jul-2024
https://dl.acm.org/doi/10.1145/3682112.3682116
Hong JLee JSim M(2024)Concise rule induction algorithm based on one-sided maximum decision tree approachExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121365237:PAOnline publication date: 27-Feb-2024
https://dl.acm.org/doi/10.1016/j.eswa.2023.121365

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents