AutoML and XAI PDF
AutoML and XAI PDF
AutoML and XAI PDF
frameworks
Summary
As the name suggests, AutoML is an automation of machine learning tasks. It serves as a
bridge between varying levels of expertise when designing a machine learning system,
with the goal of democratizing AI, making it more accessible to the world. There are
various approaches to tackle this objective and many frameworks to put it practical use.
Explainable AI is pretty self explanatory, the objective of this field of study is to make
machine learning models more interpretable and transparent, to shine a light on these so
called “black box” models. This report is a quantitative comparison of popular open
source frameworks for AutoML and Explainable AI. This report is comprised of two
parts.
1. Introduction
There are numerous approaches to AutoML, each with its unique theoretical foundations,
since we cannot perform a fair comparison of theoretical methods, therefore they must be
compared based on performance over various datasets and machine learning tasks.
1
2. Selected Frameworks
2.1 Auto-Sklearn
Salient features:
1. It includes various feature engineering methods such as one-hot-encoding,
numeric feature standardization, PCA and more.
2. It handles missing values and comes with 15 feature preprocessing algorithms out
of the box.
3. The models use sklearn estimators for regression and classification, which
provides an easy integration into existing sklearn environments.
4. It computes 38 statistics for a dataset and initializes the hyperparameters to the
optimised parameters of a dataset with similar statistics. (similarity calculated
using L1 norm)
5. It uses the optimisation framework SMAC3 which implements bayesian search
over hyperparameter space.
2
Drawbacks:
1. It lacks the ability to process natural language inputs and the ability to distinguish
between numeric and categorical inputs, which have to be fed to the model
beforehand.
2. It also does not have the ability to handle string inputs and requires integer
encoding of categorical strings.
2.2 H2O
H2O is an open source machine learning framework with its own algorithms that execute
on a server cluster accessible by a variety of interfaces and programming languages. It
includes an automatic machine learning module which uses its own algorithms to
generate a pipeline. h2o is developed in Java and includes Python, Javascript,Tableau,R
and Flow (web-UI) bindings. Note: version 3.28.0.1 was used for this comparison.
Salient Features:
1. It has a high level of abstraction aimed at making it accessible to everyone
regardless of expertise.
2. It performs an exhaustive search over feature engineering methods and model
hyperparameters for optimizing its pipelines.
3. Supports imputation, one-hot-encoding, standardisation for feature engineering
and automatically deals with categorical features.
3
4. It supports two methods of hyperparameter optimisation, cartesian grid search and
random grid search.
5. Supported models include generalized linear models, basic deep learning models,
gradient boosting machines and dense random forests.
6. The AutoML pipeline is limited to algorithm choice, stopping time and number of
validation folds.
7. It uses meta-learning methods like stacking and creates different ensembles of
trained models. Finally creates a leaderboard of best performing models.
Drawbacks:
1. Massive resource storage.
2.3 TPOT
4
TPOT (Tree-based-Pipeline-Optimisation-Tool) is a genetic programming-based pipeline
optimiser that automatically creates machine learning pipelines. It automates certain
processes of the machine learning system design cycle as shown in the figure above.
Note: version 0.11.0 was used for this comparison.
Salient Features:
1. Like auto-sklearn TPOT sources its data manipulators and algorithms from
sklearn.
2. Training time can be restricted by setting a time limit or population size. Its search
space can be restricted by a configuration file.
3. The optimisation process can be paused and resumed.
4. The biggest feature of TPOT is that it can port the optimised pipeline to code to be
further modified manually.
Drawbacks:
1. TPOT cannot automatically process natural language inputs and also categorical
inputs which have to integer encoded before feeding the data
2. Since it uses genetic programming, running times can be long before a high
accuracy is attained, but given time it will find the best parameters.
5
3.1 Framework parameters:
SS res
R2 score R2 ≡1− SS tot
Higher value indicates
better performance
SS residual is residual sum of squares
6
Figures 3. and 4. Showcase the performance of the frameworks in classification and
regression tasks. Note that the dataset ids are in the order of increasing dataset size.
7
Figures 7. And 8. Prediction time performances. H2O performs the worst considering
prediction time and TPOT gives fastest predictions.
8
Data ID Auto-Sklearn H2O TPOT
Time to make predictions
1464 0.1090 0.0092 0.0058
40701 0.1400 0.8153 0.0114
1046 0.3087 0.4126 0.0280
1461 0.0145 0.8221 0.0154
196 0.0171 0.2113 0.0045
308 0.0788 0.6163 0.0021
537 0.1463 0.6187 0.0072
344 0.3751 0.4140 0.0038
4. Results:
POT generally performs better given a long training time, the default parameters
Note: T
of TPOT take approximately an hour to run, a time limit of 2 mins was set for reducing
compute time.
4.1 Classification:
In classification tasks H2O outperforms the other two frameworks by a significant
amount. Looking at Figure 3. We can see that Auto-Sklearn and TPOT perform poorly
when dataset size is either small(~500 instances) or large(~40k instances).
9
4.2 Regression:
In regression tasks, TPOT and H2O perform the best, with TPOT slightly better on
average. According to figure 4. Auto-Sklearn on all datasets performs on par with the
other two, but on the dataset with ID 537 performs poorly, no conclusive reason for this
behaviour.
5. Overall Conclusion:
Based on the collected data, H2O performs the best on classification datasets, and TPOT
performs the best on regression datasets. TPOT gives the fastest predictions, H2O is the
slowest to generate predictions. To give an overall rating of frameworks, we will take a
weighted sum of performances along different lines.
Considering Regression score, Classification score, Time for prediction, and Ease of use
for the overall comparison. Weights distribution - 70% accuracy, 20% prediction time,
10% ease of use. Ease of use - H2O: 8/10, Auto-sklearn:7/10, TPOT: 7/10.
Note: this comparison is purely to get an overall rating, it is not purely quantitative or
rigorous.
Overall Rating (just for comparison) :
H2O - 7.8/10
Auto-Sklearn - 7.1/10
TPOT - 7.4/10
10
Dataset Information:
Classification Datasets-
Regression Datasets-
11
Part II: Review of Explainable AI frameworks
1. Introduction
In recent years with the introduction of machine learning algorithms and techniques into
the mainstream, most businesses are looking for ways to generate value from this new
paradigm of predictive modelling. Also, with computing power getting stronger
everyday the complexity of models that can be trained is
12