Enabling Automated Machine Learning For Model-Driven AI Engineering
Enabling Automated Machine Learning For Model-Driven AI Engineering
Enabling Automated Machine Learning For Model-Driven AI Engineering
Abstract—Developing smart software services requires both Software Engineering and Artificial
Intelligence (AI) skills. AI practitioners, such as data scientists often focus on the AI side, for
example, creating and training Machine Learning (ML) models given a specific use case and
data. They are typically not concerned with the entire software development life-cycle,
architectural decisions for the system and performance issues beyond the predictive ML models
(e.g., regarding the security, privacy, throughput, scalability, availability, as well as ethical, legal
and regulatory compliance). In this manuscript, we propose a novel approach to enable
Model-Driven Software Engineering and Model-Driven AI Engineering. In particular, we support
Automated ML, thus assisting software engineers without deep AI knowledge in developing
AI-intensive systems by choosing the most appropriate ML model, algorithm and techniques
with suitable hyper-parameters for the task at hand. To validate our work, we carry out a case
study in the smart energy domain.
B UILDING SMART SOFTWARE SYSTEMS , is similar to software engineering in the sense that
relies on the deployment of AI methods and both are at the border of engineering and art. For
techniques in software and systems engineering. instance, by observing the workflows in the daily
In particular, a sub-discipline of AI, namely Ma- routine of a data scientist, one does not perceive
chine Learning (ML) is currently the cutting edge. that they continuously follow a prescribed process
However, the state of practice in ML engineering with clearly defined methods and principles. Al-
though there exist a number of heuristics and best or architecture (e.g., decision trees), as well as
practices based on the current state of research, the ML model hyper-parameters (e.g., learning
for example, regarding how to initialize param- algorithm) and other possible configurations may
eters, we still see that many practitioners have be performed in a fully automated fashion. This
no choice but conducting trial-and-error [1] and is the main innovation of this work compared to
following their intuitions. This is not only time- the state of the art, namely ML-Quadrat [3].
consuming, but also makes the performance and In the following, we first propose our novel
efficiency of their ML models highly dependent approach. Then, we show the experimental results
on their specific choices, which for many of them of our case study for the validation. Finally, we
no explanation can be provided. Moreover, they conclude.
tend to either create over-complex ML models
that might perform well, but might not be efficient AutoML
enough (e.g., since their training is costly), or AutoML aims to automate the practice of ML
create ML models that are under-performing. Yet, for real-world applications. It offers end-to-end
the real challenge is when developers themselves solutions to non-experts (and experts) in the field
need to use ML methods and techniques, or of ML in a more efficient manner, compared to
maintain the ML components of the system. This the manual approach. The automation might be
is a common scenario, due to the shortage of data applied to any part of the ML pipeline: from data
scientists around the globe. pre-processing to hyper-parameter optimization
To enable software engineers without exten- and model evaluation. Most existing AutoML
sive knowledge in ML to carry out the de- solutions concentrate on one particular part of the
velopment, maintenance and customization of ML pipeline [1]. In this work, we focus on ML
AI-intensive software systems, we propose a model family/architecture selection, as well as
novel approach that integrates Automated Ma- hyper-parameter optimization for the ML model.
chine Learning (AutoML) with automated source To this aim, we construct a tree-structured search
code and ML model generation. Our work has space by considering a number of ML methods
two pillars: domain-specific Model-Driven Soft- that have proven useful for the selected use case
ware Engineering (MDSE) and Model-Driven AI scenario of the case study that will follow. The set
Engineering (MDAE). We advocate the Model- of selected ML methods comprises (i) Decision
Driven Engineering (MDE) paradigm for soft- Trees (DT), (ii) Random Forests (RF), (iii) Gated
ware and AI engineering since it offers abstrac- Recurrent Units (GRU), (iv) Long Short-Term
tion and automation. Both are necessary in order Memories (LSTM), (v) Fully-Connected Neural
to handle the complexity and heterogeneity that Networks (FCNN), also known as Multi-Layer
are involved in modern AI-enabled software sys- Perceptron (MLP), (vi) Denoising Autoencoders
tems, in particular given their distributed nature (DAE), (vii) Factorial Hidden Markov Models
and the diversity of their hardware and software (FHMM), and (viii) Combinatorial Optimization
platforms, APIs and communication protocols. In (CO). Figure 1 illustrates the mentioned search
this manuscript, we concentrate on the domain of space with these ML methods and the possi-
the Internet of Things (IoT) since it can reflect ble hyper-parameters that must be automatically
the said challenges related to decentralization tuned for each of them should they get selected
and heterogeneity very well. We adopt the DSM by the AutoML engine.
methodology as proposed by Kelly and Tolvanen We consider the possible choices for setting
[2] and build on the prior work ML-Quadrat [3]. the hyper-parameters as listed below: (a) criterion
We enable software system models to become ∈ {MSE (Mean Squared Error), Friedman MSE,
capable of not only generating the code, test MAE (Mean Absolute Error)}; (b) min sample
cases, build/run scripts, and documentation, but split ∈ Uniform [2,200]; (c) n estimators (i.e.,
also the ML models. In addition, the gener- number of estimators) ∈ Uniform [5,100]; (d)
ated ML models can be automatically trained, optimizer ∈ {Adam, Nadam, RMSprop}; (e)
tested, deployed, re-trained and re-configured as learning rate ∈ {1e-2, 1e-3, 1e-4, 1e-5}; (f)
necessary. The choice of the ML model family loss function ∈ {MSE, MAE}; (g) n layers (i.e.,
2
Figure 1. The tree-structured search space for the proposed AutoML approach
number of layers) ∈ Uniform [5, 8]; (h) dropout the software developers who use the DSML and
probability ∈ Uniform [0.1, 0.6]; (i) sequence have ML requirements, but may not necessarily
length ∈ {64, 128, 256, 512, 1024}. possess sufficient ML knowledge and skills, we
To enable the automated selection of the best provide the above-mentioned AutoML prototype
ML model architecture/family and the most ap- as open-source software on Github1 . Currently, it
propriate hyper-parameters for the selected ML is tailored to the target use case domain of our
model given a specific use case and a dataset, we case study that is described below. Using this
deploy Bayesian Optimization (BO). There exist prototype that has also a web-based Graphical
various open-source libraries that offer BO. We User Interface (GUI), one can select the most
pick the most widely-used library, namely Hy- appropriate setup for ML and feed it into the ML-
peropt [4]. This library requires a Tree-structured Quadrat DSML. Last but not least, we implement
Parzen Estimator (TPE) model and provides dis- a number of constraints and exception handling
tributed, asynchronous hyper-parameter optimiza- mechanisms, according to the API documenta-
tion. Moreover, it supports search spaces with tions of the target ML libraries, namely Scikit-
different types of variables (continuous and dis- Learn and Keras/TensorFlow, in the model-to-
crete), varied search scales (e.g., uniform vs. log- code transformations (code generators) of ML-
scaling), as well as conditional variables that are Quadrat. If the AutoML mode is enabled, then
only meaningful for certain combinations of other we can enforce these constrains and allow them
variables [4]. to make necessary changes to the existing user-
specified parameters of the model instances. Oth-
AutoML-enabled MDSE & MDAE erwise, we only let them show warnings without
We augment MDSE with AutoML to support overriding any user-specified option. For instance,
software developers who might not be ML experts if standardization of numeric values must have
in creating smart software systems. We provide been done before training a certain ML model, but
them with a Domain-Specific Modeling Language this is missing, we warn them while generating
(DSML) that is based on the prior work ML- the implementation of the target IoT service out
Quadrat [3], which supported automated source of the software model. Only in the case that
code and ML model generation for smart IoT AutoML is ON, then we additionally add the
services. In ML-Quadrat, the practitioners them- proper standardization technique in the data pre-
selves had to specify the target ML model fam- processing phase of the ML pipeline.
ily/architecture, as well as the values of the cor-
responding hyper-parameters. However, to assist 1 https://github.com/ukritw/autonialm
3
Department Head
ML model architectures that were proposed in the of the state-of-the-art” Knowledge-Based Systems., Vol-
and requires much less amount of data and com- 2. Kelly, S., Tolvanen, J.P. (2008). Domain-Specific Mod-
putational, as well as energy resources, compared eling: Enabling Full Code Generation, 1st edn. Wiley,
benefits of the proposed AutoML-based approach, 3. Moin, A., Challenger, M., Badii, A., Günnemann, S.
even for the ML experts themselves, since they (2022). A model-driven approach to machine learning
could avoid trial-and-error practices for selecting and software modeling for the IoT. Softw Syst Model.
the ML model and tuning its hyper-parameters. 4. Bergstra, J., Yamins, D., Cox, D. D. (2013) Making a Sci-
In addition, another deep learning ML model, ence of Model Search: Hyperparameter Optimization in
namely GRU, which is proposed by this work, Hundreds of Dimensions for Vision Architectures. Proc.
of ICML.
outperforms the state of the art approaches. Fig-
5. Kolter, J. Z., Johnson, M. J. (2011). REDD: A public
ure 2 illustrates the experimental results of our
data set for energy disaggregation research. Proc. of
benchmarking study with the following NIALM
the SustKDD workshop on Data Mining Applications in
approaches: (i) CO [7], (ii) DAE [8], (iii) DT,
Sustainability.
(iv) Dictionary-based (our implementation of [9]),
6. Kelly, J., Knottenbelt, W. (2015). The UK-DALE dataset,
(v) DNN-HMM (our implementation of [10]),
domestic appliance-level electricity demand and whole-
(vi) FCNN, (vii) FHMM [11], (viii) GRU, (ix)
house demand from five UK homes. Sci Data 2, 150007.
LSTM [8]. Note that applying DT, FCNN and
Nature.
GRU to this problem is our initiative. Table 1
7. Hart, G. W. (1992). Nonintrusive appliance load monitor-
shows the average error (MAE) for the mentioned
ing. Proc. of the IEEE, vol. 80, no. 12, pp. 1870-1891.
approaches. Note that we used the data of 37 days
8. Kelly, J., Knottenbelt, W. (2015). Neural NILM: Deep Neu-
from house 1 of the REDD dataset [5] with a
ral Networks Applied to Energy Disaggregation. Proc. of
sampling rate of 0.05 Hz.
the ACM Int. Conf. on Embedded Systems for Energy-
4
Figure 2. The disaggregation accuracy using different ML methods. Color codes: dark blue, orange, gray,
yellow, and light blue represent the results for the fridge, lights, sockets, washer/dryer, and average for all
appliances, respectively.
10. Mauch, L., Yang, B. (2016). A novel DNN-HMM-based the University of Reading, UK. He has established a
approach for extracting single loads from aggregate track record of key contributions to over 40 projects.
power signals. Proc. of the IEEE ICASSP. Contact him at atta.badii@reading.ac.uk.
11. Reyes-Gomez, M. J., Raj, B., Ellis, D. R. W. (2003).
Multi-channel source separation by factorial HMMs. Proc.
Stephan Günnemann is a professor at the depart-
ment of Informatics of the TUM in Germany and
of the IEEE ICASSP.
director of the Munich Data Science Institute. His
research focuses on making ML robust and reliable.
Armin Moin is a doctoral researcher at the depart- Contact him at guennemann@in.tum.de.
ment of Informatics of the Technical University of
Munich (TUM) in Germany. His research is at the Notice
intersection of ML and MDSE for smart IoT services.
This work has been submitted to the IEEE
Contact him at moin@in.tum.de.
for possible publication. Copyright may be trans-
Ukrit Wattanavaekin is an alumnus of the Computer
ferred without notice, after which this version
Science Master’s program of the department of In- may no longer be accessible.
formatics of the TUM in Germany. Contact him at
ukritwk@gmail.com.