The Impact of Data Completeness and
Correctness on Explainable Machine Learning Models (pp218-231)
Shelernaz Azimi and
Claus Pahl
doi:
https://doi.org/10.26421/JDI3.2-2
Abstracts:
Many systems in the Edge Cloud, the
Internet-of-Things or Cyber-Physical Systems are built for
processing data, which is delivered from sensors and devices,
transported, processed and consumed locally by actuators. This,
given the regularly high volume of data, permits Artificial
Intelligence (AI) strategies like Machine Learning (ML) to be used
to generate the application and management functions needed. The
quality of both source data and machine learning model is here
unavoidably of high significance, yet has not been explored
sufficiently as an explicit connection of the ML model quality that
are created through ML procedures to the quality of data that the
model functions consume in their construction. Here, we investigated
the link between input data quality for ML function construction and
the quality of these functions in data-driven software systems
towards explainable model construction through an experimental
approach with
IoT
data using decision trees.We have 3 objectives in this research: 1.
Search for indicators that influence data quality such as
correctness and completeness and model construction factors on
accuracy, precision and recall. 2. Estimate the impact of variations
in model construction and data quality. 3. Identify change patterns
that can be attributed to specific input changes. This ultimately
aims to support {\em
explainable AI}, i.e., the better understanding of how ML models
work and what impacts on their quality.
Key words:
Explainable AI, AI Engineering,
Data Quality,
IoT
Systems, Machine Learning, Data Correctness, Data Completeness,
Decision Trees