Keywords

1 Introduction

Machine learning within business contexts (advanced business analytics [1, 2]) is a key driver of organizational competitive advantage [3, 4]. Despite the organizational and societal impact of machine learning [5], many challenges remain to both the effectiveness of machine learning and its widespread adoption. These include: interpreting the models used; executing machine learning algorithms effectively; and integrating machine learning into organizational processes. As a result of these challenges, organizations continue to struggle to implement successful machine learning solutions [6]. The objective of this research is to analyze how challenges specific to machine learning can be addressed by incorporating conceptual modeling principles in the machine-learning toolbox. Specifically, we show how many challenges related to machine learning could be potentially addressed with the help of conceptual modeling techniques that have been used for decades to support information systems development and database design.

Conceptual models formally describe “some aspects of the physical and social world around us for the purposes of understanding and communication” [7]. Here, we expand the usefulness of conceptual modeling from the domain of information systems development to the task of improving machine learning processes and outcomes. The contribution of this research is to show that conceptual modeling can improve in the application of machine learning algorithms. We illustrate our results by applying conceptual modeling to a management system for psychotropic drug monitoring. Section 2 reviews relevant machine learning and conceptual modeling research. Section 3 shows how characteristics of conceptual modeling can be applied to a machine learning process. Section 4 summarizes the paper and proposes future research directions.

2 Background: Machine Learning and Conceptual Modeling

2.1 Machine Learning

Machine learning (ML) is considered to have the potential to transform organizations and society [8, 9]. According to Gartner’s 2016 Hype Cycle for Emerging Technologies, machine learning is one of “three key trends that organizations must track to gain competitive advantage” [3]. It has passed the early proof-of-concept phase and is now at the “Peak of Inflated Expectations.” That is, it is now a popular mature technology, complete with major successes and failures [3].

Traditionally, machine learning proceeds without support from external knowledge sources (e.g., domain models) and relies heavily on the training data and learning algorithms. Recent research is exploring ways to encode additional semantics so that (some) rules do not have to be learned from training example [8, 10, 11]. For example, a model may reference a domain ontology and incorporate rules such as “all birds are animals.” Then, a model does not need to learn the concept of “animal.” Instead, through the domain ontology, it can automatically infer that a new instance labeled as a bird is also an animal. This work is narrowly focused on infusing the learning algorithms with additional knowledge, without requiring assistance from users.

Given the focus of machine learning on data and algorithms, several issues are evident. First, the quality of data inputs is critical to the performance of machine learning techniques [12]. Second, given the complexity of models such as neural networks and deep learning, the ability to explain the decisions or predictions made by ML techniques needs to be improved for such models to be adopted in many settings [13]. Finally, machine learning models should also provide guidelines for developers regarding appropriate deployment (e.g., the populations, cases, or processes to which the results of the model generalize).

2.2 Conceptual Models

Traditionally, conceptual modeling is a major phase of information systems development that formally captures user requirements to support the development of information systems. Research has led to the development of different kinds of conceptual models, including data models, process models, models of business activity and goals, and models of enterprise and systems architecture [14, 16,17,18,19,20,21,22,23,24,25,26,27].

Conceptual models are typically diagrams that contain graphics and text and are widely used in IS development to facilitate communication, improve domain understanding among stakeholders, and guide IS development activities such as database design, user interface design, and programming [22,23,24,25,26,27,28]. Conceptual models have played an especially important role in database design, where grammars such as the Entity-Relationship model have become a de facto standard way of formally representing the structure of data to be stored and are widely used to derive database tables, fields and relationships.

Researchers have investigated the benefits of conceptual models for information systems development. These can be summarized through the three basic needs that conceptual models address [29]:

  1. 1.

    Need to cope with complexity. Conceptual models reduce complexity of information systems development by focusing on the relevant aspects, structuring and organizing the requirements.

  2. 2.

    Need for shared understanding. Information systems development typically involves many people with different backgrounds, beliefs, expertise and training. Such diversity creates potential for conflicts, and if left unresolved, may lead to project failures. Conceptual models are designed with the general objective of being boundary objects [30].

  3. 3.

    Need to solve problems. Information technologies are typically created to address some organizational or societal need, although it is often unclear how to best design a system with the many options that typically exists. Conceptual modeling supports analysis of a domain and supports specific design solutions.

These three needs, although studied in the context of information systems development, are also present in the context of machine learning. Therefore, the conceptual modeling techniques used in systems development might also be useful in resolving challenges related to the development, adoption, and effectiveness of ML techniques by organizations.

However, research at the intersection of conceptual modeling and ML remains rare and lacks an overarching agenda. Machine learning is absent from the authoritative conceptual modeling agenda set by Wand and Weber [28] and has not been part of active discourse in conceptual modeling [31, 32].

In addition to the paucity of efforts to apply conceptual modeling to machine learning, existing efforts appear narrowly focused on some specific issue (e.g., modeling sentiment or supporting business simulations). A general assessment of the potential of conceptual modeling to support machine learning is missing, as is an agenda that can support future research. We propose an approach intended to stimulate broad efforts in the research community to find synergies between the two fields.

3 Using Conceptual Modeling to Support Machine Learning

Conceptual modeling can be used as an effective and standard activity in machine learning to support activities in various phases of the machine learning process. Specifically, conceptual modeling can be useful to identify and specify problems, improve understanding of the data that is used for training, increase the predictive accuracy and performance of machine learning algorithms, and model the process change needed to introduce machine learning into organizational processes.

We demonstrate the application of conceptual modeling and ML in a real case, which uses ML within the context of a US foster care system tasked with placing and monitoring children in foster families. The evidence is based on our direct observations and experience with developing ML solutions for several agencies that are part of the foster care system. A major challenge for the foster care system is the ever-increasing number of children entering foster care. For processes such as monitoring medication intake or criminal activities by foster children, the load increase can be detrimental, which may put a child’s mental and physical health at serious risk. The promise of ML algorithms is to enable automated and rapid detection of potential issues with a child in a foster family (e.g., medication overdose, other physical and mental problems), based on past records written by case workers.

We illustrate the potential of conceptual modeling to benefit each stage of the Cross-Industry Standard Process for Data Mining (CRISP-DM) (i.e., Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation, and Deployment) [33]. CRISP-DM was originally developed for data mining, but became widely adopted for ML as well [34, 35].

Business Understanding.

The first phase of the CRISP-DM process seeks to understand the project objectives and requirements from a business perspective [33]. Business understanding is also a phase of information systems development. Traditionally, conceptual models have been used to support this phase. Effective ML is impossible without first carefully examining and understanding the business objectives for a particular ML project, and the specific goals the project seeks to achieve. Acquiring this information is akin to eliciting information systems requirements, a phase of systems development that has benefited historically from the use of conceptual models. Common notations to support this phase could be process and goal-oriented conceptual modeling grammars and methods, such as i*, BPMN, UML, or BIM (Business intelligence model).

Conceptual models can represent specific objectives and goals for an ML project. Monu and Woo [36] demonstrate the value of goal-oriented actor-based modeling to model goals, objectives and resources of intelligent systems. Finally, conceptual models can assist managers and other parties in comprehending the scope of ML interventions (i.e., identify the organizational processes affected by ML).

Returning to the case management in foster care example, new mandates required monitoring children who were taking, and may potentially be overprescribed, psychotropic medication (drugs used to treat psychiatric conditions). This adds a new task to the workload of caseworkers, namely, analyzing random samples of existing home-visit notes (notes that document the interaction between the caseworker and the child at their home) to identity cases of children who might be taking these medications. There can be hundreds of notes written for each child, making it difficult to identify the cases of interest. We turn to modeling (using BIM) to document the new objectives set for case workers (see Fig. 1). Based on this example and emerging research [e.g., 37], we propose the following for conceptual modeling research:

Fig. 1.
figure 1

BIM diagram fragment showing case management goals

Direction 1: Investigate the application of conceptual modeling techniques to increase business understanding in the context of machine learning projects.

Data Understanding.

The data understanding phase of CRISP-DM starts with the acquisition of data suitable for the business problem identified in Phase 1. Modelers then review the data, consider any data quality issues, and prepare it to be imported into the ML software. Conceptual modeling is especially well suited for modeling and understanding data. Common notations to support this phase include popular conceptual modeling grammars (e.g., ER, UML).

Modeling can also be used to document data provenance [38] and to help ascertain data quality. Conceptual models can suggest whether there is enough data for learning (i.e., show the scope of the domain and the attributes available for ML), identify where to obtain additional data (e.g., by showing connections between entities), and show how to best use the data in ML (e.g., what attributes to impute). For case management, without a conceptual model we may assume that home-visit notes are written in such a manner that they represent each of the children in a home. However, from the conceptual model in Fig. 2, one home can have multiple children and there might be cases where, under the same household, one child is taking psychotropic medication and another is not. If we assign labels to every given note (for the training set), how do we handle these cases? This may suggest the need to obtain data at the child level, rather than the home level, in order to acquire a more complete data set that can produce reliable models.

Fig. 2.
figure 2

Fragment of a (color-coded) conceptual model showing home-child relationship (Color figure online)

Consider another scenario, in which a particular data source could have missing values for an attribute ‘dosage’ of medication. Many ML approaches suggest imputing values for missing data before building a model [39]. However, a conceptual model of the data might indicate that ‘dosage’ is an optional property of medication, as shown in Fig. 3. In this case, some (or all) missing values in the data might not be applicable to the instances for which it is missing, instead of missing from the data source. This can indicate the need for subclasses of the phenomena of interest –in this case, medication. Moreover, it might be necessary to build different models for medications that have strict physician-prescribed dosage (e.g., prescription drugs) and medications that do not (e.g., nutrition supplements, where dosage is recommended, but not physician-prescribed), rather than impute missing values for some cases where the attribute does not apply.

Fig. 3.
figure 3

Sample conceptual model with optional property shown as unfilled circle

Following our case management case, if we find references to dosage information within the home-visit notes, this can be an indication that the child is taking a prescribed medication (potentially a psychotropic medication). We propose:

Direction 2: Investigate the application of conceptual modeling techniques to increase understanding of available training data for machine learning tasks and support data preparation activities for machine learning tasks.

Data Preparation.

The data preparation phase of CRISP-DM involves all activities (e.g., data transformations) required to construct the final dataset to be used for the learning algorithm. This phase involves multiple transformations of the original data source by performing extract-transform-load (ETL) procedures. Prior research in conceptual modeling has demonstrated the use of process models (e.g., BPMN, EPCs) to document the ETL process [40, 41]. The transformation process within ML software is analogous to the ETL processes used in the context of data warehousing.

Furthermore, conceptual data modeling grammars can be extended to better reflect the needs of ML. For example, grammars could allow color-coding of attributes included in the ML process as inputs, using one color to indicate a target attribute (e.g., in Fig. 2 the target variable is green) and a different color for attributes that cannot be used in a predictive model due to compliance to regulations (e.g., gender or race). In this way, conceptual modeling can graphically communicate the aspects of the business on which the ML process is based. This, in turn, may contribute to better understanding of the results and compliance with data protection regulations. We propose the following direction for conceptual modeling research:

Direction 3: Investigate the application of conceptual modeling techniques to support attribute selection, transformation, cleaning and other activities involved in preparing training and validation data for machine learning algorithms.

Modeling.

The dataset that emerges from the transformation procedures of the previous stage is supplied to ML algorithms for training, learning, and validation.

Conceptual models can be used to support the selection of learning algorithms and the modeling process. For example, if knowledge from a certain part of the domain is important, a modeler may choose to select inputs manually based on known domain semantics (as opposed to automatically, using dimension reduction or attribute selection algorithms). This will ensure the learning algorithm considers these inputs.

The decision on input selection can be driven by domain knowledge. The use of conceptual modeling could also improve algorithm performance. This can be accomplished by hard-coding some relationships within the data inputs. To illustrate, in the conceptual model fragment of Fig. 2, a home can accommodate multiple children. Unless this is explicitly encoded in the ML algorithms, models might potentially differ in performance, especially when some attributes of the home-visit notes are used to predict attributes of the child (e.g., has signs of abuse or neglect, lack of focus). For example, for cases where the same home has multiple children there could be a conflicting signal from the home’s attributes mapped to the children, whereas these conflicts would not exist in one-to-one cases. Understanding the conceptual model may help address these conflicts. When analyzing the home-visit notes, if there is a home with two children (one child requires psychotropic medication and the other does not), the home-visit may: (a) focus on aspects common to both children; (b) focus on the child taking psychotropic medication and reporting this in the home-visit note; or (c) focus on the child not taking psychotropic medication and not having anything in the case notes that reflects medication intake. In each of these cases, there would be the same entry associated to two instances with two different outcomes, degrading the performance of ML classifiers. Accordingly, we propose the following research direction:

Direction 4: Investigate the application of conceptual modeling techniques to enhance effectiveness of machine learning algorithms.

Evaluation.

This step of the CRISP-DM method assesses the degree to which the model learned from Phase 4 meets the original business objectives. This phase also involves interpreting the results of the model by converting the structures generated by the learning algorithms into language accessible to business users.

Here, conceptual models can help by highlighting which inputs were used to train the model, relative to all the attributes that exist in the domain. An especially valuable addition that conceptual models can bring to the evaluation phase of ML is to improve the understandability of complex ML models. One prominent concern regarding the use of ML is the lack of transparency of complex models (e.g., neural networks, random forests, support vector machines). This “black box” property of many models constitutes a major barrier for wider adoption of ML, especially in critical or sensitive applications [42].

There is a growing interest in increasing the transparency and interpretability of complex models. Much of this research, however, is effective only with the variables used by a given algorithm. For example, perhaps the most common approach to increasing the interpretability of ML is by showing the relative importance of a feature with respect to the target variable [42]. Another approach to obtaining the same result is to build a tool that allows users to interact with individual components within a model (e.g., individual neurons within a neural network), and then measure how highlighting a particular component affects the target variable [42]. However, existing approaches, only consider variables that are already part of the model. Another limitation is that these approaches do not typically present the variables as belonging to a particular conceptual structure (e.g., influence is typically shown as ordered lists of variables).

As mentioned above, a principal role of conceptual models is to simplify, abstract and conceptualize a domain. Furthermore, conceptual modeling practice and research have long dealt with the issue of transparency. For example, research on conceptual modeling has been developing and evaluating methods and design approaches to improve the comprehensibility and understandability of models [32, 42,43,44]. Research on conceptual modeling has investigated these issues within the context of both dynamic models (e.g., process models) and static models (e.g., data models) [15].

Applying conceptual models to the problem of interpreting results may offer additional benefits because conceptual models can depict both variables included in the model, as well as those discarded due to lack of predictive power or those manually excluded for ethical or other reasons [42], providing a more comprehensive view of which aspects of the domain the model affects. Another possibility is to combine existing techniques to show the weights of variables with conceptual models. Then, the variables can be grouped into entity types, as opposed to presenting them as a list sorted by their contribution size. The advantage is in their ability to show relationships among the variables, and their groupings.

Some research has used conceptual models (e.g., actor-based, goal-oriented, BIM) to improve the transparency of intelligent agents and text-mining processes, with calls for more research in this area [36, 45]. However, no theory, framework or set of principles has been proposed. Further research is needed to apply conceptual modeling to the problem of ML transparency. This could include studies that investigate specific applications, as well as research that develops general principles and approaches for using conceptual modeling to improve the transparency and interpretability of ML results. We propose the following research direction:

Direction 5: Investigate the application of conceptual modeling techniques to increase transparency, comprehensibility and understandability of the outputs of machine learning algorithms.

Deployment.

Creation of the model is generally not the end of an ML project. If the model addresses business objectives, its performance is sufficient. If the inner workings of the model are sufficiently understood, the model is introduced as an intervention into a real-world process. For example, a neural network can serve as a decision support tool to prioritize cases of interest.

Analogous to Phase 1, conceptual models may be used to document the objectives and goals behind an ML intervention. Once ML deployment occurs, business users can refer back to the original goals and objectives captured in the conceptual models from Phase 1 to assess the compliance of this intervention with the original requirements and goals. Furthermore, because deployment of ML in an organization typically results in changing an existing business process, process models and enterprise models can be used to document this change and communicate to stakeholders which part(s) of the enterprise the process affects (see Fig. 4).

Fig. 4.
figure 4

As-Is and To-Be BPMN Model for psychotropic drug monitoring

The ML classifier serves as a decision support tool for the caseworker. As the BPMN diagram in Fig. 4 shows, the ML process is relatively similar to the traditional way of doing things. Models such as the one in Fig. 4 allow case managers to draw conclusions about the impact of the new process on human and material resources in the US foster care system and may suggest strategies for the deployment of ML within the US foster care system.

Direction 6: Investigate application of conceptual modeling techniques to support deployment of machine learning in organizations, and to document and support process changes result of the introduction of machine learning in organizational processes.

4 Conclusion

We propose that conceptual modeling can be used as a standard activity in machine learning applications. After identifying machine learning challenges, we employed a popular cross-industry standard process, CRISP-DM, to highlight the potential ways in which conceptual modeling can make this process more effective by proposing specific research directions for future conceptual modeling research. To illustrate, the results were applied to a real case of psychotropic drug monitoring within the US federal foster care system. Machine learning promises to help cope with a severe shortage of caseworkers and might even save lives. The results of using conceptual modeling demonstrate the broad potential of conceptual modeling to advance ML by: (1) supporting the process change associated with introducing ML within organizations, (2) improving the usability of ML as a decision tool (e.g., by making the models and results more transparent); and (3) improving the performance of ML algorithms (e.g., by imbuing them with domain knowledge).

Future research is needed to demonstrate empirically the benefits of combining conceptual modeling and machine learning. Understanding the intersection between conceptual modeling and ML may help extend the development of ML algorithms and best practices for using ML. For example, it might be possible to directly encode the cardinalities of relationships between entities into ML algorithms. In this manner, the learner becomes aware of the entities to which the inputs belong, as well as how the entities (and therefore the inputs) are related to one another. Future research should use conceptual modeling to achieve transparency of ML models. Furthermore, although this research has proposed and illustrated the potential of conceptual modeling for each of the processes of CRISP-DM, each of these processes needs to be examined in detail and empirically assessed.

Finally, ML can add a new impetus to a recent push by conceptual modeling researchers to expand the scope of conceptual modeling (or, issues related to domain representations) beyond the traditional IS development context [e.g., 29, 31, 46, 47]. The application of conceptual modeling within the ML context can further expand the scope of conceptual modeling research and practice, foster new interdisciplinary connections and demonstrate the continued importance and value of conceptual modeling research.