Introduction

Surgical competence requires mastery of a wide range of technical and non-technical skills. Technical skills include manual dexterity, use of instruments, and knowledge of anatomy. Non-technical skills include decision making, situation awareness, communication and teamwork, and leadership (Flin et al., 2007). Surgical skill training covering all these aspects has been traditionally conducted via the Halstedian apprenticeship approach (Wanzel et al., 2002), whereby a surgical trainee is engaged in performing surgical tasks under the supervision of a surgical expert. This training model requires considerable time to ensure that each surgical trainee sees all possible indicated cases and gains the extensive experience required to prepare them for performing unsupervised surgery (Tsuda et al., 2009). But several factors, including patient safety concerns, shortened training programs, reduced working hours for residents, limitations on available operating room time (Palter & Grantcharov, 2010), and increasing introduction of new procedures have strained this model, resulting in a decrease in learning opportunities (Reznek et al., 2002). While efforts have been undertaken to provide some relief from these pressures, interest has simultaneously increased in utilizing virtual reality (VR) simulations for surgical training. VR-based simulation plays a significant role in surgical skill training because it has the potential to provide a student with increased training time outside the operating room, objective assessment of procedures, and formative feedback without the need for the close direct supervision of experts (Tsuda et al., 2009).

While VR simulation has already had a significant impact on training of technical skills, there is still a lack of work on the use of VR simulation to teach surgical decision making (Johnston et al., 2016). This is particularly noteworthy given the recognized importance of decision making in surgical outcomes (Pugh et al., 2010), as articulated in the well-known quote of Spencer: “75% of important events in the operating room are related to decision making” (1978). Our work seeks to fill this gap by developing a high-fidelity surgical simulation integrated with a conversational intelligent tutoring system for teaching surgical decision making, which we call SDMentor (Surgical Decision-making Mentor). In this paper, we focus on the intelligent tutoring system (ITS) component of the system. We demonstrate and evaluate the approach in the domain of endodontic surgery. The design of the system and the teaching approaches are motivated by information gained from an observational study and from interviews with dental instructors. While the system follows the standard architecture of an ITS, in the current implementation less emphasis is placed on the student model and more on the representation of the surgical procedure and on the pedagogical model.

Surgical decision making requires an understanding of causal relations in the surgical domain, which are a function of how actions are performed, the effects of previous actions, and the condition of the patient. This, in turn, requires continuous situation awareness to assess patient state and the effects of previous actions. Capturing these factors for use in an ITS necessitates use of a representation, that captures actions with parameters and conditional effects, changes in the state of the patient, and the situation awareness process of interpreting observable parameters. The pedagogical model must make use of this representation to respond to student errors and to explore the depth of the student’s knowledge. The main contributions of this work are (1) identifying effective strategies for teaching surgical decision making, (2) providing a novel knowledge representation formalism to capture critical aspects of surgical domain knowledge used in teaching, and (3) developing pedagogical algorithms that use the domain knowledge representation to generate tutorial dialog. As the final step of the current study, we evaluated the quality of the tutorial content by having three expert dental instructors rate the quality of tutorial interventions from human instructors and from SDMentor. To determine whether the results might be biased by knowledge of the source of the interventions, we additionally carried out a type of Turing test in which the three experts were asked to guess which interventions came from SDMentor.

Related Work

Simulations for Surgical Training

Surgical procedures consist of three basic stages: preoperative, intraoperative, and postoperative. In the preoperative stage, a surgeon selects a procedure and creates a surgical treatment plan with the necessary equipment and materials. This surgical plan is a high-level plan, which is transformed into low-level actions in the intraoperative stage when the relevant perceptual and other information is available. Decisions made in the intraoperative stage relate to carrying out the procedure to achieve the surgical plan and dealing with unexpected situations. Finally, in the postoperative stage, the surgeon evaluates the outcome of the operation and formulates a follow-up plan.

Intraoperative decision making is a complex and dynamic process. The patient’s status and the outcome of the previously executed actions must be constantly and iteratively evaluated to adjust the operating plan before choosing an action to perform (Cristancho et al., 2016). After acting, the outcome of the executed action becomes a new situation for a surgeon to carry out the next operative task of the planned procedure. In this process, situation awareness plays a crucial role. Situation awareness is “the perception of the elements in the environment within a volume of time and space, the comprehension of their meaning, and the projection of their status in the near future” (Endsley, 1995). Perception involves gathering relevant information from the environment at a specific point in time. Comprehension is the ability to interpret and understand the perceived information in a situation, relative to the goal. Projection involves envisioning the future status and understanding the associated consequences. A lack of sufficient situation awareness can lead to mistakes in surgical judgment (Andersen, 2012).

By far, the majority of surgical training simulators have focused on training of technical skills (Badash et al., 2016). Many provide some degree of formative feedback concerning outcome such as quality of result and damage to adjacent tissue but do not include an intelligent tutoring component. An exception is the TELEOS system (Luengo et al., 2011; Toussaint et al., 2015), a simulation-based ITS to promote the learning of percutaneous orthopedic surgery. The focus is on perceptual-gestural knowledge. The system produces a trail of the student’s problem-solving activity and uses Bayesian networks to diagnose the activity. A didactic decision agent uses the diagnosis to generate feedback to the student.

Existing simulations for teaching surgical decision-making follow two basic approaches. The first approach is to ask questions about the appropriate action for a given situation. Rennie et al. (Rennie et al., 2009) provide short cases to train preoperative decision making based on branching stories. Servais et al. (Servais et al., 2006) teach intraoperative decision making by asking students to answer questions regarding the action to be taken. LapSkill (Sarker et al., 2009) provides video clips of situations from laparoscopic surgery and asks the student to answer multiple-choice questions related to the situation shown. SICKO (Lin et al., 2015; Tsui & Edtech, 2014), a non-immersive game, has an intraoperative mode that enables a player to answer a multiple-choice question for a given situation described in text form.

The second approach is to train students by allowing them to carry out a procedure in simulation. SimPraxis (Tran et al., 2013) is a cognitive trainer that allows a student to carry out an operation under the supervision of a virtual mentor. At each step, the student selects an actor (either surgeon or assistant), chooses an instrument to use, and places the instrument in the appropriate anatomical location. If the selections are correct, the user is given positive feedback in the portion of the video clip of that step and moves to the next step. Desitra (Vannaprathip et al., 2016) teaches decision making in the domain of root canal treatment using interactive graphics on a tablet. The student must identify an action to perform and the details of how to execute the action. The system’s pedagogical module intervenes with minimal guidance to keep the student on a productive learning path. Touch Surgery (Sugand et al., 2016) is a cognitive rehearsal tool for surgical procedures. It provides a step-by-step manual such that a student makes the operative decisions by following the predefined steps. When an action is executed, the system plays the virtual reality motion clip relevant to the performed action. While conversational interactions between the student and the instructor are both normal and integral to developing the ability to make the correct surgical decisions (Hill et al., 2017), such interactions are not a feature of the existing simulations. Consequently, there is no explicit focus on teaching the processes underlying decision-making, including situation awareness.

Knowledge Representations

To reason about effects of actions correctly and incorrectly executed under particular conditions, we look for a knowledge representation capable of representing actions with conditional effects, as well as situation awareness components. Computer Interpretable Guidelines (CIG) and Surgical Process Models (SPM) are the two widely used approaches to representing surgical procedures. Computer Interpretable Guidelines represent medical knowledge to be shared across medical institutions for the purpose of standardization of clinical practice (De Clercq et al., 2004). Among the many CIGs that have been developed, Asbru (Miksch et al., 1997) represents medical knowledge as a skeletal plan. It specifies effects in a formula for measurable parameters and the overall effect for non-measurable parameters. Since CIGs are designed for representing and communicating how a procedure should be properly carried out, they do not have facility to represent effects of incorrectly performed actions, which is essential for teaching purposes.

Surgical Process Models represent operational activities related to surgical resources, e.g., time and operators (Lalys & Jannin, 2014; MacKenzie et al., 2001; Neumuth, 2012, 2017). They are intended as a computer-based representation to assist in surgery. SPMs focus on representing knowledge relevant to the surgical context but not elements of situation awareness. SPMs typically represent the hierarchical tasks in a surgical procedure, from the high-level actions to the low-level motor movements. Like CIGs, SPMs focus on the correct activities in a procedure and do not include effects of incorrectly chosen or executed actions.

Knowledge representation formalisms commonly used in ITSs include rule-based models and constraint-based models. Rule-based models (Brachman & Levesque, 2004; Nkambou, 2010), represent a domain in terms of rules and facts. While they can represent causal relations, they do not provide for representation of the structure and components of actions. Constraint-based models (Mitrovic, 2010, 1998, 2005) are based on the idea that correct solutions share common features. They are able to represent a space of possible solutions in terms of constraints. In our work, there is only one correct sequence of actions for a procedure.

AI planning languages have been used for knowledge representation in a few ITSs. Annie (Thomas et al., 2010) applied a STRIPS-like language to generate possible directed graph solutions from a given state of a game world to correct misconceptions and to obtain requisite knowledge. The STRIPS-like language is a simple representation of actions, which does not include conditional effects. Roman Tutor (Belghith et al., 2012) is an intelligent simulator for training astronauts to operate an articulated robot arm mounted on the international space station via a robotic working station. The camera planner inside the simulator applies the planning domain definition language (PDDL) for providing a demonstration of the camera shots. The work of Charles et al. (Charles et al., 2013) uses PDDL to instantiate interactive narratives from patient education documents. The work focuses on high-level actions and does not cover how the actions are performed.

Teaching surgical decision making has similarities to criticizing a plan since both involve providing feedback on choice of action. ATTENDING (Miller, 1983) critiques a preoperative plan for anesthetic management. The critique is created by concatenating predefined strings from an Augmented Transition Network (Woods, 1970). The structure of a surgical procedure and its components are not explicitly represented. Trauma-TIQ (Gertner, 1997) is a real-time critiquing system that examines a physician’s plan. The knowledge required to critique is based on a purely rule-based approach from the Trauma-AID planner. As with ATTENDING, different types of plan components are not explicitly represented.

A number of representations have been used in ITSs to generate conversational dialogue. ANDES (Gertner et al., 1998) uses Bayesian networks and partially observable Markov decision processes in a troubleshooting spoken dialog system (Williams, 2007). While use of probabilities can be important in diagnostic problems, obtaining the probabilities can be difficult, particularly in domains like surgery where data is not readily available. In addition, while probabilistic reasoning is commonly used in medical diagnosis, it is less commonly used in surgery, particularly in the intraoperative stage. AutoTutor (Graesser et al., 1999; Susarla et al., 2003) utilizes a curriculum script that contains sentence dialogues organized into types of tutorial responses and teaching topics. These dialogues are predefined and managed by the Dialog Advancer Network (Person et al., 2000). Atlas (Freedman, 1999), an extension of ANDES (Gertner et al., 1998), conducts natural language dialogues to promote deep learning. It presents knowledge construction dialogs (Jordan et al., 2001) as pushdown automaton networks. Tutorial utterances and the student’s responses correspond to states and arcs, respectively (Graesser et al., 2001). Manually providing such tutorial utterances for a domain is a time-consuming process. In contrast, our work automates the process of generating tutorial utterances by using a natural plan-based representation of the surgical process.

Application Domain

In this paper, we focus on teaching surgical decision making for the preoperative and intraoperative stages of root canal treatment. The root canal treatment procedure is selected because it is one of the most challenging procedures in dental surgery and involves a number of complex decisions. Furthermore, the procedure is normally carried out by a single surgeon, and we wish to focus on teaching individual decision making before confronting the complexity of group decision making.

The root canal procedure treats diseases and injuries of the dental pulp and periapical tissues and is carried out to preserve the natural dentition and prevent tooth loss (Torabinejad & Walton, 2008). There are four main stages of root canal treatment (Faculty of Dentistry – Thammasat University, n.d.): (1) access preparation or opening the canals; (2) length determination; (3) mechanical instrumentation, irrigation, and trying main cone; and (4) obturation or filling the canals. In this study, we focus on the access preparation step, which is to open the access cavity of the tooth. This step is crucial to the procedure because the quality of access affects the success of the whole procedure. Access preparation has two prerequisite steps: the provision of local anesthesia and the application of a rubber dam. The need for local anesthesia depends on the vitality of the pulp. If the pulp is not vital, providing local anesthesia is optional. Rubber dam application is an aseptic technique that separates the sterile working tooth environment from the non-sterile environment. It also prevents small dental tools and substances from falling into the patient’s throat.

Observational Study

To develop an ITS for surgical training, it is important to understand how surgical instructors teach decision-making skills in the operating room, what teaching strategies are used, the rationale behind these strategies, and what knowledge the instructor brings to bear in teaching. To obtain this information, we carried out an observational study of teaching sessions conducted by experienced human tutors in a dental clinic and subsequently interviewed them.

After obtaining ethical approval from Mahidol University and Thammasat University, the student clinic in the Faculty of Dentistry at Thammasat University was chosen as the venue for conducting this observational study. Twenty-seven teaching sessions of three dental procedures were observed, including nine root canal treatment sessions, nine tooth extraction sessions, and nine surgical tooth extraction sessions. These procedures were selected because they have the commonality that they are procedures focusing on the treatment of injuries or diseases of the oral cavity, yet provide some variety so that we can generalize our observations beyond a single procedure. The participants included 27 dental students training in the teaching hospital and nine dental instructors (three instructors for each procedure) who are responsible for the supervision of their teaching sessions. We selected three instructors for each procedure to identify the strategies commonly used among them. We observed three teaching sessions for each instructor. Actions and discussions during the teaching sessions were recorded and transcribed. At the end of each teaching session, we interviewed each instructor about the rationale of the observed interventions as well as how s/he intervened in error situations. The results of each interview were combined with the transcript of the related teaching session.

Results

Conversational dialogue is the primary approach for developing decision making skills throughout the three operative stages. In the preoperative stage, a dental student diagnoses the assigned patient’s case and proposes a treatment plan to the dental instructor. The instructor asks a series of questions to verify the diagnosis result and the proposed operative plan. After this is completed satisfactorily, the student is permitted to carry out the procedural tasks on the patient. For each action, the instructor may intervene by asking questions or providing tutorial guidance to ensure that the student acts correctly and has an understanding of why it is necessary to perform the action.

We analyzed the transcripts to identify two essential components for the ITS: teaching strategies and knowledge representation requirements. First, the transcribed interventions for actions were analyzed to identify teaching strategies using a thematic analysis similar to the work of Hauge et al. (2001). The transcripts were separated into individual interventions. Repetitive teaching behaviors in the interventions were identified and then grouped into teaching strategies. The coding process was carried out by one researcher in consultation with one dental expert. Teaching strategies were verified via interviews with the observed instructors. To measure agreement among the instructors, we obtained 19 situations for which we obtained strategies from all the instructors. These were of four main types: (1) preoperative discussion of planning actions which could be either optional or mandatory, (2) intraoperative discussion after the action was performed, (3) question answering, and (4) dealing with an unexpected situation. We found good agreement among the nine observed instructors (Fleiss Kappa 0.80.) Because the specific content of the intervention for a given strategy varied from instructor to instructor, we asked another dental expert to make final decisions as a content template for each strategy included in SDMentor. The resulting teaching strategies are shown in Table 1 with their classifications in terms of Hague’s teaching behaviors and their relative frequencies of occurrence.

Table 1 Thirteen teaching strategies for teaching surgical decision making and their classification to Hauge’s teaching behaviors and percentage of occurrence from the observational study

The work of Hauge et al., ibid., identified four high-level strategies: informing, questioning, responding and setting tone. We found the same strategies in our domain as well but identified strategies at a more fine-grained level, which is necessary for design of the interaction of the ITS. In Table 1, teaching strategies 1–4 correspond to informing behaviors of Hauge et al., teaching strategies 5–8 correspond to responding behaviors, and teaching strategies 9–13 are refinements of Hauge’s questioning behaviors. The setting tone behavior is not included in the observational study because our work does not focus on affective aspects of teaching.

A total of 1292 teaching interactions from 27 observed teaching sessions were analyzed. Each teaching session ranged from 1 to 2 h. We found that 61% of the teaching interventions involved asking questions (Strategies 9–13), 22% were informing relevant information for the operation (Strategies 1–4), and 17% involved responding to the students (Strategies 5–8). The most commonly used strategies were confirming facts related to action details (Strategy 10.1, 11%), raising awareness of relevant facts (Strategy 11.1, 12%), and giving directions for the action to perform (Strategy 4, 11%). The directions were usually found after providing local anesthesia and/or when the operating time was running out.

We further analyzed the transcripts to identify the knowledge and concepts used in the dialogues between instructors and students, so as to inform design of the domain model. The transcribed utterances from the observational study revealed the information that surgical instructors convey to students. The dialogue included discussion of the necessity of an action/sub-plan, how to correctly perform actions, the correct and incorrect effects of the performed actions and how the current situation is related to these effects, and interpretation of perceived facts. Consequently, the knowledge representation to be used in the system must represent: (1) actions with parameters and conditional effects; (2) components of situation awareness; (3) the relationships between actions; (4) the necessity of an action in the surgical procedure and the desired outcomes of the action.

Intelligent Tutoring System

In this section, we describe the design of SDMentor for teaching decision making in the preoperative and intraoperative stages. SDMentor includes a virtual reality simulation that provides an immersive environment for carrying out three interoperative stages of the root canal procedure: providing local anesthesia, inserting a rubber dam, and access preparation. The simulator uses a head-mounted display and two haptic devices to control a dental drill and a dental mirror and to respond to questions posed by the system (Fig. 1). The system starts at the preoperative stage when the student is in the operating room and has a preoperative discussion with the tutor. It assumes that the patient’s diagnosis is correct and that the procedure is selected – access preparation and its prerequisites: providing local anesthesia and inserting a rubber dam. SDMentor asks a series of questions about the plan details to be performed. In the intraoperative stage, the student carries out the procedure while SDMentor monitors the actions. As a student carries out the procedure, kinematic data is logged and used to recognize actions and errors, which are transmitted to the ITS after the student finishes executing the action. The system evaluates the student’s actions and intervenes with generated tutorial feedback. As shown in Fig. 2, dialog with the system takes place through menus which provide multiple-choice questions, and the laser pointer for selecting answers is controlled by the left haptic device.

Fig. 1
figure 1

Screenshot of the VR dental simulator with full dental station. The mirror and drill are controlled by two haptic devices

Fig. 2
figure 2

Sample dialog interaction between SDMentor and the student. When the student correctly performs an action, e.g. inserting a rubber dam to the working tooth, SDMentor asked a question to confirm the objective of the performed action (Strategy 10.2). When the student correctly answers the question, the response dialog of positive feedback (Strategy 5) and a hypothetical situation (Strategy 12) are generated

SDMentor generates tutorial feedback for one error at a time. When the student commits an error, tutorial feedback is generated, and a dialog ensues so that the student understands why the chosen action was incorrect. After that the student must perform the action or make the decision again. The tutorial feedback could be in the form of hints, negative feedback with the undesired outcome, a question-answering process, and a confirmation of how to fix the error. There are two types of errors: incorrectly chosen actions and incorrectly performed actions. If both occur simultaneously, SDMentor considers incorrectly chosen actions to have higher priority than incorrectly performed actions. If an action is performed incorrectly in multiple ways, each error is pointed out to the student. This conforms to teaching in the operating room, where each planned action must be correctly executed, and the student cannot proceed to the next planned action until the current action is performed correctly.

The standard ITS framework was employed to develop the system: domain model, student model, pedagogical module, and user interface (Nkambou et al., 2010) (Fig. 3). The domain model stores surgical procedural knowledge and an initial situation. It is stored in the form of a graph, which maintains a representation of the correct course of action and is used to detect student errors and evaluate the student’s performance. The student model represents the action the student performed and its corresponding effects. The pedagogical module represents the teaching process. It evaluates the student’s action through domain knowledge and applies the student model to provide tutorial feedback. The plan projection engine inside the pedagogical module perceives the current situation or state, accepts a student action from the user interface, projects the action outcomes, and formulates a new state. This executed action and the newly formulated state are assessed to determine whether they satisfy the desired outcomes, and they are used by the strategy controller to select teaching strategies. The intervention generator uses the domain model graph to generate tutorial dialog for the selected teaching strategies.

Fig. 3
figure 3

Overall system architecture and internal components

Domain Model

The domain model represents three major aspects of surgical decision making: action descriptions, the elements of situation awareness, and the surgical procedure. Actions are described in terms of action names, action parameters, conditional effects, and expected outcomes. The situation awareness components include perception, comprehension, and projection, with the latter being represented as the effects of actions under given conditions. Surgical procedures are represented in terms of a surgical plan and its component actions. The surgical plan has a hierarchical structure, consisting of a series of sub-plans, with the lowest level being a series of single actions. Actions in a sub-plan are sequential and can be conditional. Each action is dependent on the current status of the patient and the effects of the previously executed action.

The PDDL planning representation provides sufficient expressive power to capture the important aspects of actions and situation awareness identified in the observational study. We apply PDDL4J (Pellier, n.d.; Pellier & Fiorino, 2018) as a parser for PDDL version 3.1 (Kovacs, 2011). Figure 4 shows a sample of the PDDL representation of the action (:action) “insert rubber dam” with conditional effects (:effect) and the situation awareness process of comprehending the meaning of perceived facts (:axiom). We represent the situation awareness process using axioms from PDDL 1.2 (Ghallab et al., 1998) to replace derived-predicates from PDDL 3.1 due to its constraints in representing negated propositions. In this example, inserting a rubber dam with clamp number 2 (CLAMP C2) for the premolar (TOOTH_TYPE PREMOLAR) results in the working tooth being separated (TOOTH_SEPARATED). Also, inserting a rubber dam on a strong tooth (TOOTH_STRONG) results in the clamp not being at risk of release (NOT(CLAMP_RISK_TO_RELEASE)). The aspect of situation awareness involving the comprehension of perceived elements in the environment is represented through domain rules, in which the antecedent represents the perceived facts (:context), and the consequent represents the comprehended facts (:implies). As shown in Fig. 4, the domain axiom represents the process of inferring that the tooth is strong by observing that the crown is visible.

Fig. 4
figure 4

Partial action description: domain rule (:axiom), action (:action) and conditional effects (:effect)

The temporal evolution of the surgical procedure is represented by a sequence of states, where each action maps a state into a state. An initial state, as shown in Fig. 5 represents initial patient information prior to execution of the procedure.

Fig. 5
figure 5

Partial initial situation with the patient’s diagnosis as pulpitis on premolar number 34 and the crown visible

To represent the surgical procedure, we adopt the plan representation of Bertoli et al. (2003), who augmented PDDL with a representation of plan structure. We make use of their representation of sequential structures and conditional structures. We add additional keywords to identify a sub-plan (Fig. 67, subplan), the necessity of a sub-plan (Fig. 6,: optional), desired outcomes (Fig. 7,: desired outcome) and the primary expectation of the performed action (Fig. 7,: main).

Fig. 6
figure 6

The representation of the first three steps of the root canal treatment procedure with a relevant condition for the operational plan

Fig. 7
figure 7

Partial plan of access opening (OC) with conditional action ‘stop_bleeding’

A procedure or plan has a sequence of sub-plans. A sub-plan can be optional under a given condition. For example, providing local anesthesia (Fig. 6, la_plan) is optional if the patient’s diagnosis is necrosis (DIAGNOSIS NECROSIS). A sub-plan is mandatory if the:optional keyword is not present. A sub-plan is composed of a sequence of actions (Fig. 7, oc_plan). An action that changes the state of the patient has a desired outcome (:desired_outcome) which is related to the surgical objective. One desired outcome refers to the primary objective of the performed action (:main), e.g., PULP_CHAMBER_FLOOR_VISIBLE of the action drill_to_open_access (Fig. 7). Moreover, an action can be conditional. For example (Fig. 7), when the pulp chamber floor is perforated (if (PULP_CHAMBER_FLOOR_PERFORATED)), a student is required to stop the bleeding (stop_bleeding) and fill the perforation (fill_perforated_pulp_floor) before continuing further.

The plan is annotated with control information to indicate to the pedagogical module where in the plan it is appropriate to ask questions of the student (:is_questionable). Actions are marked as questionable if they can be interrupted and there are significant enough decisions involved to warrant discussion, e.g., select_bur_to_initial_drill (Fig. 7).

Surgical Procedure Graph and Student Solution Graph

After the system initializes and parses the representation of the plan and actions, a surgical procedure graph is created to facilitate generation of tutorial interventions. This graph represents a procedure, its sub-plans, and the component actions (Fig. 8). Each action in the plan matches an action description in PDDL. The action’s conditional effects are represented as projection nodes (PJ) of the situation awareness framework. The conditions of effects (Fig. 4, WHEN) are represented as action condition nodes (AC) which consist of conjunctions of propositions. During the projection process, perception propositions (P), comprehension propositions (C), and other propositions describing the current state may be matched against the propositions in the action conditions. Perception nodes that have no associated comprehension nodes can link directly to action conditions.

Fig. 8
figure 8

Partial graph of the represented surgical procedure of rubber dam application and access preparation

An example surgical procedure graph is shown in Fig. 9, where inserting a rubber dam (node 1) to separate the working tooth from the oral environment (node 10) before opening the access prevents instruments and substances from falling into the patient’s throat (PJ:ORAL_ENVI_PREVENTED, on node 3). To insert the rubber dam successfully, the crown must be visible (P:CROWN_VISIBLE, node 4), meaning that the tooth is strong enough (C:TOOTH_STRONG, node 5) to be firmly held by the clamp (AC:TOOTH_STRONG, node 6; PJ:NOT CLAMP_RISK_TO_RELEASE, node 7). Each node in the action description of the surgical procedure graph has properties, e.g., is_questionable (node 1 and node 2), main (node 10).

Fig. 9
figure 9

Partial surgical procedure graph of the three adjoining steps of (1) inserting a rubber dam, (2) selecting a bur for initial drilling, and (3) drilling to the dentine, with action descriptions. The checked nodes represent the student’s solution graph of inserting a rubber dam action

The student model is an overlay of the surgical procedure graph, which we call the student solution graph. It is produced by selecting a subset of this graph (checked nodes), representing the student’s executed actions, the action outcomes projected by the Plan Projection Engine, and the situation at run time. This is similar to knowledge tracing on production rules (Corbett & Anderson, 1995). The checked nodes in Fig. 9 are the student’s graph for the action of inserting a rubber dam with clamp number 2 for a premolar (node 1,8,9) with a visible crown (node 4,5,6) and the rubber sheet position under the patient’s nose (node 11). The outcomes of this action are that the working tooth is separated (node 10); the clamp is not at risk of release (node 7); the patient is not at risk of fainting (node 12).

When an action is executed, the current situation, which is represented as fact propositions, perception nodes (P; node 4,8,13) and comprehension nodes (C; node 5,14) of the initial situation and the outcomes of the previously correctly executed actions, is used with action details through action conditions (AC; node 6,9, and 11) to project the effects (PJ; node 7,10, and 12). This is done by the Plan Projection Engine by applying the domain rules to the current state to generate any comprehension propositions. The conditions of the student action are then matched to the propositions in the augmented state description and the action effects are used to generate facts in the new state. The new state is evaluated for the desired outcomes to identify whether the action was performed successfully. The new state becomes a new situation for further actions in the surgical plan.

Pedagogical Module

Preoperative Teaching Session

The preoperative teaching session begins with the pedagogical module generating a series of questions regarding the procedural details to be performed in the intraoperative stage. More questions about the rationale behind the procedural details are generated if the student provides incorrect answers. When the system initializes, actions with effects in each sub-plan – providing local anesthesia, inserting a rubber dam, and access preparation – are randomly selected. The Strategy Controller considers whether each selected action is part of the plan that can be optional and selects the strategy to generate a question regarding the necessity of the action in the plan (“Why do you provide local anesthesia?”). If the selected action is mandatory, the Strategy Controller selects the strategy to generate a question about the details of how to perform it correctly (“To insert the rubber dam, what is the appropriate clamp number?”). When the student answers the question, the Strategy Controller selects the corresponding teaching strategies and assigns the Intervention Generator to generate the tutorial content to be returned to the student. The discussion about an action terminates when there are no more questions to ask. The pedagogical module selects the next action in the plan to discuss and continues until all actions are covered.

Intraoperative Teaching Session

The pedagogical module controls the intraoperative teaching process by evaluating the student’s performed actions and generating appropriate interventions and dialog. After the end of the preoperative teaching session, the pedagogical module prepares itself for accepting the student’s actions. It identifies the correct next action in the procedure. The Plan Projection Engine loads the initial situation and runs the domain rules to formulate the initial state.

When a student performs an action within the VR simulation, the pedagogical module evaluates the performed action by verifying whether it conforms to the plan and has been correctly performed. If the performed action does not follow the plan, the pedagogical module identifies this as a procedural error and assigns the Strategy Controller and the Intervention Generator to select the corresponding teaching strategies and generate an intervention. Then the system allows the student another attempt. If the performed action follows the plan, the Plan Projection Engine projects the effects of the performed action in the current state and runs the domain rules to formulate a new state. The pedagogical module evaluates whether this new state satisfies the desired outcomes of the performed action. The new state satisfying the desired outcome indicates that the action has been performed correctly. Even when the action has been correctly performed, the pedagogical module may generate tutorial intervention to explore the knowledge of the student. After that, it points to the next action in the plan and is ready to accept a newly executed action. When the action has not been correctly performed, the pedagogical module generates an intervention to help the student understand the correct way to perform the action. It does not point to the next action in the plan, but rather rolls back to the prior state and stands by for the new attempt.

When a teaching strategy is selected, the Intervention Generator, which is responsible for generating content, loads the corresponding template of the selected teaching strategy. Each template has keys, and each key is annotated with the type of information in the graph to use. The Intervention Generator traces the graph for the template-keys to get propositions which are translated into English descriptions via the proposition-English dictionary.

The pedagogical module updates the student solution graph. By utilizing the current state, the executed action, and the relevant projected effects (checked nodes, Fig. 9), the student model represents the performed actions, projected effects (PJ), and the relevant current situation (P and C). The surgical procedural graph and the student solution graph facilitate the generation of tutorial feedback. The Strategy Controller in the pedagogical module considers a number of factors as triggers for selecting the teaching strategies. These factors are the stage of the performed action (preoperative stage, intraoperative stage); the necessity of the action (optional, mandatory); the information about whether the executed action follows the plan (procedural error); the appropriateness of asking questions about the performed action; and the information about whether the action is correctly performed.

Sample Tutorial Dialog

To illustrate the tutorial dialog with SDMentor, we provide sample transcripts of interaction with students in the preoperative and intraoperative stages. Table 2 shows a transcript of tutorial interaction in the pre-operative stage. Because the first subplan of the procedure is to provide local anesthesia and it can be optional, SDMentor asks a question about the necessity of using local anesthesia (Strategy 10.3). When the student incorrectly answers, SDMentor does not immediately tell the answer but keeps asking questions about the condition of this necessity (Strategy 11.1) and its meaning to the procedure (Strategy 11.2) to stimulate the student to consider the patient’s situation as relevant information for making a decision. The answers are finally given to the student with a confirmation message (Strategy 7 and 8). If there are more subplans, SDMentor selects one action asking for the action details (Strategy 9). In case there are facts or conditions related to the action details, SDMentor generates a hypothetical question about an alternative possible situation (Strategy 12) in order to explore the breadth of the student’s knowledge.

Table 2 A sample preoperative discussion showing how strategies are sequentially selected based on student responses

Tables 35 show transcripts of tutorial interaction in the intra-operative stage. In Table 3, when the student fails to provide the rationale of rubber dam insertion, SDMentor asks for the future consequence (Strategy 11.3) to point out the importance of performing this action in the procedure. SDMentor can also generate feedback for actions that do not have an immediate outcome, as shown in Table 4. SDMentor points out the undesired outcome if the action were to be performed with the selected bur size (Strategy 6.2), points the student to the information that needs to be considered in selecting the bur size (Strategy 4.2), and asks for the correct bur size (Strategy 9). Table 5 illustrates how SDMentor helps the student with error recovery. SDMentor first points out the sign of damage to the tooth (Strategy 3) and asks the student about its cause (Strategy 13). When the student incorrectly answers, SDMentors gives negative feedback (Strategy 6.2), informs the cause of the error (Strategy 7), and provides instruction to recover from the error (Strategy 4.1).

Table 3 A sample intraoperative discussion for future consequence after inserting a rubber dam
Table 4 A sample intraoperative discussion after selecting a bur which is larger than the patient’s pulp chamber width
Table 5 A sample intraoperative discussion when perforation, an erroneous situation, occurs

Teaching Strategies

The 13 teaching strategies in Table 1 involve two basic types of intervention: informing through messages and asking questions. Questioning is done by presenting multiple-choice questions. In this section, we describe how the strategies 11.3 and 12 are implemented in the pedagogical module. The descriptions of the algorithms for all the teaching strategies are provided in Appendix 1.

Strategy 11.3: Raise Awareness about the Future Consequences (Stage: Intraoperative)

Even though a student correctly performs an action, s/he may fail to identify why the action must be taken. SDMentor makes the student aware by asking about the future consequences.

Intraoperative situation: A student inserts the rubber dam correctly.

SDMentor: Why do you insert the rubber dam? (Strategy 10.2)

Student: (incorrectly answers)

SDMentor: What can happen if you drill to the dentine layer when the working tooth is not isolated from the oral environment?

(1) The pulp chamber floor is not visible

(2) The patient cries out

(3) The working tooth is overcut

(4) Foreign objects like endodontic instruments or fluid can be easily dropped into the mouth

Consider the graph shown in Fig. 9. The pedagogical module negates the main desired outcome of the correctly performed action (PJ:TOOTH_SEPARATED, node 10) as NOT TOOTH_SEPARATED and uses it as a key to search for corresponding AC node (AC:TOOTH_SEPARATE, node 15) in the affected action in the procedure. The affected action (DRILL_TO_DENTINE, node 2) and the negated desired outcome are part of the question. The PJ node (PJ: NOT ORAL_ENVI_PREVENTED, node 16) is the answer; choices are PJ nodes randomly selected from the affected step.

Strategy 12: Pose a Hypothetical Situation (Stage: Preoperative, Intraoperative)

SDMentor can pose hypothetical situations to explore the breadth of understanding of the student. When the student successfully answers a question about the plan details in the preoperative stage, SDMentor poses a hypothetical situation. The failure to answer this type of question causes SDMentor to ask this question again in the intraoperative stage after the student correctly performs an action.

Preoperative situation: The working tooth is premolar.

SDMentor: To insert the rubber dam, what is the appropriate clamp number? (Strategy 9)

Student: (correctly answers)

SDMentor: That’s right. (Strategy 5) Why? (Strategy 10.1)

Student: (answers) (SDMentor provides the answer if the student’s answer is not correct.)

SDMentor: Suppose that the working tooth type is molar, what is the correct clamp number?

(1) Clamp number 9

(2) Clamp number 14

(3) Clamp number 2

Intraoperative situation: In the preoperative stage, a student fails to answer the hypothetical question. In the intraoperative stage, the student inserts the rubber dam correctly.

SDMentor: Why do you insert the rubber dam? (Strategy 10.2)

Student: (correctly answers)

SDMentor: Suppose that the working tooth type is molar, what is the correct clamp number?

(1) Clamp number 14

(2) Clamp number 2

(3) Clamp number 9

Using the graph in Fig. 10, the pedagogical module traces the PJ nodes containing the main desired outcome (nodes 1, 3) to their parent AC nodes (nodes 2, 4), respectively. It selects the AC node with a user action, e.g., CLAMP_NO, and containing the perceptual fact satisfied by the current state (AC:TOOTH_TYPE = PREMOLAR && CLAMP_NO 2, node 2). It searches and selects a similar AC node (AC:TOOTH_TYPE = MOLAR && CLAMP_NO = 14, node 4) that satisfies the desired outcome of the action. The TOOTH_TYPE, which is part of the P node, becomes a question, and the action detail, CLAMP_NO, becomes the answer. The possible answers are determined by other available values of CLAMP_NO in the domain.

Fig. 10
figure 10

Partial procedure graph for the action ‘inserting rubber dam’ for various clamp sizes of premolar and molar teeth with the correct desired outcomes

Evaluation

Evaluation of Appropriateness of Interventions

Evaluation Design

We aimed to determine the appropriateness of the feedback generated by SDMentor in a variety of situations by comparing it to feedback given by human instructors in those same situations. With the ethical approval of the Mahidol University and Thammasat University ethics committees, ten endodontic instructors (human tutors HT1-HT10), who did not participate in the observational study, each with at least 6 years of teaching experience (two with 10 years, two with 8 years, three with 7 years, and three with 6 years) were recruited to provide tutorial feedback for 20 different situations (ST1 – ST20) for comparison with the interventions generated by the system in these same situations. These situations were triggering events for all 13 teaching strategies and were classified into six categories: (1) discussing whether a sub-plan is optional (the preoperative stage), (2) discussing whether a sub-plan is mandatory (the preoperative stage), (3) a student performs an action causing a procedural error, (4) a student performs an action for which intervention is appropriate (is_questionable), (5) a student performs an action for which intervention is not appropriate (not is_questionable), and (6) a student performs an action to recover the error - to fix perforation of a previous incorrectly performed action. The distribution of interventions of the ten human instructors in the evaluation was approximately the same as in the observational study, with the relative frequencies of interventions differing by at most 2%, except in the case of asking for correct details of the action (Strategy 9). See Table 8 in Appendix 2 for details.

We recruited three endodontic experts (EP1-EP3) to blindly rate the interventions of the human instructors and SDMentor by using a 5-point Likert scale to identify the level of appropriateness of the intervention in each situation (Fig. 11). We recruited three experts to mitigate idiosyncrasies in the evaluation of any one expert and to permit analysis of inter-rater agreement. These three experts did not participate in the observational study. They knew the objectives of the assessment as part of the ethical recruiting process. They have extensive teaching experience (two with 20 years and one with 17 years) and are responsible for evaluating the teaching performance of dental instructors.

Fig. 11
figure 11

Sample scores from three dental experts (EP1, EP2, EP3) for the situational tutorial interventions of a few instructors (HT1, 4, 5, 7) and SDMentor using a 5-point Likert scale with descriptions to evaluate the appropriateness of the tutorial interventions

The interventions from the ten human endodontic instructors and SDMentor were grouped by situations and were prepared for being rated by three experts. Though all three endodontic experts knew the objective of the assessment, they did not know who generated each individual tutorial intervention. Because we took great care to model the verbal responses in SDMentor on human responses recorded in our observational study, we believe that the experts evaluating the tutoring interventions had little (if any) chance to guess whether an intervention was generated by human tutors or by SDMentor. To avoid making the identity of SDMentor and human tutors apparent from incidental aspects, we (1) removed the identity of individual interventions, (2) removed the multiple-choice options from the interventions generated by SDMentor, and (3) shuffled the presentation of interventions such that interventions from the same tutor were presented in different order in individual situations. Despite these efforts, it is still possible that the experts could guess which interventions came from SDMentor, which might influence their scoring. Thus, in the Turing Test Evaluation section we report results of a type of Turing test we conducted to evaluate whether the experts could guess which interventions came from SDMentor.

Evaluation Results

Cohen’s Kappa was used to determine the inter-rater agreement of the three experts by examining each possible pairing of experts: EP1 and EP2, EP1 and EP3, EP2 and EP3. The Kappa scores for the three pairs of experts are 0.83, 0.81, 0.81, respectively, showing that the three endodontic experts have a high level of agreement in their judgment of the appropriateness of the tutorial interventions of the instructors and SDMentor.

The descriptive statistics suggest that SDMentor (M = 4.55, SD = 0.08) performs better than the human tutors (M = 4.24, SD = 0.03) on average. Using a Bayesian approach, we analyzed whether the observed advantage for SDMentor is statistically significant. Our approach is motivated by two observations. First, it has repeatedly been remarked that Bayesian data analysis has a number of advantages compared to more traditional frequentist null-hypothesis testing approaches (Kruschke, 2013, 2014; Masson, 2011; Puga et al., 2015). Second, the common procedure of treating ratings as being interval-scaled can lead to serious misinterpretations of rating data (Kruschke, 2013, 2014; Kruschke & Liddell, 2018). Against this background, we treat the experts’ ratings as ordinal data arising from a continuous distribution and investigated whether the data provide evidence for the assumption that the means of the continuous distributions differ between the human tutors (μ1) and SDMentor (μ2) (Kruschke, 2015). Based on the raw rating scores (20 situations * 10 human tutors * 3 dental experts = 600 ratings for human tutors, 20 situations * 1 SDMentor * 3 dental experts = 60 ratings for SDMentor), we computed the posterior probability distribution of the difference between the mean ratings of the human tutors and SDMentor and the corresponding effect size (Fig. 12). Both distributions indicate a significantly higher rating for SDMentor than for the human tutors (μ2– μ1 > 0). In particular, the 95% Highest Density Intervals are higher than zero (Kruschke, 2013). Accordingly, SDMentor performs significantly better than the experienced human tutors.

Fig. 12
figure 12

Posterior distribution of mean differences and the effect size

To understand why SDMentor was scored higher than human instructors, we asked the three endodontic experts, who rated the interventions in the Evaluation Design section, to provide reasons. The situations in which SDMentor interventions were given the highest rating (5) were selected. From the total of 20 situations, EP1, EP2, and EP3 rated SDMentor as the highest for 12, 13, and 10 situations, respectively. For each selected situation, we removed human interventions that were rated equal to SDMentor’s. The remaining interventions, consisting of the SDMentor intervention and the lower-rated human interventions, were sorted in descending order of score and we asked each expert to explain why they rated SDMentor highest. The identity of the source of interventions was not revealed. The reasons that SDMentor was rated highest are summarized in Table 6. The reasons involve SDMentor providing or asking for more information than the human instructors or providing more relevant information.

Table 6 Reasons given by the three experts for rating SDMentor’s interventions higher than those of the human instructors

To understand the differences in feedback from SDMentor and from the human tutors, Table 7 shows a comparison of the relative frequencies of the uses of the various strategies. We can see that SDMentor makes relatively more use of positive feedback (Strategy 5), negative feedback (Strategies 6.1 and 6.2), and asking for correct details of an action (Strategy 9). The human tutors make more use of confirming the fact related to the action details (Strategy 10.1) and raising awareness of the relevant facts (Strategy 11.1).

Table 7 Percentage of occurrence of teaching strategies of ten dental instructors and SDMentor in the evaluation

Turing Test Evaluation

To assess whether the interventions generated by SDMentor are distinguishable from those of human raters, we asked the same three endodontic experts in the Evaluation Design section to identify the source interventions, using the same data as in the evaluation described in the Evaluation Design section to identify the source interventions, using the same data as in the evaluation described in the Evaluation Design section ten human instructors and SDMentor each providing interventions for 20 situations. Each endodontic expert considered these 11 interventions and identified the one that s/he thought belonged to SDMentor. Since each situation had 11 interventions, with one of them generated by SDMentor, the baseline accuracy from random guessing is 9%. The three experts had accuracies of 30% (6/20), 15% (3/20), and 0.0% (0/20), giving a mean accuracy over the three experts of 15%, which is only marginally better than random guessing.

Discussion

The pedagogical module in SDMentor successfully implements the 13 teaching strategies identified in the observational study. The pedagogical module is able to ask questions about a variety of aspects of the surgical procedure such as questions about future consequences of actions, meanings of observed patient characteristics, and posing hypothetical situations. In addition, it provides hints in the form of questions and statements to help to guide the student to discover answers themselves. This tutoring behavior is enabled by the PDDL-based knowledge representation. Conditional action effects permit SDMentor to reason about how the state of the patient and the way an action is carried out (action parameters) influences the action’s effects. The situation awareness component of the representation, in turn, permits SDMentor to reason about the implications that observed patient features have for action conditions. The conditional effects support SDMentor in posing hypothetical questions to explore breadth of student understanding (Strategy 12). The action descriptions and their relation to the surgical procedure permit the pedagogical module to generate sophisticated tutorial feedback, e.g., questions about future consequences (Strategy 11.3), feedback for actions that do not change the state of the patient (Strategy 6.2), hints for procedural errors using expected action outcomes relevant to future actions needed in the procedure (Strategy 2).

SDMentor generates tutorial feedback by using the corresponding template of the selected teaching strategy. It traces the surgical procedural graph to retrieve the template-corresponding propositions which are then translated into English via a prepared dictionary. Generating tutorial feedback in this way is similar to the process to generate immediate feedback in ANDES (Graesser et al., 2001). The ANDES immediate feedback provides sequential hints, with more specific hints given when the student requires more help. The ATLAS natural language-based module supports conversational interaction to help students develop deep understanding. ATLAS uses Knowledge Construction Dialogue to facilitate human instructors in providing content and identify how the conversation will be carried out. Writing content in Knowledge Construction Dialogue and debugging with real students is labor-intensive work.

Tracing a model to generate conversational dialogue helps decrease the effort in creating tutorial content. AutoTutor (Graesser et al., 2001) uses curriculum scripts containing topics and English teaching content in a set of expectations, hints, and prompts. This content is prepared by human instructors and transformed into vectors. Dialogue Advancer Network, a finite automaton network, is used to control the conversational interaction between individual student and AutoTutor.

SDMentor provides a teaching process to support a student to correctly perform an action, especially when there is an error, which is similar to Vanlehn’s inner loop (Vanlehn, 2006). When the student commits an error, SDMentor will not tell the student of the correct action to perform. It generates tutorial feedback and allows the student to perform the current action until it is performed correctly. If the action was incorrectly chosen at first, it generates a hint (Strategy 2). If the same error was repeated, SDMentor tells the correct action to perform (Strategy 4.2). If the action was incorrectly performed, SDMentor points out the error and conducts a teaching process (Tables 34) that guides the student to correctly perform the current action.

The use of PDDL in SDMentor is different from the ones of Roman Tutor (Belghith et al., 2012) and the interactive narrative instantiation (Charles et al., 2013). SDMentor uses the modified PDDL 3.1 and the modified plan representation of Bertoli et al. (2003) to project action effects and generate tutorial feedback. While both Roman Tutor and the interactive narrative instantiation employed the standard PDDL 3.0, and PDDL was utilized differently. First, the camera planner inside Roman Tutor applied PDDL to represent idioms of how shots in a scene can be filmed – camera movement, to generate 3D movie demonstration (Kabanza et al., 2008). The idioms were encoded in forms of temporal logic formulas (Belghith et al., 2012). Second, the work of Charles et al. (2013) applied the standard PDDL 3.0 to simulate real-world situations to promote patient education. The real-world simulated clinical situations allowed a patient to explore appropriate behaviors concerning the patient’s actions and decisions.

In evaluating the appropriateness of the generated tutorial interventions by comparing them to human tutorial interventions, the generated interventions were found to be significantly better than those of experienced human tutors even though the system has been modeled on human tutors. This may be due to two reasons. First, SDMentor uses relevant keywords to generate feedback, resulting in an intervention relevant to a particular situation. This follows the guidelines of giving feedback in surgical education (Hoffman et al., 2015; Vickery & Lake, 2005). Second, human tutors may occasionally provide interventions that are overly vague (with or without intention). We found this situation from both the observational study and the evaluation process when we collected interventions from human tutors, e.g.: “Did you forget any steps?” from Fig. 11. Interventions that are too vague may lead to communication failures in teaching (Lingard, 2004).

We took steps to obfuscate the sources of the interventions to avoid the potential for bias that could arise from knowing which interventions came from SDMentor. In addition, we assessed any remaining potential for such bias in the evaluation by asking the three experts to try to attribute which interventions came from SDMentor. The mean accuracy for this task was 15%, which is only marginally better than the 9% that would result from random guessing. While there was quite a bit of difference in the success in guessing of the three experts (0%, 15%, 30%), the Cohen’s Kappa interrater agreements of the expert scores were high, ranging from 0.81 to 0.83. This suggests that any ability to identify which interventions came from SDMentor did not influence the scores.

Conclusions and Future Work

We have studied how surgical decision making is taught in the operating room to find an effective way to improve the training of surgical decision-making skills. By observing several real teaching sessions of the selected surgical procedures, we identified teaching strategies and knowledge representation requirements and applied them to implement SDMentor in the particular domain of endodontic surgery. The techniques presented in this paper are general and applicable to a variety of surgical domains.

The approach used to design and develop SDMentor can serve as a template for development of ITSs in other surgical domains. The approach involved six steps. First, with the objective of developing an ITS for teaching surgical decision making, an observational study was designed in order to observe teaching behaviors of experienced instructors. Second, a variety of instructors and procedures were chosen. Third, the recorded sessions were transcribed, and thematic analysis applied to identify interventions and their trigger events. This defined the teaching strategies. Fourth, the teaching strategies were verified via interviews with the observed instructors. A content template for each teaching strategy was identified. Fifth, the verified teaching strategies and their content templates were used to design the knowledge representation and the pedagogical algorithms. Here a balance had to be sought between fidelity to the observed behavior and simplicity of implementation. For example, we opted for multiple choice questions rather than open-ended questions which would have required natural language processing. Finally, the implementation was evaluated by providing a set of human tutors with a variety of situations and asking how they would intervene. Their interventions in those situations, as well as the intervention of the ITS were scored by highly experienced human tutors.

This study has a number of limitations which give rise to topics for future research. First, we have a fairly simple student model that does not support diagnosis of student errors. A more sophisticated student model could enhance the system’s ability to generate hints. To do that, we would need to enrich the domain model with an ontology of medical background knowledge. This could then be used in conjunction with the record of student surgical errors and incorrectly answered questions to diagnose misconceptions in high-level concepts. The ontology could also be used to provide a more flexible hint strategies (Kazi et al., 2012), as well as support scaffolding. In the domain of dentistry, use could be made of ontologies like SNODENT (Goldberg et al., 2005), but additional causal information would also be needed.

SDMentor does not currently support scaffolding. In surgical training, there are two dimensions to this: scaffolding of case complexity and scaffolding of student autonomy (Champagne, 2013). Scaffolding of case complexity could be achieved by having a variety of cases to present to the student, with the choice driven by information from the student model. Scaffolding of autonomy could be achieved by adjusting the amount of intervention SDMentor provides, also informed by the student model.

Best practice in surgical training involves retrospective discussion after a procedure or procedure stage (Roberts et al., 2009), involving overarching principles and concepts (Hill et al., 2017). Retrospective discussion helps students with summarizing lessons learned after the operation. In our framework, we have focused on providing immediate feedback after an action has been executed. Adding retrospective discussion to our framework would require an ontology of surgical and medical concepts, as discussed in regard to the student model.

Time is an essential factor affecting surgical decision making. Students must estimate if they have enough time to finish their procedure and human tutors often give feedback about the available time. Time used can affect appropriateness of actions. For example, the more time students need, the more local anesthesia is required. To incorporate time into the current framework would require a temporal representation of surgical tasks and monitoring of time spent.

Non-routine situations arising from uncertainties can occur during surgery. There are two types of uncertainties: uncertainty from unknown preoperative facts and uncertainty in action outcomes. The first type of uncertainty arises in the intraoperative stage when the performed action produces an unexpected outcome. For example, a patient who does not know s/he is allergic to local anesthesia may develop a red rash after anesthesia injection. The second kind of uncertainty involves actions with effects that may or may not occur. For example, in treatment of a tooth with an existing crown, the crown may chip off. Such risks could be represented by nondeterministic action effects or through a probabilistic action representation, as used in decision-theoretic planners (Haddawy et al., 1996). A probabilistic representation could then be used to generate outcomes using Monte Carlo simulation. But obtaining accurate probabilities would be a challenge.

This study focused on the ITS component of SDMentor and thus did not include an evaluation of the overall teaching effectiveness of the system. Such an evaluation would require a randomized controlled trial with a group of students taught by SDMentor and a control group taught in the traditional way. Differences in learning gains could then be compared. We plan to carry out such an evaluation as future work.