3.1 Conversational Agent for Children's Daily Tasks
This study is based on voicebot interaction research previously conducted as a tangible product targeting children with ADHD [
13,
14]. The voicebot had a physical presence that increased its effectiveness in children with ADHD. However, its limited versatility made it challenging to conduct experiments with a large number of participants. In this study, we integrated a chatbot builder, NAVER CLOVA, into an Android application that is compatible with tablet PCs to develop a conversational agent. In other words, we redeveloped an application that allows voice interaction. This prototype is an application that helps children start and complete their daily tasks, which they set up with their parents (Figure
1 (a)). The daily task list was designed to address the overall daily tasks of children in lower elementary school grades. There are a total of 18 daily tasks, including waking up in the morning, brushing teeth after dinner, preparing bags, and reading books.
In previous studies, self-instruction and behavioral parenting training have been incorporated into the interaction process to assist children with ADHD from a cognitive-behavioral therapy perspective [
13]. Cognitive-behavioral therapy enables children to develop strategies for monitoring and managing their own executive function [
11]. In particular, self-instruction involves acquiring strategies for self-regulation, which has been found to be effective in regulating behavior in children's daily lives and learning domains [
58]. Behavioral parenting training is widely recognized as one of the most effective interventions [
54]. An exemplary case involves the use of tokens to reinforce positive behaviors and reduce negative behaviors in children [
37].
The application developed for this research was designed using a “Goal, Plan, Do, and Check” process, primarily based on self-instructional steps [
16,
21]. This process enables children to 1) set their own task goals, 2) receive reminders about these goals, 3) implement them, and 4) confirm their progress. Rewards were provided to children through a token economy system. When a child completed their daily tasks, they received a star sticker, and upon successfully completing all tasks for the day, the star sticker was replaced with a character sticker. In addition, a character card was also given (Figure
1 (b) and (c)).
When impaired executive function impairs the process of regulating behavior through internally represented forms of information, obtaining externalized forms of information is helpful [
46,
47]. Notably, the physical representation of information must be externalized when performing a task [
47]. Therefore, we determined that voice interaction might be necessary for children with ADHD. The prototype tablet PC application delivers both visual and auditory cues. This prototype uses cues to convey information about a child's situation.
A conversational agent, which has widely used functions such as ordering goods and weather guidance, is structured to appropriately answer its user's question [
60,
64]. In our prototype, the typical relationship between the user and the chatbot is reversed. This means that ForME was designed to communicate with its users by prompting them to respond with several command words with limited intent, so that conversations can be conducted in a rule-based manner. The core interaction of this prototype enables child users to individually perform several actions one after the other.
Children would normally deliver a speech with four main intentions [
14]. In some situations, dialogue may be determined by three or five different intents, depending on the context. First, there is the correct answer section, where the child has completed a certain task successfully and proceeds to the next task. When the current activity has not been performed completely or is performed incorrectly, the second section is defined as in-progress. In the third section, the unable-to-process section, ForME is unable to process the child's response because of a lack of clarity in the child's speech or accent. The final section, or the no-response section, occurs when there is no response from the child at all. The children's responses were organized such that the prototype could maintain communication with them based on their responses.
These tasks can be performed in two ways. First, the agent ForME provides a sound alarm at a preset time, saying "What should we be doing right now?", so that the child can fulfill their set goal. Second, the child can start fulfilling the goal by tapping the ForME icon in the application.
The cue was designed with reference to smart speakers and displays. Figure
2 shows a child and the ForME agent working together to complete the task of “Brushing teeth after dinner” when the prototype alarm goes off at the pre-arranged time. The agent delivers a cue to the user through two channels: sounds and visuals. The sounds consist of two components: a beep that serves as a cue to help participants better understand the contents of the conversation and a dialog that contains the actual conversation. There are four different beep sounds: ringing, ding, tick tock, and ding-dang-dong. Ringing indicates the start and end of the entire task. Ding signals when users start and stop talking. Tick tock indicates the start of the timer, and ding-dang-dong indicates the end.
For each phase, multiple paraphrased versions of the dialogues are generated, totaling five or more alternatives for a given scenario. One of these alternatives is randomly chosen and presented to the user. Visual cues accompany each step to illustrate screen modifications. Adjustments in the steps and the agent's expressions reflect advancements in the task. Figure
2 illustrates children's responses and actions during their interactions with the ForME agent.
When the user does not mention the required keywords or does not speak at all during each step, the agent asks for a response so that the user can provide an answer that includes the keywords. For example, when the agent asks the user which task should be performed (see Figure
2) and the user provides a wrong answer, such as “wash my hands” instead of “brush my teeth,” the agent produces an output that says “You know…there's something you have to do three times a day to avoid an ache! Tell me what that is!”, which allows the user to derive the correct answer via the hint given. The keywords that the agent perceives as the correct answer at each step were developed using the Natural Language Understanding function provided by the chatbot engine, which was developed to analyze and process user intentions.
We configured the prepared scenarios using the NAVER CLOVA chatbot builder, which is equipped with natural language processing technology for Korean as well as a built-in machine learning algorithm. The Speech-to-Text functionality embedded in Android was prioritized. However, in cases where the analysis of a child's speech is ambiguous and intent recognition is challenging, we devised a system to verify using CLOVA Speech-to-Text. The server was divided into a Web App and API Server. The Web-based app is deployed to the client using the service through Google Firebase Hosting, and it processes service requests such as conversations and appointment schedule checks from the client and returns the results. Because the CLOVA Chatbot Builder does not provide built-in responses for waiting for appointment times or handling no-response intent, we implemented Chatbot Middleware to enable these functionalities.
3.2 Monitoring Application for Parents
We developed a Unity-based mobile application for parents to monitor their children's use of ForME. Thus, parents can install the parent application regardless of the operating system of their smartphone.
Parents and children can use the parent application to register their child's information, input school details, and enter the corresponding school code, as illustrated in Figure
1(d). Subsequently, as depicted in Figure
1(e), they can obtain a QR code, which the child can use to log in to the child application.
Parents can track their children's daily task performance and view the tasks they have completed through a dedicated web page for parents (Figure
1(f) and (g)). It is important to note that updates on task performance are not in real time but become accessible after a day. Real-time alerts and notifications were not incorporated to promote the children's autonomy and to avoid excessive intervention. Instead, parents were encouraged to agree with their children before using ForME, allowing the child to maintain a high degree of autonomy. However, if difficulties arose, the parents were informed that they could assist their children.