Keywords

1 Introduction

In an effort to modernize training to make it more effective and learner-centered, the U.S. Navy is exploring emerging technologies to provide quality training at the point of need to the fleet. Currently, gamification is a popular topic among researchers and training providers; however, there is little empirical evidence to support its use in training systems to improve training outcomes. The term “gamification” is used to describe the inclusion of game features (e.g., competition and rewards) in training and educational tasks, such as training systems. Although many theorize that gamification will increase learners’ interest and motivation to complete educational content, resulting in better learning outcomes and performance, the research on such benefits is mixed (e.g., Cagiltay et al. 2015; Hays 2005; Mayer 2019; O’Neil and Perez 2008; Sitzmann 2011; Wouters et al. 2013). A potential reason for the disparity in training effectiveness findings is due to a lack of systematic research investigating what games are beneficial for training, for which learning objectives, and for which kinds of learners, which has been pointed out in a number of reviews (Bedwell et al. 2012; Lister and College 2015; Mayer 2019; Mayer and Johnson 2010; O’Neil and Perez 2008; Plass et al. 2015). Therefore, the goal of this experiment was to establish whether adding game features has a positive impact on performance during training and leads to better learning outcomes. Specifically, we explored whether the presence of game features (i.e., performance gauges and score) and competition features (i.e., leaderboard) affected motivation and learning outcomes within the Periscope Operator Adaptive Trainer (POAT).

1.1 Game Features

Games can include a number of different features such as points, leaderboards, ranking, badges, trophies, time pressure, and levels, among many others (Lister and College 2015; Plass et al. 2015). In this experiment, we chose to explore rewards and competition, because they both have been cited as a reason games may increase learning outcomes through motivation in previous studies (Cagiltay et al. 2015; Cameron et al. 2005; Hawlitschek and Joeckel 2017; Lister and College 2015; Plass et al. 2013).

Rewards are game features such as points, score, stars, badges, trophies, etc. that are given for meeting performance or achievement standards (Wang and Sun 2011). Reward features are thought to increase a learner’s engagement, leading to higher motivation on a task (Shute et al. 2013). Higher motivation is theorized to improve learning outcomes, because learners are motivated to spend more time actively processing the learning material (Hawlitschek and Joeckel 2017). For example, Cameron and colleagues (2015) conducted an experiment that gave monetary rewards for performance achievements on a puzzle-solving task. The researchers found that the monetary rewards increased interest on the puzzle-solving task and also increased time on task after performance was no longer rewarded. Additionally, Cameron et al. found that rewards did not distract from the problem-solving task.

In the context of computer-based educational games, “competition” is defined as, “the activity of students comparing their own performances with the performance of a virtual opponent,” with either a computer or other student as the virtual opponent (Vandercruysse et al. 2013, p. 929). This social comparison can take the form of game features such as a leaderboard that presents a score or ranking of an individual or team compared to that of the virtual opponent(s). Competition is hypothesized to increase motivation on a task by increasing attention to the material, fostering a sense of challenge, or by creating a “unidirectional drive upward” social comparison (Cagiltay, Ozcelik, and Ozcelik; Garcia et al. 2006). For example, an experiment by Cagiltay and colleagues (2015), compared two versions of an educational computer game for learning database modeling concepts: one version included a leaderboard that updated in real-time with the learner’s performance score and rank compared to their peers, and the non-game version did not present a leaderboard. They found that the competition group scored higher on the post-test on conceptual knowledge and higher on motivational measures compared to the non-competition group.

1.2 Theory

From a theoretical perspective, adding game features could have positive effects on learners. Cognitive Load Theory (van Merriënboer and Sweller 2005) posits that learners have a limited capacity of working memory resources and that instruction should be designed to limit the amount of cognitive load imposed on the learner. There are three different types of cognitive load. (1) Extraneous load arises from poor instructional design, such as using a poor interface or including unnecessary information in the training. (2) Intrinsic load stems from the instructional content itself, such as the number of elements learners must hold in their working memory at one time. (3) Germane load is the level of mental effort the learner exerts in order to understand the material; this load is productive and could include the learner relating learned material to prior knowledge. The three types of cognitive load are traditionally thought of as additive, such that once an individual has reached their capacity (this is referred to as cognitive overload), learning and performance suffers.

Proponents of gamification have cited the motivational power of games for training (Driskell et al. 2002; Gee 2003; Prensky 2001). They argue that games are intrinsically motivating, and learners will want to continue playing to improve their performance. According these researchers, motivation evokes productive cognitive processing (i.e., germane load), which increases the player’s potential for learning (van Merriënboer and Sweller 2005). That is, game features serve to increase motivation to play, and therefore, free up additional cognitive resources that can be directed to learning the content and improving learning outcomes.

1.3 Present Experiment

The purpose of this experiment was to compare the value-added of game features in a scenario-based trainer on learners’ performance outcomes and motivation. In this experiment, we utilized the Periscope Operator Adaptive Trainer (POAT) as our training testbed. An adaptive trainer provides instruction that is tailored to an individual learner’s strengths and weaknesses (Landsberg et al. 2012). POAT simulates a periscope operator’s task in which the trainee observes a contact (e.g., a ship) in a periscope view and must determine the contact’s angle on the bow (AOB) and range within one minute. AOB is the side (e.g., port or starboard) and direction (e.g., angle between 0–180°) a contact is heading relative to the viewer’s line of sight. Range is the distance to the contact from the viewer’s ownship, which requires a mental math calculation based on the size of the contact in the periscope’s reticle. After each periscope call, POAT provides adaptive feedback to the trainee based on the accuracy of his or her performance. The amount of detail provided in the feedback varies based on the accuracy of the participant’s response; when the periscope call is more accurate, students receive less feedback. If the call is inaccurate, students receive detailed process information to improve their calls. In addition, the scenario difficulty was also adapted based on trainee performance during training, such that students who were performing well received more difficult scenarios and those performing poorly received easier scenarios. These difficulty adaptations occurred after a set of 15 scenarios, or a “testlet.” POAT includes scenarios of beginner, intermediate, and advanced difficulty.

The Game Features condition featured performance gauges to display how well students were performing during training. These indicators included stars that filled up with more accurate and timely periscope calls, a score that was displayed based on performance points that were additive with each scenario, and a difficulty meter that showed the current level of difficulty of the scenario (see Fig. 1). In addition, students in the Game Features condition also received a leaderboard after each testlet that displayed the student’s score relative to 23 other players (see Fig. 2). The other players and scores were the same for all students in this condition, but where students were ranked on the leaderboard was dependent upon their individual performance. A colored arrow also was displayed on the leaderboard screen to indicate whether a student’s ranking went up, down, or stayed the same relative to the previous testlet’s leaderboard. For example, if Student A was ranked #12 after the first testlet and ranked #10 on the second testlet, then Student A would see a green arrow next to their ranking on the second testlet’s leaderboard to indicate their ranking improved over the previous testlet.

Fig. 1.
figure 1

Screenshot of Game Features condition in which trainees are shown a contact and input their periscope call (including AOB and range). Specific features are explained in callout boxes.

Fig. 2.
figure 2

Screenshot of Game Features condition’s leaderboard. In this case, username “tutorial” was ranked 14th.

Participants in the Control condition did not receive performance indicators, scores, or leaderboards. Figure 3 displays a screenshot of an example scenario from the Control condition.

Fig. 3.
figure 3

Screenshot of Control condition in which trainees are shown a contact and input their periscope call (including AOB and range). Specific features are explained in callout boxes.

Consistent with Cognitive Load Theory and based on findings from previous gamification studies using competition and rewards, we predicted that the Game Features condition would show higher learning gains on AOB and range than the Control condition from pre- to post-test. We also predicted that having game features during training would encourage faster periscope call times on the post-test. Furthermore, we predicted that individuals in the Game Features condition would report higher motivation than those in the Control condition.

2 Method

2.1 Participants and Design

Fifty-six students from the Submarine Officer Basic Course were assigned to a condition using a randomized block design. Six students were removed from analyses due a fire alarm interruption during the experiment, and one was removed due to experimenter error. Therefore, a total of 49 students (45 males, 4 females) were included in the analyses with 25 students in the Game Features condition and 24 students in the Control condition. The students’ average age was 23.94 years (SD = 0.93) with a range of 22-26 years.

2.2 Materials

Testbed.

The two versions of POAT described above were used in this experiment. The Game Features version included scores, stars, and a difficulty level indicator based on performance and a leaderboard displayed after every testlet that showed the student’s score and ranking relevant to other (artificial) players. The control version did not include any game features but was otherwise the same.

POAT Performance Measures.

AOB, range accuracy, and time to complete the periscope calls were collected in a pre-test and post-test. Unlike the training scenarios, participants did not receive any feedback on the accuracy of their performance during these tests. The pre-test included 20 periscope call scenarios and was given prior to the POAT training scenarios. Following the training, participants completed 20 post-test scenarios. AOB accuracy is measured as the difference in degrees (0–180) between the trainee’s called AOB and the actual AOB of the contact. Range accuracy is the difference in yards between the trainee’s call and the actual range of the contact. Smaller numbers indicated more accurate calls (i.e., better performance) for AOB and range. Likewise, a smaller number for call time indicated better performance, because it is desirable for trainees to make faster calls.

Intrinsic Motivation Inventory (IMI).

The IMI included 23 items and measured how motivated the trainees were to complete the task (i.e., state motivation) on four subscales: 1. Interest/Enjoyment, 2. Perceived Competence, 3. Effort/Importance, and 4. Pressure/Tension (Ryan and Deci 2000). An example item from the Effort/Importance subscale is, “It is important for me to do well at this task.” Participants responded to each question on a scale from 1 (“Not at all true”) to 7 (“Very true”). Each subscale was averaged to indicate motivation in each factor. High scores on all of the subscales except Pressure/Tension correspond with higher motivation during the task.

Participant Characteristics

Demographics.

Participants completed a demographic questionnaire that included questions about their age, sex, highest level of education, time in the military, and experiences with periscope training.

Spatial Ability.

The Paper Folding Test (PFT) is a timed measure of spatial visualization in which participants answer 10 items within three minutes (Ekstrom et al. 1976). For each item, participants must match an image of a folded piece of paper with a hole punched through to an image of how the paper would look when it is unfolded. Participants select one of five answer choices for each item. Previous research with the POAT testbed has found spatial ability to be a critical individual difference variable for predicting performance (Van Buskirk et al. 2014).

Achievement Motivation Scale (AMS).

The AMS asked participants 14 questions related to their overall motivation prior to the task (i.e., trait motivation; Ray 1979). An example item is “Are you satisfied to be no better than most other people at your job?” Participants indicated their agreement with each question on a three-point scale, from 1 (“No”) to 3 (“Yes”). The sum of items was calculated to determine overall achievement motivation, with higher scores corresponding with more motivation.

2.3 Procedure

Students participated in groups of four to six, and the experiment took approximately two hours to complete. Upon consenting to participate, students were assigned using a randomized block design to either the Game Features or Control conditions, completed a demographics questionnaire, and the Achievement Motivation Scale. Prior to using POAT, all students completed a tutorial that provided basic information on how to make a periscope call (e.g., defining port and starboard) and how to use the POAT interface. Students then completed a 13-item knowledge quiz to ensure they understood the material in the tutorial before they began the training. If students missed any questions on the knowledge quiz, a researcher went over the correct answer before moving on. Next, all students completed the pre-test, followed by training in POAT that varied by condition. Altogether, students completed five testlets during training, or a total of 75 training scenarios. In the Game Features condition, students received scores and gauges depicting their level of performance for each trial, and a leaderboard was presented after each testlet during training. In the Control condition, students used POAT, but they did not receive any of the game features during training. After the training, participants completed a post-test and the Intrinsic Motivation Inventory to measure state motivation. Finally, participants were thanked and debriefed.

3 Results

3.1 Participant Variables

Initial analyses were performed to determine whether the groups differed by potentially-influential characteristics, including spatial ability and trait motivation. Means, standard deviations (SDs), and results from the t-tests are presented in Table 1. First, Paper Folding Test scores, which were used as a measure of spatial ability, did not differ significantly between groups. Second, we used the Achievement Motivation Scale (AMS) as a measure of trait motivation. AMS scores also did not differ by training condition. The scores of both groups were around the midpoint of the scale (midpoint = 21), suggesting that participants were moderately motivated individuals overall.

Table 1. Participant variables means and SDs by condition

3.2 AOB Performance

AOB scores on the pre- and post-testsFootnote 1 were calculated by using the median absolute difference in degrees (0–180) between the called and actual AOB by scenario for each test. Next, a gain score was computed for AOB for each participant. Gain scores were used to determine the extent to which participants’ performance improved after taking into account how much they could have improved from pre- to post-test. The gain score data are presented in the top row of Table 2. Based on a t-test, there were no significant differences on AOB range scores by condition.

Table 2. AOB and range gain scores by condition

3.3 Range Performance

Similar to AOB score calculations, range scoresFootnote 2 for each test were calculated using the median difference in called and actual range in yards. Then, a gain score was computed by determining the difference in pre- and post-test scores and dividing by the absolute value of the pre-test score. Range scores by condition are presented in the second row of Table 2. Based on a t-test, there was no difference found between the two conditions on range gain scores.

3.4 Call Time Performance

Since reporting accurate periscope calls as quickly as possible is a desirable skill in this domain, we also explored the potential benefits of adding game features during training on call time performance. A repeated measures ANOVA was conducted on pre- and post-test call times. Means and SDs for call times by test are presented in Table 3. There was a main effect of test, such that time to make a periscope call decreased from pre- to post-test, [F(1,47) = 141.643, p < .001, ηp2 = .751]. However, the training groups did not differ on call time [F(1,47) = 0.762, p = .387, ηp2 = .016], nor was there an interaction [F(1,47) = 0.376, p = .543, ηp2 = .008]. This indicates that students made faster periscope calls after adaptive training, regardless of game features condition.

Table 3. Periscope call time means and SDs by test and condition

3.5 State Motivation

The IMI measured motivation on the task (i.e., state motivation) in four subscales: 1. Interest/Enjoyment, 2. Perceived Competence, 3. Effort/Importance, and 4. Pressure/Tension. As shown in Table 4, motivation on the task as measured by the IMI did not differ depending on whether participants received game features during training.

Table 4. State motivation means and SDs by condition

4 Discussion

The goal of this experiment was to examine the impacts of adding game features to a scenario-based training system on performance outcomes and motivation. In particular, we explored game features that we intended to provide rewards for good performance through the use of performance gauges (i.e., scores, stars, and difficulty level indicators) and foster competition through the use of leaderboards. Overall, we found that performance results showed that adaptive training was successful at increasing AOB and range accuracy; these results were not surprising given that adaptive training techniques have been found to lead to higher learning outcomes in several different domains, including this one (Landsberg et al. 2012; Marraffino et al. 2019; Van Buskirk et al. 2019). However, contrary to our hypotheses, we did not find evidence that game features led to higher learning gains on AOB or range from pre- to post-test over a control condition with no such features. Furthermore, there is no evidence that including game features led to faster periscope call times relative to the control. The results suggest that when trainees receive adaptive training, game features may not add value to the training because they do not increase performance or motivation more than the control condition.

Participants’ trait motivation was measured prior to training to determine how generally motivated the students were before beginning the task. In general, both groups were around the midpoint on the motivation scale. Following the training, motivation on the task was assessed in both groups to understand how the different training conditions affected state motivation of the participants. On all four different factors of state motivation, including engagement, effort, competence, and pressure, the two training groups did not differ significantly from each other. There was no evidence that including game features during training increased motivation or engagement on the task compared to the control group that did not receive game features. Therefore, counter to our prediction, we found no evidence that game features such as performance gauges and leaderboards increased feelings of engagement, competence, or effort toward the task.

It is possible that the way these game features were implemented in the testbed were not rich enough to increase motivation to improve learning outcomes. For example, the leaderboards contained artificial scores and rankings, and perhaps a sense of competition would be heightened if students were aware that they were competing directly with their peers. Likewise, it is possible that one exposure to the training was not enough to foster motivation, and perhaps repeated exposures to the training are necessary to drive the impacts of the game features on learning. That is, in this experiment, participants were not given the opportunity to decide to play again, and future research should explore the impacts of game features on trainees’ willingness to continue to train, which would be a richer assessment of motivation than a self-report questionnaire.

From a practical perspective, although game features did not lead to measurable improvements in performance and motivation relative to a control condition in this case, including game features did not lead to significant decrements in performance either. More research is needed to determine the conditions under which gamification serves to improve learning and increase motivation to inform the training and education communities on when and how to invest in these technologies.