Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Fusing Quantitative and Qualitative Methods in Virtual Worlds Behavioral Research Carl Symborski, Gary M. Jackson, Meg Barton, Geoffrey Cranmer, Byron Raines, Mary Magee Quinn SAIC 4001 N Fairfax Dr. Arlington, VA 22203 +1 703-558-2958 carl.w.symborski@saic.com, marguerite.r.barton@saic.com, geoffrey.cranmer@saic.com, byron.a.raines@saic.com Celia Pearce Georgia Institute of Technology 85 Fifth Street NW Atlanta, GA 30308 +1 310-866-8014 celia.pearce@lmc.gatech.edu ABSTRACT In this study, Science Applications International Corporation (SAIC) and Georgia Institute of Technology (GT) developed a quantitative-qualitative mixed methods research technique to investigate the extent to which real world characteristics of Massively Multiplayer Online Role-Playing Game (MMORPG) players can be predicted based on the characteristics and behavior of their avatars. SAIC used three primary assessment instruments to quantitatively rate videos of participant gameplay sessions, while GT produced detailed qualitative descriptions of avatar activities and behavior. Automated textual analysis was then used to identify conceptual themes across all of the descriptions produced by the qualitative team. Using the themes generated by the automated textual analysis in combination with the quantitative variables, we were able to demonstrate the efficacy of the hybrid method for the prediction of real world characteristics from avatar characteristics and behavior. Keywords Mixed methods, behavioral studies, online games, MMORPGs, virtual worlds INTRODUCTION Quantitative and qualitative behavioral and social science methods and analyses are often conducted separately. Each method yields its own unique results and serves different purposes. Mixed methods behavioral studies of virtual worlds offer the possibility of a richer understanding of the context of the data collected, and have the potential to strengthen findings through a combined perspective (Johnson and Onwuegbuzie 2004); furthermore, they have been used successfully in the past on virtual worlds research Proceedings of DiGRA 2013: DeFragging Game Studies. © 2013 Authors & Digital Games Research Association DiGRA. Personal and educational classroom use of this paper is allowed, commercial use requires specific permission from the author. (Feldon and Kafai 2008). However, combining quantitative and qualitative research methods can also pose unique challenges for researchers. To explore this interdisciplinary approach, Science Applications International Corporation (SAIC) and Georgia Institute of Technology (GT) collaborated to develop a mixed methods virtual worlds research approach, involving the use of text analysis tools to extract quantitative data from qualitative thick description text. The purpose of this research study was to determine whether or not it is possible to predict a person’s real world characteristics from the characteristics and behaviors of his or her avatar in an MMORPG, using variables generated by the quantitative-qualitative approach. To investigate the prediction of real world characteristics using avatar characteristics and behavior, an innovative protocol was followed using videos recorded by participants during their regular play activities: (1) SAIC conducted quantitative rating of the video to determine avatar characteristics, behavior, and personality using standardized assessment forms; (2) GT conducted qualitative observations of the video generating thick description notes typically used in ethnographic participant observation (Geertz 1973); and, (3) automated text analysis was conducted to extract themes from the qualitative thick description notes, which could be used to supplement the quantitative data. The statistical technique of discriminant analysis (DA) was then conducted to determine whether or not it was possible to produce predictive models using the collected data. While automated text analysis is common in virtual worlds research, to our knowledge this is the first time that it has been applied to qualitative thick description notes to convert such notes to quantitative and predictive data. In this study, the interdisciplinary team used the quantitative and qualitative variables gathered to generate statistical models for the prediction of real world gender, age, extraversion level, submissive ideology, and aggressive ideology. METHOD Participants Participants were recruited via Facebook/Google advertisements, flyers, and online game forums. All participants were required to be at least 18 years of age and to have a minimum of 50 hours of experience with either Guild Wars® (ArenaNet 2005) or Aion® (NCSOFT 2009), two popular MMORPGs. A total of 80 participants completed the study. Recruitment occurred primarily within the Washington DC region, USA. -- 2 -- Gender Age Education # of Participants Game Figure 1: Participant demographics. More Guild Wars® (45, or 56%) than Aion® players (35, or 44%) and slightly more males (42, or 52%) than females (38, or 48%) participated in the study. As might be expected, the majority of the participants fell into the young adult age ranges, with a dwindling number of participants divided across the middle- to older-adult ranges. A slight majority of participants held a Bachelor’s degree (36, or 45%), while 30 participants (37%) had less than a Bachelor’s degree and 14 participants (18%) held a graduate degree. Instruments An important facet of the research design was the distinction between the dependent variables (DVs) to be predicted, or the selected real world (RW) characteristics – gender, age, extraversion, submissive ideology, and aggressive ideology – and the independent variables (IVs) to be used as predictors for the selected DVs, or the characteristics and behavior of the players’ avatars. Several instruments were used to capture these data (see Figure 2). Virtual World (Independent Variables) Real World (Dependent Variables) •NEO‐FFI Form‐R •VW‐BAF •ACF •Chat Logs •Demographics Form •NEO‐FFI Form‐S •ASC Scale Figure 2: Sources of real world DVs and virtual world IVs. -- 3 -- Data Collection Instruments: Real World Characteristics The DVs to be predicted, or the RW characteristics of the participants, were measured using a basic demographics form, the NEO Five-Factor Inventory (NEO-FFI) personality assessment form, and the Aggression-Submission-Conventionalism Scale (ASC scale). The Demographics Form was used to collect detailed demographic information on the participants, including gender, age, and highest educational level achieved. The NEO Five-Factor Inventory (NEO-FFI) is a standardized and validated personality assessment instrument constructed from the five-factor model of personality. The instrument has been validated to provide an accurate assessment of the so-called “Big Five” traits of neuroticism, extraversion, openness to experience, agreeableness, and conscientiousness. The NEO-FFI has two forms: the NEO-FFI Form-S (NEO-S), a selfreport form, and the NEO-FFI Form-R (NEO-R), an observer rating form; the two forms are identical, except that the NEO-S is written from a first person perspective and the NEO-R is written from a third person perspective (Costa and McCrae 1992). To establish ground truth on the participants’ personalities, each individual completed the NEO-S. Each participant’s ideology was measured using the Aggression-SubmissionConventionalism Scale (ASC scale). The ASC scale includes three subscale domains, as the name implies - submissive, aggressive, and conventional - and has demonstrated high predictive validity (Dunwoody and Funke n.d.). Data Processing Instruments: Avatar Characteristics and Behavior Multiple mechanisms for collecting IV data on the characteristics and behaviors of the participants’ avatars were employed. The NEO-R was used to gather information on avatar personality, the Virtual Worlds Behavior Analysis Form (VW-BAF) and Avatar Characteristics Form (ACF) were developed to collect data on avatar behaviors and characteristics, chat logs were analyzed for linguistic data, and qualitative data were recorded. All forms were designed to be generalizable across all MMORPGs. While the NEO-S was completed by the participants, the NEO-R (observer rating form) was used to assess avatar personality. The NEO-R was completed by trained assessors observing the participant’s avatar, facilitated by a “bridge” developed to resolve inconsistencies or conflicts of any NEO-R item that caused rating difficulty within the virtual world (VW), considering that the original form had been developed to rate humans in the RW. The Virtual Worlds Behavior Analysis Form (VW-BAF) was created to measure avatar activities and behavior in MMORPGs. The form had two pages. The first allowed a rater to tally occurrences of behaviors on a minute-by-minute basis (e.g., approaches toward other avatars, verbal initiations, instances of verbal humor, battles completed). The second page allowed a rater to record presence or absence of major activity categories, such as “quest activity” or “social activity,” during the interval. The second page of the form also measured how much time an avatar spent in a group versus solo, and whether the mode was Player vs. Player or Player vs. Environment. The Avatar Characteristics Form (ACF) was developed to capture characteristics of avatars in the VW, such as gender, appearance, character class, and combat role. It also captured information about play style; for example, whether an avatar belonged to a guild, was a social player, or was a strategic player. -- 4 -- The research team also transcribed participant chat logs. Care was taken only to record chat by the target participant/avatar, not initiations or responses by others. These chat logs were run through a parser, which extracted variables such as number of words and average word length. The remaining IVs were generated through qualitative observation and converted into a form suitable for statistical analysis and prediction using automated textual analysis. This process will be described below in the sub-section titled “Extraction of Quantitative Variables from Qualitative Data.” Design and Procedure The basic research design included five primary phases which will be described here: the initial laboratory session with the participant to establish ground truth, the processing of quantitative and qualitative data, the conversion of the qualitative data into quantitative variables that could be combined with the original quantitative data, and the development of predictive statistical models using discriminant analysis. Figure 3 depicts the research design, from the time a recruited participant arrived at the laboratory through analysis of the data. Quantitative Data Processing Lab/Home Session Discriminant Analysis Qualitative Data Processing Theme Extraction Figure 3: Research design. Laboratory/Home Session Upon arriving at the laboratory, each participant viewed a standard introductory video describing the study and completed the informed consent process, if he/she desired to participate. The participant then completed the demographics form, the NEO-S, and the ASC scale (described above). These collected data would form the basis for ground truth on the participant’s RW characteristics. At the participant’s leisure after returning home, the participant recorded a one-hour gameplay session of the selected game using a screen capture program. Participants were requested to choose one avatar and to use only that avatar for the entire one-hour home session and to play the game just as they normally would. Once the home session was completed, the participant saved the recorded session on a USB flash drive and mailed it back to the laboratory. Participants were asked to complete the home session within two weeks of the laboratory session. Once the flash drive was received, participants were compensated for their contribution to the study. -- 5 -- Quantitative Data Processing Once a participant’s USB flash drive was received at the laboratory, trained assessors rated that session using the NEO-R, the VW-BAF, and the ACF. Separately, chat was transcribed. A minimum of 80% inter-rater agreement was obtained for all rating forms. For completion of the NEO- R, raters observed the entire hour of the recorded session and then provided avatar ratings using a developed “bridge” to help answer specific items (mentioned above). The VW-BAF rating occurred across three 10-minute sessions, which were standardized in terms of start times across the full one-hour session to ensure a representative sample of avatar behavior. Observations were recorded once a minute. As described previously, the VW-BAF had two pages: the first was a series of counts for how many times particular behaviors occurred each minute, and the second recorded whether or not certain behaviors occurred within each minute. Both kinds of variable were summed over each 10-minute sample, and then averaged across the three 10-minute samples for each participant before being used for analysis. Hence, each VW-BAF variable either corresponds to how many times a behavior occurred in an average minute, or to how many minutes out of 10 included that behavior on average. At the same time that the VW-BAF ratings were recorded, the ACF was completed. The ACF items consisted of presence or absence variables; for example, presence or absence of a costume or of a ranged damage-per-second (DPS) combat role. Once all of the quantitative data had been gathered, it was integrated into a data array in preparation for analysis. Qualitative Data Processing The qualitative team protocol consisted of observing the recorded session and creating a detailed textual description of participant behavior. Researchers conducted and recorded observations using a note-taking technique known as “thick description” (Geertz 1973) to write descriptive text of the participant gameplay sessions. The thick description notetaking technique is significantly different from traditional coding in that it provides a detailed interpretive, contextualized narrative account of events, relying on specified domain knowledge by the authors. Details of avatar appearance, including the use of costumes and whether armor items were set to visible or invisible, were noted. Player activities were noted in detail, including specific behaviors pertaining to modes of navigation, weapons, skills and skill combinations used, as well as combat style and interaction style with teammates. Theme Extraction of Quantitative Variables from Qualitative Data The extraction of quantitative variables from the qualitative thick description notes is at the crux of the mixed methods approach taken in this study. By processing thick description notes to generate quantitative variables, the data generated by both qualitative and quantitative methods could be combined to produce statistically predictive models of players’ RW characteristics. In order to accomplish this, the text from the observational notes was processed with ThemeMate, an Automated Behavior Analysis application (Jackson 2012). ThemeMate is a statistically-based application used to extract “themes” from bulk text. The ThemeMate application produces a data array of all extracted themes. Figure 4 depicts the -- 6 -- methodology used to convert qualitative thick description notes into quantitative variables, useful as predictors in predictive statistical analysis. Qualitative Observation Notes SME Theme Selection ThemeMate Quantitative Data Array Key Theme Words Figure 4: Methodology used to convert qualitative notes into quantitative variables. A list of key theme words was chosen based on subject matter expertise of Guild Wars® and Aion® and familiarity with the thick description notes. The ThemeMate software used these key words to perform a directed theme search of all the thick description notes and generated 253 conceptual theme constructs ranked by importance. From these 253 themes, 20 were picked, with consultation from subject matter experts (SMEs), as potential predictors for subsequent statistical modeling. A theme was eliminated from the larger set if it a) was already represented in the existing quantitative data, b) had the same meaning or indicated the same result as another theme as assessed by SMEs, c) was spurious as a result of standard note formatting conventions (e.g., “the,” “he,” etc.), or d) was representative of standard/common avatar behavior (e.g., wearing armor, carrying a weapon) and therefore unlikely to distinguish players from one other. From the remaining list, the 20 most important themes by ThemeMate ranking were selected for use. The data array for the 20 selected themes was added to the quantitative data array and made available for statistical analysis. The importance of subject matter expertise in vetting these themes cannot be overemphasized. Without some domain expertise in the genre and specific games being studied, the automated analysis generated a good deal of “noise” that could have easily been misinterpreted. Statistical Technique: Discriminant Analysis Discriminant analysis (DA) was used to generate predictive models for participants’ RW characteristics based on their avatars’ characteristics and behavior. The purpose of DA is to predict group membership, based on RW DVs such as gender or extraversion, from a linear combination of VW IVs, such as avatar gender or type of armor worn. DA begins with a data set containing many cases (participants), where both the values of the IVs and the group membership (DVs) are known. The end result is an equation or set of equations that predict group membership for new cases where only the values of the IVs are known (Stockburger). Specifically, DA using backward stepwise reduction was conducted. This form of stepwise reduction begins with a given set of variables and reduces the set by eliminating the variables that are associated with the DV to a lesser degree than remaining variables (Brace et al. 2009). In this way, only the variables that are most predictive of the DV -- 7 -- remain as part of the predictive model when the analysis is complete. As an additional quality control, leaving-one-out cross-validation was used for all stepwise reduction DA runs. This form of validation accuracy is the process by which a model is trained on all cases but one and tested on the one case that was withheld. The process repeats until all cases have been withheld and tested blindly, eliminating any chance of predicting a case based on extracted knowledge of that case. RESULTS AND DISCUSSION The Discriminant Function As described above, DA generates a predictive model in the form of a linear combination of independent, or predictor, variables. There is also a constant term for each equation, which is used as the linear offset in the discriminant functions. The general form of the discriminant function is as follows: 0 1 Where: bn = the Fisher Coefficient (or weight) for that variable, xn = the value of the independent variable, and c = the value of the constant. Once the values of the variables are substituted in the above equations for the names of the variables, whichever of the two equations evaluates to the greater number will be the prediction for that case, or participant. In other words, whichever of the two Fisher’s discriminant functions produces a higher value “wins,” and the participant will be predicted as a member of the corresponding category. Accuracy Metrics Several accuracy metrics are reported for each of the predictive models. Overall accuracy is the number of cases correctly classified divided by the total number of cases. Precision and recall are also presented to provide information about the accuracy of the models. Overall Results Table 1 presents the accuracy results of models combining quantitative and qualitative variables predicting RW characteristics from VW observations. Individual results sections that follow present each model including a description of how the DV was defined, an overview of the accuracy of the model, and a brief discussion of the IVs relevant to the prediction of the target DV. -- 8 -- Overall Precision Accuracy Recall Gender 84% 85% 83% Approximate Age 66% 71% 63% Extraversion Level 73% 68% 71% Submissive Ideology 71% 71% 79% Aggressive Ideology 73% 53% 75% Table 1: Overall accuracy for predicting RW characteristics from VW observations. Gender Definition of the DV In the case of gender, defining the DV was straightforward: males were assigned as one group, and females were assigned as the other group. Forty-two (42) participants were male and 38 participants were female. Accuracy of Gender Model The gender model achieved 84% overall accuracy (67 participants correctly classified; see Table 2). Precision and recall were nearly balanced at 85% and 83%, respectively. Overall Accuracy Precision Recall 84% (67) 85% 83% Table 2: Accuracy of gender model. Discriminant Function for Gender As described above, in determining which group a new case should be classified into, the values for each of the IVs would be plugged into each of the following two equations. Whichever equation yields the largest value represents the group that the new case would be classified into. The following are the equations for gender: -- 9 -- 4.319 3.643 58 3.061 2.789 0.645 0.699 58 0.160 0.746 Table 3 presents descriptions of the predictor variables in the above equations. Variable MaleAV T58Heals HairAccNA Description of Variable Avatar is male Avatar heals other avatars Hair Accessories cannot be observed, likely because head is covered (e.g., by a helmet or costume) Table 3: Description of IVs relevant to prediction of gender. Through interpretation of the discriminant functions above, the models can be described loosely in simple English as follows: If the avatar is male, heals others, and/or has covered hair, then it is likely that the player’s RW gender is male. Otherwise, it is likely that the RW gender is female. NOTE: These English sentences are not to be substituted for the above equations in making predictions; they are only intended to aid the reader in understanding the statistical models. Discussion of IVs Relevant to the Prediction of Gender Avatar Gender (ACF) It is well-known that avatar gender is closely related to RW gender (Yee n.d.); therefore, it is not surprising that avatar gender surfaced as a predictor variable in this analysis. In our study worlds, the appearance of a male avatar strongly predicted a RW gender of male. Female avatars required the support of additional IVs to distinguish between female avatars operated by RW females and female avatars operated by RW males. It should also be noted that while “gender-bending” is common among male players in MMORPGs (Yee 2003; MacCallum-Stewart 2008), our findings replicate other research findings that it is rare among women (Yee 2003): only three participants in our pool were females playing male avatars. Avatar Heals Others (Theme) This theme, extracted from the qualitative notes, was identified as present if the notes indicated that an avatar was healing other avatars during the observation period. It is commonly assumed that females gravitate toward healing roles in MMORPGs, since women are considered to be more nurturing and supportive by nature (Bergstrom et al. 2012). Our research suggests the opposite; the presence of this theme was a predictor of RW male gender. Because male avatar gender is such a strong predictor of RW male gender, the remaining variables in the discriminant function above primarily indicate the remaining RW males who were “gender-bending.” Yee et al. conducted a study that offers an explanation as to why the theme of healing others in combination with the presence of a female avatar might predict RW males: they discovered that “[male] players -- 10 -- enact this stereotype [of women as healers] when gender-bending” (2011, 776). Hence, a male player creating an avatar for a healing role might tend to choose a female. Covered Hair (ACF) The variable “Hair Accessories N/A” indicated if an avatar’s hair accessories could not be observed because the avatar’s hair was covered, due to a helmet or costume that occluded the hair (e.g., a hood). The model indicated that RW male players playing female characters in the sample were less likely to expose their hair and hair accessories than RW female players were. While there does not appear to be a ready explanation for this phenomenon in the literature, the research team postulated that RW male players operating female avatars might be less concerned with displaying their hair than RW female players, instead being more focused on gameplay activities, while RW female players may have more interest in ensuring that their avatars have a feminine appearance, complete with displayed flowing locks. Age Definition of the DV The DV of age was divided into two groups: under the age of 30 (younger), and 30 and over (older). This is consistent with demographic research (Yee 2006) and qualitative observations indicating that MMORPG players tend to be younger than social game players. Furthermore, the age group of 18-29 is one that is frequently cited in research studies, particularly in voting-related and medical contexts, and thus seemed a logical breakdown to use. Using this division, 43 participants were age 30 or over and 37 participants were under age 30. Accuracy of Age Model With 66% overall accuracy, the age model correctly categorized 53 participants (see Table 4). Precision and recall were 71% and 63%, respectively. Overall Accuracy Precision Recall 66% (53) 71% 63% Table 4: Accuracy for predicting age. Discriminant Function for Age The following are the two equations for age: 30 0.999 30 0.426 35 0.984 2.029 44 1.658 35 2.100 2.768 44 2.076 Table 5 presents descriptions of the variables used. -- 11 -- Variable BAF35 Mage T44exploring Description of Variable Avatar does not move for the full 60 seconds Avatar class is Mage Avatar actively traverses the environment Table 5: Description of IVs relevant to prediction of age. In simple English, this model becomes: If the avatar spends more time stationary, is not a Mage, and/or does not actively traverse the environment, then it is likely that the player’s age is 30 or over. Otherwise, it is likely that the player’s age is under 30. Discussion of IVs Relevant to Age Avatar does not move for the full 60 seconds (VW-BAF) This item measured how many full minutes an avatar spent stationary over the observation period. In the study sample, an avatar that spent more time stationary was more likely to be older (age 30 or over). It is common knowledge that younger people tend to be more active than older people, and perhaps more prone to bouts of fidgeting and nervous activity. In the virtual world, this may translate into moving frequently, walking in circles, or jumping repeatedly, and an avatar that is frequently in motion is unlikely to be stationary for a full minute at a time. Conversely, an avatar operated by an older individual is more likely to spend time comfortably still. Mage (ACF) The discriminant function indicated that those who played a Mage class avatar were less likely to be age 30 or over. Mages are generally ranged damage dealers with powerful attacks, capable of quickly bringing down opponents. In the opinion of the research team, ranged damage-per-second (DPS) tends to be one of the least complex and most selfsufficient of the major combat roles. Mages may be attractive to younger players, who are more interested in fire-power and in quickly becoming proficient in the game, whereas older players may be more interested in more strategic, cooperative roles such as healing or tanking. Avatar actively traverses the environment (Theme) This theme was identified as present in the qualitative notes if the avatar was actively moving through the environment throughout the observation period. In the study sample, an avatar that did not actively traverse the environment was more likely to be age 30 or older. This indicator is synergistic with the previous movement-related variable from the VW-BAF. The model indicated that an avatar who spends a great deal of time moving through the game environment is likely to be a younger player, actively fighting her/his way across the landscape. An older player may be less driven, but have a more balanced play experience doing inventory management or other activities along with battling. Extraversion Level Definition of the DV The NEO scoring system groups individuals into very high, high, average, low, and very low categories of extraversion (Costa and McCrae 1992). To split the sample into groups, those with high or very high extraversion were placed in the high extraversion group, and -- 12 -- those with very low, low, or average extraversion were placed in the low extraversion group. For this model, 35 participants were considered to have high extraversion and 45 participants were considered to have low extraversion. Accuracy of Extraversion Level Model For the extraversion level model, 73% overall accuracy was obtained, with 58 participants correctly classified (see Table 6). Precision and recall were fairly comparable, at 68% and 71%, respectively. Overall Accuracy Precision Recall 73% (58) 68% 71% Table 6: Accuracy for predicting extraversion level. Discriminant Function for Extraversion Level The following are the two equations for extraversion level: 0.967 76 0.641 78 0.191 18 0.973 2.701 76 1.859 78 0.615 18 1.770 Table 7 presents descriptions of the variables used. Variable T76backtrack T78teamheal BAF18 Description of Variable Avatar turns around or goes back Avatar heals party members # of occasions avatar follows a command issued by another group member Table 7: Description of IVs relevant to prediction of extraversion level. Translated into simple English, this model becomes: If the avatar does not backtrack, does not heal party members, and/or does not follow commands issued by others, then it is likely that the individual has high extraversion. Otherwise, it is likely that the individual has low extraversion. Discussion of IVs Relevant to Extraversion Level Avatar turns around or goes back (Theme) This theme was identified as present if the qualitative notes indicated that an avatar turned around and backtracked or returned to a previously visited person or place during the observation period. In the study sample, an avatar that did not turn around or go back was more likely to be operated by a highly extraverted individual. It may be that extraverted players exhibit confidence in the navigation decisions they make during game play by backtracking less, pressing forward more. -- 13 -- Avatar heals party members (Theme) This second theme was identified as present if the qualitative notes indicated that an avatar was involved with healing other members of his/her group during the observation period. In the study sample, not being involved with healing other party members helped predict high extraversion in the RW. In MMORPGs, the group healer is generally a background figure, healing party members from a distance while preserving his/her own health for the sake of the group’s success, whereas tanking and melee DPS players tend to “lead the charge” into battle and otherwise direct the group on attack strategy (Bergstrom et al. 2012). Therefore, it might make sense that a highly extraverted, leader-like individual would be drawn to tanking/melee DPS roles rather than healing roles. Number of occasions avatar follows a command issued by another group member (VWBAF) This item represented the extent to which an avatar was observed following others’ commands during the session. In the study sample, an avatar who did not spend time following others’ commands was more likely to be highly extraverted in the RW. In keeping with the theme above, it makes sense that the extraverted, who frequently emerge as leaders in groups (Judge et al. 2002), may spend more time issuing commands than following commands issued by others. Submissive Ideology Definition of the DV An item from the ASC scale was used to measure a submissive ideology: ASC 2. Our leaders know what is best for us (Dunwoody and Funke n.d.). This item was selected on the basis that it was the most representative item from the ASC authoritarian submissive subscale. The idea that DV groups could be formed using a specific item from an assessment scale was considered to be particularly interesting; by achieving accuracy in the prediction of submissive ideology, it is possible to demonstrate the prediction of an individual’s response to a specific statement. Those participants who responded neutral, agree or strongly agree to the item were considered submissive; those who responded disagree or strongly disagree were considered not submissive. Forty-three (43) participants had a submissive ideology and 37 participants did not have a submissive ideology. Accuracy of Submissive Ideology Model For this model, 57 participants were correctly predicted, for 71% overall accuracy (see Table 8). Precision and recall were 71% and 79%, respectively. Overall Accuracy Precision Recall 71% (57) 71% 79% Table 8: Accuracy for predicting submissive ideology. -- 14 -- Discriminant Function for Submissive Ideology Model The following are the two equations for submissive ideology: 1.935 11 2.837 24 2.219 44 5.697 2.705 11 2.291 24 2.968 44 6.212 Table 9 presents descriptions of the variables used. Variable NEOR11 NEOR24 T44exploring Description of Variable When he's under a great deal of stress, sometimes he feels like he's going to pieces.* He tends to be cynical and skeptical of others' intentions.* [reverse scored] Avatar actively traverses the environment Table 9: Description of IVs relevant to prediction of submissive ideology. Stated in simple English, this model becomes: If the avatar does not go to pieces under stress, is not cynical and skeptical of others’ intentions, and/or does not actively traverse the environment, then it is likely that the individual has a submissive ideology. Otherwise, it is likely that the player does not have a submissive ideology. Discussion of IVs Relevant to Submissive Ideology “When he's under a great deal of stress, sometimes he feels like he's going to pieces.”* (NEO-R) This personality assessment item is the rater’s evaluation of whether or not the avatar “does not go to pieces under stress.” As specified on the developed NEO Bridge, a rater agreed with this item if the avatar, in the presence of a stressor (e.g., a battle, an argument), requested attention or assistance by demanding heals or resurrects or by rapidly issuing repeated commands, and/or responded emotionally to the situation. A rater disagreed with this item if the avatar, in the presence of a stressor, successfully retreated from, won, or returned to the battle, and/or responded calmly to or ignored aggressive social situations (e.g., being harassed or insulted by other players). In the study sample, not going to pieces under stress helped predict a RW submissive ideology. The research team postulated that a submissive person might not be particularly unsettled when in a stressful situation, relying on confidence that the designers of the game or leaders in group play “know what is best for [them]” (see “Definition of the DV” section above). “He tends to be cynical and skeptical of others' intentions.”* (NEO-R) This NEO-R item was the rater’s evaluation of whether or not the avatar was “cynical and skeptical of others' intentions.” A rater agreed with this item if the avatar voiced doubt; complained about the game's programming or other players; or exerted influence over others’ (avatar or NPC) behavior, particularly by planting flags or issuing commands. A rater disagreed with this item if the avatar did not exert influence over others’ (avatar or NPC) behavior with commands, and/or did not express complaints or doubts about others. Instead, the avatar might have followed others or complimented other avatars or the game. In the study sample, not being cynical and skeptical of others’ intentions -- 15 -- helped predict a submissive ideology. It makes sense that a submissive person would not be particularly cynical and skeptical of others’ intentions, particularly since the ASC scale item that was used to define this DV -- “Our leaders know what is best for us” (Dunwoody and Funke n.d.) -- is clearly devoid of cynicism or skepticism. Active player traversing the environment (Theme) This theme also appeared in the age model, and is defined in the section “Discussion of IVs Relevant to Age.” In the study sample, players who did not actively traverse the environment were more often those with a submissive ideology. This suggests that players with a submissive ideology may seek a more balanced play experience, perhaps conforming to game design expectations which include activities beyond focused seekand-destroy missions. In contrast, a player with an aggressive ideology might be more likely to spend time traversing the environment, on a constant mission to fight enemies along the way. Aggressive Ideology Definition of the DV For aggressive ideology, the most representative item from the ASC authoritarian aggressive subscale was selected: ASC 18. Strong punishments are necessary in order to send a message (Dunwoody and Funke n.d.). In order to be considered to have an aggressive ideology, the individual had to agree or strongly agree with the statement. Those who responded neutral, disagree, or strongly disagree were considered to be non-aggressive. Twenty-four (24) participants had an aggressive ideology and 56 participants did not have an aggressive ideology. Accuracy of Aggressive Ideology Model As shown in Table 10, the overall accuracy for the aggressive ideology model was 73% (58 participants correctly classified). Though precision was low at 53%, recall was high at 75%. Overall Accuracy Precision Recall 73% (58) 53% 75% Table 10: Accuracy for predicting aggressive ideology. Discriminant Function for Aggressive Ideology Model The following are the two equations for aggressive ideology: 1.158 2.599 87 -- 16 -- 3.026 18 2.178 2.607 1.133 87 Table 11 presents descriptions of the variables used. 2.115 18 1.822 Variable Description of Variable Strategizes Player has a strategic play style T87melee Avatar inflicts melee damage T18quest Player checks the quest log Table 11: Description of IVs relevant to prediction of aggressive ideology. In simple English, this model becomes: If the avatar does not strategize, inflicts melee damage in combat, and/or checks the quest log, then it is likely that the individual has an aggressive ideology. Otherwise, it is likely that the player does not have an aggressive ideology. Discussion of IVs Relevant to Aggressive Ideology Strategizes (ACF) This item captured the general play style of an avatar over the observation period. An avatar would be coded as one who strategized if the avatar displayed strategic behavior, demonstrating that the player has given some thought to how best to fight enemies in the game. Some examples of strategic behavior include pulling, a behavior in which the avatar lures a small group of enemies into safe territory to fight, giving the player a competitive advantage; and using a gimmick build, which is a character build that exploits game mechanics to render the avatar maximally efficient. In the study sample, not strategizing helped predict an aggressive ideology. One might expect an aggressive individual not to be particularly strategic, especially if he/she is constantly rushing into battle without taking time to consider strategy. Avatar inflicts melee damage (Theme) This theme was identified as present if the qualitative notes indicated that an avatar dealt melee damage during battles. In the study sample, inflicting melee damage predicted an aggressive ideology. Avatars involved in melee combat are in close proximity to enemies under attack; as such, this style of “in your face” play is arguably the most physically aggressive in MMORPGs. Therefore, it makes sense that an aggressive individual would gravitate toward melee combat. Player checks the quest log (Theme) This theme was identified as present if the qualitative notes indicated that an avatar accessed the quest log during the observation period. Each avatar has a quest log that tracks active quests, progress made toward completing quests, and next step instructions specific to each quest. In the study sample, checking the quest log helped predict an aggressive ideology. Attention to active quests suggests a goal-orientated play style. This suggests that players with an aggressive ideology might be more focused on advancement, rather than other activities such as socializing. CONCLUSIONS The aim of this study was to combine quantitative and qualitative research methods in a mixed methods approach to develop a deeper understanding of the relationship between -- 17 -- virtual world avatar behavior and real world characteristics. To that end, the research team generated statistical models for five RW characteristics from VW observations. The average overall accuracy across all five models – 73% – suggests that it is indeed possible to develop predictive models for RW characteristics from observations of avatar characteristics and behavior. Interestingly, though, all of the models require the input of several variables in order to generate predictions, some of which are more easily explained from a face validity perspective than others. This suggests that the association between an individual’s RW and VW characteristics is not intuitively clear cut: a female avatar is not necessarily being operated by a RW female, and an avatar that appears extraverted is not necessarily representing a person who is extraverted in the RW. An additional finding related to the absence of chat variables from any of the predictive models was noted. The research team, curious as to why none of the chat variables had proved significant predictors of any of the DVs, performed a qualitative investigation of the chat data. Two conclusions were reached: first, in the sample, many participants did not chat at all, and those who did spoke very little, with most conversation content limited to current game activities. Second, qualitative analysis combined with demographic data allowed us to determine that the use of all lowercase letters and absence of punctuation was a universal behavior pattern that transcended real world demographic categories such as age. It is true that there were several limitations in this study. First, the small sample size (N = 80) may be an impediment to the statistical power of the model development. The requirement to bring each participant into the laboratory, while it enhanced certainty in the validity of the ground truth data, limited recruitment – both geographically and in the sense that only MMORPG players willing to come to the laboratory participated. Furthermore, the observation of only one hour of game-play video by the quantitative and qualitative rating teams was less than ideal; however, due to time and staffing constraints associated with the rating process, it would have been very difficult within the scope of this study to process more participant data. Future research should seek to further validate the predictive models developed, with more participants and by rating the participants over multiple hour-long observation sessions. That being said, we believe that this study presents several major contributions to the gaming-related research base. For one, our approach included unique methods of virtual world observation, including the use of the NEO-R, bridged for use in the VW, to record avatar personality ratings; the development of a VW behavior assessment instrument (VW-BAF) that standardized ratings of avatar behaviors and allowed for reliable behavioral measures across all avatars; and the consistent recording of observable avatar characteristics via the ACF. The developed NEO bridge provides guidance for how to utilize the NEO-R instrument, originally designed for use with people in RW proximal settings, to rate avatars in a VW by direct observation. To our knowledge, the use of the NEO-R instrument and this form of RW-to-VW rating bridge is new in VW assessment. The development of a new mixed methods approach for studying avatar behavior in VWs is also a valuable contribution to the research community. This technique allowed researchers to extract IVs that held promise as potential predictors with semi-automated assistance, without having to manually analyze an entire corpus of qualitative thickdescription notes. The results of the interdisciplinary hybrid study reveal that automated extraction of conceptual themes across thick description notes, guided by subject matter expertise, can be combined with quantitative measures to predict RW characteristics. The -- 18 -- importance of the contribution of the SMEs should not be underemphasized. By fusing quantitative and qualitative approaches under the guidance of SMEs, we gain a better understanding of the relative contribution of each and how they may be used together to predict RW characteristics from VW observations. In the future, the mixed methods approach described in this work may be generalized to research in other contexts. ACKNOWLEDGMENTS We express our thanks to Kathleen Wipf, Jasmine Pettiford, Nic Watson, Hank Whitson, and Patrick Coursey for their work and contributions on this project. This work was supported by the Air Force Research Laboratory (AFRL). The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of AFRL or the U.S. Government. *Reproduced by special permission of the Publisher, Psychological Assessment Resources, Inc., 16204 North Florida Avenue, Lutz, Florida 33549, from the NEO Personality Inventory-Revised by Paul T. Costa Jr., PhD and Robert R. McCrae, PhD, Copyright 1978, 1985, 1989, 1991, 1992 by Psychological Assessment Resources, Inc. (PAR). Further reproduction is prohibited without permission of PAR. BIBLIOGRAPHY ArenaNet. (2005). Guild Wars. [PC Computer, Online Game], NCSOFT, North America: played September, 2011-April, 2012. K. Bergstrom, J. Jennifer, and S. de Castell, "What's 'Choice' Got to Do With It? Avatar Selection Differences between Novice and Expert Players of World of Warcraft and Rift," In Proceedings of the International Conference on the Foundations of Digital Games, Raleigh, NC: 2012, ACM Press, pp. 97-104. N. Brace, R. Kemp, and R. Snelgar, SPSS for Psychologists (4th ed.), New York: NY: Routledge, 2009. P. T. Costa and R. R. McCrae, NEO PI-R Professional Manual, Odessa, FL: Psychological Assessment Resources, Inc., 1992. P. Dunwoody and F. Funke, “Testing three three-factor authoritarianism scales,” Journal of Social and Political Psychology, submitted for publication. D. Feldon and Y. Kafai, “Mixed Methods for Mixed Reality: Understanding Users' Avatar Activities in Virtual Worlds,” Educational Technology Research and Development, vol. 56, 2008, pp. 575-593. C. Geertz, The Interpretation of Cultures, New York, NY: Basic Books, 1973. G. M. Jackson, Predicting Malicious Behavior: Tools and Techniques for Ensuring Global Security, Hoboken, NJ: John Wiley & Sons, 2012. R. B. Johnson and A. J. Onwuegbuzie, “Mixed Methods Research: A Research Paradigm Whose Time Has Come,” Educational Researcher, vol. 33, no. 7, 2004, pp. 14-26. T. A. Judge, J. E. Bono, R. Illies, and M. W. Gerhardt, “Personality and Leadership: A Qualitative and Quantitative Review,” Journal of Applied Psychology, vol. 87, 2002, pp. 765-780. NCSOFT. (2009). Aion. [PC Computer, Online Game], NA NCSOFT, North America: played September, 2011-April, 2012. E. MacCallum-Stewart, “Real Boys Carry Girly Epics: Normalizing Gender Bending in Online Games,” Eludamos, vol. 2, 2008, pp. 27-40. -- 19 -- D. W. Stockburger, “Discriminant Function Analysis,” Multivariate Statistics: Concepts, Models, and Applications, Accessed October 7, 2012, http://www.psychstat.missouristate.edu/multibook/mlt03.htm N. Yee, “Gender-bending,” The Daedalus Project: Psychology of MMORPGs. Accessed August 2, 2012, http://www.nickyee.com/eqt/genderbend.html#5 N. Yee, “The Demographics of Gender-bending,” The Daedalus Project: Psychology of MMORPGs, September 3, 2003, Accessed August 2, 2012, http://www.nickyee.com/daedalus/archives/000551.php?page=1 N. Yee, “The Demographics, Motivations and Derived Experiences of Users of Massively Multi-user Online Graphical Environments,” PRESENCE: Teleoperators and Virtual Environments, vol. 15, 2006, pp. 309-329. N. Yee, N. Ducheneaut, M. Yao, and L. Nelson, “Do Men Heal More When in Drag? Conflicting Identity Cues Between User and Avatar,” In Proceedings of the 2011 Annual Conference on Human Factors in Computing Systems (CHI ’11), New York, NY: 2011, ACM Press, pp. 773-776. -- 20 --