AUT For PsyArXiv
AUT For PsyArXiv
AUT For PsyArXiv
Authors: Helané Wahbeh,1,2 Cedric Cannard,1 Garret Yount,1 Arnaud Delorme,1,3 Dean Radin1
Corresponding Author:
Helané Wahbeh
Novato, CA 94945-327
hwahbeh@noetic.org
707-779-8230
Title: Creative self-belief responses versus manual and automated Guilford alternate use task
Abstract
While behavioral tasks like the Guilford Alternate Use task (AUT) are the gold standard for
quantifying creativity levels and clarifying how they relate to subjective, self-report ratings would
contribute to the creativity assessment field. If a single-item measure were available to reliably
and efficiently assess one’s creativity level, researchers and educators with limited time and
resources could use the simpler and shorter self-report item. This study’s primary objective was to
evaluate the construct validity of a single-item creative self-belief (CSB) measure by comparing it
with AUT fluency, flexibility, elaboration, and originality scores that were scored using manual
and automated methods. It also aimed to assess the single-item CSB’s convergent validity and test-
retest reliability. In addition, the relationship between the manual and automated AUT scoring
methods was evaluated. Data from 1,179 adult participants collected in a more extensive parent
study were used for these analyses. CSB was weakly correlated to manual fluency (rho = .13, p
=.004, n-505) and manual originality (rho = .11, p =.01, n-505) but no other creativity measures.
CSB was correlated with the personality indices of openness to experience (rho = .49, p <.000005,
n-1022), extraversion (rho = .20, p <.000005, n-1021), neuroticism (rho = -.20, p <.000005, n-
1018), agreeableness (rho = .14, p <.000005, n-1021), and conscientiousness (rho = .14, p
<.000005, n-1023). CSB test-retest reliability, assessed using entries from participants who
completed two sessions, was high (Intraclass Correlation 78.6, 95% CI [74.8 - 81.8]). The manual
elaboration score was strongly correlated with the automated Open Creativity Scoring with
Artificial Intelligence (OCSAI) elaboration score (rho = .76, p <.000005, n-520), and manual
1
Creative self-belief versus AUT
originality scores were correlated with OCSAI originality scores but less strongly (rho = .21, p
<.000005, n-520). These findings support using multiple measures to assess creativity, not relying
solely on this single-item CSB measure. However, the single-item CSB item may be helpful in
limited-time situations and has demonstrated positive content validity, test-retest reliability, and a
significant, albeit weak, correlation to AUT fluency and flexibility. This study also supports the
Keywords: creativity, subjective, objective, Guilford Alternate Use Task (AUT), creative self-
belief, artificial intelligence, Open Creativity Scoring with Artificial Intelligence (OCSAI),
SemDis
2
Creative self-belief versus AUT
INTRODUCTION
Creativity is a multifaceted concept that manifests itself in various forms. Artistic creativity
involves expressing oneself through various art forms like painting, sculpture, music, dance, and
writing, tapping into emotions and imagination (Morriss-Kay, 2010). Scientific creativity entails
individuals thinking outside the box and employing their innovative thinking and problem-
solving skills to make groundbreaking discoveries, theories, and advancements in fields such as
(Stumpf, 1995). Practical creativity involves finding innovative and functional solutions to
everyday problems and challenges, often seen in design, engineering, architecture, and fashion
(D. Cropley, 2016). Creativity encompasses a wide range of expressions, and there is no single
agreed-upon definition. In simple terms, creativity can be defined as the ability to produce work
that is both novel (i.e., original, unexpected) and appropriate (i.e., useful, adaptive concerning
(knowledge, technical skills, and talent in a particular domain), creative thinking (the ability to
think in novel ways, which includes a flexible and imaginative approach to problem-solving),
and motivation (the drive and passion for the task at hand, which can include intrinsic and
2000; Kaufman & Baer, 2012; Runco, 1986). Researchers have developed several methods and
3
Creative self-belief versus AUT
generate multiple solutions or ideas for a given problem (i.e., divergent thinking tests), assessing
an individual’s ability to establish connections between apparently unrelated words (i.e., Remote
Associates Test; Mednick, 1968), rating the quality, originality, and impact of creative works, or
assessing cognitive processes underlying creative thinking, such as insight, associative thinking,
or analogical reasoning (i.e., creative cognition tasks). Creativity assessments are usually
based on quantitative measures, such as counting the number of associations made (Park et al.,
2016). Of course, no single measure can capture all its aspects, so combining different
measurement approaches (e.g., subjective and objective assessments) may provide a more
Subjective measures can provide advantages, such as being easy to administer and allowing
insight into personal beliefs, motivations, and confidence levels regarding participants’ creative
abilities (Silvia et al., 2012). However, they also have limitations. Subjective measures are
susceptible to bias (Elsbach & Kramer, 2003; Madjar et al., 2002; Shin et al., 2012) because they
rely on self-perception, can be influenced by self-esteem issues and social desirability (Hébert et
al., 2001) and may not align with objective measures or expert evaluations. Subjective scores can
only be as reliable as the individual's level of meta-awareness regarding their creative abilities.
This means that their capacity to objectively observe their own skills plays an important role in
the accuracy of subjective scores. In addition, creativity measures consisting of one single item
(e.g., creative self-belief questions like “How creative are you?), may not capture the
multifaceted nature of creativity, provide limited information, and do not allow for a deep
4
Creative self-belief versus AUT
would be complemented with other measures, such as objective assessments, expert evaluations,
creativity.
Creativity is most commonly evaluated with divergent thinking (DT) tests (Kaufman et al., 2008;
and DT tests assess an individual’s ability to generate multiple solutions or ideas for a given
possibilities, perspectives, and connections by encouraging individuals to break free from the
crucial role in unlocking and nourishing creative potential, leading to the development of new
The AUT (Guilford, 1967; Guilford et al., 1978) is a gold standard test for DT. It is a widely
used creativity assessment tool that encourages individuals to generate alternative uses for
common objects. The task involves presenting participants with an image of an everyday object,
such as a brick or a paperclip, and asking them to provide as many uses (or functions) for that
object as possible within a given time frame. The task is usually scored by human raters who are
asked to review the corresponding answers and rate them on fluency (number of ideas),
flexibility (number of categories of ideas), elaboration (number of details provided about ideas),
and originality (number of new and unusual ideas). By requiring individuals to think beyond the
customary purpose of an object and explore novel and imaginative possibilities, the AUT
5
Creative self-belief versus AUT
provides valuable insights into an individual's capacity for originality and fluency of ideas
(Runco, 1999). However, because the responses are text-based, manual data manipulation and
rating are time-intensive and expensive, which limits its use in large populations. Also, human
rater judgments are subjective and influenced by their own creativity levels, introducing bias into
scores. While attempts to reduce variability are made through structured methods, human rater
training, and using multiple raters, the effort required to adequately score the AUT can prevent it
from being used. Some have suggested alternative human rating methods, such as the Top 2
scoring method, which asks participants to choose their most creative responses and then only
considers the top two (Silvia et al., 2008). However, this method has been criticized because it
artificially truncates the creative process (Forthmann et al., 2020). A reliable method of
automated AUT scoring would allow it to be used more often and in much larger studies, thus
Some researchers have developed automated ways to score the AUT. The performance of an
automated system scores will more strongly correlate with the gold-standard human ratings). For
example, Beaty et al. (2021) evaluated the use of semantic distance as an automated scoring
method for the AUT. They assessed top-performing automated semantic models and found that a
latent semantic distance factor strongly predicted creativity and novelty ratings across a range of
creativity tasks, including the AUT. They also provided a freely available program for
computing semantic distance (Beaty & Johnson, 2021; Dumas et al., 2021). Others have also
evaluated semantic distance as an automated scoring method, finding that it predicted average
6
Creative self-belief versus AUT
creativity ratings (Hass, 2017). One research group at the University of Amsterdam (Tanis et al.,
2017) developed an algorithm to automatically score AUT responses using a system based on
expert ratings of similar responses and showed that it reliably scored AUT responses similarly to
experts.
While automated methods like text-mining algorithms and semantic distance provided some
advantage in time and resources to manual human rating (and are correlated with human ratings),
a more sophisticated and far superior automated method has recently been developed. With the
explosion of artificial intelligence (AI) natural language processing technology, AI has shown
promise for AUT scoring. Open Creativity Scoring with Artificial Intelligence (OCSAI),
developed by Organisciak and colleagues at the University of Denver, used neural network-
based large language models (LLM) trained on 27,000 human-judged AUT responses. The
model, tested on new AUT data, showed creativity scores that strongly correlated with human
ratings of the same data (r = .81), far exceeding the performance of any other automated system
The current exploratory study addressed two gaps in the literature. While task-oriented
assessments like the AUT are the gold standard measures, clarifying how they relate to
subjective self-report ratings would contribute to the creativity assessment field. If a single-item
creative self-belief rating (“In general, how creative do you consider yourself?”) demonstrated a
strong association with the manually scored AUT, researchers and educators with limited time
and resources can use the simpler and shorter self-report item. Thus, this study evaluated the
7
Creative self-belief versus AUT
single-item creative self-belief measure (CSB). Construct validity refers to whether the measure
truly assesses the construct intended to be measured and was assessed by comparing CSB scores
with AUT scores. Convergent validity, an aspect of construct validity, measures the degree to
which two measures are related. In this case, most studies demonstrate that specific personality
indices are related to self-perceptions of creativity, such as openness to experience of the Big
Five personality system (da Costa et al., 2015; Karwowski & Lebuda, 2016). Thus, our CSB item
should correlate with the personality factors already known to relate to creativity. In addition, the
CSB should be consistent over time, reflecting stable scores over multiple administrations.
Finally, with the introduction of automated AUT scoring, the question of how well automated
scoring aligns with manual human-rater scoring can be studied in more detail. The specific
objectives of this study were to evaluate the following research questions and hypotheses:
1. What is the relationship between the CSB item and manual and automated AUT
scores?
Hypothesis: The creative self-belief item will significantly correlate with at least one of the AUT
2. Does the CSB item demonstrate convergent validity with personality indices?
Hypothesis: The CSB item significantly correlates with personality inventory measures,
8
Creative self-belief versus AUT
Hypothesis: The manual and automated scores of AUT will be significantly correlated.
The data for this study were collected as part of a larger study between April 3, 2018 and
November 4, 2020, the results of which are reported elsewhere (Cannard et al., 2021b, 2021a;
Wahbeh, Vieten, et al., 2022; Wahbeh, Yount, et al., 2022). Thus, these analyses are secondary,
exploratory analyses, and the relevant methods are briefly repeated for expedience. Participants
completed pre- and post-workshop questionnaires and tasks assessing various outcome measures,
including creativity (See Measures section). Participants included adults 18 years or older who
could read and understand the consent form, complete the survey and tasks, and had access to the
survey online or at the Institute of Noetic Sciences (IONS) EarthRise Learning Center (Petaluma,
CA). The study excluded minors, those unable to understand the consent form or those with
acute or chronic illnesses that precluded the completion of measurements. All study activities
were approved by the IONS Institutional Review Board (IORG#0003743). For this secondary
analysis, records were included if participants completed the Guilford Alternate Use Task (see
Measure Category M ± SD % N
9
Creative self-belief versus AUT
Other 1.0 8
Hispanic 4.3 45
Other 3.8 40
Measures
Creative Self-Belief (CSB) - All participants rated, “In general, how creative do you consider
yourself?” on a slider scale anchored by “Not at all creative” (0) to “Very creative” (100).
Participants did not see the numerical value associated with their answer choice, and the anchor
was always first displayed in the middle of the scale. This measure resulted in one value ranging
from 0-100.
Guildford Alternate Uses Task - (Guilford, 1967) The task was administered on Google
Chromebooks at the Earthrise Learning Center or online on their own devices. Participants were
10
Creative self-belief versus AUT
shown one of four images of a common item that were randomly selected (Newspaper, Brick,
Envelope, Wire Clothes Hanger). Participants were given 2 minutes to type into a text field as
many uses of the item as quickly as possible. A bar indicated how much time was left in real
time. The instructions were, “You are about to be shown an object or objects. Please list as many
ways to use the object(s) as possible within 2 minutes.” The answers were manually reviewed for
validity and separated by commas. “Use” is defined as one named use for the item, and
“response” is defined as all the uses the participants gave in one AUT trial.
Manual Scoring of AUT data - Three reviewers manually scored a subset of the total AUT
responses for fluency, flexibility, and elaboration. We took a qualitative analysis approach to
scoring, where two student volunteers first reviewed the responses and made independent
assessments. Then, a third, more experienced reviewer evaluated all uses where the students’
values did not match to choose the final value. Inter-rater reliability was assessed with
Krippendorf’s alpha (Zapf et al., 2016). Please see Supplemental Data for the scoring codebook
The fluency score is the number of valid uses and was considered a continuous variable.
Unacceptable uses are ones that are not possible with that item. The two student rater
scores matched for 64% (n-479) of the records. Of those that did not match, there was M
= 1.18, SD = 1.39 (range 1-7) difference in scores. Interrater reliability was good (percent
agreement 72% with 95% CI [69-74%]; Krippendorf’s alpha 0.70, 95% CI [66-73%].
11
Creative self-belief versus AUT
The flexibility score is assessed by categorizing the uses by category and was considered
a continuous variable. For the brick example, “building a house,” “building a chimney,”
and “building stove” would all be in the same “building” category, whereas “building a
three separate categories. The two student rater scores matched for 51% (n-379) of the
responses. Of those that did not match, there was M = 1.76, SD =1.55 (range 1-15)
difference in scores. Interrater reliability was fair (percent agreement 62% with 95% CI
The elaboration score is assessed by rating the amount of detail in each participant’s
shut in a strong wind” would receive a score of 2 (one for the explanation of a door
slamming and two for further detail about the wind), and a rating of 1 would be in
between those examples. The elaboration for all the listed uses in a response was taken
into account, and one elaboration score was ascribed for each response. Scores matched
for 76% (n-568) of the responses. Of those that did not match, there was M = 0.69, SD =
0.48 (range 1-2) difference in scores. Interrater reliability was good (percent agreement
The originality score is evaluated by assessing the originality of the participant’s uses
overall compared to those of others in the dataset. Originality was team scored as part of
12
Creative self-belief versus AUT
of Hackathon and vacation, occurred on December 15 and 16, 2018. Participants were
invited to analyze a collection of data collected at IONS, one of them being the AUT. The
team reviewed all the responses for each image and generated three originality response
categories (0, 1, 2) with a list of uses representing each category. Then, the team viewed
all the uses in each response and gave it an overall originality rating (i.e., ideational pool
scoring). Ideational pool scoring has advantages and disadvantages. It reduces the burden
on raters by requiring fewer ratings, but it also makes each judgment more complex,
leading to disagreement among raters (Reiter-Palmon et al., 2019). The team was
required to come to a consensus on their originality score for each response, and the final
Automated Scoring of AUT data - Automated scores for elaboration and originality were
The automated elaboration score was calculated using the 'whitespace' method and was
considered a continuous variable. This method simply counts words based on spaces. For
example, “the cat chased the dog” would count as five space-separated words (i.e., the,
cat, chased, the, dog). Contractions and hyphenated words count as one word.
The automated originality was conducted using Open Creativity Scoring with Artificial
Intelligence (OCSAI), a fine-tuned set of large language models (LLMs of the type of
ChatGPT) that greatly improves on semantic distance scoring (Organisciak et al., 2023)
13
Creative self-belief versus AUT
and is freely available. Automated scoring utilizing LLMs operates through supervised
learning, where models are trained using prior examples of scored uses. As an alternative
to semantic system scoring, this approach was first proposed by Organisciak et al. (2023).
The current models (ada, babbage, curie, davinci) are based on the ChatGPT-3
architecture as reported in the same study (Organisciak, 2023). The OCSAI Davinci LLM
model was used in this current study because it performed substantially better than
semantic scoring and the other three AI models and correlated more strongly with human
judges, as Organisciak et al. (2023) demonstrated. For example, for over 20 different
AUT word prompts, OCSAI Davinci LLM had an r value of .80 in relation to human
judgments compared to .19 for SemDis (Organisciak, 2023). Considering the far superior
performance of LLM models, this was the only automated scoring method used in this
analysis. The OCSAI Davinci LLM originality scores range from 1 to 5, where 1
indicates a highly unoriginal use, 5 represents a highly original use, and 3 represents the
Personality - The convergent validity of the creativity items was evaluated by the personality
measure, the Big Five Inventory - 10 (BFI-10) by Rammstedt (Rammstedt, 2007). The BFI-10
measures the "Big Five" personality traits and consists of ten items, with two items for each of
personality characteristics. This psychometric measure has undergone rigorous research and
exhibits high internal consistency, validity, and reliability. It has been extensively used in various
14
Creative self-belief versus AUT
Statistical Analyses
Means, standard deviations (SD), percentages, and frequencies were calculated for continuous
and categorical variables. Continuous variables were evaluated for normality and found to be
non-normal (Shapiro–Wilk p < 0.05); thus, non-parametric analyses were performed, such as the
Spearman rank-order correlation, which can be used to compare the relationship between
differing variable types (e.g., one continuous variable and one ordinal variable). Statistical
analyses were conducted using Stata 15.0 (StataCorp, LLC, College Station, TX). The data for
Data Cleaning - Records were reviewed for duplicates. If there were multiple records by the
same person, the first two iterations were retained regardless of the image. Two datasets were
created for the analyses: one with unique individuals and their scores before their workshop for
research questions 1, 2 and 4, and another with paired data for individuals with scores before and
after their workshop for research question 3. There were no evident outliers upon data
visualization, and minimum and maximum values constrained data values for most measures
Missing Data - Participants were not required to complete all questions; thus, there were missing
values. In addition, manual scoring was not conducted on all records because of limited
resources, and the non-random nature of the missing data did not allow for multiple imputation
15
Creative self-belief versus AUT
Research Question #1: What is the relationship between the CSM item and the manual and
Spearman rank-order correlations were conducted for the CSB item and the following variables:
manually scored fluency, flexibility, elaboration, originality, and automated elaboration and
originality scores. We hypothesized that CSB would significantly correlate with at least one of
the AUT values, indicative of at least one dimension of creativity captured by subjective self-
report. A False Discovery Rate (FDR) multiple comparison correction was applied to control for
Research Question #2: Does the CSB item demonstrate convergent validity with
personality indices?
Spearman rank-order correlations were conducted between CSB and the BFI-10 personality
inventory measures aligned with previous research in the following order: openness to
experience > extraversion > conscientiousness > neuroticism > agreeableness (da Costa et al.,
2015; Karwowski & Lebuda, 2016). FDR multiple comparison correction was applied.
Research Question #3: What is the test-retest reliability of the CSB item?
Test–retest reliability was assessed with an Intraclass Correlation Coefficient (ICC; Aldridge et
al., 2017), evaluating the relationship between the two administrations of the CSB. ICC value
16
Creative self-belief versus AUT
ranges from 0 to 1, where higher values indicate greater reliability. We hypothesized that the
Research Question #4: What is the relationship between manual and automated AUT
scores?
Spearman rank-order correlations (rho) were used to evaluate the relationship between manual
and automated scores for elaboration and originality. Elaboration and originality were the only
parameters evaluated because the OCS program does provide fluency or flexibility scores. We
hypothesized that the manual and automated scores AUT would be significantly correlated. FDR
RESULTS
The means and standard deviations (SD) of the subjective and objective creativity scores are
shown in Table 2. The frequency for the AUT prompts was as follows: Newspaper 299 (25.4%),
Brick 297 (25.3%), Envelope 263 (22.3%), and Hangar 320 (27.1%).
Table 2. Means, standard deviations of subjective and objective creativity scores for unique
records
Measure Mean SD N
17
Creative self-belief versus AUT
Note: The Ns are different because some participants completed the AUT but did not answer the
CSB. Only 520 AUT responses were manually scored, whereas we could obtain automated
Research Question #1: What is the relationship between the CSB item and the manual and
CSB was significantly but weakly correlated to manual fluency (rho = .13, p =.004, n-505) and
manual originality (rho = .11, p =.01, n-505), which remained significant after a multiple
comparison correction. CSB was not correlated with the other creativity measures: manual
flexibility (rho = -.10, p =.83, n-505), manual elaboration (rho = .05, p = .30, n-505), and
OCSAI elaboration (rho = -.02, p =.49, n-1139) and OCSAI originality scores (rho = 0.05, p
=.10, n-1139).
Research Question #2: Does the CSB item demonstrate convergent validity with
personality indices?
CSB was correlated with openness to experience (rho = .49, p <.000005, n-1022), extraversion
(rho = .20, p <.000005, n-1021), neuroticism (rho = -.20, p <.000005, n-1018), agreeableness
(rho = .14, p <.000005, n-1021), and conscientiousness (rho = .14, p <.000005, n-1023). These
18
Creative self-belief versus AUT
Research Question #3: What is the test-retest reliability of the CSB item?
CSB values were similar for tests one and two (1: M = 74.9, SD = 20.3; 2: M = 76.1, SD = 19.2).
The two CSB administrations were highly correlated (ICC = .79, 95% CI [.75 - .82]). The time
between administrations one and two ranged from 0 days (i.e., participants completed the AUT a
second time on the same day after a few hours) to 407 days (i.e., participants completed the AUT
more than a year later). The mean number of days between administration was 34.5 (SD = 54.2).
Research Question #4: What is the relationship between manual and automated AUT
scores?
The manual elaboration score was strongly correlated with the OCSAI elaboration score (rho =
.76, p <.000005, n-520). Manual originality scores were correlated with OCSAI originality
scores but less strongly (rho = .21, p <.000005, n-520). These remained significant after a
DISCUSSION
These analyses evaluated the construct, convergent, and test-retest reliability of a single-item
CSB and the relationship between manual and automated AUT scoring methods. We found that
the CSB significantly correlated with manual fluency and originality, multiple personality
indices, and was reliable across administrations. Further, we found significant correlations
19
Creative self-belief versus AUT
Our population’s mean effect of 9.4 for the average fluency score aligned with a recent meta-
analysis encompassing 114 effects from 31 studies at 9.08, 95% CI [7.54, 10.61] (Ogurlu et al.,
2023). This metric is the most congruent with the task instructions because we asked participants
to list as many uses they could think of as quickly as possible. Instruction nuances have
demonstrated differences in scoring metrics. For example, instructions that ask participants to list
as many responses as possible result in higher fluency but lower originality. In contrast,
instructions that explicitly ask for creative responses result in the opposite pattern (Nusbaum &
Silvia, 2011). Our scoring approach emphasized the quality of the response because
inappropriate uses were excluded from the fluency count, as has been recommended (Reiter-
Palmon et al., 2019). Thus, we focused on response quality rather than just productivity. The
flexibility scores of our participants (M = 4.4, SD = 1.5) were similar to some values reported by
others (M = 4.094, SD = 1.435; Ramakrishnan et al., 2022; M = 4.9, SD = 2.2; Organisciak et al.,
Hooijdonk et al., 2022). Likely, this variation is due to variability in scoring by study and
population differences. Similar variation is observed for the elaboration scores, with our values
(M = 0.7, SD = 0.6) being lower than another study (M = 0.1, SD = 0.3; Organisciak et al., 2023;
college-aged students often recruited for creativity studies (Said-Metwaly et al., 2020). Future
studies would benefit from establishing normative values in large populations and publishing
Research Question #1: What is the relationship between the CSB item and the manual and
20
Creative self-belief versus AUT
We observed a significant correlation between the single-item CSB and the AUT manual fluency
score, meaning that people who thought they were more creative could generate more uses in
two minutes than those who did not. These results support our hypothesis that at least one AUT
correlation coefficient strength attribution by Akoglu, would consider this relationship weak
(Akoglu, 2018). Therefore, while there was a relationship between participants’ perception of
their creativity and the number of appropriate uses they generated for the AUT, it is uncertain
how meaningful this significant relationship actually is in practical terms. There were no
significant relationships with the other scores, either manually or automatically generated. In
addition, when reflecting on their creativity level, people may consider fluency but also likely
consider other dimensions, such as arts and flow states, that the AUT does not capture. The AUT
is highly cognitive and may be influenced by unknown factors such as participants’ expertise.
For instance, a contractor may display more creativity when prompted with a brick than an artist
prompted with a newspaper. Thus, the significant but weak correlation between the single-item
CSB and the AUT, as evaluated in this study, likely reflects this nuanced, complex relationship.
Research Question #2: Does the CSB item demonstrate convergent validity with
personality indices?
The CSB item had a significant relationship with all the subjective self-reported personality
indices reflecting previous studies and supporting our hypotheses. For example, it was most
strongly correlated with openness to experience. One second-order meta-analysis of seven meta-
analyses found that openness to experience was related to creativity (r = .22; da Costa et al.,
2015), and another more recent meta-analysis found an even stronger relationship (r = .47;
21
Creative self-belief versus AUT
(Karwowski & Lebuda, 2016). Similarly, we saw results aligned with Karwowski et al., for
.07; Karwowski & Lebuda, 2016). These similarities are especially pertinent because Karwowski
evaluated these personality indices with CSB, which encompasses the constructs of creative self-
efficacy, creative personal identity, and self-rated creativity, the last of which is most relevant to
our study. So, despite the CSB item not strongly correlating with the AUT scores as shown in the
Research Question #3: What is the test-retest reliability of the CSB item?
Finally, the subjective creativity item demonstrated strong test-retest reliability, highlighting that
participant’s perceptions of their creativity were consistent and stable over time. Test-retest
reliability is important for the consistency of CSB over time and its use to identify the true
Research Question #4: What is the relationship between manual and automated AUT
scores?
We found that the manual and automated elaboration scores were strongly correlated (Akoglu,
2018), supporting our hypothesis. Elaboration is likely more straightforward to evaluate since
there were only three levels for the manual scoring, and the automated methods used word
counts for each use to generate the score. Thus, less nuanced judgment was needed. These results
add to the literature demonstrating that automated methods are useful to evaluate elaboration.
Our manual originality scores were also significantly correlated with the automated originality
22
Creative self-belief versus AUT
scores, but weakly. This may be due to the aggregation of uses the human raters did as part of
their process. That is, they gave one score per response regardless of the variation encompassed
by the uses listed for each response. For the automated originality scoring, we separated each
use, obtained an automated score, and then averaged the use originality scores for each response.
Perhaps the human raters’ less structured process to evaluate the originality of the whole
response created greater differences in the manual versus automated scores. Regardless, these
results supported our hypothesis that manual and automated elaboration and originality scores
Limitations
Certain limitations may affect the interpretation of these results. First, the study involved
secondary analyses of a previously collected dataset. Therefore, future studies should aim to
repeat these analyses as primary research questions incorporating optimal study design to answer
those questions. Second, the dataset was incomplete, as not all participants answered all the
questions, and the number of participants varied for different research questions. These missing
data may have influenced the results. Multiple imputation methods were considered but deemed
inappropriate due to the amount and non-randomness of the missing data for the manual scoring.
Some have suggested that averaging scores of four reviewers allows for treating the values as
continuous rather than ordinal, thus more accurately reflecting the spectrum of uses (Dumas et
Perhaps most importantly, the entire dataset could not be manually scored due to limited
resources. Ideally, we would manually scored all of the responses, but this was not possible, so it
23
Creative self-belief versus AUT
motivated our use of automated scoring methods. We also did not have other CSB questionnaire
data (e.g., the 11-item Short Scale of Creative Self; Karwowski, 2012; Karwowski et al., 2013)
to compare against the single item we did collect. Future studies would attempt to validate the
single item against a longer, more nuanced scale. We chose the gold-standard AUT as our
comparator for content validity. However, the AUT comes with its own limitations, such as
cultural bias favoring certain cultural groups over others (Kaufman et al., 2008) and capturing
the divergent thinking aspect of creativity but not others (Reiter-Palmon et al., 2019).
Moreover, we did not take out repeated uses for the OCSAI score; scoring was done in
aggregate. That is, if multiple similar uses were listed in one response, they were not deleted but
aggregated over the entire response. Lastly, OCSAI was the only automated method included in
the study based on recent evidence that it outperformed other automated methods (Organisciak et
al., 2023). Future research may wish to include other automated methods (e.g., SemDis; Beaty,
2021) to compare them in relationship to the manual rating of the CSB item.
Conclusions
These results highlight the importance of using the CSB in combination with other creativity
measures. While understanding that a person’s self-perception of their creativity may inspire or
predict their ability to produce alternate uses, it is not a reliable metric to replace tasks like the
AUT. That being said, in situations where time is short and extensive questionnaires or objective
measures are not feasible to be administered, the single-item CSB may still be useful and has
demonstrated positive convergent validity, test-retest reliability, and a significant, albeit weak,
correlation to AUT fluency and flexibility. Further, this study provides support for the continued
24
Creative self-belief versus AUT
use of OCSAI as a valid method to score originality in the AUT. The AUT is challenging to
score manually, and automated methods would certainly support its wider use in creativity
research. Despite some criticisms (Gilhooly, 2024), automated methods for AUT scoring,
especially AI-based versions, are gathering increasing evidence for their validity as compared to
human raters (Organisciak et al., 2023). These automatic methods will be a huge boon to the
Ultimately, the goal of this research is to support human creativity research. Creativity fosters
innovation, problem-solving, and personal growth. In a world faced with numerous intractable
paramount importance.
Funding
This work was supported by the John Sperling Foundation, the John Brockway Huntington
Foundation, and the Patricia Beck Phillips Foundation, who had no involvement in the study
design, in the collection, analysis and interpretation of data, in the writing of the report, and in
Acknowledgments
The authors would like to thank the following for their support of this project: Institute of Noetic
Sitara Taddeo, Mason Pritchard, Angel Vazquez, Mayank Ranti, Kim Davis, Tiffany Dickerson,
25
Creative self-belief versus AUT
During the preparation of this work the author(s) used Grammarly in order to improve readability
and language. After using this tool/service, the author(s) reviewed and edited the content as
needed and take(s) full responsibility for the content of the publication.
REFERENCES
Aldridge, V. K., Dovey, T. M., & Wade, A. (2017). Assessing test-retest reliability of
https://doi.org/10.1037/0022-3514.45.2.357
Beaty, R. E., & Johnson, D. R. (2021). Automating creativity assessment with SemDis: An open
platform for computing semantic distance. Behavior Research Methods, 53(2), 757–780.
https://doi.org/10.3758/s13428-020-01453-w
Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and
powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B
Cannard, C., Wahbeh, H., & Delorme, A. (2021a). Electroencephalography correlates of well-
being using a low-cost wearable system. Frontiers in Human Neuroscience, 15, 736.
26
Creative self-belief versus AUT
https://doi.org/10.3389/fnhum.2021.745135
Cannard, C., Wahbeh, H., & Delorme, A. (2021b). Validating the wearable MUSE headset for
EEG spectral analysis and frontal alpha asymmetry. 2021 IEEE International Conference
https://doi.org/10.1109/BIBM52615.2021.9669778
Cropley, A. J. (2000). Defining and measuring creativity: Are creativity tests worth using?
Springer.
da Costa, S., Páez, D., Sánchez, F., Garaigordobil, M., & Gondim, S. (2015). Personal factors of
Dumas, D., Organisciak, P., & Doherty, M. (2021). Measuring divergent thinking originality
https://doi.org/10.1037/aca0000319
Elsbach, K. D., & Kramer, R. M. (2003). Assessing creativity in Hollywood pitch meetings:
Forthmann, B., Szardenings, C., & Holling, H. (2020). Understanding the confounding effect of
27
Creative self-belief versus AUT
Guilford, J. P., Christensen, P. R., Merrifield, P. R., & Wilson, R. C. (1978). Alternate Uses
Manual & Sample Manual, Test Booklet (B & C), Scoring Key (B & C). Mind Garden,
Inc. www.mindgarden.com
Hass, R. W. (2017). Tracking the dynamics of divergent thinking via semantic distance: Analytic
https://doi.org/10.3758/s13421-016-0659-y
Hébert, J. R., Peterson, K. E., Hurley, T. G., Stoddard, A. M., Cohen, N., Field, A. E., &
Karwowski, M. (2012). Did curiosity kill the cat? Relationship between trait curiosity, creative
self-efficacy and creative personal identity. Europe’s Journal of Psychology, 8(4), Article
4. https://doi.org/10.5964/ejop.v8i4.513
Karwowski, M., & Lebuda, I. (2016). The big five, the huge two, and creative self-beliefs: A
https://doi.org/10.1037/aca0000035
Karwowski, M., Lebuda, I., Wisniewska, E., & Gralewski, J. (2013). Big five personality traits
as the predictors of creative self-efficacy and creative personal identity: Does gender
28
Creative self-belief versus AUT
https://doi.org/10.1002/jocb.32
Kaufman, J. C., & Baer, J. (2012). Beyond new and appropriate: Who decides what Is creative?
https://doi.org/10.1080/10400419.2012.649237
Kaufman, J. C., Plucker, J. A., & Baer, J. (2008). Essentials of Creativity Assessment. John
Madjar, N., Oldham, G. R., & Pratt, M. G. (2002). There’s no place like home? The
https://doi.org/10.5465/3069309
Nusbaum, E. C., & Silvia, P. J. (2011). Are intelligence and creativity really so different?: Fluid
Ogurlu, U., Acar, S., & Ozbey, A. (2023). Does word frequency impact ideational fluency in
divergent thinking? A meta-analytic exploration with the alternate uses test. Thinking
Organisciak, P., Acar, S., Dumas, D., & Berthiaume, K. (2023). Beyond semantic distance:
Automated scoring of divergent thinking greatly improves with large language models.
Park, N. K., Chun, M. Y., & Lee, J. (2016). Revisiting individual creativity assessment:
29
Creative self-belief versus AUT
Plucker, J. A., Makel, M. C., & Qian, M. (2010). Assessment of creativity. In The Cambridge
Ramakrishnan, S., Robbins, T. W., & Zmigrod, L. (2022). Cognitive rigidity, habitual
https://www.frontiersin.org/articles/10.3389/fpsyt.2022.865896
Rammstedt, B. (2007). The 10-item Big Five Inventory: Norm values and investigation of
Reiter-Palmon, R., Forthmann, B., & Barbot, B. (2019). Scoring divergent thinking tests: A
review and systematic framework. Psychology of Aesthetics, Creativity, and the Arts, 13,
144–152. https://doi.org/10.1037/aca0000227
Runco, M. A. (1986). The discriminant validity of gifted children’s divergent thinking test
Academic Press.
Said-Metwaly, S., Fernández-Castilla, B., Kyndt, E., & Van den Noortgate, W. (2020). Testing
conditions and creative performance: Meta-analyses of the impact of time limits and
https://doi.org/10.1037/aca0000244
Shin, S. J., Kim, T.-Y., Lee, J.-Y., & Bian, L. (2012). Cognitive team diversity and individual
30
Creative self-belief versus AUT
Silvia, P. J., Wigert, B., Reiter-Palmon, R., & Kaufman, J. C. (2012). Assessing creativity with
Silvia, P. J., Winterstein, B. P., Willse, J. T., Barona, C. M., Cram, J. T., Hess, K. I., Martinez, J.
L., & Richard, C. A. (2008). Assessing creativity with divergent thinking tasks:
Exploring the reliability and validity of new subjective scoring methods. Psychology of
Stumpf, H. (1995). Scientific creativity: A short overview. Educational Psychology Review, 7(3),
225–241. https://doi.org/10.1007/BF02213372
Tanis, C., Stevenson, C., & Baas, M. (2017, October). Automatic Scoring of the Alternative Uses
http://modelingcreativity.org/blog/wp-
content/uploads/2018/11/2017_Poster_Nijmegen_automated-scoring-AUT.pdf
van Hooijdonk, M., Ritter, S. M., Linka, M., & Kroesbergen, E. (2022). Creativity and change of
Wahbeh, H., Vieten, C., Yount, G., Cartry-Jacobsen, A., Radin, D., & Delorme, A. (2022).
https://digitalcommons.ciis.edu/advance-archive/47
Wahbeh, H., Yount, G., Vieten, C., Radin, D., & Delorme, A. (2022). Exploring personal
31
Creative self-belief versus AUT
https://doi.org/doi.org/10.1089/jicm.2021.0043
Zapf, A., Castell, S., Morawietz, L., & Karch, A. (2016). Measuring inter-rater reliability for
nominal data – which coefficients and confidence intervals are appropriate? BMC
32
General Instructions
Participants are given 2 minutes to write down as many uses of a common item as possible.
Images
Image 1 - Newspaper
Image 2 - Brick
Image 3 - Envelope
Image 4 - Hanger
Step 1. Fluency - look at the response of each participant and count the number of acceptable
responses. An unacceptable response is one that is not possible.
Step 2. Flexibility – categorize each word in the response by category of use. For the brick
example, building a house, building a chimney, building a stove would all be the same category,
whereas building a house (building), throwing at a person (weapon), a doorstop (weight) would
be three separate categories.
Step 3. Elaboration – rate the responses for the amount of detail (for Example "a doorstop" = 0
whereas "a doorstop to prevent a door slamming shut in a strong wind" = 2 (one for explanation
of door slamming, two for further detail about the wind) and a rating of 1 would be in between
those examples.
Step 4. Originality – The overall objective is to evaluate the originality of the person’s
responses compared to the responses of the other people in the dataset. We divided answers in
3 groups: original, somewhat original, non-original. So, for instance, we could have a person
that came up with 10 things but not original, and we would give them a 0 but for people that
came up with just 3 uses but one of the 3 was unique we would give them a 2.
1 - NEWSPAPER
Fluency
Read*, Shred, start a fire, throw*, burn*, Schools, clinics, library, hand-on, Smell/Taste,
wrap*, learn, cut*, crumple, tear, stack*, paint, acting, look at, spell*, build a tent, share*,
papier-mache, hat, sit*, create*, write*, grateful that i know how to read, Feel*, taste*,
origami, protect*, pack*, step, fan*, drying, put in water
clean*, hit*, recycle, garbage, fold, wipe*,
paper mache, paper mâché, lining, decor*,
kindling, wrap*, cover*, recycle, potty train,
tear*, confetti, layer*, sell, stack, Blanket,
wipe*, swat*, catch urine/pee
* represents any number of characters. For example, Ex* could mean Excel, Excels, Example,
Expert, etc.
Categories
1. Kindling / Fire starting
2. Reading (Knowledge, gaining information)
3. Stationary (Writing, Craft, Stacking)
4. Specific Crafts (Origami, Collage, Papier Mache, Confetti)
5. Organization (Sorting, Isolation)
6. Packing / Lining / Insulation / Covering
7. Weapon (Throwing, Hitting, Violence)
8. Stabilization (sit on, place under a table leg,
9. Getting rid of/Giving (recycle, gift, shred)
10. Cleaning (wipe up messes, clean windows)
11. Weight (lift, bench press)
12. Garden (mulch, planter box lining, weed prevention)
Elaboration
0 “To read”, “kindling”, “Packing”
2 “Blocking wind below a door”, “Using as a cover for your artwork”, “keep the cat off the
couch”, “fan yourself on a hot day”
Originality
0. Non-Original
1. Reading / information
2. Kindling / fire starting
3. Packing material
4. Wrapping
5. Folding
6. Throw away / recycle
7. Pet clean up
8. Window cleaning
9. Stacking, booster seat
10. Art, papier mâché, collages
11. Games, play (with people or animals)
12. Using as a table, chair, or shelf
13. Covering a window / shade
14. Weight
15. Throwing, scattering
16. Protect the floor from paint, oil, etc
17. Gardening, weed control, compost
18. Discussion with others
19. Fan
20. Self defense, throw at someone, make a projectile
21. Stabilization, door stop, wedge
22. Hiding, covering
23. Lining shelves
1. Somewhat Original
1. Record keeping - birthdays, funerals, historical events
2. Wrap food (specific or general)
3. Drive or bike over
4. Make Cushions / kneel on
5. Wear as clothes
6. Crossword puzzles
7. Teach someone to read / read to others
8. Tent / teepee / human shelter
9. Make an ad / advertisement / sales
10. Stuff into shoes / hats for shaping clothes
2. Original
1. Blocking wind below a door
2. Donation to offices for reading
3. Food for termites
4. Coffee filter
5. Wet ball to shoot through straw
6. pretending to be reading while scopi…
7. sitting on coloring easter eggs
8. seeing how times have changed by comparing old and new newspapers
9. keep the cat off the couch
10. creating silly putty prints
11. Cover fruit to ripen
12. Use to wrap a wound
13. learning a new language if paper is in foreign language
14. use the ink to make finger prints
15. photo opp
16. ballast for hot air balloon
17. identify all the words in articles starting with p
18. ear funnel to hear better
19. use to compare heights of children measuring heights / distance
2 - BRICK
Fluency
Hit, Build*, Weight, Throw*, Hammer, step*, Ow, Ice skate, Note*, Listening to the earth,
break*, border, path, sit, block, sink, prop, Observe*,
decor, enclosure, destroy, stack, path,
doorstop, cooking, heat*, smash, lift, door
stop, stand on,
Categories
1. Tool
2. Build (Constructive
3. Weapon
4. Athletic (lifting, throwing)
5. Weight
6. Noise
7. Stand / Stacking (Sitting
8. Artistic (Decoration)
9. Interactive
10. Misc. (Scraping, etc)
11. Measurement (Counting, Straight Edge)
12. Destructive (smashing, breaking,
Elaboration
0 “To throw”, “hit”, “build”, “bookend”, “weight”
2 “To use as a weight to keep the door shut from a big gust of wind”,
Originality
0. Non-Original
1. Building - houses, buildings, fire pits, roads, etc
2. Steps, stairs, pathway, road
3. Border, fence, barrier, divider
4. Weight - holding this down / up, car block, paper, door stop, anchor, sink something
5. Excercise / yoga / martial arts
6. Smashing, crushing, breaking something (including food),
7. Hammering
8. Weapon
9. Stacking
10. Art
11. Pressing things like flowers, cheese, tofu
12. Measuring, drawing, straightedge
13. Stand / sit on
14. Using it to keep warm / heat transfer
15. Play with, make games, imagination, children
16. Drainage for plants
17. Break it up and use the powder / crumbles
18. Writing a message on
19. Scraping, scratching, digging
20. Pillow, bed, headrest
1. Somewhat Original
1. Making a pattern?
2. Give away / gift
3. Color comparison
4. As a mold to create more bricks
5. Making marks, signaling where something is
6. Meditate on
7. Nail file
8. Block for mice / racoons / pests
9. Make noise
10. Experiement / science experiments
2. Original
1. smooth wood polish
2. qi gong tool
3. sink in the water to raise water level
4. use to sharpen a knife
5. Eat food off of
6. habitat for a lizard
7. engrave it to commemorate someone
8. Teaching math
9. A washing surface
10. Burning incense
11. Cooling rack
3 - ENVELOPE
Fluency
Create art, mail, send, post, throw, fold, blow on as an instrument, a plate, as a shoe
origami, paper airplane, pick up*, wrap*, for a small person, make noise with,apply
paint, write*, bookmark, container, cover, medicine with, paperclips, paper pins, money,
scratch paper, coaster, clean*, napkin, purse, glue, pet, imaginary friend, mitten
lick, stamp, unfold, give, notepad, hat,
stabilizer, kindling, fire, receptacle, transport,
securing, list, art, tear, bookmark,
Categories
1. Mailing / Sending
2. Containing (Coin holder, hair locks)
3. Wrapping (Including gum)
4. Crafts (Origami, art, child play, toy)
5. Stabilizer (like under a short table leg)
6. Misc. (placement, napkin, coaster, recycle)
7. Stationary (Writing, Using as notepad, filing, organizing)
8. Kindling (Fire)
9. Tool (Straight Edge, crevice cleaner, picking, cleaning, measurement, slicing, Bookmark)
Elaboration
0 “Mail”, “Bookmark”, “Notepad”, “Origami”
Originality
0. Non-Original
1. Mailing, sending
2. Writing on
3. Recyling, throwing away, compost
4. Arts and crafts, drawing painting, origami, make a toy,
5. Holding various objects, organization, filing
6. Hiding / revealing
7. Blocking light, sun, wind
8. Cleaning, catching grease / oil etc, picking up something you don’t want to touch
9. Measuring, straightline, ruler
10. Burning, kindling
11. Stabilization
12. Fan
13. Carry / move objects and insects
14. Kill insects
15. Coaster, placemat
16. Bookmark
17. Sign, label, nametag
18. Throw it
19. Play, confetti,
1. Somewhat Original
1. Make noise with, whistle
2. Papercut, slicing / cutting
3. Use the sticky part as glue / to hold something else
4. Let animal chew on
5. As an email icon
6. Toothpick
7. roll a cigarette
8. MI6 trick - I’m not sure what this is? I didn’t find anything on a preliminary google search
(943) gave a point because no one else mentioned it so it has an element of originality
but I’m not sure if it falls under another category
9. chop stick stand
2. Original
1. etrieve an object under a thin opening
2. use for sewing corners
3. dance on a piroutte in the center
4. over dressing for a wound
5. Divination
6. to make a trail while hiking
7. Blot lipstick
8. Mousepad
9. keep a door from locking
10. use it as a dixie cup for drinking water
11. Funnel
12. testing nail polish colors
4 - HANGER
Fluency
Hang, poke, unwind, wire, wrap, toy, hook, “Hanging oneself”, “fix things”
wind together, carry, prod, weave, open,
unclog, hit, decoration, melt, retrieve, unlock,
kill, drain, connect, wreath, tool
Categories
1. Hanging
2. Reaching / Tool (Abortion [intense but tool use])
3. Holder (Tie holder, watch holder)
4. Weapon
5. Art (Wire Art)
6. Tying / Binding
7. Games (keep-away game)
8. Electricity
9. Misc. (Scrape
Elaboration
0 “Hanging clothes”, “remove hair from drain”, “to poke eyes”
Originality
0. Non-Original
1. Hang coats, clothes, ties, jewelry
2. Unlocking a car / picking a lock
3. Crafts
4. TV antenna
5. Roasting marshmallows, hot dogs, cooking
6. Cleaning, unclogging a drain
7. Reaching
8. Back scratcher
9. Scratching surfaces, paint off something, making a mark etc
10. Hitting someone / weapon
11. Use as wire, conducting electricity
12. Binding, connecting, tying
13. Fence
14. Poking someone or something (including fire stick)
15. Recycling, donation, trash, get rid of
16. Fishing pole
17. Gardening, trellis, support potted plants
18. Bend into another shape
19. Make jewlery, clothes, belt, hat, or costume
20. Straightedge, tracing, drawing, measurement
21. Games / play / sports - bow and arrow, sports, basketball hoop, or playing with an animal
22. Make a hole in something
23. Hanging other objects
24. Musical instruments, drum sticks
25. As a hook
26. Animal trap
27. Flagpole
28. Abortion
29. Divining / dousing rod
30. Drying herbs / flowers
31. Melt and make something else
1.Somewhat Original
1. Jaw / mouth opening
2. Door stopper window stopper
3. move things that are hot
4. Hang photo on the wall
5. Bubble blower
6. Divining / dousing rod
7. Rodent trap
2. Original
1. Make furniture
2. stop sleep apnea by having on back
3. Bookmark
4. use it to relieve tension by twisting
5. make a magnetic coil, make a compass
6. Tanning hide
7. hold open a plastic bag
8. posture reminder
9. use as a splint