1 Introduction
Despite wide recognition, research, and advocacy, autistic
1 adults continue to face critical levels of employment challenges [
45,
90]. Prior work has shown that up to 85 - 90% autistic adults face unemployment and underemployment issues [
71,
80], among the highest of all major disability groups in the United States [
48]. When seeking employment and navigating the workplace, autistic workers report that they often face challenges such as understanding non-autistic social norms, handling limited or unclear information and expectations, discerning unspoken meaning in communication, coping with sensory overload, and managing with limited support and accommodations. Furthermore, many face difficulties navigating issues such as disclosure and advocating for roles which take advantage of their unique characteristics [
10,
48,
67].
Given that autistic individuals experience challenges with social interaction in the workplace [
5,
10,
67], researchers have noted the importance of having a champion or advocate who is able to mediate and assist the autistic individual in social or communication challenges at work [
59]. Some research has found positive effects that specialized job coaching for autistic job-seekers and counseling with speech and language pathologists (SLPs) can have [
21,
59,
74], but given the challenges and research highlighting the need for more affordable and better support [
45,
67,
74,
90], it appears unlikely that autistic workers will access essential resources in most jobs. This underscores the desire for a readily accessible resource that autistic workers can turn to in moments of social difficulty – a desire which we explore whether large language models (LLMs) can support.
Our work assesses the capability of LLMs to intermediate and otherwise assist tactfully with workplace communication and related acts. This is motivated by LLMs’ recent astronomic rise in popularity in adoption coupled with its apparent ability to simplify and explain social interactions. These include reports of utilization in contexts such as writing “how-tos” for social interactions, interpreting and explaining social situations, explaining humor in jokes, and proofreading workplace communications by the broader public [
34]. Much of this is attributable to the fact that LLMs encode a wide range of human behavior in its training data corpus [
8]. More recent chatbot incarnations (e.g. OpenAI’s ChatGPT [
63] and Google’s Gemini / Bard [
66]) have employed techniques like reinforcement learning from human feedback (RLHF) [
47] to achieve an emergent capability to generate outputs with increased perceived social awareness and coherence compared to prior language models. In autism-allied and -focused spaces in social media and online forums [e.g. Facebook, X (
formerly Twitter), TikTok, Reddit, etc.], this apparent capability has not gone unnoticed: a growing number of posts now involve users opining, speculating, and relaying their experiences after having used LLMs in social communication contexts [
76]. Many explicitly note utilizing LLM-based tools like ChatGPT to understand and prepare for social situations at work [
76,
84,
85], write emails and messages to supervisors and coworkers [
83,
84], and understanding vague communications and instructions [
83,
86] – with one thread dubbing it “a gamechanger for people on the spectrum [
86].” Given this, it can no longer be assumed that this use case is mere hypothetical.
To better envision the opportunities and concerns with autistic users utilizing LLMs for assistance in workplace communication, we aim to answer the following research questions.
RQ 1
What communication challenges and resource availabilities (or deficits) do autistic workers experience?
RQ 2
Do autistic workers believe that an LLM’s advice could be helpful for addressing workplace communication challenges?
2b
If so, how would autistic workers utilize LLMs?
RQ 3
Can an LLM’s advice be considered good? (and in what / whose definition of good?)
We conducted a within-subjects study with (
n = 11) autistic individuals where we encouraged participants to maintain a free dialogue with two chatbots, an LLM (
utilizing OpenAI’s GPT-4 [64] via API with some prompt engineering) and a human confederate in disguise. During the study, participants (1) shared with us their prior experiences with workplace communication challenges and available resources, (2) engaged in exchanges with the chatbots, and (3) rated and described their preferences for and between the LLM and confederate.
Overall, our data shows that participants showed a strong affinity to the LLM, with nine (9) out of 11 \((\simeq 82\%)\) expressing a desire to continue using it for communication aid and social advice. We found the same nine (9) out of 11 \((\simeq 82\%)\) participants preferred LLM interactions to ones with the human confederate through quantitative and qualitative analysis. Participants’ motivations to continue using the LLM were informed by a lack of available resources at work, with many relying on friends and family despite prior experiences of emotional harm.
Participants valued the LLM for its potential to communicate in ways which ascribed to their preferences, untangle implicit neurotypical norms, and allow for the freedom to ask questions without fear of reprisal – providing a sense of control in navigating the neurotypical world. To provide grounding of the LLM’s answers vis-à-vis alternative resources to autistic workers, we had a professional counselor and job coach (LPC, NCC) specializing in workforce readiness training for neurodivergent individuals evaluate LLM and confederate responses. While the practitioner placed value on the potential for LLMs’ ease-of-access in times of need, she noted the LLM’s tendency to make ungrounded assumptions and assume neurotypicality, and raised concerns about harms resulting from the LLM’s misaligned advice and participant acceptance of potentially harmful recommendations.
From our results, we note a divergence in attitudes towards the LLM:
one where autistic participants express a desire to use an LLM because it appears to provide agency for independent exploration, while our practitioner urges caution due to its misleading advice. This division symbolizes and foreshadows imminent societal concerns as LLMs are considered by autistic users [
2] and are being developed explicitly for interpersonal advice [
29]. Despite its novel façade, we show how the disagreement between participants and practitioner mirrors existing conflicts in HCI literature from accessibility, disability theory, and medical/social models of disability — ones involving the relative privileging of disabled experiences against normative authorities. We believe acknowledging this parallel allows for progress towards
truly assistive aids for autistic workers which offers access to grounded advice while centering their lived experience.
This work presents the following exploratory contributions which we believe merits further consideration:
(1)
we illuminate the practice by autistic workers of obtaining social communication advice from LLMs (§
1),
(2)
we gauge autistic participants’ preferences for receiving advice from an LLM versus a human confederate and current resources in a user study (§
4.2),
(3)
we evaluate the quality of LLM advice during a discussion with a specialized counselor and job coach (§
4.3),
(4)
we provide discussions of potential reasons, opportunities, and concerns for future use of LLMs for autistic workers’ social communication assistance (§
5),
(5)
and we elucidate design considerations as LLMs are considered for social communications, and demonstrate how designing LLMs for social advice is fraught with entanglements and relative privileging. (§
6)
3 Methodology
In the study, autistic workers were asked to share prior workplace communication challenges and resources, and to interact with and rate two chatbots: (1)
Paprika, a chatbot which utilized OpenAI’s GPT-4 [
64] API (
with some modifications described in § 3.1.4) and (2)
Pepper, a human confederate. A study design with a human confederate was chosen as a comparative baseline for evaluating the LLM as it represents a close analog for a readily available resource that provides human-like advice, akin to an anonymous help hotline (in chat form), Reddit comments, the social network Blind [
1], as well as interaction behavior which were described in social media posts and threads. To ensure participants would feel comfortable providing candid feedback without worrying about potentially offending a real person, we communicated that both
Paprika and
Pepper were automated agents. After the conclusion of the study period, we consulted with a professional counselor and job coach (
LPC, NCC) specializing in workforce readiness training for neurodivergent individuals for their opinions to reveal additional context into the efficacy and safety of LLM-generated advice.
3.1 Study Protocol
The study, whose design was approved by an Institutional Review Board (IRB), was conducted remotely, and was designed to take 90 minutes across three parts for each participant. Data was recorded via video conference software recording, and collected video and audio data from participants, as was text inputted to the Discord space used for chatbot interactions and surveys (§
3.3). Each participant encountered the following three stages:
(1)
a 10-minute semi-structured interview (§
3.1.1),
(2)
a 65-minute session interacting with and evaluating two chatbots (§
3.1.2) which included,
)
a 5-minute onboarding to familiarize the participant with Discord and interacting with Paprika (LLM) and Pepper (human confederate),
)
four (4) sessions of an (up to) 10-minute interactions with a chatbot and a rating survey,
(3)
and a 15-minute follow-up semi-structured interview (§
3.1.3).
After the study, participants were disclosed of the study’s purpose, and that Pepper was a human confederate, and further follow-up questions were addressed if raised.
3.1.1 Assessing Prior Experiences.
In an initial 10 minute semi-structured interview, participants were asked about prior issues they had encountered with workplace communication, either in person or online, as well as the context surrounding any of these challenges. Participants were also asked about types of assistance they had sought or considered (in or outside of work) when faced with these issues. Separately, participants were asked whether they had used chatbots or LLMs in the past.
3.1.2 Interaction and Evaluation of Chatbots.
Participants were navigated through the process of opening the Discord client on their device and connecting to the server used for the experiment. There were then introduced to the different rooms: one for testing/trialling (connected to an LLM) and four for interactions with the two chatbots (two each per chatbot), with ordering done in an alternating and counterbalanced fashion (including between participants). Participants were introduced to a scenario where they would attempt to ask various chatbots for advice, and that they were to consider the chatbot to be disassociated with any workplace software, and thus the agents would not have context of who they were, and also would not report data back to their workplace.
Onboarding – Test Interaction Room. In the first room, participants were encouraged to ask the chatbot any kind of question (e.g. “What’s your favorite recipe for chocolate chip cookies?”) to acclimate participants to the idea that the chatbots in this study could be more conversant than those which could be found in commercial applications, such as customer service chatbots. Most participants (P1 - P3, P6, P7, P9 - 11) opted to ask workplace communication related questions (“What are some strengths of neurodivergent people in technology?” - P8) while some (2) chose to ask philosophical questions to challenge the system (“What is the meaning of life?” - P4, P5).
Interactions with Paprika and Pepper. Before each individual interaction, participants were informed that each room was connected to either the
Paprika (LLM) or
Pepper (human confederate), and participants were encouraged to discuss their prior workplace communication-related questions with the chatbots. Generic questions were prepared, but all participants opted to ask their own questions. After each of the four (4) chatbot interactions, participants completed a short survey (described in §
3.3). Participants interacted with
Paprika and
Pepper consecutively and in alternating fashion, with a 10-minute limit per interaction.
3.1.3 Collecting Overall Impressions.
After completing the interactions and evaluations of both chatbots, participants engaged in a follow-up semi-structured interview. During this interview, they were asked to reflect on their experiences with both Paprika and Pepper, provide comparisons between the two, and discuss any preferences or suggestions for improvements. Additionally, participants were asked to share their thoughts on the potential utility of chatbots in assisting with workplace communication issues and any concerns they might have regarding the use of such technology. Upon conclusion of the follow-up interview, participants were debriefed about the true nature of Pepper being a human confederate and of the purpose of the study. They were given the opportunity to ask any questions or share any additional thoughts about the study design and their experiences.
3.1.4 Paprika: the Large Language Model Chatbot.
Paprika was developed using OpenAI’s GPT-4 [
64] API, with it prompted with the following before each 10-minute interaction:
(1)
You are a helpful assistant named Paprika. Provide clear and thorough answers but be concise.
(2)
Use a more conversational but still workplace appropriate style. Make sure your answers are short, make sure your responses are around two paragraphs.
(3)
Also, if I am not asking a question that is workplace-communication related, let me know that I am off-topic and steer the conversation back on-topic to workplace communication. Do not attempt to answer the question if it is off-topic.
to ensure some parity with the human confederate Pepper in response length and writing style. (Note that the prompt is demarcated in sections for reading clarity – in the study, the sentences of the prompt were formatted as a single prompt, in a single paragraph.)
Fine-Tuning Paprika’s Prompt and Behavior. Initially, the study originally considered only text from the beginning of the final prompt (e.g. (1)). After some testing with our confederate, the second phrase (2) was included to help ensure responses were comparable to response styles that the human confederate (Pepper) provided. Finally, the last phrase (3) was added to have Paprika (LLM) only respond to questions germane to workplace communication, as was instructed to our confederate (Pepper).
For the first half of studies (P1 through P6) responses from Paprika were posted after receiving a response from the API and undergoing human review by the interviewing researcher for safety. In the later trials (the last five trials), after noting some participants were sensitive to the response time difference between Paprika and the confederate [mean delay for first 6 participants: Paprika = 165 seconds, Pepper = 189 sec], output from Paprika was delayed to match or exceed the delay observed from Pepper (the confederate) from previous trials.
Similarly, Paprika’s verbosity was manually limited to outputs which were within 30 words of the confederate’s average reply length from previous participants [mean verbosity for first 6 participants: Paprika = 205 words, Pepper = 153 words]. This was achieved by continuously re-sending API queries until a response of desirable length was received. (Note that this only occurred once in all interactions with the last five participants.)
3.1.5 Pepper: the Human Confederate.
The human confederate, who posed as the chatbot
Pepper, is a graduate student with three (3) years of prior work experience. In preparation for this study they consulted with a practitioner (§
3.4) about best practices as a support for a neurodivergent coworker. During the study, the confederate did not have access to content from conversations with
Paprika, and was tasked to answer questions from participants as they were entered into the Discord channel.
3.2 Participant Details
We recruited eleven (11) participants, whose ages ranged from 22 to 50 (mean = 30, SD = 8.7), and with a wide variance of educational and work experience from 0.5 to 28 years (mean = 9, SD = 8.1). We report demographics and work experiences of our participants in Table
1. Participants were recruited by posts to general-purpose and special-purpose (e.g. autism advocacy related) email lists, as well as with direct outreach. Promotional and recruiting materials for the study referenced the potential of interacting with and rating chatbots based on LLMs and which may be similar to ChatGPT. Participants were informed that their responses to surveys, chatbots, as well as video and audio from the interview may be used for data analysis. After the study, participants were compensated $20 in gift card credit for their time.
3.3 Data Collection
Overall, data collected from this study included (1) transcripts from recordings of the semi-structured interviews, (2) text from the chatbot interactions, and (3) participant responses to the post-interaction surveys
3.3.1 Post-Interaction Survey Design.
The post-interaction survey given to participants after every (up to) 10-minute interaction included questions on a 7-point Likert scale assessing participants’ perceptions of the chatbots’
utility, understanding, likelihood of use, and dependability. Details about the questions and the anchors to the Likert scales are provided in Table
2.
As part of the survey, we also asked participants the following questions, which they could provide long-form written answers.
(1)
If you could change this chatbot’s behavior, or give it feedback, what would you tell it?
(2)
What concerns, if any, do you have about your interaction with this chatbot?
(3)
How does this chatbot’s response differ from what you may get from a coworker / friend / supervisor if you asked them for advice? (and tell us what/who you are comparing against)
3.3.2 Quantitative - Overall.
When participants were asked which chatbot they preferred overall, 9 out of 11 (\(\simeq 82\%\)) expressed a preference towards the LLM (Paprika) over the confederate. We conducted a two-tailed Mann-Whitney U / Wilcoxon’s signed-ranked test on the participants’ Likert scale ratings for Paprika and Pepper to determine if there was a statistically significant difference in participant ratings between the two. Given that Paprika’s (LLM) response time and verbosity were altered for the latter five participants, we conducted additional two-tailed Mann-Whitney U / Wilcoxon’s signed-ranked tests on the first six and latter five participants separately to ascertain whether statistically significant differences were observed in either or both groups.
3.3.3 Qualitative - Overall.
Long-form answers to the survey questions, as well as transcribed participant dialogue from the semi-structured interviews in (§
3.1.1) and (§
3.1.3) was utilized for thematic analysis [
11] — to identify patterns and themes in the qualitative data. The analysis began with two researchers jointly reviewing three (3) transcripts and corresponding survey responses and discussing the development of a codebook. Once the codebook was agreed upon, one of the two researchers coded the remaining transcripts. After completing this coding, both researchers met again to review the coded data, discuss any discrepancies, and finalize the identified themes and patterns. Our thematic analysis of the participant survey responses and interview transcripts yielded a codebook with 29 codes, which we report in Table
tab:themes in the Appendix.
3.4 Practitioner Evaluation
After completing all 11 participant sessions, we sought additional grounding and points of comparison for LLM-generated advice versus other known resources, and we invited a professional counselor and job coach (
LPC, NCC) who specializes in workforce readiness training for neurodivergent individuals to review the responses from both
Paprika and
Pepper and provide expert validation [
54] of LLM-responses from the study.
Over a 150 minute session, the practitioner reviewed chat transcripts from the participant interactions for both the LLM and the confederate and gave open-ended feedback as to the quality of the responses and comparisons to advice she would give in her practice. The practitioner conducted back and forth dialogue with the researcher and was encouraged to think-aloud while she reviewed the chat transcripts, and engaged in an unstructured exploratory problem-centered interview [
96] about the practical effects of more widespread access and use of LLMs and of access to advice of this form.
5 Discussion
The motivation behind this work stemmed from the recent surge in popularity of large language models, and the desire to better understand the opportunities and risks from autistic workers’ usage of LLMs. Our findings show unambiguously that the participants we interviewed displayed strong preferences for utilizing LLMs as social communication aids at work, signalling more widespread usage, as well as caution and warnings from a practitioner regarding its adoption. We reflect on why we may have observed the results we did, as well as address the difference in opinion between participants and practitioner, and how it portends difficulties in creating an equitable and practical LLM for providing social advice.
Positionality Statement.
We disclose that some of the authors identify as neurodivergent, though none identify as autistic. Our perspective in approaching this work is from that of an accessibility researchers’, who subscribe to the social model of disability. Our approach to this work was informed by a recognition that our participants were likely to be situated in contexts where the medical model framing and norms would be prevalent.
5.1 The Appeal of LLMs for an Autistic Worker and Envisioning Downstream Effects
In this section, we attempt to interpret the relationship between the myriad of factors which contribute to our participants’ preferences for interacting with the LLM over the human confederate. We describe our best understanding of immediate and near-term ramifications from this realization as we forecast possible outcomes from future use.
5.1.1 LLMs Could Seen As Better Than Existing Resources.
We observed that many participants reported limited or a total lack of reliable resources (§
4.1.3), and this became one of the major bases for comparison from which participants evaluated their willingness to seek and adopt social advice from an automated agent. Given that participants were not initially made aware that only one of the two agents they interacted with was an automated agent - we noted that several participants (7 | P1 - P4, P8 - P10) expressed a desire to have
either agent available to them. In many cases, the experiences that participants had with the LLM represented ones which held far more promise than the status quo.
As P9 remarked:
P9: I think it [the LLM] is more willing to take the time to provide explanations for things I don’t understand [...] which isn’t the case in real-life.
5.1.2 Many of the Positively-Rated LLM Attributes Had Little to Do with Social Advice.
From our findings, we observe that many of the LLMs’ qualities that participants reacted positively towards (§
4.2.2) were about affective or communicative style, rather than about substantive social guidance itself. While it may not be possible to distinguish how each quality affected participants’ overall attitudes, this nonetheless shows that
participants are looking for more than social advice with LLMs and are placing significant importance on the manner and context in which advice is delivered.Given this and our participants’ perception of resource deficits (§
4.1.3) we observe that positively rated LLM attributes have potential to give insight into addressing autistic and non-autistic dyadic or group social communication.
Specifically, behavior exhibited by LLMs gives us explicit, actionable cues on which conversations could be adapted to improve current-day communication challenges.Regrettably, we believe that preference for LLMs in this aspect also likely reflects a deficit in participants’ current workplaces of qualities (§
4.2.2) like open-mindedness, considerateness (in taking extra steps to make more comprehensible, legible communication), and courteousness. This conclusion is likely generalizable as it dovetails with well-founded existing research establishing greater rates of “workplace incivility” experiences by minoritized groups in the workplace [
17]. We hope that positive human-LLM interactions, rather than reinforcing the commonly held ableist notion that “autistics prefer to communicate with robots,” [
70], can provide guidance on reframing these notions as simply
communication preferences.
5.2 LLM Advice Appears Intractable, Even With Disclosure
The researchers, participants, and practitioner observed that the LLM frequently suggested advice which could be problematic for the autistic user. Some included employer-friendly language, while other strategies encouraged autistic individuals to engage in behaviors which would necessitate masking, such as maintaining eye contact, smiling, or participating in large group discussions (§
4.3). This was perhaps to be expected, given that only a minority of participants (2) ever disclosed to the LLM that they were autistic. However, even in cases where disclosure occurred, our participants and counselor observed that these recommendations persisted, though to a lesser degree. We provide a representative example of this type of conversation in Appendix
B. We see that its adjustments include the highly questionable suggestion of disclosing one’s autism, which prior research has shown to be perilous [
48].
Given the probabilistic and so far undefined nature of prompt engineering and variation of user input to LLM output, we would consider it unlikely that prompt engineering alone could mitigate the risks identified by the practitioner. It is clear to the authors that future LLM-based solutions to assist autistic workers’ social communication ought to consider the process of more systematic and comprehensive construction for models which may be used for this purpose.
6 Design Considerations for Autistic Workers’ Use of LLMs
Such comprehensive efforts, which may include ensuring representativeness of autistic perspectives in training corpora, and carefully balancing the diversity of perspectives involved in value-sensitive processes such as training reward functions [
65], begin to address larger sociotechnical issues which are part of the current discourse around technology and stakeholder representation. We present the following design considerations which hope provide guidance for system builders, the autism and broader accessibility community, and researchers with interest in this space.
6.1 LLMs for Social Advice Necessarily Involves Relative Privileging
We note that a practitioner’s role is generally representative of an occupation aligned with a medicalized model of disability and autism, wherein typical goals emphasize management or mitigation of autism-related challenges. The tendency then, would be to encourage clients to adhere to neurotypical norms, sometimes at the expense of their natural inclinations. This may be, as our job coach and counselor often emphasizes, not that practitioners seek to deny autistic expression, but must nevertheless encourage a practicable level of conformity to existing workplace expectations from clients. In contrast, our participants sought an enabling support to assist with understanding an unfamiliar world which understood their perspective. Participants preferred LLM interactions despite flaws like neurotypical-centered approaches (even after disclosure), employer-centered advice, and Pollyanna-esque outlook on difficult social situations.
As such, we observe that a coach-like LLM which provides advice adhering to practitioners’ best practices would not fulfill the same role as the kind of empathetic assistant in which participants would expect. In fact, an we believe that an LLM aligned with the perspective espoused by our participants will not effectively serve the broader goal of making guidance and advice from practitioners more accessible to autistic workers. Instead, it could exacerbate existing tensions between certain segments of the autistic community and practitioners. These tensions often stem from perceptions of practitioners’ practices as reinforcing an oppressive status quo, forcing autistic individuals into a restrictive mold, and promoting masking.
This division begs the question of how to consider such a conflict. We believe it is imperative to discuss how to ameliorate, if not resolve the issue:
Should the goal of LLMs as an assistive technology be to advance the interests of the disabled individual or the normative social good?
With this question, we find that existing perspectives (§
2.4) illustrate the gamut of different possible positions. Those identifying with the medical model of disability would argue that the goal of an LLM in this role should be to have it replicate existing options for therapy and encourage workplace assimilation. From those aligned with a standpoint theory perspective, the answer to this question may vary depending on the priority that an autistic user places between an LLM which encourages fitting in to existing workplace structures and advocating for one’s own needs and expressing one’s own individuality. Disability studies and critical disability advocates may argue that an LLM explicitly be constructed to prioritize and empower disabled individuals’ desires first and foremost, leery of solutions which promote existing power dynamics and social structures like the existing deficit-based view of disability in many workplaces.
We believe that understanding perspectives on a values-based question such as this one is important as we increasingly utilize automated agents like LLMs for value judgments, particularly in identifying where and how such value judgments are informed in the system development process. As LLM research increasingly focuses on issues of value alignment [
65], we encourage an introspection of the following question:
Whose experience should we privilege with LLMs? And to what end?LLM and foundation model developers are continually developing new and improved models, including those that are explicitly tuned to dispense social advice [
29]. We see that deciding on which types of response to designate as the “good” or “appropriate” one is a choice which necessarily privileges the perspectives of certain parties over the other.
More concretely, we see that developing models in which practitioners are consulted for determining “goodness” would result in a relative deprioritization of the interests for the autistic user, and vice-versa. This conundrum is not easily answered. A natural and subsequent question that systems developers should ideally address whether the intention of the system is to partake in normalization of social change or to preserve the status quo of the current minoritization of autistic individuals. We note that this is likely secondary to the question of whether LLM developers will involve stakeholders like autistic users and practitioners in LLM development, which would be a prerequisite to address this question of relative privilege.
6.2 Assistive, Rather Than Curative Solutions
We advocate for a focus towards developing LLMs as assistive tools that support autistic individuals — one which centers the individual’s perspective in achieving their personal and professional goals. Many prior technological interventions for autistic individuals have traditionally had a
curative perspective, aiming to “fix” perceived deficits or challenges associated with autism. This perspective, while perhaps well-intentioned, can reinforce harmful stereotypes and stigmas, and overlook the unique strengths and perspectives that autistic individuals bring to social and workplace environments [
95].
In contrast, an
assistive approach would view LLMs as tools that can empower autistic individuals by providing support in areas where they may struggle, while also respecting and valuing their unique experiences and perspectives. This approach aligns with the Neurodiversity movement [
89], which advocates for viewing autism as a natural variation of the human experience, rather than a disorder to be cured. We encourage researchers and developers to adopt this assistive perspective when designing and evaluating LLMs. This could involve focusing on how LLMs can provide practical support in areas such as social interaction or employment, while also ensuring that the tools respect and value the experiences and perspectives of autistic individuals.
6.3 Re-imagining Inclusive Communication Norms Involving Autistic Individuals
Our explorations with LLMs highlight concrete realizations about the potential for re-imagining inclusive communication norms involving autistic individuals. LLM’s promptness, clear formatting, and (among others), which were appreciated by participants, underscore the possibility of creating conversational spaces that are more accommodating and respectful of autistic communication styles.
This aligns with Milton’s Double Empathy perspective [
56], which posits that communication difficulties between autistic and non-autistic individuals stem from mutual misunderstandings due to differing perceptions and experiences of the world. LLMs could help bridge this gap by providing neurotypicals an accessible method to adapt to a communication style that is in-line with the needs and preferences autistic individuals, thereby fostering greater mutual understanding.
We encourage further exploration on how the positive attributes of LLMs can be leveraged to promote more inclusive communication norms. Yet at the same time, we note the potential for LLMs to become a mere
translation layer bridging neurotypicals and autistic individuals, which demands further exploration for its consequences.
6.4 Speculative Futuring with LLMs
If LLMs become more tightly integrated into workplace communication practices, and the phenomenon of utilizing LLMs for communications becomes commonplace, we anticipate potential difficulties and confusion surrounding the ultimate role for LLMs. We are particularly concerned with the possibility that LLMs may be viewed as (or explicitly dictated by supervisors as) a necessary crutch by autistic workers in low-resource occupations.
We also note the connotation and social signalling involved with using or being suspected of utilizing an LLM. One such example, in which we provide in Figure
3, illustrates a possible consequence. In it, an autistic user remarks at the fact that they were mistaken by a different individual as having used an LLM — to which the user responds “I’m just autistic.” We believe further research is warranted to determine the permutations of social blame which those who utilize LLMs may experience. However, we do not dismiss the possibility, that ubiquitous and rampant LLM usage may dissolve those concerns altogether in a scenario where the provenance of online communicative content becomes unimportant.
7 Limitations
We note the following as potential factors which may affect the internal and external validity from this work, and describe relevant mitigations where warranted.
Paprika’s Application and Operation. We first note that the LLM used for the study,
Paprika, may exhibit different behavior compared to an unprompted version of the GPT-4. We employed a specific prompt (§
3.1.4) because we observed it gave responses similar in content to the unprompted version of the GPT-4 API, while respecting guidelines on verbosity and tone to better align with the confederate’s output. We further note that as we utilized the GPT-4 API, that its cutoff date of September 2021 may have had an effect on
Paprika’s ability to provide acceptable advice and respond to autistic perspectives.
Additionally, we implemented functionality which limited the LLMs’ verbosity and response time to be similar to that of the confederate (§
3.1.4) with half (5 of 11) of the participant sample. When comparing ratings for participants without (first 6) and with (latter 5) adjustment (§
4.2), we observe that participants in the latter group rated the LLM
higher, and the confederate
lower, than the participant group encountering
Paprika without adjustment. Despite the different configurations of LLMs, this result appears to reinforce our perception that participants’ general preference for the LLM agent included factors outside of increased verbosity and lower response times.
Study Design and Participant Demographics. We note that our study was designed for 90 minutes – given this length, we acknowledge the role that participant fatigue may have played in chatbot interactions towards the end of the study, though we did not observe any significant (statistically or otherwise) differences in quantitative and qualitative results from later interactions. Relatedly, given the potential for chatbot interaction order may have (e.g. whether a participant encounters Paprika or Pepper first), we counterbalanced the order across our participant group.
We observed that our participants included 9 women and 2 men, which has a notably higher proportion of women than the diagnosed population of autistic people at large [
48]. While dimensions of gender identities are known to play large roles in workplace experiences and navigating those challenges, these difficulties are amplified for underrepresented autistic adult employees, especially when finding suitable support [
37,
40,
55,
58] and are often understudied in research [
19]. While this aspect was not specifically explored for this work, we believe a participant group with more women could help illuminate a more diverse range of possible viewpoints than one with more men than women, considering how deeply gender identify relates to social communication issues and norms. Relatedly, we note that our participant group is almost certainly not representative of the larger autistic community owing to the high prevalence of individuals with bachelor’s or advanced degrees. Likewise, none of our participants identified as nonverbal, nor did it seem likely that a substantial part of our group had an intellectual disability.
Though our human confederate (
Pepper) tried their best to apply a consistent approach to answering questions from participants, they did not ask follow-up questions of all participants. To mitigate this potential issue, we validated our confederate’s responses with the practitioner (§
4.3), a professional counselor and job coach. The practitioner rated the confederate’s response as satisfactory, without caveats.
While we believe triangulation from participants’ preferences for
Paprika (LLM) over
Pepper (confederate) across (1) quantitative and (2) qualitative data (§
4.2), alongside (3) participants’ desire to use
Paprika as a sole or primary resource (§
4.2.3), provides grounding for conclusions presented in this work, we acknowledge that this work is exploratory in nature, and the presence of a singular confederate, along with a limited sample size (
n = 11), precludes our ability to make durable conclusions. Relatedly, we acknowledge limitations related to having a single expert review to assess the quality of our LLMs, and expect that future work into and demonstrating the applicability and hazards of LLM advice can benefit from multiple expert agreements – we see that including more job coaches could reduces the risk of overemphasis singularly on expert opinions.
As this study explores potential outcomes of autistic workers relying on publicly available general-purpose LLMs, such as ChatGPT and Gemini (née Bard), we do not make direct or specific claims about future or special-purpose models. Rather, we aim to illustrate the current and potential outcomes, opportunities, and risks from autistic workers’ use of widely-available LLMs, and to demonstrate this use as a matter of exigent concern given current the current progress in LLM development and affordability, existing and growing interest from autism-related communities, and our participants’ enthusiasm for adopting LLM advice despite its flaws.
Overall, this study examines the possible results for autistic workers who use general-purpose LLMs, like OpenAI’s ChatGPT and Google’s Gemini (née Bard). Given our study limitations and the rapidly-changing landscape of state-of-the-art language and foundation models, we do not attempt to critique or otherwise analyze current models, nor speculate about future or specialized models. Our work instead aims to highlight and centers autistic workers’ needs and desire for greater empathy and agency.
Acknowledgments
We would like to thank Amy Tavares, Frank Elavsky, Laura Dabbish, Faria Huq, Jeffrey Bigham, Brianna Blaser (and AccessComputing), the College Autism Network (CANVAS), Cella Sum, Lea Albaugh, Katie Oswald, Paulette Penzvalto, Yunzhi Li, Neeta Khanuja, and Alice Tang, for their help in ideation, development, analysis, and recruitment for this work. We are especially grateful for our anonymous CHI reviewers for their constructive guidance and feedback through the review process.