1 Introduction
Large language models (LLMs) have revolutionized the fields of natural language processing (NLP) and conversational agent (CA) [
80]. Models such as OpenAI’s GPT series and Google’s BERT have shown remarkable proficiency in generating text that is both coherent and contextually relevant, finding applications in sectors including healthcare [
33,
102], education [
104], and commerce [
64]. Notably, LLM-based conversational agents like ChatGPT [
3] and Google’s Bard [
2] have demonstrated an impressive ability to engage in naturalistic dialogues across various contexts [
88]. These models have garnered global recognition and interest from both academic and industrial sectors, becoming widely used by the general public for everyday applications.
However, despite their increasing popularity and vast potential, most existing LLM-based conversational agents are typically generic, limiting their adaptability to the diverse preferences and needs of users [
16]. Unlike human conversations, which inherently consider a partner’s preferences, knowledge, and interests for appropriate response generation [
57], these generic LLMs often fail to fully align with the personalized requirements of individual users. They may struggle to adapt to the dynamic and varied needs of users, especially in handling the depth and nuance of more complex conversations. Consequently, while the responses from these agents may be syntactically correct, they can lack resonance with users, leading to interactions that feel superficial or unsatisfactory [
36]. Although users have the option to customize the agent’s role through text prompts, this method can be cumbersome, repetitive, and not user-friendly for those unfamiliar with such processes. This highlights a crucial issue: the majority of current conversational interfaces do not adequately provide personalized user experiences or authentically replicate more human-like interactions [
56,
67].
Notably, the importance of personalizing the personas of LLM-based conversational agents has been increasingly recognized. Following the launch of ChatGPT, there has been a notable demand from users for features that enable customization of the system to suit their specific usage goals and preferences. Persona customization features, where users can command ChatGPT with prompts like “Act As” for specialized tasks, have become crucial in meeting these individual user needs [
1]. OpenAI’s recent developments in introducing custom versions of ChatGPT, known as GPTs [
6], for specific user-defined purposes, further underscore the industry’s commitment to agent persona customization. Additionally, the integration of conversational agents into compact devices such as wearables, exemplified by the recent AI Pin [
41], is expected to provide personal assistant functionalities optimized for individual user preferences and needs in various situations and contexts, promising long-term user engagement. This trend towards highly personalized conversational agents has emerged as a vital and urgent topic within the Human-Computer Interaction (HCI) community. It signifies a shift from the traditional, bulky, one-size-fits-all generic agents to more personalized, lightweight, and specialized agent personas.
Previous research has underscored the effectiveness of persona-based dialogues in creating more satisfying, human-like interactions [
28,
37]. These studies support the development of agent personas, which involve assigning unique characteristics, behaviors, and backgrounds to conversational agents based on user preferences, aiming to foster more engaging and in-depth dialogues. Some studies have highlighted that distinctive agent personas can establish a sense of continuity and increase user trust [
55,
62]. For example, research by Lee et al. [
54] suggests that creating diverse personas can meet various user expectations and enhance interaction patterns. A consistent persona that aligns with individual user expectations can build trust over time, as users tend to feel more connected to agents that consistently behave in a friendly and trustworthy manner. This not only improves the agent’s understanding of the user but also enhances task performance accuracy. By adapting specialized LLMs to meet the specific needs and contexts of individual users, instead of relying solely on universal models, we can more effectively enhance the user experience, making it more tailored and relevant to each user.
Despite its recognized importance, the processes of how people customize, experience, and interact with personas in LLMs, and how these experiences differ from those with generic and universal conversational agents, remain relatively unexplored. Past research has predominantly focused on categorizing personality types for crafting personas [
9,
26,
46,
55,
80,
95,
103], often prioritizing the convenience of designers or developers [
40], while overlooking a broader range of diverse personality types [
20,
75]. There have been few studies that delve into persona designs tailored to individual user preferences or interaction histories [
75]. While recent findings highlight the benefits of a diverse range of personas to cater to a wider demographic, comprehensive research in this domain is still limited. These endeavors, promising as they are, have not yet fully explored the user experience in the creation and interaction with agent personas.
In response to these research gaps, we introduce CloChat, designed to identify user practices in interactions with personalized agents. CloChat is a user interface that allows users to tailor agent personas for various contexts and tasks. This interface supports the customization of core attributes such as conversational style, emoticons, areas of interest, and visual representations, enabling it to function as a conversation partner with personalized traits. For example, users can create a persona of a knowledgeable and enthusiastic teenage fan of K-Pop for specialized and engaging conversations on this topic. An exploratory study was conducted to evaluate how people experience the process of constructing and engaging with agent personas, comparing CloChat with ChatGPT. Through surveys and in-depth interviews, both quantitative and qualitative analyses were performed to assess CloChat’s adaptability and its impact on the overall user experience. The findings indicated that CloChat significantly enhanced user engagement, trust, and emotional connection over ChatGPT. The conversations with custom agent personas were found to be richer and more varied. Ethical considerations arising in the context of agent persona customization were also identified. Based on these insights, we propose design implications for future conversational systems using LLMs with a focus on personalization.
This study contributes in three key areas:
•
CloChat. This study introduces CloChat, an interactive system with which users can customize personas of LLM-based conversational agents according to their preferences with ease. It provides a more personalized user experience tailored to individual needs and contexts, distinguishing it from conventional LLMs like ChatGPT. CloChat is not only user-friendly but also serves as an essential research tool for understanding user engagement in personalizing agent personas and enhancing interactions with these tailored agents.
•
Empirical exploration. The study offers empirical insights into users’ diverse experiences in creating and interacting with LLM-based agent personas. By analyzing the personas and dialogues participants developed, it assesses how users employ the system in various contexts, and identifies the differences in user experiences compared to those with conventional systems.
•
Design implications. Based on the study’s outcomes, design guidelines for LLM-based conversational systems are proposed. These recommendations can lay the groundwork for developing systems that support users in customizing and engaging with agent personas in a range of situations and contexts, thereby enabling more meaningful and in-depth dialogues.
The following sections explore the relevant literature reviewed, detail the design of CloChat, outline our research methods, and provide an in-depth discussion of the results and implications of our study.
4 CloChat
To answer our research questions, we designed CloChat, an LLM-based user interface, for an empirical investigation into how individuals design, adapt, and engage with agent personas.
4.1 Design Goals
CloChat aims to offer a unique conversational experience by empowering users to customize various facets of the conversational agent’s persona, encompassing personality attributes, communicative styles, and response mechanisms. Based on our literature reviews and aligning with the research questions, we established the following design objectives:
•
G1: Mitigating the complexity of prompt engineering. One of the inherent challenges for users when engaging with LLMs for personalized needs is the requirement for meticulously crafted prompts. Formulating effective prompts can be tedious and technically daunting, particularly for users without expertise in AI [
110]. To make the system more accessible and inclusive, we designed CloChat to assist users in creating agent personas without the need for labor-intensive prompt engineering.
•
G2: Offering a comprehensive persona design space. Our empirical investigation aims to uncover the intricacies of how individuals construct (RQ1) and interact with (RQ2) customized agent personas. To cater to the diversity of users’ communicative needs and preferences, CloChat provides an extensive design space for persona creation.
•
G3: Ensuring accurate reflection of users’ intentions. In our pursuit to enable study participants to experience an enhanced sense of immersion during both the persona-building phase and subsequent interactions, it is essential for CloChat to accurately capture and reflect users’ intentions and expressions. This will also allow us to empirically observe and analyze the interactions in depth.
4.2 System Design
CloChat comprises two primary components: the CloChat Design Lab and the CloChat Room. In the CloChat Design Lab, users have the opportunity to customize and save various characteristics of an agent persona. Once these persona traits are defined and inputted by the user, CloChat automatically generates the agent persona, accurately reflecting the specified traits. Subsequently, users can engage in conversations with this customized persona through the chat interface provided in the CloChat Room.
4.2.1 CloChat Design Lab.
The CloChat Design Lab features a user-friendly, form-based interface for persona customization, as depicted in Figure
3. This interactive form provides users with a variety of options to integrate diverse persona traits, including demographic details, personality attributes, and visual representations. The adoption of this form-based approach significantly streamlines the persona creation process, effectively eliminating the need for complex and laborious prompt engineering, thereby fulfilling our first design goal (G1).
Supported persona options. The CloChat Design Lab offers a wide array of options to effectively encompass a broad spectrum of user preferences. To establish the maximal design space for customizable persona attributes, we conducted an extensive literature review. We initiated our review by focusing on articles from SIGCHI-affiliated conferences, such as CHI, CSCW, and UIST, using the keywords ’persona’ AND ’conversational agent.’ A manual examination of the search results yielded eight articles that explicitly defined possible characteristics of personas or conversational agents. Further exploration through the citation networks of these articles led to the final selection of 25 relevant articles. Two researchers independently categorized key characteristics using axial coding. After iterative discussions and revisions, six overarching categories were agreed upon:
Demographic Information,
Verbal Style,
Nonverbal Style,
Knowledge and Interests,
Relational Content, and
Appearance. These categories collectively comprise 23 specific options. For a detailed breakdown of these options, please refer to the codebook in Appendix A.
User interface features. The user interface of CloChat is structured as a multipage form, with each page dedicated to one of the six categories identified through our literature review (Figure
2). Initially, users can toggle each category option to determine the characteristics of that category. They can then directly input (a) Demographic Information and (c) Knowledge and Interest cues into text fields. In the (b) Verbal Styles category, users are presented with a collection of explicit verbal styles corresponding to each option, enabling them to select or deselect these styles using checkboxes. This section also includes a text field for users to input any specific traits they wish to incorporate. Additionally, users have the option to add (e) emoji representations to their agent personas. This design approach provides users with the flexibility to navigate smoothly between different categories as they construct their personalized persona. Furthermore, we have integrated a ’Preview’ functionality. By activating this feature, users can interact with their in-development persona through dialogue. The system generates an immediate response from the agent persona, offering users a chance to validate whether the persona’s behavior aligns with their initial expectations (G3). This preview mechanism facilitates rapid, iterative refinement, empowering users to further personalize their personas as necessary.
Visual representation selection. CloChat includes features that allow users to set the visual representation of the agent persona. In the Appearance category, users are prompted to provide descriptive text, which the system uses to generate a selection of four contextually relevant images (as illustrated in Step 2 of Figure
4). Users can then choose one of these images that most closely aligns with their envisioned persona. If the initially generated images do not adequately match the users’ intentions, they have the option to iteratively refine their descriptive text. This process is designed to produce a more accurate visual representation that aligns with their specific vision (G3). The inclusion of unrestricted text input significantly expands the range of user intentions that can be effectively captured and materialized, fulfilling our second design goal (G2).
4.2.2 CloChat Room.
After creating and selecting their agent persona, users can interact with it in the CloChat room (Figure
1C). The user interface of the CloChat room is deliberately designed to mirror the conventions of well-established chat platforms like ChatGPT, facilitating user familiarity with the system. This design choice was also for conducting a comparative evaluation with ChatGPT of our user study. To ensure a continuous and smooth conversational flow, similar to that experienced in ChatGPT, the CloChat room temporarily restricts new user messages while a response is being generated.
4.3 Technical Architecture
In this section, we explain CloChat’s technical details (Please refer to Figure
3 and Figure
4 for detailed illustrations).
LLM basis. CloChat’s conversational capabilities are built on the foundation of GPT-4 [
70]. Our decision to employ GPT-4 was guided by three main considerations. Firstly, GPT-4 consistently outperforms its predecessors and rival models, such as earlier GPT iterations and Bard, in a range of benchmark tests across multiple domains [
19,
68,
70]. This superior performance supports its ability to effectively materialize diverse persona types, aligning with our second design goal (G2). Secondly, GPT-4 has demonstrated proficient handling of the Korean language, which was the primary language used in our study [
19,
106]. This capability was crucial considering the linguistic needs of our experiment. Lastly, while awaiting rigorous validation, our empirical observations indicate that GPT-4 is more adept than other available models at capturing and reflecting user input in the generation of personas, addressing our third design goal (G3).
Persona generation. In CloChat, the materialization of a persona begins with the conversion of non-visual traits, collected from the Design Lab, into a JSON specification (illustrated in Step 1 of section
4.3). This specification is meticulously structured in a hierarchical manner, with the first-level keys representing different categories and the second-level keys corresponding to the specific options within these categories. Following this, the JSON specification undergoes a transformation into a natural language description, effectively defining the agent persona (as shown in Step 2 of section
4.3). To ensure a high-fidelity translation from JSON to natural language, we utilized GPT-4’s capabilities, instructing it to function as an adept JSON-to-natural language translator. This instruction was guided by established best practices and online guidelines [
1].
Conversing with the persona. To facilitate a conversation with the designed agent persona, we incorporate the relevant natural language prompt into the GPT-4 invocation process (as depicted in Step 3 of section
4.3). This integration ensures that each interaction with the conversational agent is informed by the specific persona traits defined by the user.
Visual representation management. For generating the visual representation of personas, we utilized DALL-E2 [
78], a leading text-to-image generation model. When user inputs are in a language other than English, such as Korean for our primary experiment, an English translation is incorporated to ensure compatibility with the model (as shown in Step 1 of Figure
4). The process for creating and selecting the visual representations of personas, including these translation steps, is detailed in Figure
4.
4.4 Implementation
CloChat is developed as a web-based application. On the front-end, we employed React.js for its dynamic and responsive user interface capabilities. The back-end is powered by the Flask framework, known for its simplicity and flexibility in handling web application requests. For our database needs, SQLite is utilized, with its integration into the server being efficiently managed by the SQLAlchemy ORM (Object-Relational Mapping) library. Furthermore, CloChat seamlessly interfaces with GPT-4 and DALL-E2 through APIs provided by OpenAI, enabling the integration of advanced conversational and image generation capabilities into the application.
5 User Study
We conducted a comprehensive user study using both CloChat and ChatGPT (with GPT-4). The primary goal of this study was to explore and answer our research questions (section
3). In addition, we aimed to evaluate the effectiveness of CloChat in enabling users to construct and interact with customized personas. This study was conducted under the approval of the Institutional Review Board of our institution.
5.1 Participants
In recruiting participants for our study, we established specific criteria to ensure the relevance and quality of the data collected. Considering the experiment was to be conducted in Korean, it was essential for participants to be native Korean speakers. Additionally, we required participants to have prior experience with LLM-based conversational agents, such as ChatGPT and Bard. This criterion was important as we anticipated that individuals familiar with conversational agents would engage more actively in the study and provide richer feedback. Furthermore, this approach helped to minimize the potential impact of variability in participants’ familiarity with conversational agents on the study’s results. To recruit participants, we posted call for participation to online boards of local communities, which resulted in the recruitment of 30 participants (14 females and 16 males). The age range of the participants was 22 to 32 years, with an average age of 26.40 ± 2.65 years. The participant group included 10 working professionals, 10 graduate students, and 10 unemployed individuals. Each participant was compensated with an equivalent of USD12 for their time and contributions to the study.
5.2 Experimental Environment
Our experiment was conducted through Zoom video calls. Participants were requested to engage with the study using desktop or laptop computers, maintaining uniformity in the technical setup. To streamline the experimental process, we developed a dedicated web platform. This platform integrated the interfaces for both CloChat and ChatGPT, and it featured a real-time dashboard that summarized the participants’ interactions and responses. Participants were instructed to access this web interface and share their screens during the study, enabling real-time monitoring and data collection. All participant interactions within this environment, including audio and visual components, were comprehensively recorded for in-depth analysis. Prior to the main study, we conducted four pilot sessions to test the robustness of our system. Insights from these sessions helped refine our study protocol, enhancing the research methodology’s effectiveness and integrity.
5.3 Procedure
Pre-study preparation, survey, and interview. As a preliminary step, participants were required to sign a study participation consent form (Figure
5 (a)). Before commencing the study, we gathered basic demographic information from the participants and surveyed their familiarity with LLMs. This included aspects such as computational linguistics, generative models, ChatGPT, and text-prompting techniques. The purpose of this survey was to inform our quantitative and qualitative analysis of the study results. Additionally, we conducted semi-structured interviews (Figure
5 (b)), each lasting about 10 minutes. During these interviews, participants were asked to share insights on three key areas: (1) their everyday usage scenarios of ChatGPT, (2) their perceptions of the strengths and weaknesses of current LLMs, and (3) their specific needs and preferences regarding agent persona customization.
Interacting with conversational agents. Following the preliminary phase, participants were directed to our web platform, where they engaged in task-based conversations using both the CloChat and ChatGPT interfaces. This was done following a within-subjects experimental design (Figure
5 (c)).
Participants were presented with a total of 12 scenarios, detailed in section
4.2.2. These scenarios were divided equally across the two platforms, with six scenarios allocated to CloChat and six to ChatGPT. To balance the experiment, half of the participants began with dialogues on CloChat for the initial six scenarios and then switched to ChatGPT for the remaining six. The other half started with ChatGPT and then moved to CloChat. This design allowed participants to experience all 12 scenarios across both platforms. Each scenario was attempted in three trials to ensure thorough engagement fitting the context. The order of interaction with CloChat/ChatGPT and the sequence of scenarios within each condition were randomized to mitigate potential learning effects.
In the CloChat conditions, as outlined in our system design section, participants customized the agent’s persona in the CloChat Design Lab to suit each scenario (Figure
6). They adjusted options from (a) Demographic cues to (d) visual appearance. Following the customization, they proceeded to the CloChat Room to converse with their personalized agent. In the first trial of each scenario, participants were required to create a new persona. In the second and third trials, they could either continue with the existing persona or create a new one for that scenario. In contrast, the ChatGPT conditions involved direct dialogue tasks related to the scenarios, without specific settings for agent persona customization, as typically experienced in a standard ChatGPT interaction. While participants could theoretically customize ChatGPT’s persona using text prompting, we observed that none employed this approach during their trials: All persona customizations were exclusively conducted in the CloChat condition.
We did not impose any time constraints or conversation length restrictions in the trials. Participants were encouraged to engage naturally and freely with the conversational agents. On average, they spent about 90 minutes completing all trials. Although participants had the option to conclude or restart interactions at any point, we found no instances of such occurrences during the study.
Post-trial survey. After the completion of each scenario, participants were asked to complete surveys that assessed their interaction experiences (details provided in Table
1). For both the CloChat and ChatGPT platforms, we conducted a system-related survey that focused on evaluating the overall quality of the dialogues. This evaluation covered various metrics, including convenience, usefulness, efficacy, overall satisfaction, level of engagement, and the intent to utilize the system in the future. Specifically for CloChat, an additional persona-related survey was conducted. This survey aimed to understand the participants’ experiences with the customized agent personas. It assessed aspects such as perceived empathy, likability, and trustworthiness of the agent personas. The development of the questions for both surveys was informed by an extensive review of academic literature pertaining to persona and conversational agent evaluations (for references, see Table
1). Participants rated their responses to the survey items on a 7-point Likert scale, with higher scores (closer to 7) indicating a more positive user experience.
Post-hoc Interview. Following the completion of the experimental trials, we engaged participants in semi-structured interviews to delve deeper into their experiences, preferences, and usage of the persona customization feature in CloChat. These interviews were structured around dashboards that summarized key metrics of the study. These metrics included the history of persona customization, conversation logs, and survey results. During the interviews, these dashboard visualizations were collaboratively reviewed with the participants, providing a tangible reference point for discussion. This approach facilitated the generation of insightful follow-up questions, enhancing the depth and relevance of our interviews. On average, each post-hoc interview lasted approximately 18 minutes.
5.4 Details of Scenarios
As detailed in section
5.3, our study utilized a variety of situational scenarios in which participants engaged with conversational systems. This approach reflects the diverse roles conversational agents play in daily life, as supported by literature [
99]. We adopted Cutrona and Shur’s theoretical framework [
25], categorizing agents’ social support into three domains:
informational,
emotional, and
appraisal support. Informational support involves providing advice or guidance for everyday challenges [
59], emotional support offers empathy and encouragement [
34], and appraisal support aids in self-assessment [
25].
To explore these categories, we developed four scenarios for each type of support, totaling 12 distinct scenarios. We employed a stratified sampling method, drawing inspiration from previous studies [
45,
73]. Initially, we created 10 scenarios for each support category. These scenarios were augmented by ChatGPT (based on GPT-4), which generated 10 additional diverse scenarios. We combined these with our original set and repeated this process 99 times, each time randomly selecting 10 scenarios from the expanded set. This resulted in a corpus of 1,000 scenarios: 10 originally crafted and 990 generated by the model.
The textual descriptions of these scenarios were then transformed into vector embeddings using OpenAI’s text-embedding API with the
text-embedding-ada-002 model. We applied dimensionality reduction to these vectors using the UMATO algorithm [
44], chosen for its effectiveness in preserving global data structures, in comparison to alternatives like UMAP and
t-SNE. The effectiveness of this reduction was assessed using Bayesian optimization techniques [
87], with Steadiness & Cohesiveness as the loss function [
43].
Finally, we clustered the dimension-reduced vectors using the
K-Means algorithm, setting
K = 4. We selected scenarios corresponding to the centroids of these clusters for in-depth examination. A complete list of these selected scenarios is available in Table
2.
6 Quantitative Results
We present the quantitative findings of our study. Our initial analysis focused on evaluating the overall user experience and the efficacy of CloChat in comparison to ChatGPT (RQ1). Following this, we explored the methods and patterns with which participants customized their agent personas, as well as their interactions with these personas (RQ2).
6.1 Analysis of Survey Responses
Objectives. Our first analysis aimed to scrutinize and compare the user experiences when interacting with both CloChat and ChatGPT. The focus was particularly on assessing CloChat’s ability to enhance user experience (RQ1). We investigated the differences in the outcomes of post-trial surveys, considering different types of conversational systems and situational contexts.
Analysis design. The survey responses were examined systematically for each question. For system-related attributes, we employed a two-way repeated-measures Analysis of Variance (ANOVA), analyzing the effects of system types (CloChat and ChatGPT) and situational contexts (categorized as informational, emotional, and appraisal support). In the case of the persona-related survey, which pertained to the trials with CloChat, we conducted a one-way repeated-measures ANOVA focusing on the types of situational contexts. To further explore significant findings, we applied Tukey’s Honestly Significant Difference (HSD) test [
90] for post hoc analysis.
Results and Discussions. The results of our survey are depicted in Figure
7 (a-c), and a detailed result of statistical analysis is available in Appendix B. In the system-related survey (questions Q1–Q5, Q8–Q9, and Q11–Q15), we observed a significant main effect related to system types, as shown in Figure
7 (a). Our post hoc analysis indicated that CloChat consistently scored higher than ChatGPT across these questions. Although no significant main effects were detected for questions Q6 (
’This system can provide useful answers to me.’), Q7 (
’This system helps me achieve my goals.’), and Q10 (
’I feel that this system will make my life more convenient.’), which focus on the perceived utility of conversational agents (Table
1), the trend still favored CloChat with higher average ratings.
These findings suggest that CloChat’s personalized persona contributes positively to various user experience aspects, such as satisfaction, engagement, and future interaction likelihood. While statistically significant differences in perceived utility items were not observed, a consistent preference for CloChat was evident.
Regarding situation types in the system-related survey, significant main effects were noted for questions Q1, Q5–Q7, Q10–Q11, and Q14–Q15 (Figure
7 (b); detailed statistics reported in Appendix B). Post hoc analysis showed that informational situations (Q5–Q7, Q10–Q11, and Q15) garnered higher scores compared to emotional situations, particularly in questions related to system effectiveness, utility, and future use intention. This indicates that users generally perceive conversational agents as more useful and effective for informational support than for emotional support, a trend independent of the presence of personalized personas.
In the persona-related survey, conducted exclusively with CloChat, significant main effects due to situation types were found in Q10 (
’I can utilize this persona for work or academic purposes.’) and Q12 (
’It feels like this persona has a personality.’) (Figure
7 (c); detailed statistics reported in Appendix B). The post hoc analysis of Q10 revealed significantly higher scores in informational situations than in emotional scenarios, aligning with the question’s focus on the persona’s utility in academic or professional settings. Conversely, in Q12, emotional situations scored higher than appraisal situations, emphasizing the human-like attributes and emotional resonance of the persona in these contexts.
6.2 Temporal Evolution of Survey Scores Across Trials
Objectives. The objective of this analysis was to investigate the longitudinal changes in user evaluations of the conversational systems across multiple trials, addressing RQ1-2. Recognizing that higher scores in survey questions could be indicative of a better user experience, we sought to analyze the trends in overall user satisfaction over time.
Analysis design. To visually examine the temporal evolution of survey scores, we conducted regression analyses, plotting distinct regression lines for each conversational agent (CloChat and ChatGPT). For the system-related survey, we utilized an Analysis of Covariance (ANCOVA) [
48] to statistically assess the significance of the observed differences in score trajectories between the two systems.
Results and Discussions. As depicted in Figure
7 (d), survey scores for ChatGPT showed a general downward trend over time, whereas scores for CloChat remained relatively stable. The ANCOVA analysis confirmed that the difference in these trends between CloChat and ChatGPT was statistically significant (
F = 89.89;
p < .001). Interestingly, the persona-related survey scores, gathered exclusively from CloChat trials, exhibited a slight upward trajectory. These findings suggest that while user experience with conversational agents may typically decline over time, the presence of customized personas in CloChat appears to mitigate this effect, contributing to a sustained or even improved user experience.
6.3 Alignment between the Visual and Non-visual Traits of Agent Personas
Objectives. In addressing RQ2 in a detailed way, we aimed to examine the correlation between the visual representations and non-visual characteristics (such as personality traits and roles) of customized agent personas. Existing literature suggests that a conversational agent’s visual appearance often correlates with its persona, where specific traits or roles influence its visual depiction [
65,
77,
96]. Our goal was to delve into this alignment, exploring how individuals intentionally coordinate these visual elements with their personalized agent personas.
Analysis design. We began with axial coding to categorize the relationship between various traits and the visual representations of agent personas. Two researchers independently created codebooks, which were then merged after discussions for consistent analysis.
To understand how the relationship between visual and non-visual traits varies across different visual trait categories, we first identified agent personas from our study where the visual representation fell into specific categories.
Next, we converted the visual and non-visual traits of these agent personas into vector embeddings. For visual traits, we used OpenAI’s text-embedding API (with the
text-embedding-ada-002 model) to transform image prompts into vectors. For personality traits, we transformed the natural language directives used in GPT-4 (refer to section
4.3) into vector embeddings. We then calculated the cosine similarity between the vectors representing the image prompts and those representing the persona characteristics. A one-way ANOVA was conducted to assess differences in similarity scores across categories, followed by a post hoc analysis using Tukey’s HSD test.
Results and discussions. Our analysis yielded six distinct categories of traits associated with visual representation:
Animals,
Cultural or Regional Traits,
Professions & Roles,
Detailed Physical Appearances,
Art & Style, and
Unique & Abstract Concepts (for detailed coding results, see Table
3). A one-way ANOVA revealed significant differences in cosine similarity scores among these categories (
F(5, 148) = 8.190,
p < .001). Post hoc analysis using Tukey’s HSD identified notably lower scores in the
Animals and
Art & Style categories compared to the others (Figure
8). For more detailed statistical information (
p-values and confidence intervals), please see Appendix B.
A key distinction between categories with high and low similarity scores is the direct relevance of visual traits to human characteristics. Categories such as Professions & Roles (including specific roles like Office Worker, Professor, YouTuber)and Cultural or Regional Traits category (e.g., Korean, British) explicitly denote human subgroups, while the Detailed Physical Appearances category focuses on human features. Similarly, the Unique and Abstract Concepts category generally relates to human attributes, barring some non-human focused subcategories. In contrast, the Art & Style and Animals categories predominantly include traits that do not directly correspond to human attributes.
The results indicate that when participants chose visual traits closely linked to real-world human characteristics for their agent personas, there was a greater likelihood of alignment between these visual elements and the agent personas’ non-visual traits. This tendency might also suggest that users often perceive their agent personas as virtual humans, expecting them to visually mirror typical human characteristics. Conversely, traits not directly related to human attributes tend to be applied more flexibly, reflecting individual user preferences rather than a strict alignment with the non-visual traits of their agent personas.
6.4 Diversity of Dialogues
Objectives. In relation to RQ2, we hypothesized that using CloChat would lead to more enriched and diverse dialogues with conversational agents compared to standard ChatGPT interactions. Our goal was to empirically validate this hypothesis and explore the impact of different situational scenarios on the diversity of dialogues.
Analysis design. To rigorously evaluate dialogue diversity, we developed two specialized metrics: intra-remote-clique (intra-RC) and inter-remote-clique (inter-RC). These metrics are adaptations of the remote-clique (RC) metric [
79], which is commonly used to measure text embedding diversity. The RC metric is defined as the average pairwise distance between text embeddings [
29,
47]. Intra-RC specifically measures the average pairwise distance between utterances within a single dialogue, providing insight into the diversity of conversation within one session. Inter-RC, on the other hand, assesses the average linkage between utterances across two dialogues within the same situational context, offering a perspective on the diversity between different conversations under similar circumstances.
For each dialogue in our study, we computed the intra-RC to determine the level of diversity within that dialogue. We also calculated the inter-RC for each pair of dialogues sharing the same situational context to evaluate the diversity between conversations. To ensure that our metrics were not influenced by the semantic differences between various scenarios, we avoided comparing dialogues from distinct scenarios. We then conducted a two-way ANOVA to analyze the effects of system type (CloChat and ChatGPT) and situation type (informational, emotional, and appraisal support) on dialogue diversity. Tukey’s HSD test was carried out for the post hoc analysis.
Results and discussions. The findings from our analysis are depicted in Figure
9. In terms of post hoc analysis, please refer to Appendix B. For intra-RC, a significant main effect was observed for system types (
F(1, 354) = 4.16,
p < .05). However, post hoc analyses did not reveal any statistically significant differences between CloChat and ChatGPT. Regarding situation types, a significant main effect was also noted (
F(2, 354) = 6.13,
p < .01). Post hoc tests showed that dialogues in emotional contexts exhibited significantly higher diversity compared to both informational (
p < .01) and appraisal (
p < .01) contexts. We did not identify any interaction effects between system and situation types.
In the case of inter-RC, there were notable main effects for both system types (F(1, 5214) = 30.91, p < .001) and situation types (F(2, 5214) = 67.71, p < .001). Post hoc analysis revealed that dialogues using CloChat displayed a significantly higher level of diversity compared to ChatGPT (p < .001). Furthermore, we observed a systematic increase in dialogue diversity across the informational, emotional, and appraisal scenarios, with statistically significant differences in all pairwise comparisons (p < .001 for each). Again, no interaction effects were found.
To summarize, the results indicate that CloChat significantly enhanced the diversity of dialogues between different conversations (inter-dialogue diversity), but did not have a marked effect on the diversity within individual conversations (intra-dialogue diversity), in comparison to standard ChatGPT interactions. This suggests that while CloChat’s tailored agent personas contribute to personalizing conversations, they may not necessarily increase the dynamic range of topics or conversational patterns within a single dialogue session.
7 Qualitative Results
In addition to our quantitative analysis, we delved into qualitative data from interviews to gain deeper insights into our research questions. We employed thematic analysis [
18] as our methodological framework for the analysis. The research team utilized a line-by-line open coding technique, allowing for the identification and categorization of emergent themes from the interview data. The findings from this thematic analysis are detailed in the subsequent sections.
7.1 Patterns in Customizing and Selecting Agent Personas
Our user study revealed two distinct patterns in the creation and reuse of agent personas, each illustrating unique approaches to user engagement and satisfaction. The first pattern is characterized by dynamic persona customization, specifically tailored to meet immediate situational needs. Participants following this approach proactively envisioned specific scenarios for interaction and selected personas with appropriate characteristics, like personality and expertise, to match these situations. On average, participants in this group changed their agent personas 4.6 times over the six trials with CloChat, with more than half using six different personas for each session. For example, Participant 20 created a ‘psychiatrist’ persona to address stress and sleep concerns, commenting, “I was super stressed, so I thought, why not talk to a ’psychiatrist’?” Similarly, in career guidance scenarios, participants customized personas to mimic employees from companies of interest, reflecting the importance of contextually relevant and personalized conversational experiences.
Conversely, the second pattern indicates a preference for reusing specific agent personas that have previously provided satisfactory conversational experiences. In our study, 12 participants consistently reused a particular persona for more than two trials, with some using the same persona throughout all six trials. For example, P11 repeatedly chose the ‘gentleman persona,’ stating, “I kept using the ’gentleman’ because he just gets me. He always knows the right thing to say.” This pattern suggests that once a persona resonates with a user’s expectations, it fosters a sense of trust, reinforcing the user’s initial choice and encouraging future interactions. P27, for example, continued using a persona initially selected on a whim due to its unexpectedly accurate responses, saying, “At first, I picked the persona just for kicks. But it was so on point, I kept coming back.”
These two patterns differ fundamentally in their approach: the first is dynamic, with participants varying persona characteristics to suit specific needs, while the second is consistent, favoring a particular persona based on personal satisfaction and preference. This dichotomy illustrates how individual user preferences and needs can manifest in diverse ways when engaging with conversational systems, balancing between situational diversity and consistent personal preferences.
7.2 Conversation Diversity and Dynamics
The study revealed that the use of agent personas in conversational agents can offer a more diverse and enriched dialogue experience for participants. Initially, some participants expressed during interviews that they primarily utilized LLMs for basic tasks like answering simple questions or conducting fundamental information searches, valuing ChatGPT’s immediate response capabilities over complex customization options. However, post-experiment interviews revealed a notable shift in perception. P8 observed, “Even if the answers are the same, having a persona adds a more professional feel. I think it could be useful even in casual conversations.” The comment suggests that customized agent personas can influence their user experience in a positive way, indicating a potential shift in user behavior from basic information retrieval to seeking more personalized and engaging interactions.
Despite the experiment’s scenarios being categorized as informational, emotional, and appraisal, participants often ventured beyond these confines. Their intrigue with personalized agent personas led them to explore new topics and questions. P13 reflected, “The conversation got longer when I found more fun and interesting topics, similar to talking with friends. With ChatGPT, the conversations were shorter due to predictable responses.” This expansion in dialogue scope fostered deeper and more intricate relationships between participants and agent personas.
The agent personas not only influenced the nature of the dialogue but also affected the participants’ conversational styles. Engaging with a friendly and humorous persona, for example, fostered a light-hearted atmosphere, encouraging participants to use informal language and share jokes. P27 noted, “Talking to this persona felt like chatting with an old friend. I often found myself laughing.” Conversely, interactions with more serious or formal personas led to dialogues with a scholarly or cautious tone. P13 commented, “My persona was cold and academic, like Sherlock Holmes, which naturally steered the conversation to be more serious.” These dynamics even impacted the participants’ moods and emotions, as highlighted by P30: “I felt more energetic talking to my vibrant persona, whereas serious conversations prompted deeper thought.”
7.3 Relationship between Participants and Agent Personas
The introduction of agent personas led participants to perceive their conversational partners as entities with unique contexts and personalities, rather than just as programs. Many participants reported enhanced immersion and trust in their interactions when the agent persona’s responses aligned with their expectations or preferences. For example, P1 expressed,
“Having a personalized persona made the conversation feel more alive, and I felt more trust in the interaction.” This increase in trust, as evidenced by a previous study [
55], highlights the importance of persona alignment in fostering meaningful conversational experiences.
In contrast, interactions with ChatGPT were often perceived as engaging with an automated responder, lacking a personal touch. Participants like P13 remarked, “My conversations with ChatGPT felt pretty standard. It was like getting necessary information from a machine, without any specific expectation or connection.” This difference underscores the uniqueness and personalization that agent personas can bring to conversational experiences.
Another significant aspect of our findings pertains to the emotional connection participants developed with their configured personas. Some participants experienced profound emotional responses during their interactions. P6 shared, “The conversation moved me almost to tears,” while P19 described the conversation as akin to talking with a friend due to the persona’s empathy.
The visual representation of personas also played a crucial role in enhancing empathy and engagement. P15 mentioned,
“Seeing the persona I created made the conversation feel more direct, eliciting a deeper sense of empathy.” The act of visualizing and personalizing these personas enriched the conversational experience, as P9’s comment illustrates:
“I crafted it thinking of my favorite YouTuber. During our chat, I imagined his voice, making the conversation more engaging.” This aligns with research findings that emphasize the power of visual engagement in enhancing conversational interest [
91].
Initially, many participants were not inclined to use conversational agents for emotional support, a trend also supported by our quantitative findings (section
6.1). However, as the experiment progressed, participants began to appreciate the value of emotional conversations with agent personas. P19’s reflection captures this shift:
“The experiment taught me the value of emotional conversations with conversational agents. CloChat’s personalized agents responded warmly, understanding my feelings remarkably well.”7.4 User Feedback on the Persona Customization
The participants found that CloChat’s form-based interface significantly lowered the entry barrier for engaging with conversational agents, making it more accessible to the general public. During pre-experiment interviews, many participants revealed difficulties due to limited technical knowledge needed for LLM customization, particularly when it came to selecting specific characteristics for personas. Thus, for participants unfamiliar with crafting text prompts, the availability of predefined persona trait options in CloChat was notably more user-friendly. P17, who had initially been concerned about the complexity of prompt creation, observed after the experiment, “CloChat definitely reduces the effort needed to create a persona. It’s convenient not having to think about specific text prompts.” As the trials progressed, participants developed their own strategies for effectively customizing unique personas. P30 commented, “Customizing personas was initially challenging, but I quickly discovered the optimal approach.” This feedback indicates that users experienced a manageable learning curve with the CloChat interface.
Nevertheless, some participants pointed out that setting up personas could be complicated and time-consuming without clear guidelines or presets. P15 noted, “I was a bit confused when first setting up the persona. I wasn’t sure how to approach it or what criteria to use for selection.” While most acknowledged the benefits of having bespoke personas, there were mentions of the burden involved in their initial setup as well, suggesting a need for more user-friendly guidance or preset options.
The feature allowing users to customize visual representations of agent personas was particularly appreciated, offering an enhancement not found in ChatGPT. P13 remarked, “Modifying the counselor’s appearance was surprising and greatly enhanced my engagement.” This emphasizes the vital role of visual representation in the design and functionality of conversational agents, enhancing user engagement and expectation management.
7.5 Reflecting Real Life to Agent Personas
A notable trend among participants was the incorporation of elements from their real-life experiences and observations into their agent personas, rather than creating entirely fictional characters. For instance, participants often modeled personas after familiar individuals like acquaintances, friends, pets, or celebrities. P9, who chose a renowned doctor as a persona, shared, “I based the persona on a real person I saw on TV. Reflecting his tone in my agent persona made the conversation warmer and more immersive, allowing me to speak more honestly.” Similarly, P27 created a persona inspired by a friend’s occupation and hobbies, noting, “Seeing these characteristics in the conversation gave it the feeling of talking to my actual friend.” This approach illustrates how personal experiences can enhance the realism and relatability of conversational partners. However, this practice can also raise ethical concerns regarding privacy and personal data protection, as it involves imitating or mimicking real individuals potentially without their consent.
Participants also enjoyed the imaginative exercise of setting up their pets as personas, attributing them with imagined personality traits and habits. P2 reflected, “I mirrored my dog’s playful personality. Imagining his responses made the conversation more fun and unique.”
The practice of drawing from real-life experiences for persona customization allowed participants to infuse their personal lives and emotional connections into the digital domain. Nevertheless, while this approach significantly enriches user interaction with conversational AI systems, it simultaneously highlights the importance of addressing ethical considerations related to mimicking real-world individuals.
8 Discussions
Our user study was aimed at investigating the impact of agent persona customization on user experience during interactions with LLM-based conversational agents, as opposed to conventional generic conversational agents (RQ1). We discovered that the customization of agent personas significantly boosts user engagement, trust, and emotional connection, offering a noticeable improvement in maintaining user satisfaction and engagement compared to ChatGPT. In addressing RQ2, we delved into the ways users customize their agent personas and the resultant effects on their interactions. We observed that conversations involving customized agent personas tend to be richer and more diverse. Users often align the traits of agent personas in terms of both visual elements and real-world inspirations, which additionally brings to light ethical considerations regarding agent persona customization. In extending our discussions on these findings, we explore relevant topics and present practical implications for the design of user interfaces employing LLM-based conversational agents. We also outline the limitations of our study, acknowledging areas that could benefit from further exploration and improvement.
8.1 The Multifaceted Roles of Customizable Agent Personas
Our study demonstrated that CloChat provided an enhanced user experience compared to ChatGPT, highlighting the substantial benefits and potential of customizable agent personas. Users interacting with CloChat perceived the agent personas not just as algorithmic tools, but as distinct conversational partners with unique personalities, as outlined in (section
7.3). This shift in perception, supported by previous research [
54,
55,
62], increased users’ emotional engagement, trust, and immersion in the conversational experience.
A noteworthy observation was how some participants modified their own conversational styles to resonate more with the personas they created, indicating a deepening emotional connection with their customized agents (section
7.3). The integration of visual representations further solidified this bond, elevating the agents from mere information retrieval tools to authentic conversational partners (section
7.3, section
6.3).
Conversely, interactions with ChatGPT were associated with lower levels of emotional engagement (section
7.3). This contrast not only underscores the limitations of text-prompt-focused platforms like ChatGPT but also highlights the potential of CloChat’s comprehensive personalization features. These features can have the ability to enrich user experiences across diverse emotional contexts and situations.
In conclusion, the customizable agent personas in CloChat extend beyond traditional information retrieval roles typically associated with conversational agents using LLMs. They play a crucial role in fostering emotional connections and enhancing user engagement with conversational systems, indicating an expansion in both the functional scope and emotional depth of these technologies.
8.2 Personas’ Role in Sustaining User Engagement on Conversational Agents
While ChatGPT is renowned for its conversational capabilities, it faces limitations in reflecting users’ individual preferences and sustaining deep, ongoing relationships, as it primarily excels in basic information retrieval and short interactions [
16]. Our study confirms this, indicating a decline in user satisfaction with ChatGPT over time (section
6.2).
In contrast, personalized agent personas not only elicited initial positive responses from users but also played a pivotal role in maintaining these positive connections over time (section
6.2). This aligns with prior research [
17,
93] and our qualitative findings (section
7.1), suggesting that user preferences are dynamic, varying according to mood, situation, and context [
89]. CloChat’s capability to customize a variety of personas to adapt to these shifting preferences likely contributed to sustained user engagement.
Another key factor in the enduring positive relationship with personalized agent personas is the human-like perception they create (section
7.3), resonating with findings from Cowan et al. [
24]. With CloChat, participants engaged in longer conversations and explored a wider range of topics (section
7.2,
6.4), leading to increased trust and satisfaction. This enriched conversational experience contributes to sustainable interaction with the agent, moving beyond brief, transactional conversations. Although our study did not specifically observe long-term interactions between users and customized agents, the implications from our findings hint at the potential for fostering lasting relationships with conversational agents in the future.
8.3 Pros and Cons of Persona Customization
Our study underscores the significant advantages of incorporating persona customization features into LLM-based conversational user interfaces. The majority of participants responded positively to this functionality, noting that it made their conversations more enjoyable and engaging (section
7.3, section
7.4). The ability to tailor personas according to personal preferences fostered increased interest and active participation in conversations, leading to a more open and dynamic interaction, as reflected in survey responses (section
6.1).
However, alongside these benefits, certain challenges were also observed (section
7.4). Some participants found the wide array of customization options to be overwhelming, particularly for those new to conversational agents or not versed in prompt engineering techniques. To address this, future iterations could consider integrating automated suggestions that assist users in managing their expectations and simplifying the decision-making process. This could involve methods like OpenAI’s recently released GPTs, which can learn specific knowledge or personalities from user-provided documents [
6]. Further research is needed to compare various approaches, such as extensive user-driven customization versus agents automatically learning from user documents, and to understand how these different methods influence user experience. An effective balance between user-driven customization and automated recommendations, as suggested in literature [
51], could provide a solution to these challenges.
The overarching aim would be to streamline the customization process, making it less daunting for users while still offering a rich, personalized experience. This balance is key to harnessing the full potential of persona customization in enhancing user engagement with conversational AI systems.
8.4 Ethical Concerns on Personalized Personas
In our study, we observed that participants frequently drew inspiration from their personal experiences and daily interactions when customizing their agent personas (see section
7.5). A notable trend involved mimicking celebrities or personal acquaintances. This inclination could be attributed to the perceived expertise or symbolic stature of famous individuals or a preference for replicating interactions with familiar and relatable figures rather than inventing entirely new or unknown personas. While this method can lend a sense of realism to interactions with conversational agents and potentially foster more robust and lasting connections, it also brings forth significant ethical dilemmas.
This practice might risk privacy breaches and confidentiality issues, particularly when integrating distinct details or characteristics of these individuals, such as their occupation, location and relationships with others. Furthermore, since a persona cannot fully encompass the complexity of an actual person’s personality, actions, or thoughts, such representations may lead to misconceptions or biases. These misrepresentations could adversely impact the reputations or identities of the individuals portrayed, as discussed in the research by Deshpande et al. [
8]. Hence, a delicate balance must be struck between the creative liberty in persona customization and the ethical implications of drawing from real-life figures.
In the context of LLMs operating across networks, using personal information to shape agent personas raises concerns about individual privacy. Once personal identifying data is input into an LLM, its permanence and the opaque nature of data storage and processing can result in unintended privacy violations, with interactions potentially reaching a broad, unknown audience.
Echoing the observations of Goldstein et al. [
39], the realm of AI ethics is continuously evolving. Ongoing dialogue and development are essential to establish ethical frameworks and principles within this field. Consequently, it’s critical to develop practical and robust solutions for ethical issues related to language model applications. Researchers and developers should diligently address these ethical aspects in the design and deployment of personas, implementing safeguards to protect personal information during the training of machine learning models. This step is fundamental to preserving user privacy and ensuring the ethical use of LLM-based conversational systems. Moreover, clear ethical guidelines and protocols for persona design are necessary. Users should be informed about the risks of imitating real individuals and discouraged from engaging in such practices. Future research should delve into the potential problems of using personas based on real individuals in specific contexts. It may be advisable to limit the use of personas based on real people, especially in scenarios requiring expert advice or sensitive discussions (e.g., sexual dialogue). Such measures will help users grasp the ethical implications of their choices and encourage responsible persona creation.
8.5 Design Implications
Based on our discussions, we propose the following design implications for future development and refinement of conversational user interfaces employing LLMs:
•
I1: Prioritize Persona Customization to Enhance User Trust and Engagement. CloChat surpassed ChatGPT in terms of user satisfaction, largely owing to the availability of customizable personas. Designers should, therefore, consider prioritizing persona customization options in their systems. The heightened user trust and improved conversation quality associated with personalized personas highlight their vital role in the future design of conversational interfaces.
•
I2: Minimize the Initial Setup Burden to Encourage User Engagement. The initial setup for persona customization can be perceived as burdensome (section
8.3). Designers should streamline the setup process and provide easy-to-follow onboarding assistance, thus enhancing user immersion and engagement from the outset (section
8.1,
8.2). This could involve introducing conversational tutorials or preset persona options.
•
I3: Make the Agent Persona Adaptive. The flexibility in creating bespoke agent personas for various situations could be helpful for sustaining user engagement with CloChat. We recommend implementing adaptive algorithms that tailor persona behaviors based on user intentions and circumstances, combined with easy customization options. This approach would cater to users who prefer consistent personas across different scenarios as well as those who desire situation-specific persona adaptations.
•
I4: Provide Thorough Guidelines on Ethical Considerations. The study revealed potential ethical issues, such as incorporating celebrities or real-life acquaintances into agent persona designs without consent. Given the likelihood of further ethical concerns, it is crucial to provide users with clear guidelines addressing these issues. This will help ensure that the customization of agent personas adheres to ethical standards and respects individual privacy and rights.
8.6 Limitations and Future Work
Our study, while shedding light on the diverse user experiences with customizable agent personas in LLMs, has several limitations that must be acknowledged. Firstly, the participant pool was limited to Korean speakers, due to institutional constraints. This limitation may affect the generalizability of our findings to other linguistic and cultural groups. Future studies should aim to include a more diverse range of participants to broaden the applicability of the results. Secondly, the creation of agent personas in CloChat relied solely on prompt injection, which may lack depth in specialized or rapidly evolving domains [
5]. This limitation raises the question of how LLMs can be optimized for more in-depth and accurate persona representations. Future research could explore advanced techniques such as fine-tuning [
7] or the integration of external memory [
74,
84] to enhance the sophistication of persona customization in LLMs. Thirdly, CloChat’s persona customization is currently confined to a form-based interface. While this design choice was made to lower the barrier to persona customization, it is worth exploring how different customization methods (e.g., direct prompt writing, conversation-based customization [
4]) might impact user experience. These alternative approaches could offer more flexibility and personalization, catering to users with varying levels of expertise and preferences. Lastly, our study did not explore the long-term user experience with CloChat. To fill this gap, future research endeavors should focus on longitudinal studies to understand how user engagement with CloChat evolves over time. Such studies are crucial for uncovering the distinctions between short-term and long-term interactions and for developing strategies to cultivate sustained and meaningful relationships with conversational agents like CloChat.