Work in Progress

Public Access

Affective Typography: The Effect of AI-Driven Font Design on Empathetic Story Reading

Authors:

Hae Won ParkAuthors Info & Claims

CHI EA '23: Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems

Article No.: 26, Pages 1 - 7

https://doi.org/10.1145/3544549.3585625

Published: 19 April 2023 Publication History

All formats PDF

Abstract

When people listen to stories, the words are colored with emotions conveyed through prosodic features beyond the text alone. Visual font design provides an opportunity to enhance the empathic quality of a story compared to plain text. In this paper, we present the design, implementation, and evaluation of Affective Typography (AffType), an AI-driven system that extracts prosodic information and sentiment from speech and maps these properties to typographic styles.1 We conduct a crowdsourced study (N = 140) to assess how different font design elements impact readers’ empathy with personal stories. While our empathy survey results were not statistically significant, we found that participants had a preference for color to express emotion and saw an increase in average empathy for stories with color-based text alterations. In addition, we offer design insights as to what display features best convey emotional qualities of personal stories for future applications that use affective fonts to create more expressive digital text.

1 Introduction

Many of the most meaningful human-human connections are fostered through sharing of personal stories and empathizing with emotional experiences [11, 21, 27]. Although people consume pages of digital content everyday, digitized text can reduce the true empathic impact of a story to a stale, desensitized version if design of the text is not carefully considered. In contrast, speech conveys information beyond words alone – there is emotional richness in the way people tell stories. Representing these prosodic features as visual cues could enhance empathy in reading experiences by injecting the human quality back into the text. In addition, such a system could help convey emotion in the voice to people with hearing impairments or trouble with emotion identification.

In this paper, we specifically answer the question: What design elements should affective fonts have in order to visually convey emotion and improve empathy for spoken stories? We present the design and evaluation of Affective Typography (AffType), an AI-driven system that converts prosodic and sentiment-related information present in speech into typographic properties and emojis. Prior works have explored how to convey speech features through text in order to improve accuracy of the perceived emotion [2, 6]. However, we are specifically interested in how affective typography impacts a reader’s ability to empathize with a storyteller and create more expressive text, not just accuracy of the conveyed emotion. Our system implementation leverages deep learning and lexicon-based sentiment models, as well as speech feature extraction pipelines to generate affective text. In addition, our system is easily transferable to digital text, as we use boldness, spacing, color, and emojis, which are available across most front-end applications and web browsers. Through a crowdsourced study (N = 140), we evaluate our system’s ability to improve empathy on personal stories with different emotional arcs. We do not find statistically significant results across font conditions, but we do identify a preference for color to express emotions. Our preliminary findings motivate more extensive work on using intelligent visual design to convey affective information from modalities that may be less accessible in digital settings and which can enhance the emotions in text.

2 Related Work

Stories have the ability to elicit empathy and help us connect with others [1, 4, 10]. In fact, some research has shown that when telling a story to a second listener, speakers and listeners couple their brain activity, indicating the neurological underpinnings of story sharing [12, 24]. Our work is interested in how AI-driven fonts can be used to convey emotional information present in spoken stories, potentially improving salience and emphasis of a storyteller’s emotions, and ultimately, empathy towards the narrator.

Within the human-computer interaction community, many existing interfaces have explored how to creatively convey emotion information to users. Example interfaces include intelligent mirrors that generate emotionally-relevant poetry [20], personalized animated movies for self reflection [19], and visual user interfaces as prosthetics to improve emotional memory [17]. We focus on using fonts as our visual communication tool, since text is low resource and easy to transfer across a wide variety of applications.

Designers understand the importance of typography in conveying meaning [5, 14], and using computational techniques to render intelligent typography has been explored by prior research. These works used acoustical features of speech such as loudness, pitch, and speed, which were represented by corresponding variations in font, and found that changing the appearance of text over time through kinetic typography enabled users to add emotional content to texts during instant messaging [3, 16, 25]. However, most of these early works were not automatically voice-driven, as the text effects were manually applied, and more recent advances in affective computing allow for advanced computer systems that better capture emotions present in speech and text. Utilizing these methods to automatically detect features related to emotion can be leveraged to create more expressive digital fonts.

Most relevant to our work are prior works that have explored how to bring text to life through speech modulated typography and voice-driven type design, as well as understanding what qualities of font better convey information from speech [7, 26]. For example, more recent works used automatically generated chat balloons for texts shared during instant messaging [2, 6]. These designs show promising results towards using automated systems to support personal expression and emotion understanding through visually conveying information such as arousal and valence. Our work builds on existing work by incorporating both prosodic features and sentiment analysis of the user’s recorded speech, which we automatically translate into variations in typographic design. We combine methods used in existing interfaces and convey affective information through font weight (loudness), font spacing (pace), color (positive/negative sentiment), and emojis (general sentiment). In contrast to prior works, we focus on measuring the effect of font design elements on readers’ empathy, not just accuracy of the perceived emotion.

3 System Design

Our system takes in speech, extracts text and prosodic information, and then renders the text with a font conveying the predicted emotion and speech features (Fig. 1). Sentiment is represented by font color and emojis, and prosodic features, such as loudness and pace, are represented by letter boldness and letter spacing. We chose these mappings based on prior research in voice driven and kinetic typography [7, 26], as well as how people can intuitively interpret louder speech as more bold text, and letter spacing naturally maps to speech pacing. Additionally, prior work demonstrates that people associate certain colors with different types of emotions, such as green, which evokes positive emotions [15]. An example output of a story is shown in Fig. 2.

Figure 1:

Figure 2:

The story audio file is passed to the Google Cloud Speech API for automatic speech recognition (ASR) in order to extract the text, as well as timestamps for the approximate start and end times of each individual spoken word. We then extract the prosodic features of the speech using OpenSmile [8]. Specifically, we use the loudness feature, where values are normalized over the entire audio clip using min-max normalization. We then calculate a loudness factor for each word by merging the timestamps from the transcript and the OpenSmile outputs. This factor is used to determine font weight. Next, the text transcript is passed into two sentiment models: DeepMoji [9], which selects the top emoji for the emotion of a given sentence, and VADER [13], which we use to choose text color based on whether the sentence is positive, negative, or neutral. Originally, we used speech affect recognition to map these sentences to colors based on the six basic emotions, but found that this made the text too difficult to read. Font weight and spacing are applied at the word level, and color and emoji are applied at the sentence level in order to better contextualize the emotion. Note that we have redundancy in how sentiment is conveyed, as we wanted our system to offer multiple sources of interpretations of emotion, whether that be through emojis or colors.

We synchronize the text and audio using timestamps outputted by ASR and calculate a proxy for speaker pace by taking the number of letters divided by the length of the time interval. While this approximation is not as accurate as using the number of syllables divided by the length of the time interval, since syllables are more closely related to speaker pace than number of letters, we found that this approximation was good enough for capturing the information needed to display pace in an intelligible way. Based on the model outputs from the server, we render the final font using CSS properties. Our system is fully automated, and we open source our code to aid further research in this area. In addition, we implemented a frontend web interface in React where users can upload audio stories to our Flask server and see their responses in real time.

4 Experiment

To evaluate the effect of our generated fonts on empathy during story reading, we conducted a crowdsourced study with N = 140 participants through Prolific [18]. Our study procedure was approved by our university’s Institutional Review Board committee as an exempt study. Participants were predominantly between 18-24 years of age (32, 27 age 25-29, 22 age 55+, 20 age 30-34, 15 age 35-39, 11 age 40-45, 7 age 50-55, and 6 age 45-49), predominantly women (88 women, 46 men, 4 non-binary, 1 transgender woman, and 1 transgender man), and predominantly white (102, 11 Hispanic, 11 Black, 6 Asian, 2 Middle Eastern, 8 other).

At the beginning of the study, we asked for demographic information of the reader, including age range, gender, and ethnicity. We then randomized participants into one of seven conditions (20 participants per condition): (1) control/regular text (abbr. regular) (2) font conveying loudness (abbr. bold) (3) font conveying pace (abbr. spacing) (4) font conveying sentiment with emojis (abbr. emoji) (5) font conveying sentiment with color (abbr. color), (6) font conveying loudness, pace, and sentiment with color (abbr. bold + spacing + color), and (7) font conveying loudness, pace, and sentiment with both emojis and color (abbr. all features). We hypothesized that compared to the regular text condition, participants’ empathic responses would increase in all other conditions. Participants in each condition were asked to read two different stories that were run through AffType and rate the extent to which they empathized with the story as well as answer a free response question about what they empathized with in the story. To assess empathy towards the narrator, we used the 12-item State Empathy Scale [22], which takes into account the affective, cognitive, and associative components of empathy when receiving messages. The stories we ran through the interface were chosen from StoryCorps,² a site containing short recordings of personal stories. Our selected stories are included in the Appendix. We selected two stories with different emotional trajectories to control for variations in empathy based on the overall emotional tone of the story. Note that for each participant, we randomized the order the stories were presented in to control for ordering effects, but the two stories were rendered with the same features. At the end of the task, we asked participants to rate how well the font properties corresponded to speech features, as well as free responses on their likes, dislikes, and improvements for the way the text was presented.

5 Results

5.1 Effects on Empathy

We first analyze results for the State Empathy Scale [22] across conditions. To calculate p-values, we use a one-sided Mann Whitney U-test, as we identify through a Shapiro-Wilk test across conditions that the data distribution is not normal. We determine statistical significance using a p-value threshold of 0.0083, adjusted using Bonferroni correction for six comparisons (control/regular condition compared to each six experimental condition). Note that reported p-values are relative to the control/regular text condition. The experimental conditions, bold (mean = 2.72, std = 0.80, p = 0.11), spacing (mean = 2.72, std = 0.68, p = 0.071), color (mean = 2.81, std = 0.82, p = 0.014), emoji (mean = 2.73, std = 0.68, p = 0.043), and bold + spacing + color (mean = 2.57, std = 0.74, p = 0.31), show average increases in empathy, although not significantly to when compared to the regular text condition (mean = 2.47, std = 0.75). Interestingly, the only condition where participants’ empathy decreased relative to the regular text condition was the all features condition (mean = 2.18, std = 1.28, p = 0.71), and this last condition had the greatest standard deviation in empathy scores. From looking at our qualitative data, we hypothesize that this is because participants found the combination of all features jarring and distracted from the underlying emotional meaning of the story. As shown in Fig. 3, participants in the color condition had the greatest increase in average empathy over the regular text condition, followed by emoji, spacing, bold, bold + spacing + color, and all features.

Figure 3:

Looking at psychometric survey data alone offers us one view on how the font elements affected empathy with the stories. In addition, we look at what participants say in their free responses to what they empathized with in the stories. To analyze this, we use dimensions from LIWC (linguistic inquiry and word count) [23]. In particular, we look at the total count of emotional language (emo_pos + emo_neg) used in free responses across conditions (Fig. 4). We find that when compared to the regular text condition (mean = 1.45, std = 2.49), participants in the color condition use the most emotional language on average (mean = 2.73, std = 3.18, p = 0.017), followed by all features (mean = 1.8, std = 3.0, p = 0.37), emoji (mean = 1.77, std = 3.23, p = 0.54), spacing (mean = 1.66, std = 2.82, p = 0.41), and bold (mean = 1.5, std = 3.83, p = 0.74). Again, although the differences are not statistically significant, participants in the color condition used, on average, the most emotional language relative to the regular text condition. We hypothesize that this could be due to the fact that color draws attention to the emotions present in the text. To validate this, we also asked participants if they felt that the font design helped them perceive emotions in the text. We found that, consistent with the LIWC results, participants in the color condition reported the highest average agreement with this statement (mean = 1.95, std = 0.94) when compared to the regular text condition (mean = 1.5, std = 0.93), although not statistically significantly so (p = 0.03).

Figure 4:

While we found that the participants in the all features condition were distracted by too many changes in the font, many participants in this condition wrote meaningful responses to the empathy free response survey question. For example, one participant self-disclosed, “I live in a bordertown and I often think about my grandparents coming to America. They fled the pogroms in Russia. I know we have people fleeing their homeland for varies humanitarian reasons. I worry about how unwelcoming we have become. I do not know what the solution is and how we can actually help other’s be able to stay safely in their homelands.” Another wrote, “I somewhat empathized with the feelings of guilt over escaping a bad situation. This story tangentially reminded me of my experience as an LGBTQ+ individual and how although I’ve experienced oppression and hate, others in the community have experienced it to a much harsher extent.” Future work can further explore this relationship between the font properties and self disclosure when empathizing with another person’s story.

5.2 Design Considerations

We hypothesize that some challenges in conveying emotion through AffType are limitations in the speech to font mappings. At the end of the study, we asked each participant to rate their agreement with how effective each font design change (bold, spacing, color, or emoji) was in capturing the intended speech characteristic (loudness, pace, positive/negative sentiment, or general sentiment). Overall, we found that participants were, on average, neutral to these mappings, with a slight preference towards boldness (mean = 2.19, std = 1.22), followed by color (mean = 1.94, std = 1.31), emoji (mean = 1.86, std = 1.28), and spacing (mean = 1.82, std = 1.24). Although the alterations we made to the text were motivated by prior works [7, 26], in our application, the effectiveness of these mappings could be improved. In the rest of the section, we provide insight from participants’ responses on their likes, dislikes, and suggestions for what font design elements could improve empathy with personal stories.

For each of the following analyses, we used qualitative coding to identify core themes in participants’ responses. Three researchers independently coded the survey responses, and commonalities were extracted as major themes. As shown in Table 1, participants liked the more natural and human quality to the text, commenting on how "it felt like someone was speaking to me" and "it looked more personal." Some participants preferred spacing for matching the pace of the story and readily understood that color was associated with emotion. Other participants found boldness helpful in drawing emphasis to specific points. From Table 2, we see that participants disliked the way the font interrupted the flow of the story and commented that there was a lack of correlation between the style and meaning of the story due to too many text alterations. In addition, participants commented that emojis were not effective in promoting empathy, as they affected how the writing style was perceived and made it appear more childish.

Table 1:

Theme	Example
Styles gave the text more personality	"It looked more personal, like handwriting, since it doesn’t look like the typical typed text"
Styles conveyed spoken elements to some participants	"I think it felt like someone was speaking to me. Completely telling the story from their view."
Spacing sometimes helped users match the pacing of the story	"I think it is helpful for some to feel the pauses in the text as spaces that were placed.”
Color was easily understood to emphasize emotion	"Colored text I read more passionately and felt more emotion from it"
The style helped draw attention to certain parts of the story	"The boldness of text was helpful for emphasizing points."

Table 1: What participants liked about AffType with respect to empathetic story reading

Table 2:

Theme	Example
Styles visually interrupted the flow of the story	"Not that it was necessarily a hinderance, but some of the spacing was different which made me think about that rather than the story.”
Emojis affected how the writing style was perceived	"The emojis made it feel a but silly and not serious, like I was reading a facebook post.”
There was sometimes a lack of correlation between style and meaning	"...emphasizing all kinds of words in the story, not just the ones that made sense for the emotional impact."

Table 2: What participants disliked about AffType with respect to empathetic story reading.

Finally, participants expressed what they wished was different about the way the text was displayed in order to increase empathy with the story. As shown in Table 3, a major theme was using standard writing to convey emotion in the story with minimal unnatural text alterations. For example, participants suggested including explanations of how something was spoken, such as the tone they used, if they sighed, or what their facial expression was. Others suggested using common text formatting like italics and paragraphs, indicating that seamless integration of the narrators spoken emotions into the text is an important property of the system. Finally, participants suggested using photos instead of emojis to augment the story and preserve the formal quality of the writing as well as using the text to contextualize the narrator’s experiences better.

Table 3:

Theme	Example
Explanations of how something was spoken	Adding in explanations of tone, pauses, sighs, and facial expressions.
Use of other text formatting like italics, indentation, and paragraphs	“I would break up the stories into multiple paragraphs, and I would use italics to emphasize important points.”
Use photos to augment the story	“I would maybe add a photo/image to help people visualize the person who is telling the story in some way”
Contextualizing the story	"more information about the author, and information about the setting"

Table 3: What participants would change about AffType with respect to empathetic story reading.

Based on participant feedback and survey results, we summarize the following design insights for AI-driven empathetic fonts: (1) readable – alterations in text should not distract from clarity of the story, (2) natural – speech to font mappings should be intuitive, (3) colorful – colors represent emotions well, (4) appropriate – alterations to text should not affect how the writing style is perceived (eg. emojis make writing more informal), (5) explainable – speech characteristics should be explained directly by the text, and (6) personalized and culturally sensitive – use of features could be interpreted differently across people and cultures.

6 Conclusion and Future Work

Our work expands on interfaces that can better convey human qualities through computational means, without the reduction that often occurs when human stories are digitized. While we did not prove our original hypotheses, we found that participants preferred colored text for empathetic story reading and desired more features such as explanations of how something was spoken and improved readability.

There are limitations to our interface, which we identified through our user study. In particular, it is possible that long-term exposure to empathetic text could lead to fatigue or loss of sensitivity to the empathy the text is intended to foster. Our current design does not take this into account. Furthermore, the limitations of our user study are the short duration and scale of the study. If participants had interacted with the system over a longer period of time, perhaps some of the text features would not have been so jarring, or they might have become desensitized to the font changes. In addition, we only asked participants to read two hand-curated stories. Using stories with more diverse emotional trajectories could also have improved the user understanding of how the font features correlate with emotions present in the narrator’s voice. Finally, we did not look at how different demographic characteristics affect the usability of the system. For example, younger people might be more receptive to features like emojis.

Based on participant feedback, there is future work that could improve the capabilities of our system in expressing emotions and fostering empathy in spoken stories. In particular, the text can be integrated with spoken emotions in a more natural and seamless manner. For example, one participant commented that the way something is written is important for empathizing with a piece of text. One idea could be to use prompting methods with large language models to generate explanations of the way something is spoken. Therefore, instead of using uncommon visual elements like letter spacing or boldness, the language itself could bring life to the emotions and contextualize the narrator’s experiences.

For future applications, it would be interesting to explore how this work could be used in video captioning to generate more empathic captions. Additionally, this system could be used to help individuals reflect on emotions present in stories and help them notice communication patterns. For example, one could easily look at the affective text and see if their words are laced with negativity or if their voice became quieter at the end of sentences due to lack of confidence. Rendering text in this way can be a creative means for users to engage with personal, spoken stories. Further work can explore using the system to convey elements of speech to people with hearing impairments, trouble with emotion identification, and more generally, in storytelling applications to improve empathy and human connection.

Acknowledgments

We would like to thank our participants and all of our teammates who have contributed constructive comments to our project. This work was supported by an NSF GRFP under Grant No. 2141064.

Footnote

https://storycorps.org/

Supplementary Material

MP4 File (3544549.3585625-talk-video.mp4)

Pre-recorded Video Presentation

Download
52.08 MB

MP4 File (3544549.3585625-video-preview.mp4)

Video Preview

Download
15.27 MB

References

[1]

[1] Mary E. Andrews, Bradley D. Mattan, Keana Richards, Samantha L. Moore-Berg, and Emily B. Falk. 2022. Using first-person narratives about healthcare workers and people who are incarcerated to motivate helping behaviors during the COVID-19 pandemic. 299 (2022), 114870. https://doi.org/10.1016/j.socscimed.2022.114870

Abstract

1 Introduction

2 Related Work

3 System Design

4 Experiment

5 Results

5.1 Effects on Empathy

5.2 Design Considerations

6 Conclusion and Future Work

Acknowledgments

Footnote

Supplementary Material

References

Cited By

Index Terms

Recommendations

Perceptions of Cognitive and Affective Empathetic Statements by Socially Assistive Robots

Emotional Responses to Font Types and Sizes in Web Pages

Supporting affective communication in the classroom with the Subtle Stone

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

HTML Format

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations