Emotion recognition through voice analysis

Elka Popova

Emotion recognition through voice analysis

One of the most important information that speech acoustics provide is the expression of emotions. The purpose of this research is to identify the pitch differences between two basic emotions: anger and joy. In order to find answers to this question vocal data have been collected from small group of participants. Results from Friedman’s Two-Way Analysis of Variance by Ranks revealed difference in pitch levels when expressing anger and joy as well as jitter (rap)....Read more

Research paper Social Signal Processing Emotion recognition through voice analysis Master CIS 2014-2015 880243 Social Signal Processing Elka Popova [ANR 216553] prof.dr.E.O.Postma, dr.M.Postma Ilona Isaeva [ANR 291124] Abstract One of the most important information that speech acoustics provide is the expression of emotions. The purpose of this research is to identify the pitch differences between two basic emotions: anger and joy. In order to ﬁnd answers to this question vocal data have been collected from small group of participants. Results from Friedman’s Two-Way Analysis of Variance by Ranks revealed difference in pitch levels when expressing anger and joy as well as jitter (rap).

2 Introduction It is well known that speech is an acoustically rich signal that provides a lot of information about the speaker during vocal interaction. The expression and recognition of emotions are extremely important steps for human communication process and for this reason voice recognition is useful for detecting and identifying speciﬁc affective characteristics between the speakers. However, it is scientiﬁcally proved that basic acoustic features are an indicator for someone’s vocal proﬁle. Human’s voice is a reliable source of emotional signalling. Thus, the capability of recognizing vocal emotional expressions in speech is crucial for creating a more detailed “decoding” of the message, which will lead to better understanding of the expresser’s social signals. Previous study argues that the six basic emotions, which are sadness, anger, surprise, disgust, fear and happiness are very well recognized from prosody and voice quality (Couper, Pell & Kotz, 2011). In linguistics, the prosody includes intonation, stress and rhythm of speech, whereas voice quality refers to pitch, energy and tempo. In this research paper the attention will be mainly brought to pitch analysis. According to the article “Intonation and Emotion: Inﬂuence of Pitch Levels and Contour Type on Creating Emotions”, intonation and a certain pitch levels indicate the true emotions people are expressing while talking (Rodero, 2010). To illustrate, the majority of people speak in uncharacteristically high-pitched voice when they are excited, affected or overwhelmed. In contrast, low-pitch voice expresses neutral feeling, calmness, sadness and boredom. Moreover, jitter is another acoustic characteristic that plays a crucial role in identiﬁcation of particular voices. There are different kinds of jitter (absolute, relative, rap and ppq5), but the methodological part of this paper will mainly focus on analyzing jitter rap’s speciﬁcs. Jitter rap is deﬁned as the relative average perturbation, the average absolute difference between a period and the average of it (Farrus, Hernando & Ejarque, 2006). In other words, jitter is the acoustic characteristic of a voice signal, which is quantiﬁed as cycle- to-cycle variation of fundamental frequency and waveform amplitude, respectively. It is mainly measured by long situated vowels and signiﬁcant differences could be detected between different speaking styles. All the emotion recognition’s vocal features are inﬂuenced by gender, culture and affective state. In most of the cases, it is challenging to make a difference between two emotions that conduct high intensity, such as anger and happiness. Studying the relationship

! Research paper Social Signal Processing ! Emotion recognition through voice analysis Master CIS 2014-2015 ! 880243 Social Signal Processing Elka Popova [ANR 216553] prof.dr.E.O.Postma, dr.M.Postma Ilona Isaeva [ANR 291124] ! ! ! ! ! ! Abstract One of the most important information that speech acoustics provide is the expression of emotions. The purpose of this research is to identify the pitch differences between two basic emotions: anger and joy. In order to ﬁnd answers to this question vocal data have been collected from small group of participants. Results from Friedman’s Two-Way Analysis of Variance by Ranks revealed difference in pitch levels when ! ! expressing anger and joy as well as jitter (rap). Introduction It is well known that speech is an acoustically rich signal that provides a lot of information about the speaker during vocal interaction. The expression and recognition of emotions are extremely important steps for human communication process and for this reason voice recognition is useful for detecting and identifying speciﬁc affective characteristics between the speakers. However, it is scientiﬁcally proved that basic acoustic features are an indicator for someone’s vocal proﬁle. ! Human’s voice is a reliable source of emotional signalling. Thus, the capability of recognizing vocal emotional expressions in speech is crucial for creating a more detailed “decoding” of the message, which will lead to better understanding of the expresser’s social signals. Previous study argues that the six basic emotions, which are sadness, anger, surprise, disgust, fear and happiness are very well recognized from prosody and voice quality (Couper, Pell & Kotz, 2011). ! In linguistics, the prosody includes intonation, stress and rhythm of speech, whereas voice quality refers to pitch, energy and tempo. In this research paper the attention will be mainly brought to pitch analysis. According to the article “Intonation and Emotion: Inﬂuence of Pitch Levels and Contour Type on Creating Emotions”, intonation and a certain pitch levels indicate the true emotions people are expressing while talking (Rodero, 2010). To illustrate, the majority of people speak in uncharacteristically high-pitched voice when they are excited, affected or overwhelmed. In contrast, low-pitch voice expresses neutral feeling, calmness, sadness and boredom. Moreover, jitter is another acoustic characteristic that plays a crucial role in identiﬁcation of particular voices. There are different kinds of jitter (absolute, relative, rap and ppq5), but the methodological part of this paper will mainly focus on analyzing jitter rap’s speciﬁcs. Jitter rap is deﬁned as the relative average perturbation, the average absolute difference between a period and the average of it (Farrus, Hernando & Ejarque, 2006). In other words, jitter is the acoustic characteristic of a voice signal, which is quantiﬁed as cycleto-cycle variation of fundamental frequency and waveform amplitude, respectively. It is mainly measured by long situated vowels and signiﬁcant differences could be detected between different speaking styles. All the emotion recognition’s vocal features are inﬂuenced by gender, culture and affective state. In most of the cases, it is challenging to make a difference between two emotions that conduct high intensity, such as anger and happiness. Studying the relationship !2 between speech and emotional states is difﬁcult, and progress depends on ﬁnding forms of description that apply to those states (Cowie, 2000). However, authors did not pay their full attention to comparing the two basic emotions of anger and happiness. Therefore, the aim of this research is to analyze them by posing the question of what are the voice pitch differences between expressing joy and anger? Based on this research question, one hypothesis is being formulated: Voice pitch increases when expressing an emotion of joy. There is a lot of scientiﬁc literature on the topic, but there is also a lot of individual variation in emotion recognition through voice analysis. This is the reason behind the decision of using a within-participant comparison. ! Method ! Participants In order to collect vocal recordings we asked 18 participants (10 females, 8 males) to voluntarily take part in the experiment. The participants were chosen on a random basis as those were students encountered on campus. All of them were of above 18 years of age and were promptly informed about the conditions of the experiment and the way their data will be used. However, “age” was not used as a variable in this research. ! Design A within subject-design was chosen for this research as the subjects have to participate in both conditions created. We created a condition in which each of the participants read out loud 2 short sentences (Happiness “I always love spending time with you” Anger “Get out of my sight I don’t want to see you again”), which are the same for each participant. The participant was asked to imagine a situation where a person dear to him is in front of him and read out the ﬁrst sentence that contained a positive message and provoked positive emotions in the participant while he was reading it. This indicated the emotion of joy. Then, we asked the participant to imagine a situation where a person he despises is in front of him and to read out the second sentence that contained a negative message and provoked negative emotions in the participant while reading it. This indicated the emotion of anger. ! ! !3 Instrumentation Every sentence was recorded with either a mobile device. The analysis was carried out only with the permission of the participant. Since most mobile devices record in a .m4a format, a conversion of the ﬁles was necessary. After compiling the corpus, we converted each of the recording into a .wav so that PRAAT can recognise it. ! Preprocessing The obtained results were analysed with behavioural statistical methods. PRAAT was used for the recordings and SPSS was used for data analysis. For each of the recordings data on maximum and minimum pitch was extracted as well as on jitter rap (Relative Average Perturbation). ! ! Results This analysis comprises of three dependent variables (min pitch, max pitch and jitter) and two independent variables (gender and emotion). Firstly, we began by calculating whether there is a normal distribution of the variables. There are 36 valid cases, 0 missing. None of the dependent variables was found to be normally distributed, min pitch (M = 118.72, SD = 41.28, Zskewness = 1.38, Zkurtosis = -1.21), max pitch (M = 279.23, SD = 84.60, Zskewness = -.49, Zkurtosis = -1.24) nor Jitter (M = .01, SD = .004, Zskewness = 1.90 Zkurtosis = -.56). ! The descriptive statistics we ran showed that males express joy [min pitch (M=83.06, SD=9.93), max pitch (M=200.02, SD=40.66)] with a lower pitch than anger [min pitch (M=108.55, SD=37.81), max pitch (M=236.64, SD=60.33)]. The opposite is observed with females, where the expression of joy [min pitch (M=154.92, SD=29.45), max pitch (M=381.83, SD=37.50)] has a higher pitch than anger [min pitch (M=130.81, SD=44.08), max pitch (297.18, SD=57.19)]. From Figure 1, it can also be observed that on average males use lower pitch than females for both emotions. ! ! ! ! ! ! ! !4 ! ! ! ! ! ! ! ! ! ! ! ! Figure 1: Pitch levels according to emotion and gender There are two independent variables - Emotion (anger, joy) and Gender (male, female) and three dependent variables (max pitch, min pitch and jitter rap). Due to the small sample size (36) and the non-normal distribution of variables, it was decided to perform a non-parametric test. We used a related-samples non-parametric test where SPSS determines the right type of test according to the variables computed at 95% level of conﬁdence. The Friedman’s Two-Way Analysis of Variance by Ranks is an alternative to the Factorial ANOVA which would have been used if there was a normal distribution of variables. The Friedman’s test ranks variables according to their mean per related group. However, the only data we need from the Friedman’s test to prove or reject our hypothesis is the ChiSquare, degrees of Freedom and Signiﬁcance level. From the test performed, we can conclude that there was a large statistically signiﬁcant difference in pitch depending on which type of emotion was vocally expressed, χ2(4) = 140.308, p = 0.00. Therefore, we reject the null hypothesis and retain the alternative hypothesis. ! ! ! ! ! !5 Conclusion and Discussion Based on the results above, it has been concluded that there were statistically signiﬁcant differences between the pitch levels for anger and joy. The data supports the hypothesis that “Voice pitch increases when expressing an emotion of joy”. However, our research established that this is only valid for the females as males increase their voice pitch when they express anger. Despite the large numerical differences in pitch levels among emotions and genders, these numbers may be biased by the small sample size. It is therefore recommended that such an analysis is performed with an increased sample size of minimum 50 participants. ! Research implementation This research generated a useful insight into the vocal properties of males and females when expressing emotions of anger and joy. Such a research can be implemented in plenty of areas where it would be useful. For instance, in eHealth, patients with heart conditions (and previous cardiac arrests) may beneﬁt from sensors detecting/recording their voice activities. When the pitch of an angry voice reaches a critical level (for males approximately 500Hz) for a certain time period, necessary interventions can be made in order to decrease blood pressure in a timely manner. Such preventive actions may prove to be useful especially to patients that live alone or do not have access to immediate healthcare. The ﬁndings of joy pitch levels for women can be used in advertising by creative agencies for instance. If people react to joyful voices by mirroring the emotion, the advertised product is prone to generate a higher revenue. When mirroring joy, the customer is generating higher levels of endorphin which may lead to spontaneous purchase decisions. ! Limitations There are some limitations in this study that cannot be ignored. The main limitation is the empirical part of the study. The data analysis was conducted of small number of participants (only 9 participants per each emotion). Another limitation may relate to the fact that none of the participants was an actor – their vocal recordings were absolutely genuine. However, it is much more difﬁcult to analyze genuine vocal expressions, compared to posed ones. It is known, that when people know their voice is being recorded they feel stressed or unable to show their true emotions. Third limitation could be that in the research only the difference between anger and happiness was taken into account. Signiﬁcant difference could be measured if more than two basic emotions !6 were compared with one another. The improvement of those facts could raise the accuracy of the results. ! Future research Voice analysis is very useful for great variety of other scientiﬁc ﬁelds like healthcare, affective gaming, computer science, education, telecommunication, security etc. All of these ﬁelds could beneﬁt from different future researches, in order to improve computer-human interaction. For the future studies, researchers may focus on measuring and comparing vocal differences between other than the two basic emotions, mentioned in the research. Scientists could also focus on analyzing voice parameters in order to detect speaker’s age/culture/ ethnicity. For other future research it could be also interesting to detect people’s personalities according to their vocal cues (extrovert/introvert). This will be useful for education: teachers will be able to understand student’s personality and help them release stress, or motivate them to participate more in different school activities. Moreover, voice could be indicator for intention and predict people’s action towards a situation. In security voice detection machines could be useful to predict criminals moves while being interrogated. ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !7 ! Bibliography ! Bachorowsi, J. (1999). Vocal expression and perception of emotion. Current directions of psychological science, 8(2), 53-57. Bachorowsi, J., Owren,M. (1995). Vocal Expression of Emotion: Acoustic Properties of Speech Are Associated with Emotional Intensity and Context. Psychological science, 6(4), 219-224. Cowie,R. (2000). Emotional states expressed in speech. ITRW on Speech and Emotion Newcastle, 5-7. Farrus,M., Hernando, J., Ejarque, P. (2006). Jitter and shimmer measurements for speaker recognition. TALP Research Center, Department of Signal Theory and Communications Universitat Politècnica de Catalunya Gorish, J., Wells, B., Brown, G.J. (2011). Pitch contour matching and interactional alignment across turns: an acoustic investigation. Language of speech, 55(1), 57-76. Henton, C. (1995). Pitch dynamism in female and male speech. Language & Communication, 15(1). 43-61. Pell, M.D., Kotz, S.A. (2011). One the time course of vocal emotion recognition. PLoS ONE, 6(11). Simon-Thomas, E.R., Keltner, D.J., Sauter, D., Sinicropi-Yao, L., Abramson,A. (2009). The voice conveys speciﬁc emotions: Evidence from vocal burst displays. Emotion, 9(6), 838-846. ! Sobin, C., Alpert, M. (1999). Emotion in speech: The acoustic attributes of fear, anger, sadness and joy. Journal of psycholinguistic research, 28(4). !8

Blockchain allows you to perform operations in a clear and secure manner in the digital environment. Blockchain is a decentralized database with no singular central control. Whenever a new transaction occurs in the system or changes are made to an existing transaction, this change is checked on all entries on the network and if most of the entries in the system confirm the new transaction as authentic, it will be included in the system. If most of the records in the system do not confirm the authenticity of this new transaction, it is not recorded in the system. In the event of this decentralized system without a centralized control, the chain can operate continuously without the need for a center. Blockchain was spoken in the original Bitcoin article of Satoshi Nakamoto, published in 2008. In this article blockchain is defined as a set of data blocks that are cryptographically interlinked with one another, and a technology component that underlying the cryptocurrency although the word "blockchain" is not mentioned. Bitcoin is the first application of blockchain technology. Its success is one of the biggest reason of its widespreading. The first reaction from scholars and legislators to this dissemination is how should regulate this technology in the context of Bitcoin. Blockchain technology can be applied not only to Bitcoin but also to many other areas. This revolutionary technology has emerged as an entirely new system of data storage and management, and is inherently democratic in nature. It is not controlled by a single person or entity. The format of processing and storage of the data is decentralized and transparent. When a data block is saved, this operation is permanent and it is not possible to hack the system and changing the data is impossible. The success of this technology in cryptocurrency markets has made it possible to use it in other areas. General elections are only one of these areas. Blockchain technology has shown that a credible democracy is possible for voters as a system that removes the odds of corruption and violations of the rules that are present in the traditional voting process. Elections that based on blockchain technology has begun to be implemented in some countries. In this article, defining the outline of blockchain technology, examining the practices in general elections in various countries and identifying the advantages and disadvantages that can be encountered are aimed.

硕士学位等级☀《西蒙菲莎大学毕业证购买》微信95270640《SFU毕业证模板办理》文凭、本科、硕士、研究生学历都可以做,《文凭SFU毕业证书原版制作SFU成绩单》《仿制SFU毕业证成绩单西蒙菲莎大学学位证书pdf电子图》毕业证【微信95270640】【大学毕业证、学历认证、文凭、学位证、成绩单等】代办国外（海外）澳洲英国加拿大韩国美国,新西兰等各大学毕业证，修改成绩单分数，学历认证，文凭，diploma，degree 产品篇有的产品线：专业办理全球毕业证成绩单（英国，美国，加拿大，澳洲，新西兰，新加坡，德国，法国，荷兰，意大利，奥地利，日本，韩国等等国家）专业办理【真实大使馆认证（留学人员回国证明，总领事馆认证）存档】专业办理【真实教育部国（境）外学位学历认证，网上可查，真实存档】专业办理OFFER,COE,学生卡，学校信封，海外驾照，大学面试通知书，没有我们办不到，只有您想不到您的所需，我们的专注，您之所急，需您之所需！！拥有多位海外归来的专业学历学位咨询顾问，他们或许是您的校友，能够为您提供一对一专业咨询服务，可电话，可面谈！拥有专业的服务团队，为您的学历学位认证提供专业，，安全的服务！拥有成熟稳定的使馆领馆资源，加拿大，英国，澳洲，美国均可办理！拥有暴力的教育部学历学位认证资源，为您的回国发展保驾护航！拥有完善的自动办公系统，让您的订单能够得到完善快速完美的处理！为您提供安全的服务，可电话咨询，可视频，可面谈，可当面交易，让您没有后顾之忧！付款支付宝，微信转账，国内中农工建四大银行，国外接受外币西联转账，让您足不出户便可拿到自己想要的东西！拥有专业的印刷工厂和印刷设备，钢印，激凸，凹凸，激光标，各种烫金，烫银一应俱全，以及目前拥有的水印防伪技术，保证产品的精美精细，让您内心小鹿不再忐忑不安！毕业证除了钢印之外，毕业证背面都有激光标，成绩单都有水印，所以请同学们擦亮自己的眼睛哦！国外的UPS（空运），FedEx（空运），DHL（空运）,TNT（空运）,国内的顺丰（空运），EMS（空运和陆运），有长期稳定的合作关系，国外一般3-4天可达，网上，国内1-3天可达，网上，，让您所需的资料安全、快捷的送达您的手中！版权归志远教育顾问Hank所有，除非授权，谢绝转载！诚招代理：本公司诚聘当地代理人员，如果你有业余时间，或者你有同学朋友需要办理，或者你想帮助他们，有兴趣就请联系我志远教育长期招聘海外合作中介，让您的求学之路不再枯燥，还可帮助广大留学生朋友拿到文凭，认证~！

Log In

Emotion recognition through voice analysis