Carlo Strapparava

2024

pdf bib abs
Big-Five Backstage: A Dramatic Dataset for Characters Personality Traits & Gender Analysis
Marina Tiuleneva | Vadim A. Porvatov | Carlo Strapparava
Proceedings of the Workshop on Cognitive Aspects of the Lexicon @ LREC-COLING 2024

This paper introduces a novel textual dataset comprising fictional characters’ lines with annotations based on their gender and Big-Five personality traits. Using psycholinguistic findings, we compared texts attributed to fictional characters and real people with respect to their genders and personality traits. Our results indicate that imagined personae mirror most of the language categories observed in real people while demonstrating them in a more expressive manner.

pdf bib abs
Context Matters: Enhancing Metaphor Recognition in Proverbs
Gamze Goren | Carlo Strapparava
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Despite the remarkable achievements of Large Language Models (LLMs) in various Natural Language Processing tasks, their competence in abstract language understanding remains a relatively under-explored territory. Figurative language interpretation serves as ideal testbed for assessing this as it requires models to navigate beyond the literal meaning and delve into underlying semantics of the figurative expressions. In this paper, we seek to examine the performance of GPT-3.5 in zero-shot setting through word-level metaphor detection. Specifically, we frame the task as annotation of word-level metaphors in proverbs. To this end, we employ a dataset of English proverbs and evaluated its performance by applying different prompting strategies. Our results show that the model shows a satisfactory performance at identifying word-level metaphors, particularly when it is prompted with a hypothetical context preceding the proverb. This observation underscores the pivotal role of well-designed prompts for zero-shot settings through which these models can be leveraged as annotators for subjective NLP tasks.

pdf bib abs
Multimodal and Multilingual Laughter Detection in Stand-Up Comedy Videos
Anna Kuznetsova | Carlo Strapparava
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

This paper presents the development of a novel multimodal multilingual dataset in Russian and English, with a particular emphasis on the exploration of laughter detection techniques. Data was collected from YouTube stand-up comedy videos with manually annotated subtitles, and our research covers data preparation and laughter labeling. We explore two laughter detection approaches presented in the literature: peak detection using preprocessed voiceless audio with an energy-based algorithm and machine learning approach with pretrained models to identify laughter presence and duration. While the machine learning approach currently outperforms peak detection in accuracy and generalization, the latter shows promise and warrants further study. Additionally, we explore unimodal and multimodal humor detection on the new dataset, showing the effectiveness of neural models in capturing humor in both languages, even with textual data. Multimodal experiments indicate that even basic models benefit from visual data, improving detection results. However, further research is needed to enhance laughter detection labeling quality and fully understand the impact of different modalities in a multimodal and multilingual context.

2022

pdf bib abs
Making People Laugh like a Pro: Analysing Humor Through Stand-Up Comedy
Beatrice Turano | Carlo Strapparava
Proceedings of the Thirteenth Language Resources and Evaluation Conference

The analysis of humor using computational tools has gained popularity in the past few years, and a lot of resources have been built for this purpose. However, most of these resources focus on standalone jokes or on occasional humorous sentences during presentations. In this paper I present a new dataset, SCRIPTS, built using stand-up comedy shows transcripts: the humor that this dataset collects is inserted in a larger narrative, composed of daily events made humorous by the ability of the comedian. This different perspective on the humor problem can allow us to think and study humor in a different way and possibly to open the path to new lines of research.

pdf bib abs
CorEDs: A Corpus on Eating Disorders
Melissa Donati | Carlo Strapparava
Proceedings of the RaPID Workshop - Resources and ProcessIng of linguistic, para-linguistic and extra-linguistic Data from people with various forms of cognitive/psychiatric/developmental impairments - within the 13th Language Resources and Evaluation Conference

Eating disorders (EDs) constitute a widespread group of mental illnesses affecting the everyday life of many individuals in all age groups. One of the main difficulties in the diagnosis and treatment of these disorders is the interpersonal variability of symptoms and the variety of underlying psychological states that are not considered in traditional approaches. In order to gain a better understanding of these disorders, many studies have collected data from social media and analysed them from a computational perspective, but the resulting dataset were very limited and task-specific. Aiming to address this shortage by providing a dataset that could be easily adapted to different tasks, we built a corpus collecting ED-related and ED-unrelated comments from Reddit focusing on a limited number of topics (fitness, nutrition, etc.). To validate the effectiveness of the dataset, we evaluated the performance of two classifiers in distinguishing between ED-related and unrelated comments. The high-level accuracy of both classifiers indicates that ED-related texts are separable from texts on similar topics that do not address EDs. For explorative purposes, we also carried out a linguistic analysis of word class dominance in ED-related texts, whose results are consistent with the findings of psychological research on EDs.

pdf bib abs
NewYeS: A Corpus of New Year’s Speeches with a Comparative Analysis
Anna Tramarin | Carlo Strapparava
Proceedings of the LREC 2022 workshop on Natural Language Processing for Political Sciences

This paper introduces the NewYeS corpus, which contains the Christmas messages and New Year’s speeches held at the end of the year by the heads of state of different European countries (namely Denmark, France, Italy, Norway, Spain and the United Kingdom). The corpus was collected via web scraping of the speech transcripts available online. A comparative analysis was conducted to examine some of the cultural differences showing through the texts, namely a frequency distribution analysis of the term “God” and the identification of the three most frequent content words per year, with a focus on years in which significant historical events happened. An analysis of positive and negative emotion scores, examined along with the frequency of religious references, was carried out for those countries whose languages are supported by LIWC, a tool for sentiment analysis. The corpus is available for further analyses, both comparative (across countries) and diachronic (over the years).

2020

pdf bib abs
DecOp: A Multilingual and Multi-domain Corpus For Detecting Deception In Typed Text
Pasquale Capuozzo | Ivano Lauriola | Carlo Strapparava | Fabio Aiolli | Giuseppe Sartori
Proceedings of the Twelfth Language Resources and Evaluation Conference

In recent years, the increasing interest in the development of automatic approaches for unmasking deception in online sources led to promising results. Nonetheless, among the others, two major issues remain still unsolved: the stability of classifiers performances across different domains and languages. Tackling these issues is challenging since labelled corpora involving multiple domains and compiled in more than one language are few in the scientific literature. For filling this gap, in this paper we introduce DecOp (Deceptive Opinions), a new language resource developed for automatic deception detection in cross-domain and cross-language scenarios. DecOp is composed of 5000 examples of both truthful and deceitful first-person opinions balanced both across five different domains and two languages and, to the best of our knowledge, is the largest corpus allowing cross-domain and cross-language comparisons in deceit detection tasks. In this paper, we describe the collection procedure of the DecOp corpus and his main characteristics. Moreover, the human performance on the DecOp test-set and preliminary experiments by means of machine learning models based on Transformer architecture are shown.

pdf bib abs
EmoEvent: A Multilingual Emotion Corpus based on different Events
Flor Miriam Plaza del Arco | Carlo Strapparava | L. Alfonso Urena Lopez | Maite Martin
Proceedings of the Twelfth Language Resources and Evaluation Conference

In recent years emotion detection in text has become more popular due to its potential applications in fields such as psychology, marketing, political science, and artificial intelligence, among others. While opinion mining is a well-established task with many standard data sets and well-defined methodologies, emotion mining has received less attention due to its complexity. In particular, the annotated gold standard resources available are not enough. In order to address this shortage, we present a multilingual emotion data set based on different events that took place in April 2019. We collected tweets from the Twitter platform. Then one of seven emotions, six Ekman’s basic emotions plus the “neutral or other emotions”, was labeled on each tweet by 3 Amazon MTurkers. A total of 8,409 in Spanish and 7,303 in English were labeled. In addition, each tweet was also labeled as offensive or no offensive. We report some linguistic statistics about the data set in order to observe the difference between English and Spanish speakers when they express emotions related to the same events. Moreover, in order to validate the effectiveness of the data set, we also propose a machine learning approach for automatically detecting emotions in tweets for both languages, English and Spanish.

pdf bib abs
VROAV: Using Iconicity to Visually Represent Abstract Verbs
Simone Scicluna | Carlo Strapparava
Proceedings of the Twelfth Language Resources and Evaluation Conference

For a long time, philosophers, linguists and scientists have been keen on finding an answer to the mind-bending question “what does abstract language look like?”, which has also sprung from the phenomenon of mental imagery and how this emerges in the mind. One way of approaching the matter of word representations is by exploring the common semantic elements that link words to each other. Visual languages like sign languages have been found to reveal enlightening patterns across signs of similar meanings, pointing towards the possibility of identifying clusters of iconic meanings. With this insight, merged with an understanding of verb predicates achieved from VerbNet, this study presents a novel verb classification system based on visual shapes, using graphic animation to visually represent 20 classes of abstract verbs. Considerable agreement between participants who judged the graphic animations based on representativeness suggests a positive way forward for this proposal, which may be developed as a language learning aid in educational contexts or as a multimodal language comprehension tool for digital text.

2019

pdf bib
Proceedings of the Tenth Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis
Alexandra Balahur | Roman Klinger | Veronique Hoste | Carlo Strapparava | Orphee De Clercq
Proceedings of the Tenth Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

pdf bib abs
Personality Traits Recognition in Literary Texts
Daniele Pizzolli | Carlo Strapparava
Proceedings of the Second Workshop on Storytelling

Interesting stories often are built around interesting characters. Finding and detailing what makes an interesting character is a real challenge, but certainly a significant cue is the character personality traits. Our exploratory work tests the adaptability of the current personality traits theories to literal characters, focusing on the analysis of utterances in theatre scripts. And, at the opposite, we try to find significant traits for interesting characters. The preliminary results demonstrate that our approach is reasonable. Using machine learning for gaining insight into the personality traits of fictional characters can make sense.

pdf bib abs
Anglicized Words and Misspelled Cognates in Native Language Identification
Ilia Markov | Vivi Nastase | Carlo Strapparava
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications

In this paper, we present experiments that estimate the impact of specific lexical choices of people writing in a second language (L2). In particular, we look at misspelled words that indicate lexical uncertainty on the part of the author, and separate them into three categories: misspelled cognates, “L2-ed” (in our case, anglicized) words, and all other spelling errors. We test the assumption that such errors contain clues about the native language of an essay’s author through the task of native language identification. The results of the experiments show that the information brought by each of these categories is complementary. We also note that while the distribution of such features changes with the proficiency level of the writer, their contribution towards native language identification remains significant at all levels.

2018

pdf bib abs
Punctuation as Native Language Interference
Ilia Markov | Vivi Nastase | Carlo Strapparava
Proceedings of the 27th International Conference on Computational Linguistics

In this paper, we describe experiments designed to explore and evaluate the impact of punctuation marks on the task of native language identification. Punctuation is specific to each language, and is part of the indicators that overtly represent the manner in which each language organizes and conveys information. Our experiments are organized in various set-ups: the usual multi-class classification for individual languages, also considering classification by language groups, across different proficiency levels, topics and even cross-corpus. The results support our hypothesis that punctuation marks are persistent and robust indicators of the native language of the author, which do not diminish in influence even when a high proficiency level in a non-native language is achieved.

pdf bib
Metaphor: A Computational Perspective by Tony Veale, Ekaterina Shutova and Beata Beigman Klebanov
Carlo Strapparava
Computational Linguistics, Volume 44, Issue 1 - April 2018

pdf bib abs
The Role of Emotions in Native Language Identification
Ilia Markov | Vivi Nastase | Carlo Strapparava | Grigori Sidorov
Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

We explore the hypothesis that emotion is one of the dimensions of language that surfaces from the native language into a second language. To check the role of emotions in native language identification (NLI), we model emotion information through polarity and emotion load features, and use document representations using these features to classify the native language of the author. The results indicate that emotion is relevant for NLI, even for high proficiency levels and across topics.

pdf bib abs
A Computational Exploration of Exaggeration
Enrica Troiano | Carlo Strapparava | Gözde Özbal | Serra Sinem Tekiroğlu
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Several NLP studies address the problem of figurative language, but among non-literal phenomena, they have neglected exaggeration. This paper presents a first computational approach to this figure of speech. We explore the possibility to automatically detect exaggerated sentences. First, we introduce HYPO, a corpus containing overstatements (or hyperboles) collected on the web and validated via crowdsourcing. Then, we evaluate a number of models trained on HYPO, and bring evidence that the task of hyperbole identification can be successfully performed based on a small set of semantic features.

2017

pdf bib abs
Word Etymology as Native Language Interference
Vivi Nastase | Carlo Strapparava
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We present experiments that show the influence of native language on lexical choice when producing text in another language – in this particular case English. We start from the premise that non-native English speakers will choose lexical items that are close to words in their native language. This leads us to an etymology-based representation of documents written by people whose mother tongue is an Indo-European language. Based on this representation we grow a language family tree, that matches closely the Indo-European language tree.

pdf bib abs
A Computational Analysis of the Language of Drug Addiction
Carlo Strapparava | Rada Mihalcea
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

We present a computational analysis of the language of drug users when talking about their drug experiences. We introduce a new dataset of over 4,000 descriptions of experiences reported by users of four main drug types, and show that we can predict with an F1-score of up to 88% the drug behind a certain experience. We also perform an analysis of the dominant psycholinguistic processes and dominant emotions associated with each drug type, which sheds light on the characteristics of drug users.

pdf bib abs
To Sing like a Mockingbird
Lorenzo Gatti | Gözde Özbal | Oliviero Stock | Carlo Strapparava
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

Musical parody, i.e. the act of changing the lyrics of an existing and very well-known song, is a commonly used technique for creating catchy advertising tunes and for mocking people or events. Here we describe a system for automatically producing a musical parody, starting from a corpus of songs. The system can automatically identify characterizing words and concepts related to a novel text, which are taken from the daily news. These concepts are then used as seeds to appropriately replace part of the original lyrics of a song, using metrical, rhyming and lexical constraints. Finally, the parody can be sung with a singing speech synthesizer, with no intervention from the user.

pdf bib
Proceedings of the 2017 EMNLP Workshop: Natural Language Processing meets Journalism
Octavian Popescu | Carlo Strapparava
Proceedings of the 2017 EMNLP Workshop: Natural Language Processing meets Journalism

pdf bib abs
CIC-FBK Approach to Native Language Identification
Ilia Markov | Lingzhen Chen | Carlo Strapparava | Grigori Sidorov
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications

We present the CIC-FBK system, which took part in the Native Language Identification (NLI) Shared Task 2017. Our approach combines features commonly used in previous NLI research, i.e., word n-grams, lemma n-grams, part-of-speech n-grams, and function words, with recently introduced character n-grams from misspelled words, and features that are novel in this task, such as typed character n-grams, and syntactic n-grams of words and of syntactic relation tags. We use log-entropy weighting scheme and perform classification using the Support Vector Machines (SVM) algorithm. Our system achieved 0.8808 macro-averaged F1-score and shared the 1st rank in the NLI Shared Task 2017 scoring.

pdf bib abs
Improving Native Language Identification by Using Spelling Errors
Lingzhen Chen | Carlo Strapparava | Vivi Nastase
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

In this paper, we explore spelling errors as a source of information for detecting the native language of a writer, a previously under-explored area. We note that character n-grams from misspelled words are very indicative of the native language of the author. In combination with other lexical features, spelling error features lead to 1.2% improvement in accuracy on classifying texts in the TOEFL11 corpus by the author’s native language, compared to systems participating in the NLI shared task.

2016

pdf bib
Learning to Identify Metaphors from a Corpus of Proverbs
Gözde Özbal | Carlo Strapparava | Serra Sinem Tekiroğlu | Daniele Pighin
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Emotions and NLP: Future Directions
Carlo Strapparava
Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

pdf bib abs
Innovative Semi-Automatic Methodology to Annotate Emotional Corpora
Lea Canales | Carlo Strapparava | Ester Boldrini | Patricio Martínez-Barco
Proceedings of the Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media (PEOPLES)

Detecting depression or personality traits, tutoring and student behaviour systems, or identifying cases of cyber-bulling are a few of the wide range of the applications, in which the automatic detection of emotion is a crucial element. Emotion detection has the potential of high impact by contributing the benefit of business, society, politics or education. Given this context, the main objective of our research is to contribute to the resolution of one of the most important challenges in textual emotion detection task: the problems of emotional corpora annotation. This will be tackled by proposing of a new semi-automatic methodology. Our innovative methodology consists in two main phases: (1) an automatic process to pre-annotate the unlabelled sentences with a reduced number of emotional categories; and (2) a refinement manual process where human annotators will determine which is the predominant emotion between the emotional categories selected in the phase 1. Our proposal in this paper is to show and evaluate the pre-annotation process to analyse the feasibility and the benefits by the methodology proposed. The results obtained are promising and allow obtaining a substantial improvement of annotation time and cost and confirm the usefulness of our pre-annotation process to improve the annotation task.

pdf bib abs
PROMETHEUS: A Corpus of Proverbs Annotated with Metaphors
Gözde Özbal | Carlo Strapparava | Serra Sinem Tekiroğlu
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Proverbs are commonly metaphoric in nature and the mapping across domains is commonly established in proverbs. The abundance of proverbs in terms of metaphors makes them an extremely valuable linguistic resource since they can be utilized as a gold standard for various metaphor related linguistic tasks such as metaphor identification or interpretation. Besides, a collection of proverbs fromvarious languages annotated with metaphors would also be essential for social scientists to explore the cultural differences betweenthose languages. In this paper, we introduce PROMETHEUS, a dataset consisting of English proverbs and their equivalents in Italian.In addition to the word-level metaphor annotations for each proverb, PROMETHEUS contains other types of information such as the metaphoricity degree of the overall proverb, its meaning, the century that it was first recorded in and a pair of subjective questions responded by the annotators. To the best of our knowledge, this is the first multi-lingual and open-domain corpus of proverbs annotated with word-level metaphors.

2015

pdf bib
SemEval-2015 Task 9: CLIPEval Implicit Polarity of Events
Irene Russo | Tommaso Caselli | Carlo Strapparava
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
SemEval 2015, Task 7: Diachronic Text Evaluation
Octavian Popescu | Carlo Strapparava
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
Echoes of Persuasion: The Effect of Euphony in Persuasive Communication
Marco Guerini | Gözde Özbal | Carlo Strapparava
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Exploring Sensorial Features for Metaphor Identification
Serra Sinem Tekiroğlu | Gözde Özbal | Carlo Strapparava
Proceedings of the Third Workshop on Metaphor in NLP

2014

pdf bib abs
Mapping WordNet Domains, WordNet Topics and Wikipedia Categories to Generate Multilingual Domain Specific Resources
Spandana Gella | Carlo Strapparava | Vivi Nastase
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper we present the mapping between WordNet domains and WordNet topics, and the emergent Wikipedia categories. This mapping leads to a coarse alignment between WordNet and Wikipedia, useful for producing domain-specific and multilingual corpora. Multilinguality is achieved through the cross-language links between Wikipedia categories. Research in word-sense disambiguation has shown that within a specific domain, relevant words have restricted senses. The multilingual, and comparable, domain-specific corpora we produce have the potential to enhance research in word-sense disambiguation and terminology extraction in different languages, which could enhance the performance of various NLP tasks.

pdf bib abs
Creative language explorations through a high-expressivity N-grams query language
Carlo Strapparava | Lorenzo Gatti | Marco Guerini | Oliviero Stock
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In computation linguistics a combination of syntagmatic and paradigmatic features is often exploited. While the first aspects are typically managed by information present in large n-gram databases, domain and ontological aspects are more properly modeled by lexical ontologies such as WordNet and semantic similarity spaces. This interconnection is even stricter when we are dealing with creative language phenomena, such as metaphors, prototypical properties, puns generation, hyperbolae and other rhetorical phenomena. This paper describes a way to focus on and accomplish some of these tasks by exploiting NgramQuery, a generalized query language on Google N-gram database. The expressiveness of this query language is boosted by plugging semantic similarity acquired both from corpora (e.g. LSA) and from WordNet, also integrating operators for phonetics and sentiment analysis. The paper reports a number of examples of usage in some creative language tasks.

pdf bib abs
Enriching the “Senso Comune” Platform with Automatically Acquired Data
Tommaso Caselli | Laure Vieu | Carlo Strapparava | Guido Vetere
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper reports on research activities on automatic methods for the enrichment of the Senso Comune platform. At this stage of development, we will report on two tasks, namely word sense alignment with MultiWordNet and automatic acquisition of Verb Shallow Frames from sense annotated data in the MultiSemCor corpus. The results obtained are satisfying. We achieved a final F-measure of 0.64 for noun sense alignment and a F-measure of 0.47 for verb sense alignment, and an accuracy of 68% on the acquisition of Verb Shallow Frames.

pdf bib
Aligning an Italian WordNet with a Lexicographic Dictionary: Coping with limited data
Tommaso Caselli | Carlo Strapparava | Laure Vieu | Guido Vetere
Proceedings of the Seventh Global Wordnet Conference

pdf bib
A Computational Approach to Generate a Sensorial Lexicon
Serra Sinem Tekiroğlu | Gözde Özbal | Carlo Strapparava
Proceedings of the 4th Workshop on Cognitive Aspects of the Lexicon (CogALex)

pdf bib
Automatic Domain Assignment for Word Sense Alignment
Tommaso Caselli | Carlo Strapparava
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Sensicon: An Automatically Constructed Sensorial Lexicon
Serra Sinem Tekiroğlu | Gözde Özbal | Carlo Strapparava
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Automation and Evaluation of the Keyword Method for Second Language Learning
Gözde Özbal | Daniele Pighin | Carlo Strapparava
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2013

pdf bib
Bridging Languages through Etymology: The case of cross language text categorization
Vivi Nastase | Carlo Strapparava
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
BRAINSUP: Brainstorming Support for Creative Sentence Generation
Gözde Özbal | Daniele Pighin | Carlo Strapparava
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Aligning Verb Senses in Two Italian Lexical Semantic Resources
Tommaso Caselli | Carlo Strapparava | Laure Vieu | Guido Vetere
Proceedings of the Joint Symposium on Semantic Processing. Textual Inference and Structures in Corpora

pdf bib
Linguistic Linked Data for Sentiment Analysis
Paul Buitelaar | Mihael Arcan | Carlos Iglesias | Fernando Sánchez-Rada | Carlo Strapparava
Proceedings of the 2nd Workshop on Linked Data in Linguistics (LDL-2013): Representing and linking lexicons, terminologies and other language data

pdf bib
Behind the Times: Detecting Epoch Changes using Large Corpora
Octavian Popescu | Carlo Strapparava
Proceedings of the Sixth International Joint Conference on Natural Language Processing

2012

pdf bib
A Computational Approach to the Automation of Creative Naming
Gözde Özbal | Carlo Strapparava
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Ecological Evaluation of Persuasive Messages Using Google AdWords
Marco Guerini | Carlo Strapparava | Oliviero Stock
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Corpus-based Explorations of Affective Load Differences in Arabic-Hebrew-English
Carlo Strapparava | Oliviero Stock | Ilai Alon
Proceedings of COLING 2012: Posters

pdf bib abs
Brand Pitt: A Corpus to Explore the Art of Naming
Gözde Özbal | Carlo Strapparava | Marco Guerini
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The name of a company or a brand is the key element to a successful business. A good name is able to state the area of competition and communicate the promise given to customers by evoking semantic associations. Although various resources provide distinct tips for inventing creative names, little research was carried out to investigate the linguistic aspects behind the naming mechanism. Besides, there might be latent methods that copywriters unconsciously use. In this paper, we describe the annotation task that we have conducted on a dataset of creative names collected from various resources to create a gold standard for linguistic creativity in naming. Based on the annotations, we compile common and latent methods of naming and explore the correlations among linguistic devices, provoked effects and business domains. This resource represents a starting point for a corpus based approach to explore the art of naming.

pdf bib abs
A Parallel Corpus of Music and Lyrics Annotated with Emotions
Carlo Strapparava | Rada Mihalcea | Alberto Battocchi
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

In this paper, we introduce a novel parallel corpus of music and lyrics, annotated with emotions at line level. We first describe the corpus, consisting of 100 popular songs, each of them including a music component, provided in the MIDI format, as well as a lyrics component, made available as raw text. We then describe our work on enhancing this corpus with emotion annotations using crowdsourcing. We also present some initial experiments on emotion classification using the music and the lyrics representations of the songs, which lead to encouraging results, thus demonstrating the promise of using joint music-lyric models for song processing.

pdf bib abs
NgramQuery - Smart Information Extraction from Google N-gram using External Resources
Martin Aleksandrov | Carlo Strapparava
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper describes the implementation of a generalized query language on Google Ngram database. This language allows for very expressive queries that exploit semantic similarity acquired both from corpora (e.g. LSA) and from WordNet, and phonetic similarity available from the CMU Pronouncing Dictionary. It contains a large number of new operators, which combined in a proper query can help users to extract n-grams having similarly close syntactic and semantic relational properties. We also characterize the operators with respect to their corpus affiliation and their functionality. The query syntax is considered next given in terms of Backus-Naur rules followed by a few interesting examples of how the tool can be used. We also describe the command-line arguments the user could input comparing them with the ones for retrieving n-grams through the interface of Google Ngram database. Finally we discuss possible improvements on the extraction process and some relevant query completeness issues.

pdf bib
Lyrics, Music, and Emotions
Rada Mihalcea | Carlo Strapparava
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

2010

pdf bib
Proceedings of the 5th International Workshop on Semantic Evaluation
Katrin Erk | Carlo Strapparava
Proceedings of the 5th International Workshop on Semantic Evaluation

pdf bib
Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text
Diana Inkpen | Carlo Strapparava
Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text

pdf bib
The Color of Emotions in Texts
Carlo Strapparava | Gozde Ozbal
Proceedings of the 2nd Workshop on Cognitive Aspects of the Lexicon

pdf bib abs
Evaluation Metrics for Persuasive NLP with Google AdWords
Marco Guerini | Carlo Strapparava | Oliviero Stock
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Evaluating systems and theories about persuasion represents a bottleneck for both theoretical and applied fields: experiments are usually expensive and time consuming. Still, measuring the persuasive impact of a message is of paramount importance. In this paper we present a new ``cheap and fast'' methodology for measuring the persuasiveness of communication. This methodology allows conducting experiments with thousands of subjects for a few dollars in a few hours, by tweaking and using existing commercial tools for advertising on the web, such as Google AdWords. The central idea is to use AdWords features for defining message persuasiveness metrics. Along with a description of our approach we provide some pilot experiments, conducted both with text and image based ads, that confirm the effectiveness of our ideas. We also discuss the possible application of research on persuasive systems to Google AdWords in order to add more flexibility in the wearing out of persuasive messages.

pdf bib abs
Studying the Lexicon of Dialogue Acts
Nicole Novielli | Carlo Strapparava
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Dialogue Acts have been well studied in linguistics and attracted computational linguistics research for a long time: they constitute the basis of everyday conversations and can be identified with the communicative goal of a given utterance (e.g. asking for information, stating facts, expressing opinions, agreeing or disagreeing). Even if not constituting any deep understanding of the dialogue, automatic dialogue act labeling is a task that can be relevant for a wide range of applications in both human-computer and human-human interaction. We present a qualitative analysis of the lexicon of Dialogue Acts: we explore the relationship between the communicative goal of an utterance and its affective content as well as the salience of specific word classes for each speech act. The experiments described in this paper fit in the scope of a research study whose long-term goal is to build an unsupervised classifier that simply exploits the lexical semantics of utterances for automatically annotate dialogues with the proper speech acts.

pdf bib abs
Predicting Persuasiveness in Political Discourses
Carlo Strapparava | Marco Guerini | Oliviero Stock
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In political speeches, the audience tends to react or resonate to signals of persuasive communication, including an expected theme, a name or an expression. Automatically predicting the impact of such discourses is a challenging task. In fact nowadays, with the huge amount of textual material that flows on the Web (news, discourses, blogs, etc.), it can be useful to have a measure for testing the persuasiveness of what we retrieve or possibly of what we want to publish on Web. In this paper we exploit a corpus of political discourses collected from various Web sources, tagged with audience reactions, such as applause, as indicators of persuasive expressions. In particular, we use this data set in a machine learning framework to explore the possibility of classifying the transcript of political discourses, according to their persuasive power, predicting the sentences that possibly trigger applause. We also explore differences between Democratic and Republican speeches, experiment the resulting classifiers in grading some of the discourses in the Obama-McCain presidential campaign available on the Web.

2009

pdf bib
Towards Unsupervised Recognition of Dialogue Acts
Nicole Novielli | Carlo Strapparava
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Student Research Workshop and Doctoral Consortium

pdf bib
Kernel Methods for Minimally Supervised WSD
Claudio Giuliano | Alfio Massimiliano Gliozzo | Carlo Strapparava
Computational Linguistics, Volume 35, Number 4, December 2009

pdf bib
The Lie Detector: Explorations in the Automatic Recognition of Deceptive Language
Rada Mihalcea | Carlo Strapparava
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

2008

pdf bib abs
Resources for Persuasion
Marco Guerini | Carlo Strapparava | Oliviero Stock
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper presents resources and strategies for persuasive natural language processing. After the introduction of a specifically tagged corpus, some techniques for affective language processing and for persuasive lexicon extraction are provided together with prospective scenarios of application.

pdf bib abs
Valentino: A Tool for Valence Shifting of Natural Language Texts
Marco Guerini | Carlo Strapparava | Oliviero Stock
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper a first implementation of a tool for valence shifting of natural language texts, named Valentino (VALENced Text INOculator), is presented. Valentino can modify existing textual expressions towards more positively or negatively valenced versions. To this end we built specific resources gathering various valenced terms that are semantically or contextually connected, and implemented strategies that uses these resources for substituting input terms.

2007

pdf bib
SemEval-2007 Task 14: Affective Text
Carlo Strapparava | Rada Mihalcea
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

pdf bib
FBK-irst: Lexical Substitution Task Exploiting Domain and Syntagmatic Coherence
Claudio Giuliano | Alfio Gliozzo | Carlo Strapparava
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

2006

pdf bib abs
The Affective Weight of Lexicon
Carlo Strapparava | Alessandro Valitutti | Oliviero Stock
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper presents resources and functionalities for the recognition and selection of affective evaluative terms. An affective hierarchy as an extension of the WordNet-Affect lexical database was developed in the first place. The second phase was the development of a semantic similarity function, acquired automatically in an unsupervised way from a large corpus of texts, which allows us to put into relation concepts and emotional categories. The integration of the two components is a key element for several applications.

pdf bib
Direct Word Sense Matching for Lexical Substitution
Ido Dagan | Oren Glickman | Alfio Gliozzo | Efrat Marmorshtein | Carlo Strapparava
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
Exploiting Comparable Corpora and Bilingual Dictionaries for Cross-Language Text Categorization
Alfio Gliozzo | Carlo Strapparava
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
Syntagmatic Kernels: a Word Sense Disambiguation Case Study
Claudio Giuliano | Alfio Gliozzo | Carlo Strapparava
Proceedings of the Workshop on Learning Structured Information in Natural Language Applications