The Review Opinion Diversification (Revopid-2017) shared task focuses on selecting top-k reviews from a set of reviews for a particular product based on a specific criteria. In this paper, we describe our approaches and results for... more
The Review Opinion Diversification (Revopid-2017) shared task focuses on selecting top-k reviews from a set of reviews for a particular product based on a specific criteria. In this paper, we describe our approaches and results for modeling the ranking of reviews based on their usefulness score, this being the first of the three subtasks under this shared task. Instead of posing this as a regression problem, we modeled this as a classification task where we want to identify whether a review is useful or not. We employed a bi-directional LSTM to represent each review and is used with a softmax layer to predict the usefulness score. We chose the review with highest usefulness score, then find its cosine similarity score with rest of the reviews. This is done in order to ensure diversity in the selection of top-k reviews. On the top-5 list prediction, we finished 3rd while in top-10 list one, we are placed 2nd in the shared task. We have discussed the model and the results in detail in...
Online social media channels are everywhere and almost everyone uses them to post various parts of their lives that they are not comfortable sharing in real. Large amounts of data can be collected from these channels and used to study and... more
Online social media channels are everywhere and almost everyone uses them to post various parts of their lives that they are not comfortable sharing in real. Large amounts of data can be collected from these channels and used to study and development of systems that detect depression and tendency of self-harm in individuals on the internet. It is generally believed that social posts from a time order can be used to predict depression or signs of self-harm for a user. Deep learning models for Natural Language Processing tasks are already producing better results on language-based analysis of sentiment, offensive or depression detection for a collection of text. BERT and transformer-based architectures have proven to achieve state-of-the-art results for Natural Processing Tasks. We will apply transfer learning and supervised learning algorithm Logistic Regression on data to early detect signs of self-harm for a user. BERT is used to generate word embeddings of sentences, has a limitat...
Chapter 4 investigates interaction between languages in the performing arts – theatre, stand-up comedy, grime, rap, opera – and the types of creativity this generates in response to cultural contexts and audiences, drawing on media and... more
Chapter 4 investigates interaction between languages in the performing arts – theatre, stand-up comedy, grime, rap, opera – and the types of creativity this generates in response to cultural contexts and audiences, drawing on media and performance studies, and working with artists ranging from Russian dramatists to Black British and British Asian musicians from Birmingham and Leicester.
The article focuses on two parallel studies aimed at validating an original automatic tool (RusAC) designed to define the level of abstractness of Russian texts. The studies were conducted on: (a) the Russian Academic Corpus (RAC)... more
The article focuses on two parallel studies aimed at validating an original automatic tool (RusAC) designed to define the level of abstractness of Russian texts. The studies were conducted on: (a) the Russian Academic Corpus (RAC) compiled of the textbooks used in middle and high schools of the Russian Federation and (b) students’ recalls of academic texts. The design of RusAC is based on the Russian Dictionary of abstractness / concreteness compiled by the authors in previous studies, which enlists abstractness ratings of over 88.000 tokens. The pilot studies pursued on the Russian Academic Corpus (circa 3 mln tokens) proved that the ratio of abstract words grows in textbooks of all disciplines across grades from 5 to 11. We also confirmed that the share of abstract words in Science textbooks is lower than that in the Humanities textbooks and that abstractness of readers’ recalls is typically lower than that of the original text as the respondents tend to omit more abstract words t...
The automatic classification of stage directions is a little explored topic in computational drama analysis (CDA), in spite of their relevance for plays' structural and stylistic analysis. We developed a 13-class stage direction typology,... more
The automatic classification of stage directions is a little explored topic in computational drama analysis (CDA), in spite of their relevance for plays' structural and stylistic analysis. We developed a 13-class stage direction typology, based on annotations in the FreDraCor corpus (French-language plays), but abstracting away from their huge variability while still providing classes useful for literary research. We fine-tuned transformers-based models to classify against the typology, gradually decreasing training-corpus size to compare model efficiency with reduced training data. A result comparison speaks in favour of distilled monolingual models for this task, and, unlike earlier research on German, shows no negative effects of model case-sensitivity. The results have practical relevance for computational literary studies, as comparing classification results with complementary stage direction typologies, limiting the amount of manual annotation needed to apply them, would be helpful towards a systematic study of this important textual element.
The eHealth-KD challenge hosted at IberLEF 2020 proposes a set of resources and evaluation scenarios to encourage the development of systems for the automatic extraction of knowledge from unstructured text. This paper describes the system... more
The eHealth-KD challenge hosted at IberLEF 2020 proposes a set of resources and evaluation scenarios to encourage the development of systems for the automatic extraction of knowledge from unstructured text. This paper describes the system presented by team UH-MatCom in the challenge. Several deeplearning models are trained and ensembled to automatically extract relevant entities and relations from plain text documents. State of the art techniques such as BERT, Bi-LSTM, and CRF are applied. The use of external knowledge sources such as ConceptNet is explored. The system achieved average results in the challenge, ranking fifth across all different evaluation scenarios. The ensemble method produced a slight improvement in performance. Additional work needs to be done for the relation extraction task to successfully benefit from external knowledge sources.
Chapter 4 investigates interaction between languages in the performing arts – theatre, stand-up comedy, grime, rap, opera – and the types of creativity this generates in response to cultural contexts and audiences, drawing on media and... more
Chapter 4 investigates interaction between languages in the performing arts – theatre, stand-up comedy, grime, rap, opera – and the types of creativity this generates in response to cultural contexts and audiences, drawing on media and performance studies, and working with artists ranging from Russian dramatists to Black British and British Asian musicians from Birmingham and Leicester.
Matching text and images based on their semantics has an important role in cross-media retrieval. Especially, in terms of news, text and images connection is highly ambiguous. In the context of MediaEval 2020 Challenge, we propose three... more
Matching text and images based on their semantics has an important role in cross-media retrieval. Especially, in terms of news, text and images connection is highly ambiguous. In the context of MediaEval 2020 Challenge, we propose three multi-modal methods for mapping text and images of news articles to the shared space in order to perform efficient cross-retrieval. Our methods show systemic improvement and validate our hypotheses, while the best-performed method reaches a recall@100 score of 0.2064.
English. In this article, we present the results of applying a Stacking Ensemble method to the problem of hate speech classification proposed in the main task of HaSpeeDe 2 at EVALITA 2020. The model was then compared to a Logistic... more
English. In this article, we present the results of applying a Stacking Ensemble method to the problem of hate speech classification proposed in the main task of HaSpeeDe 2 at EVALITA 2020. The model was then compared to a Logistic Regression classifier, along with two other benchmarks defined by the competition’s organising committee (an SVM with a linear kernel and a majority class classifier). Results showed our Ensemble to outperform the benchmarks to various degrees, both when testing in the same domain as training and in a different domain. Italiano. In questo articolo, ci presentiamo i risultati dell’applicazione di un modello di Stacking Ensemble al problema della classificazione dei discorsi di incitamento all’odio nel compito A di EVALITA (HaSpeeDe 2). Il modello è stato quindi confrontato con un modello di regressione logistica, insieme ad altri due benchmark definiti dal comitato organizzatore della competizione (un SVM con un kernel lineare e un classificatore di classe...
Chapter 4 investigates interaction between languages in the performing arts – theatre, stand-up comedy, grime, rap, opera – and the types of creativity this generates in response to cultural contexts and audiences, drawing on media and... more
Chapter 4 investigates interaction between languages in the performing arts – theatre, stand-up comedy, grime, rap, opera – and the types of creativity this generates in response to cultural contexts and audiences, drawing on media and performance studies, and working with artists ranging from Russian dramatists to Black British and British Asian musicians from Birmingham and Leicester.