Liviu P. Dinu

Followers

Following

Co-authors

Mentions

Public Views

Interests

Uploads

Papers by Liviu P. Dinu

The Minimum Entropy Submodular Set Cover Problem

Lecture Notes in Computer Science, 2016

Download

University of Bucharest

Total rank distance and scaled total rank distance:

Download

Discriminating between Indo-Aryan Languages Using SVM Ensembles

Cornell University - arXiv, Jul 9, 2018

Download

A Computational Approach to the Study of Portuguese Newspapers Published in Macau

This paper investigates the application of text classification methods to investigate diatopic va... more This paper investigates the application of text classification methods to investigate diatopic variation in Portuguese journalistic texts. We compare the language used in Portuguese newspapers written in Brazil, Macau, and Portugal under the assumption that the more similar language varieties are, the more difficult it is for algorithms to discriminate between them. We present two sets of experiments: in the first one we use original texts and in the second one we use texts with blinded named entities to remove country-specific expressions. Our results indicate that the language of Portuguese newspapers published in Macau is substantially more similar to the language used in European newspapers than that used in Brazilian newspapers.

Download

Native Language Identification on Text and Speech

Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, 2017

Download

An Efficient Algorithm for Rank Distance Consensus

Lecture Notes in Computer Science, 2013

In various research fields a common task is to summarize the information shared by a collection o... more In various research fields a common task is to summarize the information shared by a collection of objects and to find a consensus of them. In many scenarios, the object items for which a consensus needs to be determined are rankings, and the process is called rank aggregation. Common applications are electoral processes, meta-search engines, document classification, selecting documents based on multiple criteria, and many others. This paper is focused on a particular application of such aggregation schemes, that of finding motifs or common patterns in a set of given DNA sequences. Among the conditions that a string should satisfy to be accepted as consensus, are the median string and closest string. These approaches have been intensively studied separately, but only recently, the work of [1] tries to combine both problems: to solve the consensus string problem by minimizing both distance sum and radius. The aim of this paper is to investigate the consensus string in the rank distance paradigm. Theoretical results show that it is not possible to identify a consensus string via rank distance for three or more strings. Thus, an efficient genetic algorithm is proposed to find the optimal consensus string. To show an application for the studied problem, this work also exhibits a clustering algorithm based on consensus string, that builds a hierarchy of clusters based on distance connectivity. Experiments on DNA comparison are presented to show the efficiency of the proposed genetic algorithm for consensus string. Phylogenetic experiments were also conducted to show the utility of the proposed clustering method. In conclusion, the consensus string is indeed an interesting problem with many practical applications.

Identifying Source-Language Dialects in Translation

Mathematics

In this paper, we aim to explore the degree to which translated texts preserve linguistic feature... more In this paper, we aim to explore the degree to which translated texts preserve linguistic features of dialectal varieties. We release a dataset of augmented annotations to the Proceedings of the European Parliament that cover dialectal speaker information, and we analyze different classes of written English covering native varieties from the British Isles. Our analyses aim to discuss the discriminatory features between the different classes and to reveal words whose usage differs between varieties of the same language. We perform classification experiments and show that automatically distinguishing between the dialectal varieties is possible with high accuracy, even after translation, and propose a new explainability method based on embedding alignments in order to reveal specific differences between dialects at the level of the vocabulary.

Download

Faculty of Foreign Languages and Literature

On the behavior of Romanian syllables related to minimum effort laws

and Literatures

We applied hierarchical clustering using Rank distance, previously used in compu-tational stylome... more We applied hierarchical clustering using Rank distance, previously used in compu-tational stylometry, on literary texts written by Mateiu Caragiale and a number of dif-ferent authors who attempted to imperson-ate Caragiale after his death, or simply to mimic his style. Their pastiches were con-sistently clustered opposite to the original work, thereby confirming the performance of the method and proposing an extension of the method from simple authorship attri-bution to the more complicated problem of pastiche detection. The novelty of our work is the use of fre-quency rankings of stopwords as features, showing that this idea yields good results for pastiche detection. 1

Download

System, the Neuter

Download

A Computational Exploration of Pejorative Language in Social Media

Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

Download

Towards a Map of the Syntactic Similarity of Languages

In this paper we propose a computational method for determining the syntactic similarity between ... more In this paper we propose a computational method for determining the syntactic similarity between languages. We investigate multiple approaches and metrics, showing that the results are consistent across methods. We report results on 16 languages belonging to various language families. The analysis that we conduct is adaptable to any languages, as far as resources are available.

A Computational Approach to Measuring the Semantic Divergence of Cognates

ArXiv, 2020

Meaning is the foundation stone of intercultural communication. Languages are continuously changi... more Meaning is the foundation stone of intercultural communication. Languages are continuously changing, and words shift their meanings for various reasons. Semantic divergence in related languages is a key concern of historical linguistics. In this paper we investigate semantic divergence across languages by measuring the semantic similarity of cognate sets in multiple languages. The method that we propose is based on cross-lingual word embeddings. In this paper we implement and evaluate our method on English and five Romance languages, but it can be extended easily to any language pair, requiring only large monolingual corpora for the involved languages and a small bilingual dictionary for the pair. This language-agnostic method facilitates a quantitative analysis of cognates divergence -- by computing degrees of semantic similarity between cognate pairs -- and provides insights for identifying false friends. As a second contribution, we formulate a straightforward method for detectin...

Download

Tracking Semantic Change in Cognate Sets for English and Romance Languages

Semantic divergence in related languages is a key concern of historical linguistics. We cross-lin... more Semantic divergence in related languages is a key concern of historical linguistics. We cross-linguistically investigate the semantic divergence of cognate pairs in English and Romance languages, by means of word embeddings. To this end, we introduce a new curated dataset of cognates in all pairs of those languages. We describe the types of errors that occurred during the automated cognate identification process and manually correct them. Additionally, we label the English cognates according to their etymology, separating them into two groups: old borrowings and recent borrowings. On this curated dataset, we analyse word properties such as frequency and polysemy, and the distribution of similarity scores between cognate sets in different languages. We automatically identify different clusters of English cognates, setting a new direction of research in cognates, borrowings and possibly false friends analysis in related languages.

Download

RED: A Novel Dataset for Romanian Emotion Detection from Tweets

Proceedings of the Conference Recent Advances in Natural Language Processing - Deep Learning for Natural Language Processing Methods and Applications, 2021

Download

Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020

Download

Exploring Optimism and Pessimism in Twitter Using Deep Learning

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018

Download

Temporal Text Ranking and Automatic Dating of Texts

Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers, 2014

Download

Systems and Computational Biology - Bioinformatics and Computational Modeling, 2011

Download

Romanian Syllabication Using Machine Learning

Lecture Notes in Computer Science, 2013

Download

The Minimum Entropy Submodular Set Cover Problem

Lecture Notes in Computer Science, 2016

Download

University of Bucharest

Total rank distance and scaled total rank distance:

Download

Discriminating between Indo-Aryan Languages Using SVM Ensembles

Cornell University - arXiv, Jul 9, 2018

Download

A Computational Approach to the Study of Portuguese Newspapers Published in Macau

Download

Native Language Identification on Text and Speech

Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, 2017

Download

An Efficient Algorithm for Rank Distance Consensus

Lecture Notes in Computer Science, 2013

Identifying Source-Language Dialects in Translation

Mathematics

Download

Faculty of Foreign Languages and Literature

On the behavior of Romanian syllables related to minimum effort laws

A Computational Exploration of Pejorative Language in Social Media

Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

Download

Towards a Map of the Syntactic Similarity of Languages

A Computational Approach to Measuring the Semantic Divergence of Cognates

ArXiv, 2020

Download

Tracking Semantic Change in Cognate Sets for English and Romance Languages

Download

RED: A Novel Dataset for Romanian Emotion Detection from Tweets

Proceedings of the Conference Recent Advances in Natural Language Processing - Deep Learning for Natural Language Processing Methods and Applications, 2021

Download

Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020

Download

Exploring Optimism and Pessimism in Twitter Using Deep Learning

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018

Download

Temporal Text Ranking and Automatic Dating of Texts

Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers, 2014

Download

Systems and Computational Biology - Bioinformatics and Computational Modeling, 2011

Download

Romanian Syllabication Using Machine Learning

Lecture Notes in Computer Science, 2013

Download

Liviu P. Dinu

Uploads

Papers by Liviu P. Dinu

Log In