Author: Zampieri, Marcos : Search

research-article

Health text simplification: An annotated corpus for digestive cancer education and novel strategies for reinforcement learning

Journal of Biomedical Informatics (JOBI), Volume 158, Issue Chttps://doi.org/10.1016/j.jbi.2024.104727

Abstract Objective:

The reading level of health educational materials significantly influences the understandability and accessibility of the information, particularly for minoritized populations. Many patient educational resources surpass widely accepted ...

Graphical abstract

Display Omitted

research-article

A survey of multimodal sarcasm detection

IJCAI '24: Proceedings of the Thirty-Third International Joint Conference on Artificial IntelligenceArticle No.: 887, Pages 8020–8028https://doi.org/10.24963/ijcai.2024/887

Sarcasm is a rhetorical device that is used to convey the opposite of the literal meaning of an utterance. Sarcasm is widely used on social media and other forms of computer-mediated communication motivating the use of computational models to identify it ...

Article

CSEPrompts: A Benchmark of Introductory Computer Science Prompts

Foundations of Intelligent SystemsPages 45–54https://doi.org/10.1007/978-3-031-62700-2_5

Abstract

Recent advances in AI, machine learning, and NLP have led to the development of a new generation of Large Language Models (LLMs) that are trained on massive amounts of data and often have trillions of parameters. Commercial applications (e.g., ...

extended-abstract

Overview of the HASOC Subtracks at FIRE 2023: Hate Speech and Offensive Content Identification in Assamese, Bengali, Bodo, Gujarati and Sinhala

FIRE '23: Proceedings of the 15th Annual Meeting of the Forum for Information Retrieval EvaluationPages 13–15https://doi.org/10.1145/3632754.3633278

The evaluation of content moderation systems requires reliable benchmark data. This task becomes particularly formidable for low-resource languages, where obtaining or curating such data poses significant challenges. Addressing this issue, HASOC 2023 ...

research-article

Offensive language identification with multi-task learning

Journal of Intelligent Information Systems (JIIS), Volume 60, Issue 3Pages 613–630https://doi.org/10.1007/s10844-023-00787-z

Abstract

The widespread presence of offensive content is a major issue in social media. This has motivated the development of computational models to identify such content in posts or conversations. Most of these models, however, treat offensive language ...

survey

Open Access

Lexical Complexity Prediction: An Overview

ACM Computing Surveys (CSUR), Volume 55, Issue 9Article No.: 179, Pages 1–42https://doi.org/10.1145/3557885

The occurrence of unknown words in texts significantly hinders reading comprehension. To improve accessibility for specific target populations, computational modeling has been applied to identify complex words in texts and substitute them for simpler ...

abstract

Overview of the HASOC Subtrack at FIRE 2022: Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages

FIRE '22: Proceedings of the 14th Annual Meeting of the Forum for Information Retrieval EvaluationPages 4–7https://doi.org/10.1145/3574318.3574326

In recent years, the spread of online offensive content has become of great concern, motivating researchers to develop robust systems capable of identifying such content automatically. To carry out a fair evaluation of these systems, several ...

research-article

Predicting lexical complexity in English texts: the Complex 2.0 dataset

Language Resources and Evaluation (SPLRE), Volume 56, Issue 4Pages 1153–1194https://doi.org/10.1007/s10579-022-09588-2

Abstract

Identifying words which may cause difficulty for a reader is an essential step in most lexical text simplification systems prior to lexical substitution and can also be used for assessing the readability of a text. This task is commonly referred ...

abstract

Overview of the HASOC Subtrack at FIRE 2021: Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages and Conversational Hate Speech

FIRE '21: Proceedings of the 13th Annual Meeting of the Forum for Information Retrieval EvaluationPages 1–3https://doi.org/10.1145/3503162.3503176

The HASOC track is dedicated to the evaluation of technology for finding Offensive Language and Hate Speech. HASOC is creating a multilingual data corpus mainly for English and under-resourced languages(Hindi and Marathi). This paper presents one HASOC ...

research-article

Multilingual Offensive Language Identification for Low-resource Languages

ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), Volume 21, Issue 1Article No.: 4, Pages 1–13https://doi.org/10.1145/3457610

Offensive content is pervasive in social media and a reason for concern to companies and government organizations. Several studies have been recently published investigating methods to detect the various forms of such content (e.g., hate speech, ...

masters_thesis

Machine-assisted Translation by Human-in-the-loop Crowdsourcing for Bambara

Abstract

Language is more than a tool of conveying information; it is utilized in all aspects of our lives. Yet only a small number of languages in the 7,000 languages worldwide are highly resourced by human language technologies (HLT). Despite African ...

article

Automatic language identification in texts: a survey

Journal of Artificial Intelligence Research (JAIR), Volume 65, Issue 1Pages 675–682https://doi.org/10.1613/jair.1.11675

Language identification ("LI") is the problem of determining the natural language that a document or part thereof is written in. Automatic LI has been extensively researched for over fifty years. Today, LI is a key part of many text processing pipelines,...

Article

Portuguese Native Language Identification

Computational Processing of the Portuguese LanguagePages 115–124https://doi.org/10.1007/978-3-319-99722-3_12

Abstract

This study presents the first Native Language Identification (NLI) study for L2 Portuguese. We used a sub-set of the NLI-PT dataset, containing texts written by speakers of five different native languages: Chinese, English, German, Italian, and ...

article

Improving translation memory matching and retrieval using paraphrases

Machine Translation (KLU-COAT), Volume 30, Issue 1-2Pages 19–40https://doi.org/10.1007/s10590-016-9180-0

Most current translation memory (TM) systems work on the string level (character or word level) and lack semantic knowledge while matching. They use simple edit-distance (ED) calculated on the surface form or some variation on it (stem, lemma), which ...

Article

Investigating Genre and Method Variation in Translation Using Text Classification

TSD 2015: Proceedings of the 18th International Conference on Text, Speech, and Dialogue - Volume 9302Pages 41–50https://doi.org/10.1007/978-3-319-24033-6_5

In this paper, we propose the use of automatic text classification methods to analyse variation in English-German translations from both a quantitative and a qualitative perspective. The experiments described in this paper are carried out in two steps. ...

Article

P-AWL: academic word list for Portuguese

PROPOR'10: Proceedings of the 9th international conference on Computational Processing of the Portuguese LanguagePages 120–123https://doi.org/10.1007/978-3-642-12320-7_15

This paper presents and discusses the methodology for the construction of an Academic Word List for Portuguese: PAWL, inspired in its English equivalent. The aim of this linguistic resource is to provide a solid base for future studies and applications ...

Search Results

Applied Filters

People

Names

Institutions

Authors

Advisors

Reviewers

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Media Formats

Publisher

Conferences

Conference Event

Proceedings Series

Publication Date

Results

Health text simplification: An annotated corpus for digestive cancer education and novel strategies for reinforcement learning

A survey of multimodal sarcasm detection

CSEPrompts: A Benchmark of Introductory Computer Science Prompts

Overview of the HASOC Subtracks at FIRE 2023: Hate Speech and Offensive Content Identification in Assamese, Bengali, Bodo, Gujarati and Sinhala

Offensive language identification with multi-task learning

Lexical Complexity Prediction: An Overview

Overview of the HASOC Subtrack at FIRE 2022: Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages

Predicting lexical complexity in English texts: the Complex 2.0 dataset

Overview of the HASOC Subtrack at FIRE 2021: Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages and Conversational Hate Speech

Multilingual Offensive Language Identification for Low-resource Languages

Machine-assisted Translation by Human-in-the-loop Crowdsourcing for Bambara

Automatic language identification in texts: a survey

Portuguese Native Language Identification

Improving translation memory matching and retrieval using paraphrases

Investigating Genre and Method Variation in Translation Using Text Classification

P-AWL: academic word list for Portuguese

Applied Filters

People

Names

Institutions

Authors

Advisors

Reviewers

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Media Formats

Publisher

Conferences

Conference Event

Proceedings Series

Publication Date

Save to Binder