Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleOctober 2024
Health text simplification: An annotated corpus for digestive cancer education and novel strategies for reinforcement learning
- Md Mushfiqur Rahman,
- Mohammad Sabik Irbaz,
- Kai North,
- Michelle S. Williams,
- Marcos Zampieri,
- Kevin Lybarger
Journal of Biomedical Informatics (JOBI), Volume 158, Issue Chttps://doi.org/10.1016/j.jbi.2024.104727Abstract Objective:The reading level of health educational materials significantly influences the understandability and accessibility of the information, particularly for minoritized populations. Many patient educational resources surpass widely accepted ...
Graphical abstract
Display Omitted
- research-articleAugust 2024
A survey of multimodal sarcasm detection
IJCAI '24: Proceedings of the Thirty-Third International Joint Conference on Artificial IntelligenceArticle No.: 887, Pages 8020–8028https://doi.org/10.24963/ijcai.2024/887Sarcasm is a rhetorical device that is used to convey the opposite of the literal meaning of an utterance. Sarcasm is widely used on social media and other forms of computer-mediated communication motivating the use of computational models to identify it ...
- ArticleJune 2024
CSEPrompts: A Benchmark of Introductory Computer Science Prompts
- Nishat Raihan,
- Dhiman Goswami,
- Sadiya Sayara Chowdhury Puspo,
- Christian Newman,
- Tharindu Ranasinghe,
- Marcos Zampieri
AbstractRecent advances in AI, machine learning, and NLP have led to the development of a new generation of Large Language Models (LLMs) that are trained on massive amounts of data and often have trillions of parameters. Commercial applications (e.g., ...
- extended-abstractFebruary 2024
Overview of the HASOC Subtracks at FIRE 2023: Hate Speech and Offensive Content Identification in Assamese, Bengali, Bodo, Gujarati and Sinhala
- Tharindu Ranasinghe,
- Koyel Ghosh,
- Aditya Shankar Pal,
- Apurbalal Senapati,
- Alphaeus Eric Dmonte,
- Marcos Zampieri,
- Sandip Modha,
- Shrey Satapara
FIRE '23: Proceedings of the 15th Annual Meeting of the Forum for Information Retrieval EvaluationPages 13–15https://doi.org/10.1145/3632754.3633278The evaluation of content moderation systems requires reliable benchmark data. This task becomes particularly formidable for low-resource languages, where obtaining or curating such data poses significant challenges. Addressing this issue, HASOC 2023 ...
- research-articleApril 2023
Offensive language identification with multi-task learning
Journal of Intelligent Information Systems (JIIS), Volume 60, Issue 3Pages 613–630https://doi.org/10.1007/s10844-023-00787-zAbstractThe widespread presence of offensive content is a major issue in social media. This has motivated the development of computational models to identify such content in posts or conversations. Most of these models, however, treat offensive language ...
- surveyJanuary 2023
Lexical Complexity Prediction: An Overview
ACM Computing Surveys (CSUR), Volume 55, Issue 9Article No.: 179, Pages 1–42https://doi.org/10.1145/3557885The occurrence of unknown words in texts significantly hinders reading comprehension. To improve accessibility for specific target populations, computational modeling has been applied to identify complex words in texts and substitute them for simpler ...
- abstractJanuary 2023
Overview of the HASOC Subtrack at FIRE 2022: Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages
- Shrey Satapara,
- Prasenjit Majumder,
- Thomas Mandl,
- Sandip Modha,
- Hiren Madhu,
- Tharindu Ranasinghe,
- Marcos Zampieri,
- Kai North,
- Damith Premasiri
FIRE '22: Proceedings of the 14th Annual Meeting of the Forum for Information Retrieval EvaluationPages 4–7https://doi.org/10.1145/3574318.3574326In recent years, the spread of online offensive content has become of great concern, motivating researchers to develop robust systems capable of identifying such content automatically. To carry out a fair evaluation of these systems, several ...
- research-articleDecember 2022
Predicting lexical complexity in English texts: the Complex 2.0 dataset
Language Resources and Evaluation (SPLRE), Volume 56, Issue 4Pages 1153–1194https://doi.org/10.1007/s10579-022-09588-2AbstractIdentifying words which may cause difficulty for a reader is an essential step in most lexical text simplification systems prior to lexical substitution and can also be used for assessing the readability of a text. This task is commonly referred ...
- abstractJanuary 2022
Overview of the HASOC Subtrack at FIRE 2021: Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages and Conversational Hate Speech
- Sandip Modha,
- Thomas Mandl,
- Gautam Kishore Shahi,
- Hiren Madhu,
- Shrey Satapara,
- Tharindu Ranasinghe,
- Marcos Zampieri
FIRE '21: Proceedings of the 13th Annual Meeting of the Forum for Information Retrieval EvaluationPages 1–3https://doi.org/10.1145/3503162.3503176The HASOC track is dedicated to the evaluation of technology for finding Offensive Language and Hate Speech. HASOC is creating a multilingual data corpus mainly for English and under-resourced languages(Hindi and Marathi). This paper presents one HASOC ...
- research-articleNovember 2021
Multilingual Offensive Language Identification for Low-resource Languages
ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), Volume 21, Issue 1Article No.: 4, Pages 1–13https://doi.org/10.1145/3457610Offensive content is pervasive in social media and a reason for concern to companies and government organizations. Several studies have been recently published investigating methods to detect the various forms of such content (e.g., hate speech, ...
- masters_thesisJanuary 2020
Machine-assisted Translation by Human-in-the-loop Crowdsourcing for Bambara
- Allahsera Auguste Tapo,
- Homan, Chritopher,
- Michael Leventhal,
- Marcos Zampieri,
- Sarah Luger,
- Julia Kreutzer
AbstractLanguage is more than a tool of conveying information; it is utilized in all aspects of our lives. Yet only a small number of languages in the 7,000 languages worldwide are highly resourced by human language technologies (HLT). Despite African ...
- articleMay 2019
Automatic language identification in texts: a survey
Journal of Artificial Intelligence Research (JAIR), Volume 65, Issue 1Pages 675–682https://doi.org/10.1613/jair.1.11675Language identification ("LI") is the problem of determining the natural language that a document or part thereof is written in. Automatic LI has been extensively researched for over fifty years. Today, LI is a key part of many text processing pipelines,...
- ArticleSeptember 2018
- articleJune 2016
Improving translation memory matching and retrieval using paraphrases
Machine Translation (KLU-COAT), Volume 30, Issue 1-2Pages 19–40https://doi.org/10.1007/s10590-016-9180-0Most current translation memory (TM) systems work on the string level (character or word level) and lack semantic knowledge while matching. They use simple edit-distance (ED) calculated on the surface form or some variation on it (stem, lemma), which ...
- ArticleSeptember 2015
Investigating Genre and Method Variation in Translation Using Text Classification
TSD 2015: Proceedings of the 18th International Conference on Text, Speech, and Dialogue - Volume 9302Pages 41–50https://doi.org/10.1007/978-3-319-24033-6_5In this paper, we propose the use of automatic text classification methods to analyse variation in English-German translations from both a quantitative and a qualitative perspective. The experiments described in this paper are carried out in two steps. ...
- ArticleApril 2010
P-AWL: academic word list for Portuguese
PROPOR'10: Proceedings of the 9th international conference on Computational Processing of the Portuguese LanguagePages 120–123https://doi.org/10.1007/978-3-642-12320-7_15This paper presents and discusses the methodology for the construction of an Academic Word List for Portuguese: PAWL, inspired in its English equivalent. The aim of this linguistic resource is to provide a solid base for future studies and applications ...