Skip to main content

Elena Tutubalina

Kazan Federal University, Institute of Computer Mathematics and Information Technologies, Graduate Student

Followers

2

Following

1

Public Views

Address: Kazan, Tatarstan, Russian Federation

less

InterestsView All (10)

Uploads

Papers by Elena Tutubalina

RUREBUS-2020 Shared Task: Russian Relation Extraction for Business

Computational Linguistics and Intellectual Technologies, 2020

In this paper, we present a shared task on core information extraction problems, named entity rec... more In this paper, we present a shared task on core information extraction problems, named entity recognition and relation extraction. In contrast to popular shared tasks on related problems, we try to move away from strictly academic rigor and rather model a business case. As a source for textual data we choose the corpus of Russian strategic documents, which we annotated according to our own annotation scheme. To speed up the annotation process, we exploit various active learning techniques. In total we ended up with more than two hundred annotated documents. Thus we managed to create a high-quality data set in short time. The shared task consisted of three tracks, devoted to 1) named entity recognition, 2) relation extraction and 3) joint named entity recognition and relation extraction. We provided with the annotated texts as well as a set of unannotated texts, which could of been used in any way to improve solutions. In the paper we overview and compare solutions, submitted by the ...

Fair Evaluation in Concept Normalization: a Large-scale Comparative Analysis for BERT-based Models

Proceedings of the 28th International Conference on Computational Linguistics, 2020

The Russian Drug Reaction Corpus and neural models for drug reactions and effectiveness detection in user reviews

Bioinformatics

Motivation Drugs and diseases play a central role in many areas of biomedical research and health... more Motivation Drugs and diseases play a central role in many areas of biomedical research and healthcare. Aggregating knowledge about these entities across a broader range of domains and languages is critical for information extraction (IE) applications. To facilitate text mining methods for analysis and comparison of patient’s health conditions and adverse drug reactions reported on the Internet with traditional sources such as drug labels, we present a new corpus of Russian language health reviews. Results The Russian Drug Reaction Corpus (RuDReC) is a new partially annotated corpus of consumer reviews in Russian about pharmaceutical products for the detection of health-related named entities and the effectiveness of pharmaceutical products. The corpus itself consists of two parts, the raw one and the labeled one. The raw part includes 1.4 million health-related user-generated texts collected from various Internet sources, including social media. The labeled part contains 500 consume...

NEREL: A Russian Dataset with Nested Named Entities, Relations and Events

Proceedings of the Conference Recent Advances in Natural Language Processing - Deep Learning for Natural Language Processing Methods and Applications

Medical concept normalization in clinical trials with drug and disease representation learning

Bioinformatics

Motivation Clinical trials are the essential stage of every drug development program for the trea... more Motivation Clinical trials are the essential stage of every drug development program for the treatment to become available to patients. Despite the importance of well-structured clinical trial databases and their tremendous value for drug discovery and development such instances are very rare. Presently large-scale information on clinical trials is stored in clinical trial registers which are relatively structured, but the mappings to external databases of drugs and diseases are increasingly lacking. The precise production of such links would enable us to interrogate richer harmonized datasets for invaluable insights. Results We present a neural approach for medical concept normalization of diseases and drugs. Our two-stage approach is based on Bidirectional Encoder Representations from Transformers (BERT). In the training stage, we optimize the relative similarity of mentions and concept names from a terminology via triplet loss. In the inference stage, we obtain the closest concep...

AspeRa: Aspect-Based Rating Prediction Model

Lecture Notes in Computer Science

Automated Detection of Adverse Drug Reactions from Social Media Posts with Machine Learning

Lecture Notes in Computer Science

RuSimpleSentEval-2021 Shared Task: Evaluating Sentence Simplification for Russian

DeepADEMiner: a deep learning pharmacovigilance pipeline for extraction and normalization of adverse drug event mentions on Twitter

Journal of the American Medical Informatics Association

Objective Research on pharmacovigilance from social media data has focused on mining adverse drug... more Objective Research on pharmacovigilance from social media data has focused on mining adverse drug events (ADEs) using annotated datasets, with publications generally focusing on 1 of 3 tasks: ADE classification, named entity recognition for identifying the span of ADE mentions, and ADE mention normalization to standardized terminologies. While the common goal of such systems is to detect ADE signals that can be used to inform public policy, it has been impeded largely by limited end-to-end solutions for large-scale analysis of social media reports for different drugs. Materials and Methods We present a dataset for training and evaluation of ADE pipelines where the ADE distribution is closer to the average ‘natural balance’ with ADEs present in about 7% of the tweets. The deep learning architecture involves an ADE extraction pipeline with individual components for all 3 tasks. Results The system presented achieved state-of-the-art performance on comparable datasets and scored a class...

RecVAE: A New Variational Autoencoder for Top-N Recommendations with Implicit Feedback

Proceedings of the 13th International Conference on Web Search and Data Mining

A Large-Scale COVID-19 Twitter Chatter Dataset for Open Scientific Research—An International Collaboration

Epidemiologia

As the COVID-19 pandemic continues to spread worldwide, an unprecedented amount of open data is b... more As the COVID-19 pandemic continues to spread worldwide, an unprecedented amount of open data is being generated for medical, genetics, and epidemiological research. The unparalleled rate at which many research groups around the world are releasing data and publications on the ongoing pandemic is allowing other scientists to learn from local experiences and data generated on the front lines of the COVID-19 pandemic. However, there is a need to integrate additional data sources that map and measure the role of social dynamics of such a unique worldwide event in biomedical, biological, and epidemiological analyses. For this purpose, we present a large-scale curated dataset of over 1.12 billion tweets, growing daily, related to COVID-19 chatter generated from 1 January 2020 to 27 June 2021 at the time of writing. This data source provides a freely available additional data source for researchers worldwide to conduct a wide and diverse number of research projects, such as epidemiological...

KFU NLP Team at SMM4H 2021 Tasks: Cross-lingual and Cross-modal BERT-based Models for Adverse Drug Effects

Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task

Overview of the Sixth Social Media Mining for Health Applications (#SMM4H) Shared Tasks at NAACL 2021

Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task

DeepADEMiner: A Deep Learning Pharmacovigilance Pipeline for Extraction and Normalization of Adverse Drug Effect Mentions on Twitter

Objective: Research on pharmacovigilance from social media data has focused on mining adverse dru... more Objective: Research on pharmacovigilance from social media data has focused on mining adverse drug effects (ADEs) using annotated datasets, with publications generally focusing on one of three tasks: (i) ADE classification, (ii) named entity recognition (NER) for identifying the span of an ADE mentions, and (iii) ADE mention normalization to standardized vocabularies. While the common goal of such systems is to detect ADE signals that can be used to inform public policy, it has been impeded largely by limited end-to-end solutions to the three tasks for large-scale analysis of social media reports for different drugs. Materials and Methods: We present a dataset for training and evaluation of ADE pipelines where the ADE distribution is closer to the average `natural balance' with ADEs present in about 7% of the Tweets. The deep learning architecture involves an ADE extraction pipeline with individual components for all three tasks. Results: The system presented achieved a classifi...

RecVAE: A New Variational Autoencoder for Top-N Recommendations with Implicit Feedback

Proceedings of the 13th International Conference on Web Search and Data Mining, Jan 20, 2020

AspeRa: Aspect-Based Rating Prediction Model

Advances in Information Retrieval, 2019

Multiple features for clinical relation extraction: A machine learning approach

Journal of Biomedical Informatics

Detecting Adverse Drug Reactions from Biomedical Texts with Neural Networks

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

Deep Neural Models for Medical Concept Normalization in User-Generated Texts

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

KFU NLP Team at SMM4H 2019 Tasks: Want to Extract Adverse Drugs Reactions from Tweets? BERT to The Rescue

Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task

RUREBUS-2020 Shared Task: Russian Relation Extraction for Business

Computational Linguistics and Intellectual Technologies, 2020

In this paper, we present a shared task on core information extraction problems, named entity rec... more In this paper, we present a shared task on core information extraction problems, named entity recognition and relation extraction. In contrast to popular shared tasks on related problems, we try to move away from strictly academic rigor and rather model a business case. As a source for textual data we choose the corpus of Russian strategic documents, which we annotated according to our own annotation scheme. To speed up the annotation process, we exploit various active learning techniques. In total we ended up with more than two hundred annotated documents. Thus we managed to create a high-quality data set in short time. The shared task consisted of three tracks, devoted to 1) named entity recognition, 2) relation extraction and 3) joint named entity recognition and relation extraction. We provided with the annotated texts as well as a set of unannotated texts, which could of been used in any way to improve solutions. In the paper we overview and compare solutions, submitted by the ...

Fair Evaluation in Concept Normalization: a Large-scale Comparative Analysis for BERT-based Models

Proceedings of the 28th International Conference on Computational Linguistics, 2020

The Russian Drug Reaction Corpus and neural models for drug reactions and effectiveness detection in user reviews

Bioinformatics

Motivation Drugs and diseases play a central role in many areas of biomedical research and health... more Motivation Drugs and diseases play a central role in many areas of biomedical research and healthcare. Aggregating knowledge about these entities across a broader range of domains and languages is critical for information extraction (IE) applications. To facilitate text mining methods for analysis and comparison of patient’s health conditions and adverse drug reactions reported on the Internet with traditional sources such as drug labels, we present a new corpus of Russian language health reviews. Results The Russian Drug Reaction Corpus (RuDReC) is a new partially annotated corpus of consumer reviews in Russian about pharmaceutical products for the detection of health-related named entities and the effectiveness of pharmaceutical products. The corpus itself consists of two parts, the raw one and the labeled one. The raw part includes 1.4 million health-related user-generated texts collected from various Internet sources, including social media. The labeled part contains 500 consume...

NEREL: A Russian Dataset with Nested Named Entities, Relations and Events

Proceedings of the Conference Recent Advances in Natural Language Processing - Deep Learning for Natural Language Processing Methods and Applications

Medical concept normalization in clinical trials with drug and disease representation learning

Bioinformatics

Motivation Clinical trials are the essential stage of every drug development program for the trea... more Motivation Clinical trials are the essential stage of every drug development program for the treatment to become available to patients. Despite the importance of well-structured clinical trial databases and their tremendous value for drug discovery and development such instances are very rare. Presently large-scale information on clinical trials is stored in clinical trial registers which are relatively structured, but the mappings to external databases of drugs and diseases are increasingly lacking. The precise production of such links would enable us to interrogate richer harmonized datasets for invaluable insights. Results We present a neural approach for medical concept normalization of diseases and drugs. Our two-stage approach is based on Bidirectional Encoder Representations from Transformers (BERT). In the training stage, we optimize the relative similarity of mentions and concept names from a terminology via triplet loss. In the inference stage, we obtain the closest concep...

AspeRa: Aspect-Based Rating Prediction Model

Lecture Notes in Computer Science

Automated Detection of Adverse Drug Reactions from Social Media Posts with Machine Learning

Lecture Notes in Computer Science

RuSimpleSentEval-2021 Shared Task: Evaluating Sentence Simplification for Russian

DeepADEMiner: a deep learning pharmacovigilance pipeline for extraction and normalization of adverse drug event mentions on Twitter

Journal of the American Medical Informatics Association

Objective Research on pharmacovigilance from social media data has focused on mining adverse drug... more Objective Research on pharmacovigilance from social media data has focused on mining adverse drug events (ADEs) using annotated datasets, with publications generally focusing on 1 of 3 tasks: ADE classification, named entity recognition for identifying the span of ADE mentions, and ADE mention normalization to standardized terminologies. While the common goal of such systems is to detect ADE signals that can be used to inform public policy, it has been impeded largely by limited end-to-end solutions for large-scale analysis of social media reports for different drugs. Materials and Methods We present a dataset for training and evaluation of ADE pipelines where the ADE distribution is closer to the average ‘natural balance’ with ADEs present in about 7% of the tweets. The deep learning architecture involves an ADE extraction pipeline with individual components for all 3 tasks. Results The system presented achieved state-of-the-art performance on comparable datasets and scored a class...

RecVAE: A New Variational Autoencoder for Top-N Recommendations with Implicit Feedback

Proceedings of the 13th International Conference on Web Search and Data Mining

A Large-Scale COVID-19 Twitter Chatter Dataset for Open Scientific Research—An International Collaboration

Epidemiologia

As the COVID-19 pandemic continues to spread worldwide, an unprecedented amount of open data is b... more As the COVID-19 pandemic continues to spread worldwide, an unprecedented amount of open data is being generated for medical, genetics, and epidemiological research. The unparalleled rate at which many research groups around the world are releasing data and publications on the ongoing pandemic is allowing other scientists to learn from local experiences and data generated on the front lines of the COVID-19 pandemic. However, there is a need to integrate additional data sources that map and measure the role of social dynamics of such a unique worldwide event in biomedical, biological, and epidemiological analyses. For this purpose, we present a large-scale curated dataset of over 1.12 billion tweets, growing daily, related to COVID-19 chatter generated from 1 January 2020 to 27 June 2021 at the time of writing. This data source provides a freely available additional data source for researchers worldwide to conduct a wide and diverse number of research projects, such as epidemiological...

KFU NLP Team at SMM4H 2021 Tasks: Cross-lingual and Cross-modal BERT-based Models for Adverse Drug Effects

Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task

Overview of the Sixth Social Media Mining for Health Applications (#SMM4H) Shared Tasks at NAACL 2021

Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task

DeepADEMiner: A Deep Learning Pharmacovigilance Pipeline for Extraction and Normalization of Adverse Drug Effect Mentions on Twitter

Objective: Research on pharmacovigilance from social media data has focused on mining adverse dru... more Objective: Research on pharmacovigilance from social media data has focused on mining adverse drug effects (ADEs) using annotated datasets, with publications generally focusing on one of three tasks: (i) ADE classification, (ii) named entity recognition (NER) for identifying the span of an ADE mentions, and (iii) ADE mention normalization to standardized vocabularies. While the common goal of such systems is to detect ADE signals that can be used to inform public policy, it has been impeded largely by limited end-to-end solutions to the three tasks for large-scale analysis of social media reports for different drugs. Materials and Methods: We present a dataset for training and evaluation of ADE pipelines where the ADE distribution is closer to the average `natural balance' with ADEs present in about 7% of the Tweets. The deep learning architecture involves an ADE extraction pipeline with individual components for all three tasks. Results: The system presented achieved a classifi...

RecVAE: A New Variational Autoencoder for Top-N Recommendations with Implicit Feedback

Proceedings of the 13th International Conference on Web Search and Data Mining, Jan 20, 2020

AspeRa: Aspect-Based Rating Prediction Model

Advances in Information Retrieval, 2019

Multiple features for clinical relation extraction: A machine learning approach

Journal of Biomedical Informatics

Detecting Adverse Drug Reactions from Biomedical Texts with Neural Networks

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

Deep Neural Models for Medical Concept Normalization in User-Generated Texts

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

KFU NLP Team at SMM4H 2019 Tasks: Want to Extract Adverse Drugs Reactions from Tweets? BERT to The Rescue

Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task