-
Control Policy Correction Framework for Reinforcement Learning-based Energy Arbitrage Strategies
Authors:
Seyed Soroush Karimi Madahi,
Gargya Gokhale,
Marie-Sophie Verwee,
Bert Claessens,
Chris Develder
Abstract:
A continuous rise in the penetration of renewable energy sources, along with the use of the single imbalance pricing, provides a new opportunity for balance responsible parties to reduce their cost through energy arbitrage in the imbalance settlement mechanism. Model-free reinforcement learning (RL) methods are an appropriate choice for solving the energy arbitrage problem due to their outstanding…
▽ More
A continuous rise in the penetration of renewable energy sources, along with the use of the single imbalance pricing, provides a new opportunity for balance responsible parties to reduce their cost through energy arbitrage in the imbalance settlement mechanism. Model-free reinforcement learning (RL) methods are an appropriate choice for solving the energy arbitrage problem due to their outstanding performance in solving complex stochastic sequential problems. However, RL is rarely deployed in real-world applications since its learned policy does not necessarily guarantee safety during the execution phase. In this paper, we propose a new RL-based control framework for batteries to obtain a safe energy arbitrage strategy in the imbalance settlement mechanism. In our proposed control framework, the agent initially aims to optimize the arbitrage revenue. Subsequently, in the post-processing step, we correct (constrain) the learned policy following a knowledge distillation process based on properties that follow human intuition. Our post-processing step is a generic method and is not restricted to the energy arbitrage domain. We use the Belgian imbalance price of 2023 to evaluate the performance of our proposed framework. Furthermore, we deploy our proposed control framework on a real battery to show its capability in the real world.
△ Less
Submitted 30 April, 2024; v1 submitted 29 April, 2024;
originally announced April 2024.
-
Probabilistic forecasting of power system imbalance using neural network-based ensembles
Authors:
Jonas Van Gompel,
Bert Claessens,
Chris Develder
Abstract:
Keeping the balance between electricity generation and consumption is becoming increasingly challenging and costly, mainly due to the rising share of renewables, electric vehicles and heat pumps and electrification of industrial processes. Accurate imbalance forecasts, along with reliable uncertainty estimations, enable transmission system operators (TSOs) to dispatch appropriate reserve volumes,…
▽ More
Keeping the balance between electricity generation and consumption is becoming increasingly challenging and costly, mainly due to the rising share of renewables, electric vehicles and heat pumps and electrification of industrial processes. Accurate imbalance forecasts, along with reliable uncertainty estimations, enable transmission system operators (TSOs) to dispatch appropriate reserve volumes, reducing balancing costs. Further, market parties can use these probabilistic forecasts to design strategies that exploit asset flexibility to help balance the grid, generating revenue with known risks. Despite its importance, literature regarding system imbalance (SI) forecasting is limited. Further, existing methods do not focus on situations with high imbalance magnitude, which are crucial to forecast accurately for both TSOs and market parties. Hence, we propose an ensemble of C-VSNs, which are our adaptation of variable selection networks (VSNs). Each minute, our model predicts the imbalance of the current and upcoming two quarter-hours, along with uncertainty estimations on these forecasts. We evaluate our approach by forecasting the imbalance of Belgium, where high imbalance magnitude is defined as $|$SI$| > 500\,$MW (occurs 1.3% of the time in Belgium). For high imbalance magnitude situations, our model outperforms the state-of-the-art by 23.4% (in terms of continuous ranked probability score (CRPS), which evaluates probabilistic forecasts), while also attaining a 6.5% improvement in overall CRPS. Similar improvements are achieved in terms of root-mean-squared error. Additionally, we developed a fine-tuning methodology to effectively include new inputs with limited history in our model. This work was performed in collaboration with Elia (the Belgian TSO) to further improve their imbalance forecasts, demonstrating the relevance of our work.
△ Less
Submitted 24 April, 2024; v1 submitted 23 April, 2024;
originally announced April 2024.
-
HomeLabGym: A real-world testbed for home energy management systems
Authors:
Toon Van Puyvelde,
Marie-Sophie Verwee,
Gargya Gokhale,
Mehran Zareh Eshghdoust,
Chris Develder
Abstract:
Amid growing environmental concerns and resulting energy costs, there is a rising need for efficient Home Energy Management Systems (HEMS). Evaluating such innovative HEMS solutions typically relies on simulations that may not model the full complexity of a real-world scenario. On the other hand, real-world testing, while more accurate, is labor-intensive, particularly when dealing with diverse as…
▽ More
Amid growing environmental concerns and resulting energy costs, there is a rising need for efficient Home Energy Management Systems (HEMS). Evaluating such innovative HEMS solutions typically relies on simulations that may not model the full complexity of a real-world scenario. On the other hand, real-world testing, while more accurate, is labor-intensive, particularly when dealing with diverse assets, each using a distinct communication protocol or API. Centralizing and synchronizing the control of such a heterogeneous pool of assets thus poses a significant challenge. In this paper, we introduce HomeLabGym, a real-world testbed to ease such real-world evaluations of HEMS and flexible assets control in general, by adhering to the well-known OpenAI Gym paradigm. HomeLabGym allows researchers to prototype, deploy, and analyze HEMS controllers within the controlled test environment of a real-world house (the IDLab HomeLab), providing access to all its available sensors and smart appliances. The easy-to-use Python interface eliminates concerns about intricate communication protocols associated with sensors and appliances, streamlining the evaluation of various control strategies. We present an overview of HomeLabGym, and demonstrate its usefulness to researchers in a comparison between real-world and simulated environments in controlling a residential battery in response to real-time prices.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Explainable Reinforcement Learning-based Home Energy Management Systems using Differentiable Decision Trees
Authors:
Gargya Gokhale,
Bert Claessens,
Chris Develder
Abstract:
With the ongoing energy transition, demand-side flexibility has become an important aspect of the modern power grid for providing grid support and allowing further integration of sustainable energy sources. Besides traditional sources, the residential sector is another major and largely untapped source of flexibility, driven by the increased adoption of solar PV, home batteries, and EVs. However,…
▽ More
With the ongoing energy transition, demand-side flexibility has become an important aspect of the modern power grid for providing grid support and allowing further integration of sustainable energy sources. Besides traditional sources, the residential sector is another major and largely untapped source of flexibility, driven by the increased adoption of solar PV, home batteries, and EVs. However, unlocking this residential flexibility is challenging as it requires a control framework that can effectively manage household energy consumption, and maintain user comfort while being readily scalable across different, diverse houses. We aim to address this challenging problem and introduce a reinforcement learning-based approach using differentiable decision trees. This approach integrates the scalability of data-driven reinforcement learning with the explainability of (differentiable) decision trees. This leads to a controller that can be easily adapted across different houses and provides a simple control policy that can be explained to end-users, further improving user acceptance. As a proof-of-concept, we analyze our method using a home energy management problem, comparing its performance with commercially available rule-based baseline and standard neural network-based RL controllers. Through this preliminary study, we show that the performance of our proposed method is comparable to standard RL-based controllers, outperforming baseline controllers by ~20% in terms of daily cost savings while being straightforward to explain.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Distill2Explain: Differentiable decision trees for explainable reinforcement learning in energy application controllers
Authors:
Gargya Gokhale,
Seyed Soroush Karimi Madahi,
Bert Claessens,
Chris Develder
Abstract:
Demand-side flexibility is gaining importance as a crucial element in the energy transition process. Accounting for about 25% of final energy consumption globally, the residential sector is an important (potential) source of energy flexibility. However, unlocking this flexibility requires developing a control framework that (1) easily scales across different houses, (2) is easy to maintain, and (3…
▽ More
Demand-side flexibility is gaining importance as a crucial element in the energy transition process. Accounting for about 25% of final energy consumption globally, the residential sector is an important (potential) source of energy flexibility. However, unlocking this flexibility requires developing a control framework that (1) easily scales across different houses, (2) is easy to maintain, and (3) is simple to understand for end-users. A potential control framework for such a task is data-driven control, specifically model-free reinforcement learning (RL). Such RL-based controllers learn a good control policy by interacting with their environment, learning purely based on data and with minimal human intervention. Yet, they lack explainability, which hampers user acceptance. Moreover, limited hardware capabilities of residential assets forms a hurdle (e.g., using deep neural networks). To overcome both those challenges, we propose a novel method to obtain explainable RL policies by using differentiable decision trees. Using a policy distillation approach, we train these differentiable decision trees to mimic standard RL-based controllers, leading to a decision tree-based control policy that is data-driven and easy to explain. As a proof-of-concept, we examine the performance and explainability of our proposed approach in a battery-based home energy management system to reduce energy costs. For this use case, we show that our proposed approach can outperform baseline rule-based policies by about 20-25%, while providing simple, explainable control policies. We further compare these explainable policies with standard RL policies and examine the performance trade-offs associated with this increased explainability.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
In-Context Learning for Extreme Multi-Label Classification
Authors:
Karel D'Oosterlinck,
Omar Khattab,
François Remy,
Thomas Demeester,
Chris Develder,
Christopher Potts
Abstract:
Multi-label classification problems with thousands of classes are hard to solve with in-context learning alone, as language models (LMs) might lack prior knowledge about the precise classes or how to assign them, and it is generally infeasible to demonstrate every class in a prompt. We propose a general program, $\texttt{Infer--Retrieve--Rank}$, that defines multi-step interactions between LMs and…
▽ More
Multi-label classification problems with thousands of classes are hard to solve with in-context learning alone, as language models (LMs) might lack prior knowledge about the precise classes or how to assign them, and it is generally infeasible to demonstrate every class in a prompt. We propose a general program, $\texttt{Infer--Retrieve--Rank}$, that defines multi-step interactions between LMs and retrievers to efficiently tackle such problems. We implement this program using the $\texttt{DSPy}$ programming model, which specifies in-context systems in a declarative manner, and use $\texttt{DSPy}$ optimizers to tune it towards specific datasets by bootstrapping only tens of few-shot examples. Our primary extreme classification program, optimized separately for each task, attains state-of-the-art results across three benchmarks (HOUSE, TECH, TECHWOLF). We apply the same program to a benchmark with vastly different characteristics and attain competitive performance as well (BioDEX). Unlike prior work, our proposed solution requires no finetuning, is easily applicable to new tasks, alleviates prompt engineering, and requires only tens of labeled examples. Our code is public at https://github.com/KarelDO/xmc.dspy.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
Distributional Reinforcement Learning-based Energy Arbitrage Strategies in Imbalance Settlement Mechanism
Authors:
Seyed Soroush Karimi Madahi,
Bert Claessens,
Chris Develder
Abstract:
Growth in the penetration of renewable energy sources makes supply more uncertain and leads to an increase in the system imbalance. This trend, together with the single imbalance pricing, opens an opportunity for balance responsible parties (BRPs) to perform energy arbitrage in the imbalance settlement mechanism. To this end, we propose a battery control framework based on distributional reinforce…
▽ More
Growth in the penetration of renewable energy sources makes supply more uncertain and leads to an increase in the system imbalance. This trend, together with the single imbalance pricing, opens an opportunity for balance responsible parties (BRPs) to perform energy arbitrage in the imbalance settlement mechanism. To this end, we propose a battery control framework based on distributional reinforcement learning (DRL). Our proposed control framework takes a risk-sensitive perspective, allowing BRPs to adjust their risk preferences: we aim to optimize a weighted sum of the arbitrage profit and a risk measure while constraining the daily number of cycles for the battery. We assess the performance of our proposed control framework using the Belgian imbalance prices of 2022 and compare two state-of-the-art RL methods, deep Q learning and soft actor-critic. Results reveal that the distributional soft actor-critic method can outperform other methods. Moreover, we note that our fully risk-averse agent appropriately learns to hedge against the risk related to the unknown imbalance price by (dis)charging the battery only when the agent is more certain about the price.
△ Less
Submitted 23 December, 2023;
originally announced January 2024.
-
Demand response for residential building heating: Effective Monte Carlo Tree Search control based on physics-informed neural networks
Authors:
Fabio Pavirani,
Gargya Gokhale,
Bert Claessens,
Chris Develder
Abstract:
To reduce global carbon emissions and limit climate change, controlling energy consumption in buildings is an important piece of the puzzle. Here, we specifically focus on using a demand response (DR) algorithm to limit the energy consumption of a residential building's heating system while respecting user's thermal comfort. In that domain, Reinforcement learning (RL) methods have been shown to be…
▽ More
To reduce global carbon emissions and limit climate change, controlling energy consumption in buildings is an important piece of the puzzle. Here, we specifically focus on using a demand response (DR) algorithm to limit the energy consumption of a residential building's heating system while respecting user's thermal comfort. In that domain, Reinforcement learning (RL) methods have been shown to be quite effective. One such RL method is Monte Carlo Tree Search (MCTS), which has achieved impressive success in playing board games (go, chess). A particular advantage of MCTS is that its decision tree structure naturally allows to integrate exogenous constraints (e.g., by trimming branches that violate them), while conventional RL solutions need more elaborate techniques (e.g., indirectly by adding penalties in the cost/reward function, or through a backup controller that corrects constraint-violating actions). The main aim of this paper is to study the adoption of MCTS for building control, since this (to the best of our knowledge) has remained largely unexplored. A specific property of MCTS is that it needs a simulator component that can predict subsequent system states, based on actions taken. A straightforward data-driven solution is to use black-box neural networks (NNs). We will however extend a Physics-informed Neural Network (PiNN) model to deliver multi-timestep predictions, and show the benefit it offers in terms of lower prediction errors ($-$32\% MAE) as well as better MCTS performance ($-$4\% energy cost, $+$7\% thermal comfort) compared to a black-box NN. A second contribution will be to extend a vanilla MCTS version to adopt the ideas applied in AlphaZero (i.e., using learned prior and value functions and an action selection heuristic) to obtain lower computational costs while maintaining control performance.
△ Less
Submitted 21 May, 2024; v1 submitted 6 December, 2023;
originally announced December 2023.
-
Flexible Model Interpretability through Natural Language Model Editing
Authors:
Karel D'Oosterlinck,
Thomas Demeester,
Chris Develder,
Christopher Potts
Abstract:
Model interpretability and model editing are crucial goals in the age of large language models. Interestingly, there exists a link between these two goals: if a method is able to systematically edit model behavior with regard to a human concept of interest, this editor method can help make internal representations more interpretable by pointing towards relevant representations and systematically m…
▽ More
Model interpretability and model editing are crucial goals in the age of large language models. Interestingly, there exists a link between these two goals: if a method is able to systematically edit model behavior with regard to a human concept of interest, this editor method can help make internal representations more interpretable by pointing towards relevant representations and systematically manipulating them.
△ Less
Submitted 17 November, 2023;
originally announced November 2023.
-
Zero-Shot Cross-Lingual Sentiment Classification under Distribution Shift: an Exploratory Study
Authors:
Maarten De Raedt,
Semere Kiros Bitew,
Fréderic Godin,
Thomas Demeester,
Chris Develder
Abstract:
The brittleness of finetuned language model performance on out-of-distribution (OOD) test samples in unseen domains has been well-studied for English, yet is unexplored for multi-lingual models. Therefore, we study generalization to OOD test data specifically in zero-shot cross-lingual transfer settings, analyzing performance impacts of both language and domain shifts between train and test data.…
▽ More
The brittleness of finetuned language model performance on out-of-distribution (OOD) test samples in unseen domains has been well-studied for English, yet is unexplored for multi-lingual models. Therefore, we study generalization to OOD test data specifically in zero-shot cross-lingual transfer settings, analyzing performance impacts of both language and domain shifts between train and test data. We further assess the effectiveness of counterfactually augmented data (CAD) in improving OOD generalization for the cross-lingual setting, since CAD has been shown to benefit in a monolingual English setting. Finally, we propose two new approaches for OOD generalization that avoid the costly annotation process associated with CAD, by exploiting the power of recent large language models (LLMs). We experiment with 3 multilingual models, LaBSE, mBERT, and XLM-R trained on English IMDb movie reviews, and evaluate on OOD test sets in 13 languages: Amazon product reviews, Tweets, and Restaurant reviews. Results echo the OOD performance decline observed in the monolingual English setting. Further, (i) counterfactuals from the original high-resource language do improve OOD generalization in the low-resource language, and (ii) our newly proposed cost-effective approaches reach similar or up to +3.1% better accuracy than CAD for Amazon and Restaurant reviews.
△ Less
Submitted 11 November, 2023;
originally announced November 2023.
-
Personality Style Recognition via Machine Learning: Identifying Anaclitic and Introjective Personality Styles from Patients' Speech
Authors:
Semere Kiros Bitew,
Vincent Schelstraete,
Klim Zaporojets,
Kimberly Van Nieuwenhove,
Reitske Meganck,
Chris Develder
Abstract:
In disentangling the heterogeneity observed in psychopathology, personality of the patients is considered crucial. While it has been demonstrated that personality traits are reflected in the language used by a patient, we hypothesize that this enables automatic inference of the personality type directly from speech utterances, potentially more accurately than through a traditional questionnaire-ba…
▽ More
In disentangling the heterogeneity observed in psychopathology, personality of the patients is considered crucial. While it has been demonstrated that personality traits are reflected in the language used by a patient, we hypothesize that this enables automatic inference of the personality type directly from speech utterances, potentially more accurately than through a traditional questionnaire-based approach explicitly designed for personality classification. To validate this hypothesis, we adopt natural language processing (NLP) and standard machine learning tools for classification. We test this on a dataset of recorded clinical diagnostic interviews (CDI) on a sample of 79 patients diagnosed with major depressive disorder (MDD) -- a condition for which differentiated treatment based on personality styles has been advocated -- and classified into anaclitic and introjective personality styles. We start by analyzing the interviews to see which linguistic features are associated with each style, in order to gain a better understanding of the styles. Then, we develop automatic classifiers based on (a) standardized questionnaire responses; (b) basic text features, i.e., TF-IDF scores of words and word sequences; (c) more advanced text features, using LIWC (linguistic inquiry and word count) and context-aware features using BERT (bidirectional encoder representations from transformers); (d) audio features. We find that automated classification with language-derived features (i.e., based on LIWC) significantly outperforms questionnaire-based classification models. Furthermore, the best performance is achieved by combining LIWC with the questionnaire features. This suggests that more work should be put into developing linguistically based automated techniques for characterizing personality, however questionnaires still to some extent complement such methods.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
Transfer Learning in Transformer-Based Demand Forecasting For Home Energy Management System
Authors:
Gargya Gokhale,
Jonas Van Gompel,
Bert Claessens,
Chris Develder
Abstract:
Increasingly, homeowners opt for photovoltaic (PV) systems and/or battery storage to minimize their energy bills and maximize renewable energy usage. This has spurred the development of advanced control algorithms that maximally achieve those goals. However, a common challenge faced while developing such controllers is the unavailability of accurate forecasts of household power consumption, especi…
▽ More
Increasingly, homeowners opt for photovoltaic (PV) systems and/or battery storage to minimize their energy bills and maximize renewable energy usage. This has spurred the development of advanced control algorithms that maximally achieve those goals. However, a common challenge faced while developing such controllers is the unavailability of accurate forecasts of household power consumption, especially for shorter time resolutions (15 minutes) and in a data-efficient manner. In this paper, we analyze how transfer learning can help by exploiting data from multiple households to improve a single house's load forecasting. Specifically, we train an advanced forecasting model (a temporal fusion transformer) using data from multiple different households, and then finetune this global model on a new household with limited data (i.e. only a few days). The obtained models are used for forecasting power consumption of the household for the next 24 hours~(day-ahead) at a time resolution of 15 minutes, with the intention of using these forecasts in advanced controllers such as Model Predictive Control. We show the benefit of this transfer learning setup versus solely using the individual new household's data, both in terms of (i) forecasting accuracy ($\sim$15\% MAE reduction) and (ii) control performance ($\sim$2\% energy cost reduction), using real-world household data.
△ Less
Submitted 29 October, 2023;
originally announced October 2023.
-
Real-World Implementation of Reinforcement Learning Based Energy Coordination for a Cluster of Households
Authors:
Gargya Gokhale,
Niels Tiben,
Marie-Sophie Verwee,
Manu Lahariya,
Bert Claessens,
Chris Develder
Abstract:
Given its substantial contribution of 40\% to global power consumption, the built environment has received increasing attention to serve as a source of flexibility to assist the modern power grid. In that respect, previous research mainly focused on energy management of individual buildings. In contrast, in this paper, we focus on aggregated control of a set of residential buildings, to provide gr…
▽ More
Given its substantial contribution of 40\% to global power consumption, the built environment has received increasing attention to serve as a source of flexibility to assist the modern power grid. In that respect, previous research mainly focused on energy management of individual buildings. In contrast, in this paper, we focus on aggregated control of a set of residential buildings, to provide grid supporting services, that eventually should include ancillary services. In particular, we present a real-life pilot study that studies the effectiveness of reinforcement-learning (RL) in coordinating the power consumption of 8 residential buildings to jointly track a target power signal. Our RL approach relies solely on observed data from individual households and does not require any explicit building models or simulators, making it practical to implement and easy to scale. We show the feasibility of our proposed RL-based coordination strategy in a real-world setting. In a 4-week case study, we demonstrate a hierarchical control system, relying on an RL-based ranking system to select which households to activate flex assets from, and a real-time PI control-based power dispatch mechanism to control the selected assets. Our results demonstrate satisfactory power tracking, and the effectiveness of the RL-based ranks which are learnt in a purely data-driven manner.
△ Less
Submitted 29 October, 2023;
originally announced October 2023.
-
Career Path Prediction using Resume Representation Learning and Skill-based Matching
Authors:
Jens-Joris Decorte,
Jeroen Van Hautte,
Johannes Deleu,
Chris Develder,
Thomas Demeester
Abstract:
The impact of person-job fit on job satisfaction and performance is widely acknowledged, which highlights the importance of providing workers with next steps at the right time in their career. This task of predicting the next step in a career is known as career path prediction, and has diverse applications such as turnover prevention and internal job mobility. Existing methods to career path predi…
▽ More
The impact of person-job fit on job satisfaction and performance is widely acknowledged, which highlights the importance of providing workers with next steps at the right time in their career. This task of predicting the next step in a career is known as career path prediction, and has diverse applications such as turnover prevention and internal job mobility. Existing methods to career path prediction rely on large amounts of private career history data to model the interactions between job titles and companies. We propose leveraging the unexplored textual descriptions that are part of work experience sections in resumes. We introduce a structured dataset of 2,164 anonymized career histories, annotated with ESCO occupation labels. Based on this dataset, we present a novel representation learning approach, CareerBERT, specifically designed for work history data. We develop a skill-based model and a text-based model for career path prediction, which achieve 35.24% and 39.61% recall@10 respectively on our dataset. Finally, we show that both approaches are complementary as a hybrid approach achieves the strongest result with 43.01% recall@10.
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
CAW-coref: Conjunction-Aware Word-level Coreference Resolution
Authors:
Karel D'Oosterlinck,
Semere Kiros Bitew,
Brandon Papineau,
Christopher Potts,
Thomas Demeester,
Chris Develder
Abstract:
State-of-the-art coreference resolutions systems depend on multiple LLM calls per document and are thus prohibitively expensive for many use cases (e.g., information extraction with large corpora). The leading word-level coreference system (WL-coref) attains 96.6% of these SOTA systems' performance while being much more efficient. In this work, we identify a routine yet important failure case of W…
▽ More
State-of-the-art coreference resolutions systems depend on multiple LLM calls per document and are thus prohibitively expensive for many use cases (e.g., information extraction with large corpora). The leading word-level coreference system (WL-coref) attains 96.6% of these SOTA systems' performance while being much more efficient. In this work, we identify a routine yet important failure case of WL-coref: dealing with conjoined mentions such as 'Tom and Mary'. We offer a simple yet effective solution that improves the performance on the OntoNotes test set by 0.9% F1, shrinking the gap between efficient word-level coreference resolution and expensive SOTA approaches by 34.6%. Our Conjunction-Aware Word-level coreference model (CAW-coref) and code is available at https://github.com/KarelDO/wl-coref.
△ Less
Submitted 19 October, 2023; v1 submitted 9 October, 2023;
originally announced October 2023.
-
Distractor generation for multiple-choice questions with predictive prompting and large language models
Authors:
Semere Kiros Bitew,
Johannes Deleu,
Chris Develder,
Thomas Demeester
Abstract:
Large Language Models (LLMs) such as ChatGPT have demonstrated remarkable performance across various tasks and have garnered significant attention from both researchers and practitioners. However, in an educational context, we still observe a performance gap in generating distractors -- i.e., plausible yet incorrect answers -- with LLMs for multiple-choice questions (MCQs). In this study, we propo…
▽ More
Large Language Models (LLMs) such as ChatGPT have demonstrated remarkable performance across various tasks and have garnered significant attention from both researchers and practitioners. However, in an educational context, we still observe a performance gap in generating distractors -- i.e., plausible yet incorrect answers -- with LLMs for multiple-choice questions (MCQs). In this study, we propose a strategy for guiding LLMs such as ChatGPT, in generating relevant distractors by prompting them with question items automatically retrieved from a question bank as well-chosen in-context examples. We evaluate our LLM-based solutions using a quantitative assessment on an existing test set, as well as through quality annotations by human experts, i.e., teachers. We found that on average 53% of the generated distractors presented to the teachers were rated as high-quality, i.e., suitable for immediate use as is, outperforming the state-of-the-art model. We also show the gains of our approach 1 in generating high-quality distractors by comparing it with a zero-shot ChatGPT and a few-shot ChatGPT prompted with static examples.
△ Less
Submitted 30 July, 2023;
originally announced July 2023.
-
Extreme Multi-Label Skill Extraction Training using Large Language Models
Authors:
Jens-Joris Decorte,
Severine Verlinden,
Jeroen Van Hautte,
Johannes Deleu,
Chris Develder,
Thomas Demeester
Abstract:
Online job ads serve as a valuable source of information for skill requirements, playing a crucial role in labor market analysis and e-recruitment processes. Since such ads are typically formatted in free text, natural language processing (NLP) technologies are required to automatically process them. We specifically focus on the task of detecting skills (mentioned literally, or implicitly describe…
▽ More
Online job ads serve as a valuable source of information for skill requirements, playing a crucial role in labor market analysis and e-recruitment processes. Since such ads are typically formatted in free text, natural language processing (NLP) technologies are required to automatically process them. We specifically focus on the task of detecting skills (mentioned literally, or implicitly described) and linking them to a large skill ontology, making it a challenging case of extreme multi-label classification (XMLC). Given that there is no sizable labeled (training) dataset are available for this specific XMLC task, we propose techniques to leverage general Large Language Models (LLMs). We describe a cost-effective approach to generate an accurate, fully synthetic labeled dataset for skill extraction, and present a contrastive learning strategy that proves effective in the task. Our results across three skill extraction benchmarks show a consistent increase of between 15 to 25 percentage points in \textit{R-Precision@5} compared to previously published results that relied solely on distant supervision through literal matches.
△ Less
Submitted 20 July, 2023;
originally announced July 2023.
-
Learning from Partially Annotated Data: Example-aware Creation of Gap-filling Exercises for Language Learning
Authors:
Semere Kiros Bitew,
Johannes Deleu,
A. Seza Doğruöz,
Chris Develder,
Thomas Demeester
Abstract:
Since performing exercises (including, e.g., practice tests) forms a crucial component of learning, and creating such exercises requires non-trivial effort from the teacher, there is a great value in automatic exercise generation in digital tools in education. In this paper, we particularly focus on automatic creation of gapfilling exercises for language learning, specifically grammar exercises. S…
▽ More
Since performing exercises (including, e.g., practice tests) forms a crucial component of learning, and creating such exercises requires non-trivial effort from the teacher, there is a great value in automatic exercise generation in digital tools in education. In this paper, we particularly focus on automatic creation of gapfilling exercises for language learning, specifically grammar exercises. Since providing any annotation in this domain requires human expert effort, we aim to avoid it entirely and explore the task of converting existing texts into new gap-filling exercises, purely based on an example exercise, without explicit instruction or detailed annotation of the intended grammar topics. We contribute (i) a novel neural network architecture specifically designed for aforementioned gap-filling exercise generation task, and (ii) a real-world benchmark dataset for French grammar. We show that our model for this French grammar gap-filling exercise generation outperforms a competitive baseline classifier by 8% in F1 percentage points, achieving an average F1 score of 82%. Our model implementation and the dataset are made publicly available to foster future research, thus offering a standardized evaluation and baseline solution of the proposed partially annotated data prediction task in grammar exercise creation.
△ Less
Submitted 15 June, 2023; v1 submitted 2 June, 2023;
originally announced June 2023.
-
IDAS: Intent Discovery with Abstractive Summarization
Authors:
Maarten De Raedt,
Fréderic Godin,
Thomas Demeester,
Chris Develder
Abstract:
Intent discovery is the task of inferring latent intents from a set of unlabeled utterances, and is a useful step towards the efficient creation of new conversational agents. We show that recent competitive methods in intent discovery can be outperformed by clustering utterances based on abstractive summaries, i.e., "labels", that retain the core elements while removing non-essential information.…
▽ More
Intent discovery is the task of inferring latent intents from a set of unlabeled utterances, and is a useful step towards the efficient creation of new conversational agents. We show that recent competitive methods in intent discovery can be outperformed by clustering utterances based on abstractive summaries, i.e., "labels", that retain the core elements while removing non-essential information. We contribute the IDAS approach, which collects a set of descriptive utterance labels by prompting a Large Language Model, starting from a well-chosen seed set of prototypical utterances, to bootstrap an In-Context Learning procedure to generate labels for non-prototypical utterances. The utterances and their resulting noisy labels are then encoded by a frozen pre-trained encoder, and subsequently clustered to recover the latent intents. For the unsupervised task (without any intent labels) IDAS outperforms the state-of-the-art by up to +7.42% in standard cluster metrics for the Banking, StackOverflow, and Transport datasets. For the semi-supervised task (with labels for a subset of intents) IDAS surpasses 2 recent methods on the CLINC benchmark without even using labeled data.
△ Less
Submitted 31 May, 2023;
originally announced May 2023.
-
BioDEX: Large-Scale Biomedical Adverse Drug Event Extraction for Real-World Pharmacovigilance
Authors:
Karel D'Oosterlinck,
François Remy,
Johannes Deleu,
Thomas Demeester,
Chris Develder,
Klim Zaporojets,
Aneiss Ghodsi,
Simon Ellershaw,
Jack Collins,
Christopher Potts
Abstract:
Timely and accurate extraction of Adverse Drug Events (ADE) from biomedical literature is paramount for public safety, but involves slow and costly manual labor. We set out to improve drug safety monitoring (pharmacovigilance, PV) through the use of Natural Language Processing (NLP). We introduce BioDEX, a large-scale resource for Biomedical adverse Drug Event Extraction, rooted in the historical…
▽ More
Timely and accurate extraction of Adverse Drug Events (ADE) from biomedical literature is paramount for public safety, but involves slow and costly manual labor. We set out to improve drug safety monitoring (pharmacovigilance, PV) through the use of Natural Language Processing (NLP). We introduce BioDEX, a large-scale resource for Biomedical adverse Drug Event Extraction, rooted in the historical output of drug safety reporting in the U.S. BioDEX consists of 65k abstracts and 19k full-text biomedical papers with 256k associated document-level safety reports created by medical experts. The core features of these reports include the reported weight, age, and biological sex of a patient, a set of drugs taken by the patient, the drug dosages, the reactions experienced, and whether the reaction was life threatening. In this work, we consider the task of predicting the core information of the report given its originating paper. We estimate human performance to be 72.0% F1, whereas our best model achieves 62.3% F1, indicating significant headroom on this task. We also begin to explore ways in which these models could help professional PV reviewers. Our code and data are available: https://github.com/KarelDO/BioDEX.
△ Less
Submitted 20 October, 2023; v1 submitted 22 May, 2023;
originally announced May 2023.
-
TempEL: Linking Dynamically Evolving and Newly Emerging Entities
Authors:
Klim Zaporojets,
Lucie-Aimee Kaffee,
Johannes Deleu,
Thomas Demeester,
Chris Develder,
Isabelle Augenstein
Abstract:
In our continuously evolving world, entities change over time and new, previously non-existing or unknown, entities appear. We study how this evolutionary scenario impacts the performance on a well established entity linking (EL) task. For that study, we introduce TempEL, an entity linking dataset that consists of time-stratified English Wikipedia snapshots from 2013 to 2022, from which we collect…
▽ More
In our continuously evolving world, entities change over time and new, previously non-existing or unknown, entities appear. We study how this evolutionary scenario impacts the performance on a well established entity linking (EL) task. For that study, we introduce TempEL, an entity linking dataset that consists of time-stratified English Wikipedia snapshots from 2013 to 2022, from which we collect both anchor mentions of entities, and these target entities' descriptions. By capturing such temporal aspects, our newly introduced TempEL resource contrasts with currently existing entity linking datasets, which are composed of fixed mentions linked to a single static version of a target Knowledge Base (e.g., Wikipedia 2010 for CoNLL-AIDA). Indeed, for each of our collected temporal snapshots, TempEL contains links to entities that are continual, i.e., occur in all of the years, as well as completely new entities that appear for the first time at some point. Thus, we enable to quantify the performance of current state-of-the-art EL models for: (i) entities that are subject to changes over time in their Knowledge Base descriptions as well as their mentions' contexts, and (ii) newly created entities that were previously non-existing (e.g., at the time the EL model was trained). Our experimental results show that in terms of temporal performance degradation, (i) continual entities suffer a decrease of up to 3.1% EL accuracy, while (ii) for new entities this accuracy drop is up to 17.9%. This highlights the challenge of the introduced TempEL dataset and opens new research prospects in the area of time-evolving entity disambiguation.
△ Less
Submitted 5 February, 2023;
originally announced February 2023.
-
Learning to Reuse Distractors to support Multiple Choice Question Generation in Education
Authors:
Semere Kiros Bitew,
Amir Hadifar,
Lucas Sterckx,
Johannes Deleu,
Chris Develder,
Thomas Demeester
Abstract:
Multiple choice questions (MCQs) are widely used in digital learning systems, as they allow for automating the assessment process. However, due to the increased digital literacy of students and the advent of social media platforms, MCQ tests are widely shared online, and teachers are continuously challenged to create new questions, which is an expensive and time-consuming task. A particularly sens…
▽ More
Multiple choice questions (MCQs) are widely used in digital learning systems, as they allow for automating the assessment process. However, due to the increased digital literacy of students and the advent of social media platforms, MCQ tests are widely shared online, and teachers are continuously challenged to create new questions, which is an expensive and time-consuming task. A particularly sensitive aspect of MCQ creation is to devise relevant distractors, i.e., wrong answers that are not easily identifiable as being wrong. This paper studies how a large existing set of manually created answers and distractors for questions over a variety of domains, subjects, and languages can be leveraged to help teachers in creating new MCQs, by the smart reuse of existing distractors. We built several data-driven models based on context-aware question and distractor representations, and compared them with static feature-based models. The proposed models are evaluated with automated metrics and in a realistic user test with teachers. Both automatic and human evaluations indicate that context-aware models consistently outperform a static feature-based approach. For our best-performing context-aware model, on average 3 distractors out of the 10 shown to teachers were rated as high-quality distractors. We create a performance benchmark, and make it public, to enable comparison between different approaches and to introduce a more standardized evaluation of the task. The benchmark contains a test of 298 educational questions covering multiple subjects & languages and a 77k multilingual pool of distractor vocabulary for future research.
△ Less
Submitted 13 December, 2022; v1 submitted 25 October, 2022;
originally announced October 2022.
-
Robustifying Sentiment Classification by Maximally Exploiting Few Counterfactuals
Authors:
Maarten De Raedt,
Fréderic Godin,
Chris Develder,
Thomas Demeester
Abstract:
For text classification tasks, finetuned language models perform remarkably well. Yet, they tend to rely on spurious patterns in training data, thus limiting their performance on out-of-distribution (OOD) test data. Among recent models aiming to avoid this spurious pattern problem, adding extra counterfactual samples to the training data has proven to be very effective. Yet, counterfactual data ge…
▽ More
For text classification tasks, finetuned language models perform remarkably well. Yet, they tend to rely on spurious patterns in training data, thus limiting their performance on out-of-distribution (OOD) test data. Among recent models aiming to avoid this spurious pattern problem, adding extra counterfactual samples to the training data has proven to be very effective. Yet, counterfactual data generation is costly since it relies on human annotation. Thus, we propose a novel solution that only requires annotation of a small fraction (e.g., 1%) of the original training data, and uses automatic generation of extra counterfactuals in an encoding vector space. We demonstrate the effectiveness of our approach in sentiment classification, using IMDb data for training and other sets for OOD tests (i.e., Amazon, SemEval and Yelp). We achieve noticeable accuracy improvements by adding only 1% manual counterfactuals: +3% compared to adding +100% in-distribution training samples, +1.3% compared to alternate counterfactual approaches.
△ Less
Submitted 21 October, 2022;
originally announced October 2022.
-
EduQG: A Multi-format Multiple Choice Dataset for the Educational Domain
Authors:
Amir Hadifar,
Semere Kiros Bitew,
Johannes Deleu,
Chris Develder,
Thomas Demeester
Abstract:
We introduce a high-quality dataset that contains 3,397 samples comprising (i) multiple choice questions, (ii) answers (including distractors), and (iii) their source documents, from the educational domain. Each question is phrased in two forms, normal and close. Correct answers are linked to source documents with sentence-level annotations. Thus, our versatile dataset can be used for both questio…
▽ More
We introduce a high-quality dataset that contains 3,397 samples comprising (i) multiple choice questions, (ii) answers (including distractors), and (iii) their source documents, from the educational domain. Each question is phrased in two forms, normal and close. Correct answers are linked to source documents with sentence-level annotations. Thus, our versatile dataset can be used for both question and distractor generation, as well as to explore new challenges such as question format conversion. Furthermore, 903 questions are accompanied by their cognitive complexity level as per Bloom's taxonomy. All questions have been generated by educational experts rather than crowd workers to ensure they are maintaining educational and learning standards. Our analysis and experiments suggest distinguishable differences between our dataset and commonly used ones for question generation for educational purposes. We believe this new dataset can serve as a valuable resource for research and evaluation in the educational domain. The dataset and baselines will be released to support further research in question generation.
△ Less
Submitted 12 October, 2022;
originally announced October 2022.
-
Design of Negative Sampling Strategies for Distantly Supervised Skill Extraction
Authors:
Jens-Joris Decorte,
Jeroen Van Hautte,
Johannes Deleu,
Chris Develder,
Thomas Demeester
Abstract:
Skills play a central role in the job market and many human resources (HR) processes. In the wake of other digital experiences, today's online job market has candidates expecting to see the right opportunities based on their skill set. Similarly, enterprises increasingly need to use data to guarantee that the skills within their workforce remain future-proof. However, structured information about…
▽ More
Skills play a central role in the job market and many human resources (HR) processes. In the wake of other digital experiences, today's online job market has candidates expecting to see the right opportunities based on their skill set. Similarly, enterprises increasingly need to use data to guarantee that the skills within their workforce remain future-proof. However, structured information about skills is often missing, and processes building on self- or manager-assessment have shown to struggle with issues around adoption, completeness, and freshness of the resulting data. Extracting skills is a highly challenging task, given the many thousands of possible skill labels mentioned either explicitly or merely described implicitly and the lack of finely annotated training corpora. Previous work on skill extraction overly simplifies the task to an explicit entity detection task or builds on manually annotated training data that would be infeasible if applied to a complete vocabulary of skills. We propose an end-to-end system for skill extraction, based on distant supervision through literal matching. We propose and evaluate several negative sampling strategies, tuned on a small validation dataset, to improve the generalization of skill extraction towards implicitly mentioned skills, despite the lack of such implicit skills in the distantly supervised data. We observe that using the ESCO taxonomy to select negative examples from related skills yields the biggest improvements, and combining three different strategies in one model further increases the performance, up to 8 percentage points in RP@5. We introduce a manually annotated evaluation benchmark for skill extraction based on the ESCO taxonomy, on which we validate our models. We release the benchmark dataset for research purposes to stimulate further research on the task.
△ Less
Submitted 13 September, 2022;
originally announced September 2022.
-
CookDial: A dataset for task-oriented dialogs grounded in procedural documents
Authors:
Yiwei Jiang,
Klim Zaporojets,
Johannes Deleu,
Thomas Demeester,
Chris Develder
Abstract:
This work presents a new dialog dataset, CookDial, that facilitates research on task-oriented dialog systems with procedural knowledge understanding. The corpus contains 260 human-to-human task-oriented dialogs in which an agent, given a recipe document, guides the user to cook a dish. Dialogs in CookDial exhibit two unique features: (i) procedural alignment between the dialog flow and supporting…
▽ More
This work presents a new dialog dataset, CookDial, that facilitates research on task-oriented dialog systems with procedural knowledge understanding. The corpus contains 260 human-to-human task-oriented dialogs in which an agent, given a recipe document, guides the user to cook a dish. Dialogs in CookDial exhibit two unique features: (i) procedural alignment between the dialog flow and supporting document; (ii) complex agent decision-making that involves segmenting long sentences, paraphrasing hard instructions and resolving coreference in the dialog context. In addition, we identify three challenging (sub)tasks in the assumed task-oriented dialog system: (1) User Question Understanding, (2) Agent Action Frame Prediction, and (3) Agent Response Generation. For each of these tasks, we develop a neural baseline model, which we evaluate on the CookDial dataset. We publicly release the CookDial dataset, comprising rich annotations of both dialogs and recipe documents, to stimulate further research on domain-specific document-grounded dialog systems.
△ Less
Submitted 17 June, 2022;
originally announced June 2022.
-
Computationally efficient joint coordination of multiple electric vehicle charging points using reinforcement learning
Authors:
Manu Lahariya,
Nasrin Sadeghianpourhamami,
Chris Develder
Abstract:
A major challenge in todays power grid is to manage the increasing load from electric vehicle (EV) charging. Demand response (DR) solutions aim to exploit flexibility therein, i.e., the ability to shift EV charging in time and thus avoid excessive peaks or achieve better balancing. Whereas the majority of existing research works either focus on control strategies for a single EV charger, or use a…
▽ More
A major challenge in todays power grid is to manage the increasing load from electric vehicle (EV) charging. Demand response (DR) solutions aim to exploit flexibility therein, i.e., the ability to shift EV charging in time and thus avoid excessive peaks or achieve better balancing. Whereas the majority of existing research works either focus on control strategies for a single EV charger, or use a multi-step approach (e.g., a first high level aggregate control decision step, followed by individual EV control decisions), we rather propose a single-step solution that jointly coordinates multiple charging points at once. In this paper, we further refine an initial proposal using reinforcement learning (RL), specifically addressing computational challenges that would limit its deployment in practice. More precisely, we design a new Markov decision process (MDP) formulation of the EV charging coordination process, exhibiting only linear space and time complexity (as opposed to the earlier quadratic space complexity). We thus improve upon earlier state-of-the-art, demonstrating 30% reduction of training time in our case study using real-world EV charging session data. Yet, we do not sacrifice the resulting performance in meeting the DR objectives: our new RL solutions still improve the performance of charging demand coordination by 40-50% compared to a business-as-usual policy (that charges EV fully upon arrival) and 20-30% compared to a heuristic policy (that uniformly spreads individual EV charging over time).
△ Less
Submitted 26 March, 2022;
originally announced March 2022.
-
Optimized cost function for demand response coordination of multiple EV charging stations using reinforcement learning
Authors:
Manu Lahariya,
Nasrin Sadeghianpourhamami,
Chris Develder
Abstract:
Electric vehicle (EV) charging stations represent a substantial load with significant flexibility. The exploitation of that flexibility in demand response (DR) algorithms becomes increasingly important to manage and balance demand and supply in power grids. Model-free DR based on reinforcement learning (RL) is an attractive approach to balance such EV charging load. We build on previous research o…
▽ More
Electric vehicle (EV) charging stations represent a substantial load with significant flexibility. The exploitation of that flexibility in demand response (DR) algorithms becomes increasingly important to manage and balance demand and supply in power grids. Model-free DR based on reinforcement learning (RL) is an attractive approach to balance such EV charging load. We build on previous research on RL, based on a Markov decision process (MDP) to simultaneously coordinate multiple charging stations. However, we note that the computationally expensive cost function adopted in the previous research leads to large training times, which limits the feasibility and practicality of the approach. We, therefore, propose an improved cost function that essentially forces the learned control policy to always fulfill any charging demand that does not offer any flexibility. We rigorously compare the newly proposed batch RL fitted Q-iteration implementation with the original (costly) one, using real-world data. Specifically, for the case of load flattening, we compare the two approaches in terms of (i) the processing time to learn the RL-based charging policy, as well as (ii) the overall performance of the policy decisions in terms of meeting the target load for unseen test data. The performance is analyzed for different training periods and varying training sample sizes. In addition to both RL policies performance results, we provide performance bounds in terms of both (i) an optimal all-knowing strategy, and (ii) a simple heuristic spreading individual EV charging uniformly over time
△ Less
Submitted 3 March, 2022;
originally announced March 2022.
-
Defining a synthetic data generator for realistic electric vehicle charging sessions
Authors:
Manu Lahariya,
Dries Benoit,
Chris Develder
Abstract:
Electric vehicle (EV) charging stations have become prominent in electricity grids in the past years. Analysis of EV charging sessions is useful for flexibility analysis, load balancing, offering incentives to customers, etc. Yet, the limited availability of such EV sessions data hinders further development in these fields. Addressing this need for publicly available and realistic data, we develop…
▽ More
Electric vehicle (EV) charging stations have become prominent in electricity grids in the past years. Analysis of EV charging sessions is useful for flexibility analysis, load balancing, offering incentives to customers, etc. Yet, the limited availability of such EV sessions data hinders further development in these fields. Addressing this need for publicly available and realistic data, we develop a synthetic data generator (SDG) for EV charging sessions. Our SDG assumes the EV inter-arrival time to follow an exponential distribution. Departure times are modeled by defining a conditional probability density function (pdf) for connection times. This pdf for connection time and required energy is fitted by Gaussian mixture models. Since we train our SDG using a large real-world dataset, its output is realistic.
△ Less
Submitted 28 February, 2022;
originally announced March 2022.
-
Learning physics-informed simulation models for soft robotic manipulation: A case study with dielectric elastomer actuators
Authors:
Manu Lahariya,
Craig Innes,
Chris Develder,
Subramanian Ramamoorthy
Abstract:
Soft actuators offer a safe, adaptable approach to tasks like gentle grasping and dexterous manipulation. Creating accurate models to control such systems however is challenging due to the complex physics of deformable materials. Accurate Finite Element Method (FEM) models incur prohibitive computational complexity for closed-loop use. Using a differentiable simulator is an attractive alternative,…
▽ More
Soft actuators offer a safe, adaptable approach to tasks like gentle grasping and dexterous manipulation. Creating accurate models to control such systems however is challenging due to the complex physics of deformable materials. Accurate Finite Element Method (FEM) models incur prohibitive computational complexity for closed-loop use. Using a differentiable simulator is an attractive alternative, but their applicability to soft actuators and deformable materials remains underexplored. This paper presents a framework that combines the advantages of both. We learn a differentiable model consisting of a material properties neural network and an analytical dynamics model of the remainder of the manipulation task. This physics-informed model is trained using data generated from FEM, and can be used for closed-loop control and inference. We evaluate our framework on a dielectric elastomer actuator (DEA) coin-pulling task. We simulate the task of using DEA to pull a coin along a surface with frictional contact, using FEM, and evaluate the physics-informed model for simulation, control, and inference. Our model attains < 5% simulation error compared to FEM, and we use it as the basis for an MPC controller that requires fewer iterations to converge than model-free actor-critic, PD, and heuristic policies.
△ Less
Submitted 16 July, 2022; v1 submitted 25 February, 2022;
originally announced February 2022.
-
Physics Informed Neural Networks for Control Oriented Thermal Modeling of Buildings
Authors:
Gargya Gokhale,
Bert Claessens,
Chris Develder
Abstract:
This paper presents a data-driven modeling approach for developing control-oriented thermal models of buildings. These models are developed with the objective of reducing energy consumption costs while controlling the indoor temperature of the building within required comfort limits. To combine the interpretability of white/gray box physics models and the expressive power of neural networks, we pr…
▽ More
This paper presents a data-driven modeling approach for developing control-oriented thermal models of buildings. These models are developed with the objective of reducing energy consumption costs while controlling the indoor temperature of the building within required comfort limits. To combine the interpretability of white/gray box physics models and the expressive power of neural networks, we propose a physics informed neural network approach for this modeling task. Along with measured data and building parameters, we encode the neural networks with the underlying physics that governs the thermal behavior of these buildings. Thus, realizing a model that is guided by physics, aids in modeling the temporal evolution of room temperature and power consumption as well as the hidden state, i.e., the temperature of building thermal mass for subsequent time steps. The main research contributions of this work are: (1) we propose two variants of physics informed neural network architectures for the task of control-oriented thermal modeling of buildings, (2) we show that training these architectures is data-efficient, requiring less training data compared to conventional, non-physics informed neural networks, and (3) we show that these architectures achieve more accurate predictions than conventional neural networks for longer prediction horizons. We test the prediction performance of the proposed architectures using simulated and real-word data to demonstrate (2) and (3) and show that the proposed physics informed neural network architectures can be used for this control-oriented modeling problem.
△ Less
Submitted 21 March, 2022; v1 submitted 23 November, 2021;
originally announced November 2021.
-
JobBERT: Understanding Job Titles through Skills
Authors:
Jens-Joris Decorte,
Jeroen Van Hautte,
Thomas Demeester,
Chris Develder
Abstract:
Job titles form a cornerstone of today's human resources (HR) processes. Within online recruitment, they allow candidates to understand the contents of a vacancy at a glance, while internal HR departments use them to organize and structure many of their processes. As job titles are a compact, convenient, and readily available data source, modeling them with high accuracy can greatly benefit many H…
▽ More
Job titles form a cornerstone of today's human resources (HR) processes. Within online recruitment, they allow candidates to understand the contents of a vacancy at a glance, while internal HR departments use them to organize and structure many of their processes. As job titles are a compact, convenient, and readily available data source, modeling them with high accuracy can greatly benefit many HR tech applications. In this paper, we propose a neural representation model for job titles, by augmenting a pre-trained language model with co-occurrence information from skill labels extracted from vacancies. Our JobBERT method leads to considerable improvements compared to using generic sentence encoders, for the task of job title normalization, for which we release a new evaluation benchmark.
△ Less
Submitted 20 September, 2021;
originally announced September 2021.
-
Towards Consistent Document-level Entity Linking: Joint Models for Entity Linking and Coreference Resolution
Authors:
Klim Zaporojets,
Johannes Deleu,
Yiwei Jiang,
Thomas Demeester,
Chris Develder
Abstract:
We consider the task of document-level entity linking (EL), where it is important to make consistent decisions for entity mentions over the full document jointly. We aim to leverage explicit "connections" among mentions within the document itself: we propose to join the EL task with that of coreference resolution (coref). This is complementary to related works that exploit either (i) implicit docu…
▽ More
We consider the task of document-level entity linking (EL), where it is important to make consistent decisions for entity mentions over the full document jointly. We aim to leverage explicit "connections" among mentions within the document itself: we propose to join the EL task with that of coreference resolution (coref). This is complementary to related works that exploit either (i) implicit document information (e.g., latent relations among entity mentions, or general language models) or (ii) connections between the candidate links (e.g, as inferred from the external knowledge base). Specifically, we cluster mentions that are linked via coreference, and enforce a single EL for all of the clustered mentions together. The latter constraint has the added benefit of increased coverage by joining EL candidate lists for the thus clustered mentions. We formulate the coref+EL problem as a structured prediction task over directed trees and use a globally normalized model to solve it. Experimental results on two datasets show a boost of up to +5% F1-score on both coref and EL tasks, compared to their standalone counterparts. For a subset of hard cases, with individual mentions lacking the correct EL in their candidate entity list, we obtain a +50% increase in accuracy.
△ Less
Submitted 1 July, 2022; v1 submitted 30 August, 2021;
originally announced August 2021.
-
Injecting Knowledge Base Information into End-to-End Joint Entity and Relation Extraction and Coreference Resolution
Authors:
Severine Verlinden,
Klim Zaporojets,
Johannes Deleu,
Thomas Demeester,
Chris Develder
Abstract:
We consider a joint information extraction (IE) model, solving named entity recognition, coreference resolution and relation extraction jointly over the whole document. In particular, we study how to inject information from a knowledge base (KB) in such IE model, based on unsupervised entity linking. The used KB entity representations are learned from either (i) hyperlinked text documents (Wikiped…
▽ More
We consider a joint information extraction (IE) model, solving named entity recognition, coreference resolution and relation extraction jointly over the whole document. In particular, we study how to inject information from a knowledge base (KB) in such IE model, based on unsupervised entity linking. The used KB entity representations are learned from either (i) hyperlinked text documents (Wikipedia), or (ii) a knowledge graph (Wikidata), and appear complementary in raising IE performance. Representations of corresponding entity linking (EL) candidates are added to text span representations of the input document, and we experiment with (i) taking a weighted average of the EL candidate representations based on their prior (in Wikipedia), and (ii) using an attention scheme over the EL candidate list. Results demonstrate an increase of up to 5% F1-score for the evaluated IE tasks on two datasets. Despite a strong performance of the prior-based model, our quantitative and qualitative analysis reveals the advantage of using the attention-based approach.
△ Less
Submitted 5 July, 2021;
originally announced July 2021.
-
A Million Tweets Are Worth a Few Points: Tuning Transformers for Customer Service Tasks
Authors:
Amir Hadifar,
Sofie Labat,
Véronique Hoste,
Chris Develder,
Thomas Demeester
Abstract:
In online domain-specific customer service applications, many companies struggle to deploy advanced NLP models successfully, due to the limited availability of and noise in their datasets. While prior research demonstrated the potential of migrating large open-domain pretrained models for domain-specific tasks, the appropriate (pre)training strategies have not yet been rigorously evaluated in such…
▽ More
In online domain-specific customer service applications, many companies struggle to deploy advanced NLP models successfully, due to the limited availability of and noise in their datasets. While prior research demonstrated the potential of migrating large open-domain pretrained models for domain-specific tasks, the appropriate (pre)training strategies have not yet been rigorously evaluated in such social media customer service settings, especially under multilingual conditions. We address this gap by collecting a multilingual social media corpus containing customer service conversations (865k tweets), comparing various pipelines of pretraining and finetuning approaches, applying them on 5 different end tasks. We show that pretraining a generic multilingual transformer model on our in-domain dataset, before finetuning on specific end tasks, consistently boosts performance, especially in non-English settings.
△ Less
Submitted 16 April, 2021;
originally announced April 2021.
-
A Simple Geometric Method for Cross-Lingual Linguistic Transformations with Pre-trained Autoencoders
Authors:
Maarten De Raedt,
Fréderic Godin,
Pieter Buteneers,
Chris Develder,
Thomas Demeester
Abstract:
Powerful sentence encoders trained for multiple languages are on the rise. These systems are capable of embedding a wide range of linguistic properties into vector representations. While explicit probing tasks can be used to verify the presence of specific linguistic properties, it is unclear whether the vector representations can be manipulated to indirectly steer such properties. For efficient l…
▽ More
Powerful sentence encoders trained for multiple languages are on the rise. These systems are capable of embedding a wide range of linguistic properties into vector representations. While explicit probing tasks can be used to verify the presence of specific linguistic properties, it is unclear whether the vector representations can be manipulated to indirectly steer such properties. For efficient learning, we investigate the use of a geometric mapping in embedding space to transform linguistic properties, without any tuning of the pre-trained sentence encoder or decoder. We validate our approach on three linguistic properties using a pre-trained multilingual autoencoder and analyze the results in both monolingual and cross-lingual settings.
△ Less
Submitted 21 September, 2021; v1 submitted 8 April, 2021;
originally announced April 2021.
-
DWIE: an entity-centric dataset for multi-task document-level information extraction
Authors:
Klim Zaporojets,
Johannes Deleu,
Chris Develder,
Thomas Demeester
Abstract:
This paper presents DWIE, the 'Deutsche Welle corpus for Information Extraction', a newly created multi-task dataset that combines four main Information Extraction (IE) annotation subtasks: (i) Named Entity Recognition (NER), (ii) Coreference Resolution, (iii) Relation Extraction (RE), and (iv) Entity Linking. DWIE is conceived as an entity-centric dataset that describes interactions and propertie…
▽ More
This paper presents DWIE, the 'Deutsche Welle corpus for Information Extraction', a newly created multi-task dataset that combines four main Information Extraction (IE) annotation subtasks: (i) Named Entity Recognition (NER), (ii) Coreference Resolution, (iii) Relation Extraction (RE), and (iv) Entity Linking. DWIE is conceived as an entity-centric dataset that describes interactions and properties of conceptual entities on the level of the complete document. This contrasts with currently dominant mention-driven approaches that start from the detection and classification of named entity mentions in individual sentences. Further, DWIE presented two main challenges when building and evaluating IE models for it. First, the use of traditional mention-level evaluation metrics for NER and RE tasks on entity-centric DWIE dataset can result in measurements dominated by predictions on more frequently mentioned entities. We tackle this issue by proposing a new entity-driven metric that takes into account the number of mentions that compose each of the predicted and ground truth entities. Second, the document-level multi-task annotations require the models to transfer information between entity mentions located in different parts of the document, as well as between different tasks, in a joint learning setting. To realize this, we propose to use graph-based neural message passing techniques between document-level mention spans. Our experiments show an improvement of up to 5.5 F1 percentage points when incorporating neural graph propagation into our joint model. This demonstrates DWIE's potential to stimulate further research in graph neural networks for representation learning in multi-task IE. We make DWIE publicly available at https://github.com/klimzaporojets/DWIE.
△ Less
Submitted 9 March, 2021; v1 submitted 26 September, 2020;
originally announced September 2020.
-
Solving Arithmetic Word Problems by Scoring Equations with Recursive Neural Networks
Authors:
Klim Zaporojets,
Giannis Bekoulis,
Johannes Deleu,
Thomas Demeester,
Chris Develder
Abstract:
Solving arithmetic word problems is a cornerstone task in assessing language understanding and reasoning capabilities in NLP systems. Recent works use automatic extraction and ranking of candidate solution equations providing the answer to arithmetic word problems. In this work, we explore novel approaches to score such candidate solution equations using tree-structured recursive neural network (T…
▽ More
Solving arithmetic word problems is a cornerstone task in assessing language understanding and reasoning capabilities in NLP systems. Recent works use automatic extraction and ranking of candidate solution equations providing the answer to arithmetic word problems. In this work, we explore novel approaches to score such candidate solution equations using tree-structured recursive neural network (Tree-RNN) configurations. The advantage of this Tree-RNN approach over using more established sequential representations, is that it can naturally capture the structure of the equations. Our proposed method consists of transforming the mathematical expression of the equation into an expression tree. Further, we encode this tree into a Tree-RNN by using different Tree-LSTM architectures. Experimental results show that our proposed method (i) improves overall performance with more than 3% accuracy points compared to previous state-of-the-art, and with over 15% points on a subset of problems that require more complex reasoning, and (ii) outperforms sequential LSTMs by 4% accuracy points on such more complex problems.
△ Less
Submitted 9 March, 2021; v1 submitted 11 September, 2020;
originally announced September 2020.
-
Block-wise Dynamic Sparseness
Authors:
Amir Hadifar,
Johannes Deleu,
Chris Develder,
Thomas Demeester
Abstract:
Neural networks have achieved state of the art performance across a wide variety of machine learning tasks, often with large and computation-heavy models. Inducing sparseness as a way to reduce the memory and computation footprint of these models has seen significant research attention in recent years. In this paper, we present a new method for \emph{dynamic sparseness}, whereby part of the comput…
▽ More
Neural networks have achieved state of the art performance across a wide variety of machine learning tasks, often with large and computation-heavy models. Inducing sparseness as a way to reduce the memory and computation footprint of these models has seen significant research attention in recent years. In this paper, we present a new method for \emph{dynamic sparseness}, whereby part of the computations are omitted dynamically, based on the input. For efficiency, we combined the idea of dynamic sparseness with block-wise matrix-vector multiplications. In contrast to static sparseness, which permanently zeroes out selected positions in weight matrices, our method preserves the full network capabilities by potentially accessing any trained weights. Yet, matrix vector multiplications are accelerated by omitting a pre-defined fraction of weight blocks from the matrix, based on the input. Experimental results on the task of language modeling, using recurrent and quasi-recurrent models, show that the proposed method can outperform a magnitude-based static sparseness baseline. In addition, our method achieves similar language modeling perplexities as the dense baseline, at half the computational cost at inference time.
△ Less
Submitted 14 January, 2020;
originally announced January 2020.
-
Sub-event detection from Twitter streams as a sequence labeling problem
Authors:
Giannis Bekoulis,
Johannes Deleu,
Thomas Demeester,
Chris Develder
Abstract:
This paper introduces improved methods for sub-event detection in social media streams, by applying neural sequence models not only on the level of individual posts, but also directly on the stream level. Current approaches to identify sub-events within a given event, such as a goal during a soccer match, essentially do not exploit the sequential nature of social media streams. We address this sho…
▽ More
This paper introduces improved methods for sub-event detection in social media streams, by applying neural sequence models not only on the level of individual posts, but also directly on the stream level. Current approaches to identify sub-events within a given event, such as a goal during a soccer match, essentially do not exploit the sequential nature of social media streams. We address this shortcoming by framing the sub-event detection problem in social media streams as a sequence labeling task and adopt a neural sequence architecture that explicitly accounts for the chronological order of posts. Specifically, we (i) establish a neural baseline that outperforms a graph-based state-of-the-art method for binary sub-event detection (2.7% micro-F1 improvement), as well as (ii) demonstrate superiority of a recurrent neural network model on the posts sequence level for labeled sub-events (2.4% bin-level F1 improvement over non-sequential models).
△ Less
Submitted 13 March, 2019;
originally announced March 2019.
-
Definition and evaluation of model-free coordination of electrical vehicle charging with reinforcement learning
Authors:
Nasrin Sadeghianpourhamami,
Johannes Deleu,
Chris Develder
Abstract:
Initial DR studies mainly adopt model predictive control and thus require accurate models of the control problem (e.g., a customer behavior model), which are to a large extent uncertain for the EV scenario. Hence, model-free approaches, especially based on reinforcement learning (RL) are an attractive alternative. In this paper, we propose a new Markov decision process (MDP) formulation in the RL…
▽ More
Initial DR studies mainly adopt model predictive control and thus require accurate models of the control problem (e.g., a customer behavior model), which are to a large extent uncertain for the EV scenario. Hence, model-free approaches, especially based on reinforcement learning (RL) are an attractive alternative. In this paper, we propose a new Markov decision process (MDP) formulation in the RL framework, to jointly coordinate a set of EV charging stations. State-of-the-art algorithms either focus on a single EV, or perform the control of an aggregate of EVs in multiple steps (e.g., aggregate load decisions in one step, then a step translating the aggregate decision to individual connected EVs). On the contrary, we propose an RL approach to jointly control the whole set of EVs at once. We contribute a new MDP formulation, with a scalable state representation that is independent of the number of EV charging stations. Further, we use a batch reinforcement learning algorithm, i.e., an instance of fitted Q-iteration, to learn the optimal charging policy. We analyze its performance using simulation experiments based on a real-world EV charging data. More specifically, we (i) explore the various settings in training the RL policy (e.g., duration of the period with training data), (ii) compare its performance to an oracle all-knowing benchmark (which provides an upper bound for performance, relying on information that is not available or at least imperfect in practice), (iii) analyze performance over time, over the course of a full year to evaluate possible performance fluctuations (e.g, across different seasons), and (iv) demonstrate the generalization capacity of a learned control policy to larger sets of charging stations.
△ Less
Submitted 28 November, 2018; v1 submitted 27 September, 2018;
originally announced September 2018.
-
Predefined Sparseness in Recurrent Sequence Models
Authors:
Thomas Demeester,
Johannes Deleu,
Fréderic Godin,
Chris Develder
Abstract:
Inducing sparseness while training neural networks has been shown to yield models with a lower memory footprint but similar effectiveness to dense models. However, sparseness is typically induced starting from a dense model, and thus this advantage does not hold during training. We propose techniques to enforce sparseness upfront in recurrent sequence models for NLP applications, to also benefit t…
▽ More
Inducing sparseness while training neural networks has been shown to yield models with a lower memory footprint but similar effectiveness to dense models. However, sparseness is typically induced starting from a dense model, and thus this advantage does not hold during training. We propose techniques to enforce sparseness upfront in recurrent sequence models for NLP applications, to also benefit training. First, in language modeling, we show how to increase hidden state sizes in recurrent layers without increasing the number of parameters, leading to more expressive models. Second, for sequence labeling, we show that word embeddings with predefined sparseness lead to similar performance as dense embeddings, at a fraction of the number of trainable parameters.
△ Less
Submitted 27 August, 2018;
originally announced August 2018.
-
Adversarial training for multi-context joint entity and relation extraction
Authors:
Giannis Bekoulis,
Johannes Deleu,
Thomas Demeester,
Chris Develder
Abstract:
Adversarial training (AT) is a regularization method that can be used to improve the robustness of neural network methods by adding small perturbations in the training data. We show how to use AT for the tasks of entity recognition and relation extraction. In particular, we demonstrate that applying AT to a general purpose baseline model for jointly extracting entities and relations, allows improv…
▽ More
Adversarial training (AT) is a regularization method that can be used to improve the robustness of neural network methods by adding small perturbations in the training data. We show how to use AT for the tasks of entity recognition and relation extraction. In particular, we demonstrate that applying AT to a general purpose baseline model for jointly extracting entities and relations, allows improving the state-of-the-art effectiveness on several datasets in different contexts (i.e., news, biomedical, and real estate data) and for different languages (English and Dutch).
△ Less
Submitted 14 January, 2019; v1 submitted 21 August, 2018;
originally announced August 2018.
-
Prior Attention for Style-aware Sequence-to-Sequence Models
Authors:
Lucas Sterckx,
Johannes Deleu,
Chris Develder,
Thomas Demeester
Abstract:
We extend sequence-to-sequence models with the possibility to control the characteristics or style of the generated output, via attention that is generated a priori (before decoding) from a latent code vector. After training an initial attention-based sequence-to-sequence model, we use a variational auto-encoder conditioned on representations of input sequences and a latent code vector space to ge…
▽ More
We extend sequence-to-sequence models with the possibility to control the characteristics or style of the generated output, via attention that is generated a priori (before decoding) from a latent code vector. After training an initial attention-based sequence-to-sequence model, we use a variational auto-encoder conditioned on representations of input sequences and a latent code vector space to generate attention matrices. By sampling the code vector from specific regions of this latent space during decoding and imposing prior attention generated from it in the seq2seq model, output can be steered towards having certain attributes. This is demonstrated for the task of sentence simplification, where the latent code vector allows control over output length and lexical simplification, and enables fine-tuning to optimize for different evaluation metrics.
△ Less
Submitted 25 June, 2018;
originally announced June 2018.
-
Joint entity recognition and relation extraction as a multi-head selection problem
Authors:
Giannis Bekoulis,
Johannes Deleu,
Thomas Demeester,
Chris Develder
Abstract:
State-of-the-art models for joint entity recognition and relation extraction strongly rely on external natural language processing (NLP) tools such as POS (part-of-speech) taggers and dependency parsers. Thus, the performance of such joint models depends on the quality of the features obtained from these NLP tools. However, these features are not always accurate for various languages and contexts.…
▽ More
State-of-the-art models for joint entity recognition and relation extraction strongly rely on external natural language processing (NLP) tools such as POS (part-of-speech) taggers and dependency parsers. Thus, the performance of such joint models depends on the quality of the features obtained from these NLP tools. However, these features are not always accurate for various languages and contexts. In this paper, we propose a joint neural model which performs entity recognition and relation extraction simultaneously, without the need of any manually extracted features or the use of any external tool. Specifically, we model the entity recognition task using a CRF (Conditional Random Fields) layer and the relation extraction task as a multi-head selection problem (i.e., potentially identify multiple relations for each entity). We present an extensive experimental setup, to demonstrate the effectiveness of our method using datasets from various contexts (i.e., news, biomedical, real estate) and languages (i.e., English, Dutch). Our model outperforms the previous neural models that use automatically extracted features, while it performs within a reasonable margin of feature-based neural models, or even beats them.
△ Less
Submitted 17 December, 2018; v1 submitted 20 April, 2018;
originally announced April 2018.
-
An attentive neural architecture for joint segmentation and parsing and its application to real estate ads
Authors:
Giannis Bekoulis,
Johannes Deleu,
Thomas Demeester,
Chris Develder
Abstract:
In processing human produced text using natural language processing (NLP) techniques, two fundamental subtasks that arise are (i) segmentation of the plain text into meaningful subunits (e.g., entities), and (ii) dependency parsing, to establish relations between subunits. In this paper, we develop a relatively simple and effective neural joint model that performs both segmentation and dependency…
▽ More
In processing human produced text using natural language processing (NLP) techniques, two fundamental subtasks that arise are (i) segmentation of the plain text into meaningful subunits (e.g., entities), and (ii) dependency parsing, to establish relations between subunits. In this paper, we develop a relatively simple and effective neural joint model that performs both segmentation and dependency parsing together, instead of one after the other as in most state-of-the-art works. We will focus in particular on the real estate ad setting, aiming to convert an ad to a structured description, which we name property tree, comprising the tasks of (1) identifying important entities of a property (e.g., rooms) from classifieds and (2) structuring them into a tree format. In this work, we propose a new joint model that is able to tackle the two tasks simultaneously and construct the property tree by (i) avoiding the error propagation that would arise from the subtasks one after the other in a pipelined fashion, and (ii) exploiting the interactions between the subtasks. For this purpose, we perform an extensive comparative study of the pipeline methods and the new proposed joint model, reporting an improvement of over three percentage points in the overall edge F1 score of the property tree. Also, we propose attention methods, to encourage our model to focus on salient tokens during the construction of the property tree. Thus we experimentally demonstrate the usefulness of attentive neural architectures for the proposed joint model, showcasing a further improvement of two percentage points in edge F1 score for our application.
△ Less
Submitted 19 March, 2018; v1 submitted 27 September, 2017;
originally announced September 2017.
-
Break it Down for Me: A Study in Automated Lyric Annotation
Authors:
Lucas Sterckx,
Jason Naradowsky,
Bill Byrne,
Thomas Demeester,
Chris Develder
Abstract:
Comprehending lyrics, as found in songs and poems, can pose a challenge to human and machine readers alike. This motivates the need for systems that can understand the ambiguity and jargon found in such creative texts, and provide commentary to aid readers in reaching the correct interpretation. We introduce the task of automated lyric annotation (ALA). Like text simplification, a goal of ALA is t…
▽ More
Comprehending lyrics, as found in songs and poems, can pose a challenge to human and machine readers alike. This motivates the need for systems that can understand the ambiguity and jargon found in such creative texts, and provide commentary to aid readers in reaching the correct interpretation. We introduce the task of automated lyric annotation (ALA). Like text simplification, a goal of ALA is to rephrase the original text in a more easily understandable manner. However, in ALA the system must often include additional information to clarify niche terminology and abstract concepts. To stimulate research on this task, we release a large collection of crowdsourced annotations for song lyrics. We analyze the performance of translation and retrieval models on this task, measuring performance with both automated and human evaluation. We find that each model captures a unique type of information important to the task.
△ Less
Submitted 11 August, 2017;
originally announced August 2017.
-
Predicting Relevance based on Assessor Disagreement: Analysis and Practical Applications for Search Evaluation
Authors:
Thomas Demeester,
Robin Aly,
Djoerd Hiemstra,
Dong Nguyen,
Chris Develder
Abstract:
Evaluation of search engines relies on assessments of search results for selected test queries, from which we would ideally like to draw conclusions in terms of relevance of the results for general (e.g., future, unknown) users. In practice however, most evaluation scenarios only allow us to conclusively determine the relevance towards the particular assessor that provided the judgments. A factor…
▽ More
Evaluation of search engines relies on assessments of search results for selected test queries, from which we would ideally like to draw conclusions in terms of relevance of the results for general (e.g., future, unknown) users. In practice however, most evaluation scenarios only allow us to conclusively determine the relevance towards the particular assessor that provided the judgments. A factor that cannot be ignored when extending conclusions made from assessors towards users, is the possible disagreement on relevance, assuming that a single gold truth label does not exist. This paper presents and analyzes the Predicted Relevance Model (PRM), which allows predicting a particular result's relevance for a random user, based on an observed assessment and knowledge on the average disagreement between assessors. With the PRM, existing evaluation metrics designed to measure binary assessor relevance, can be transformed into more robust and effectively graded measures that evaluate relevance towards a random user. It also leads to a principled way of quantifying multiple graded or categorical relevance levels for use as gains in established graded relevance measures, such as normalized discounted cumulative gain (nDCG), which nowadays often use heuristic and data-independent gain values. Given a set of test topics with graded relevance judgments, the PRM allows evaluating systems on different scenarios, such as their capability of retrieving top results, or how well they are able to filter out non-relevant ones. Its use in actual evaluation scenarios is illustrated on several information retrieval test collections.
△ Less
Submitted 23 November, 2015;
originally announced November 2015.
-
Knowledge Base Population using Semantic Label Propagation
Authors:
Lucas Sterckx,
Thomas Demeester,
Johannes Deleu,
Chris Develder
Abstract:
A crucial aspect of a knowledge base population system that extracts new facts from text corpora, is the generation of training data for its relation extractors. In this paper, we present a method that maximizes the effectiveness of newly trained relation extractors at a minimal annotation cost. Manual labeling can be significantly reduced by Distant Supervision, which is a method to construct tra…
▽ More
A crucial aspect of a knowledge base population system that extracts new facts from text corpora, is the generation of training data for its relation extractors. In this paper, we present a method that maximizes the effectiveness of newly trained relation extractors at a minimal annotation cost. Manual labeling can be significantly reduced by Distant Supervision, which is a method to construct training data automatically by aligning a large text corpus with an existing knowledge base of known facts. For example, all sentences mentioning both 'Barack Obama' and 'US' may serve as positive training instances for the relation born_in(subject,object). However, distant supervision typically results in a highly noisy training set: many training sentences do not really express the intended relation. We propose to combine distant supervision with minimal manual supervision in a technique called feature labeling, to eliminate noise from the large and noisy initial training set, resulting in a significant increase of precision. We further improve on this approach by introducing the Semantic Label Propagation method, which uses the similarity between low-dimensional representations of candidate training instances, to extend the training set in order to increase recall while maintaining high precision. Our proposed strategy for generating training data is studied and evaluated on an established test collection designed for knowledge base population tasks. The experimental results show that the Semantic Label Propagation strategy leads to substantial performance gains when compared to existing approaches, while requiring an almost negligible manual annotation effort.
△ Less
Submitted 3 March, 2016; v1 submitted 19 November, 2015;
originally announced November 2015.