Active Learning for Reducing Labeling Effort in Text Classification Tasks

  • Conference paper
  • First Online:
Artificial Intelligence and Machine Learning (BNAIC/Benelearn 2021)


Labeling data can be an expensive task as it is usually performed manually by domain experts. This is cumbersome for deep learning, as it is dependent on large labeled datasets. Active learning (AL) is a paradigm that aims to reduce labeling effort by only using the data which the used model deems most informative. Little research has been done on AL in a text classification setting and next to none has involved the more recent, state-of-the-art Natural Language Processing (NLP) models. Here, we present an empirical study that compares different uncertainty-based algorithms with BERT\(_{base}\) as the used classifier. We evaluate the algorithms on two NLP classification datasets: Stanford Sentiment Treebank and KvK-Frontpages. Additionally, we explore heuristics that aim to solve presupposed problems of uncertainty-based AL; namely, that it is unscalable and that it is prone to selecting outliers. Furthermore, we explore the influence of the query-pool size on the performance of AL. Whereas it was found that the proposed heuristics for AL did not improve performance of AL; our results show that using uncertainty-based AL with BERT\(_{base}\) outperforms random sampling of data. This difference in performance can decrease as the query-pool size gets larger.

  1. 1.

    For our experiments, this resulted in our n ranging from 20 to 191 for the SST dataset and from 17 to 152 for the KvK dataset (the used q can be found in Sect. 3.5).

  2. 2.

    Larger values up to 100 were tested, but induced much larger training times without noteworthy performance gains.


We would like to express our thanks and gratitude to the people at Dialogic (Utrecht) of which Nick Jelicic in particular, for the useful advice on the writing style of the paper and the suggested improvements for the source code.

Author information

Authors and Affiliations


Corresponding authors

Correspondence to Pieter Floris Jacobs , Gideon Maillette de Buy Wenniger , Marco Wiering or Lambert Schomaker .

Editor information

Editors and Affiliations



1.1 A.1 RET Algorithm Computational Cost Analysis

The number of forward passes required by the RET algorithm depends on two factors:

  1. 1.

    Basic passes: The forward passes required by the “normal” computation of uncertainty at the beginning of the computation for every query-pool.

  2. 2.

    RP passes: The forward passed required for intermediate updates, using the redundancy pool RP.

In this analysis we will assume that the size of the redundancy pool \(|\mathcal {RP}|\) is chosen as a factor \(f > 1\) of the size of the query-pool q. A reasonable assumption, considering that making \(|\mathcal {RP}|\) larger than needed incurs unnecessary computational cost, whereas a too small value is expected to diminish the effect of the RET algorithm. We furthermore notice that given this assumption, and assuming a fixed total number of examples to label, there are two factors influencing the required amount of RP passes:

  • Linearly increasing the query-pool size and coupled redundancy pool size causes a quadratic increase in the number of required forward passes per query pool round.

  • At the same time, a linearly increased query-pool size also induces a corresponding linear decrease in the number of required query-pool rounds.

We will see that these two factors will cause a net linear contribution to the number of RP passes starts causing a net increase of total passes once the query-size comes above a certain value. Looking at (1) more precisely, the amount of passes over \(\mathcal {RP}\) that needs to be performed per query-pool round can be computed as an arithmetic progression:

$$\begin{aligned} |\mathcal {RP}| + (|\mathcal {RP}| - 1) + (|\mathcal {RP}| - 2) + \ldots + (|\mathcal {RP} - q) \end{aligned}$$
$$\begin{aligned} = \frac{1}{2} \times (q + 1) \times (|\mathcal {RP}| + |\mathcal {RP}| - q ) \end{aligned}$$
$$\begin{aligned} = \frac{1}{2} \times (q + 1) \times ((2f - 1) \times q) \end{aligned}$$
$$\begin{aligned} = \frac{1}{2} \times (q + 1) \times f' \times q) \end{aligned}$$
$$\begin{aligned} = \frac{1}{2} \times f' \times (q^2 + q)) \end{aligned}$$

Let’s assume we use \(f = 1.5\) (as also used in our experiments), and consequently, \(f' = 2f - 1 = 2\). The number of forward passes over \(\mathcal {RP}\) then becomes exactly \(q^2 + q\).

The complexity can then be expressed by the following formula:


This can be approximately rewritten as:

$$\begin{aligned} T \times \#\text {Samples} \times (\frac{|\text {data}|}{q} + \frac{q^2 + q}{\text {query-pool}}) \end{aligned}$$
$$\begin{aligned} = T \times \#\text {Samples} \times (\frac{|\text {data}|}{q} + q + 1) \end{aligned}$$

Note that the second term \(\text {query-pool-size} + 1\) only starts dominating the number of forward passes in this formula as soon as:

$$q + 1 \approx q > \frac{|\text {data}|}{q} $$

This is the case when

$$q > \sqrt{(}|\text {data}|)$$

Until then, the computational gains of less basic passes outweighs the cost of more RP passes. In practice though, this may happen fairly quickly. For example, assuming we have a data size of 10000 examples, and we use as mentioned \(q = 1.5 |\times \mathcal {RP}|\), then as soon as \(q \ge 100\) the increased computation of the RP passes starts dominating the gains made by less basic passes when further increasing the query-pool size, and the net effect is that the total amount of computation increases.

In summary, for the RET algorithm, RP passes contribute to the total amount of forward passes. Furthermore, this contribution increases linearly with redundancy-pool size and coupled query-pool size, and starts to dominate the total amount of forward passes once \(\text {redundancy-pool-size} > \sqrt{\text {data-size}}\). This limits its use for decreasing computation by increasing the query-pool size.

1.2 A.2 Algorithms

figure a
figure b
figure c
figure d
figure e

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jacobs, P.F., Maillette de Buy Wenniger, G., Wiering, M., Schomaker, L. (2022). Active Learning for Reducing Labeling Effort in Text Classification Tasks. In: Leiva, L.A., Pruski, C., Markovich, R., Najjar, A., Schommer, C. (eds) Artificial Intelligence and Machine Learning. BNAIC/Benelearn 2021. Communications in Computer and Information Science, vol 1530. Springer, Cham. https://doi.org/10.1007/978-3-030-93842-0_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-93842-0_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-93841-3

  • Online ISBN: 978-3-030-93842-0

  • eBook Packages: Computer ScienceComputer Science (R0)

