Towards the Terminator Economy: Assessing Job Exposure to AI through LLMs

Emilio Colombo^{1 3}, Fabio Mercorio^{1 2}, Mario Mezzanzanica^{1 2}, Antonio Serino^{1 4}

Abstract

The spread and rapid development of AI-related technologies are influencing many aspects of our daily lives, from social to educational, including the labour market. Many researchers have been highlighting the key role AI and technologies play in reshaping jobs and their related tasks, either by automating or enhancing human capabilities in the workplace. Can we estimate if, and to what extent, jobs and related tasks are exposed to the risk of being automatized by state-of-the-art AI-related technologies? Our work tackles this question through a data-driven approach: (i) developing a reproducible framework that exploits a battery of open-source Large Language Models to assess current AI and robotics’ capabilities in performing job-related tasks; (ii) formalising and computing an AI exposure measure by occupation, namely the TEAI (Task Exposure to AI) index. Our results show that about one-third of U.S. employment is highly exposed to AI, primarily in high-skill jobs (aka, white collars). This exposure correlates positively with employment and wage growth from 2019 to 2023, indicating a beneficial impact of AI on productivity. The source codes and results are publicly available, enabling the whole community to benchmark and track AI and technology capabilities over time.

Introduction

The 1984 famous movie “The Terminator” is set in a dystopian future where intelligent machines, created by a military defence system known as Skynet, become self-aware and perceive humanity as a threat, initiating a war to eliminate humans. Skynet creates advanced humanoid robots called Terminators to hunt down and kill human survivors. The Terminator possesses advanced learning algorithms that enable it to adapt to any environment, making it a formidable antagonist for humans. The debate and concerns about the impact of AI are often conducted against the backdrop of the film’s setting. This paper takes these concerns seriously by developing an AI-centered assessment of the possible exposure of different occupations to artificial intelligence. Assessing the potential impacts of technology on the labour market is not easy, as there are several potential channels at work. As stressed by Acemoglu and Restrepo (2019) technology has three major effects on labour demand. The first is the productivity effects that operate through lower production costs brought about by new technologies. The second is the displacement effect of workers operated by machines and alike. These two effects operate in different directions and depend on whether technology substitutes or complements human labour. Economic jargon depends on the elasticity of substitution between tasks. Moreover, there is another third effect of technology: the creation of new tasks and activities where labour can be productively employed (reinstatement effect). Indeed, if we look at history, the reinstatement effect has been a central feature of all technological revolutions that continuously created new opportunities for labour. For the reinstatement effect to take hold, technology must have a wider impact than its narrow scope, with spillover effects in sectors/areas other than those for which it was designed. In other words, technology must have the features of a general purpose technology, which, according to Lipsey, Carlaw, and Bekar (2005), are pervasiveness across the economy, ability to generate complementary innovations, and improvement over time.

AI, due to its broad applicability, potential productivity gains, and potential for driving further innovation, provides strong arguments for being considered a general-purpose technology. These features of AI, however, create a relevant measurement issue, as it is extremely difficult to identify all the channels through which it affects the economy. This paper contributes to this field by developing a methodology for assessing AI exposure using Large Language Models (LLMs), using a very granular approach that analyses exposure for each task that makes up each occupation.

Contribution. Our main contribution is twofold:

1.

From a methodological point of view, we design and implement a reproducible framework to estimate to what extent existing AI and robotics technologies can perform job-related tasks relying on Large Language Models (LLMs). In a nutshell, instead of assessing AI exposure through external benchmarks such as expert judgment or AI patents and innovations data, we construct an internal assessment using LLM’s own evaluation. To do so, we use O*NET¹¹1O*NET, namely the Occupational Information Network, which is a comprehensive database of detailed information on hundreds of standardized and occupation-specific descriptors. It is sponsored by the U.S. Department of Labor/Employment and Training Administration. O*NET serves as a resource to provide information on skills, abilities, knowledge, work activities, and interests associated with occupations as a reference taxonomy of about 1K occupations and 19K+ job-related tasks.
2.

From an economic perspective, we develop an AI exposure measure for individual tasks reaching a high level of granularity. Then, we show that in the US, approximately 1/3 of employment is highly exposed to AI technologies, the major being high-skill jobs. Finally, we show that AI exposure is positively associated with employment and wage growth in 2019-2023, suggesting a positive effect of AI technologies on productivity.

To allow the community to compare our results and estimate the advances of LLMs capabilities over time, both codes and the enriched O $*$ NET have been made available on GitHub²²2https://github.com/Crisp-Unimib/Terminator-Economy. The remainder of the paper is structured as follows: section Background and Related Works discusses the related literature, section Building the AI Exposure Index describes the methodology and the construction of the AI index, section Experimental Results presents the results; finally section Conclusions, limitations and future extensions concludes.

Background and Related Works

AI and jobs.

Since the seminal paper by Autor, Levy, and Murnane (2003), the task approach has proven to be very effective in analyzing the impact of technology and jobs. It divides work activities into tasks, each of which can be performed by humans or by machines. In this way, the distinction between capital and labour tasks is more precise, flexible, and able to shift over time. In fact, capital and machines can substitute for labour in the performance of a particular task while complementing it in the performance of others.

The task approach has been applied to analyse the effect of technology and trade (offshoring) Acemoglu and Autor (2011), to the long run effect of technology (Consoli et al. 2023) and to skill task interaction (Colombo, Mercorio, and Mezzanzanica 2019).

This approach has been used to measure occupational exposure to computers and robots recently. In a seminal paper Frey and Osborne (2017) estimated that up to 47% of jobs in the US are at risk of automation.³³3See also Nedelkoska and Quintini (2018) for a similar approach. Subsequently, other attempts focused on developing measures of exposure to machine learning and robotics Brynjolfsson and Mitchell (2017); Acemoglu and Restrepo (2020) and to AI Felten, Raj, and Seamans (2021); Webb (2023); Eloundou et al. (2023); Pizzinelli et al. (2023).

Overall all these works find an extensive share of employment exposed to AI; the specific effect on occupations varies depending on the nuances that the different indicators capture about the effect of technology, i.e., whether they focus on aspects of technology that impact more routine-based activities (Frey and Osborne 2017) or more cognitive elements (Felten, Raj, and Seamans 2021). All these works share the attempt to quantify AI exposure through an external benchmark, which may be expert judgment or data analysis on patents and innovations. In contrast, our approach is based on an internal assessment, whereby LLM systems are asked to assess the suitability of tasks for AI. This approach has two major advantages. Firstly, it is fully transparent, with outcomes and results being fully disclosed. Secondly, the approach is entirely reproducible. This implies that when subsequent generations of LLM are available, they can be employed in our approach to measuring the change in task exposure that they imply.

Large Language Models.

LLMs are powerful computational models designed to understand and generate human-like text by leveraging vast amounts of textual data, have taken Natural Language Processing (NLP) by storm, achieving state-of-the-art performance on many tasks (Min et al. 2023). Typically these models are based on Transformer architecture (Vaswani et al. 2017), powered by Attention mechanism (Luong, Pham, and Manning 2015; Bahdanau, Cho, and Bengio 2014) and are composed by decoder-only stack. These models are initially trained on Autoregressive task (Radford et al. 2018), where given a sequence of words $S=(w_{1},w_{2},...,w_{n-1})$ the training objective is to maximize the log-likelihood $\sum_{i}\log P(w_{i}|w_{1},w_{2},\dots,w_{i-1};\theta^{T})$ where $\theta^{T}$ are the model parameters, in order to predict the next word in the sequence $\prod_{i=1}^{n}P(w_{i}\mid w_{1},...,w_{i-1})$ . After being pre-trained, these models are fine-tuned for several tasks, providing examples of Natural Language Inference (Radford et al. 2018). Thanks to their capability to learn from context, known as in-context learning (Radford et al. 2019), LLMs can accomplish specific tasks with high accuracy (Zhao et al. 2023), exploiting prompt engineering methodologies such as zero-shot (Wei et al. 2021) and few-shot learning (Brown et al. 2020).

Building the AI Exposure Index

Our method can be summarised as follows: First, we obtain from O $*$ NET the description of each task associated with each SOC occupation. Second, we apply LLMs to task descriptions to obtain a rating about how well AI technologies can accomplish each task. Third, we aggregate the rating at the occupation level to obtain an AI occupation score. Finally, we apply our score to US data to assess the extent of AI exposure in the US labour market and the effect of AI on employment and wages. Figure 1 provides a graphical representation of our approach.

Refer to caption — Figure 1: Graphical overview of the framework to compute the TEAI Index

Step 1: Compute the AI rate

To obtain the AI rate we propose a methodology driven by LLMs. To avoid the risk of being driven by the LLMs’ well-known problem called ”hallucinations” (Ji et al. 2023), we design a framework involving three different LLMs aiming to identify and limit the false information generated, creating a consensus system between them

Model choice.

To ensure the reproducibility of this work we use three of the best open source models, according to performance benchmarks, available on the open LLM leaderboard.⁴⁴4https://huggingface.co/spaces/HuggingFaceH4/open˙llm˙leaderboard To reduce the lack of computational complexity, we use 7 billion parameter models. The three selected models are Mistral 7B Instruct v 0.2 (Jiang et al. 2023), openchat 3.5 0106 (Wang et al. 2023) and orca mini v3 7b (Mukherjee et al. 2023).

Prompt design.

The starting point is the O^∗NET taxonomy which identifies 19281 tasks for 923 SOC occupations.⁵⁵5We use the O^∗NET 28.2 version released in February 2024. We formulate a five-shot prompt using the few-shot learning approach (Brown et al. 2020). We use each individual task description assigned to an occupation to ask the models how well, on a scale of 1 to 5,⁶⁶6How well an AI system, which can be an LLM, Image Processing System or a Robot, could perform in the task on a scale of 1 to 5 where 1 stands for poor and 5 stands for excellent? the combination of different AI technologies could perform the input task and a discursive motivation for the evaluation. As AI technologies, we consider i) LLMs for textual data understanding, ii) Image Processing Systems for elaboration and decision-making based on visual data analysis, and iii) Robotic systems for physical execution. At the end of the prompt, we provide the model with five examples of this task to obtain more contextual and accurate results.

To automate the methodology, the models are asked to return the result in a list format, with the rate as the first element and the motivation as the second element.

Consensus System.

We iterate this process for each task provided by O^∗NET and for each model chosen, ending up with three scores and natural language motivations provided by each model. Table Consensus System. provides an example of the results after this stage for a selection of occupations and tasks.