Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Towards the Terminator Economy: Assessing Job Exposure to AI through LLMs

Emilio Colombo1 3, Fabio Mercorio1 2, Mario Mezzanzanica1 2, Antonio Serino1 4
Abstract

The spread and rapid development of AI-related technologies are influencing many aspects of our daily lives, from social to educational, including the labour market. Many researchers have been highlighting the key role AI and technologies play in reshaping jobs and their related tasks, either by automating or enhancing human capabilities in the workplace. Can we estimate if, and to what extent, jobs and related tasks are exposed to the risk of being automatized by state-of-the-art AI-related technologies? Our work tackles this question through a data-driven approach: (i) developing a reproducible framework that exploits a battery of open-source Large Language Models to assess current AI and robotics’ capabilities in performing job-related tasks; (ii) formalising and computing an AI exposure measure by occupation, namely the TEAI (Task Exposure to AI) index. Our results show that about one-third of U.S. employment is highly exposed to AI, primarily in high-skill jobs (aka, white collars). This exposure correlates positively with employment and wage growth from 2019 to 2023, indicating a beneficial impact of AI on productivity. The source codes and results are publicly available, enabling the whole community to benchmark and track AI and technology capabilities over time.

Introduction

The 1984 famous movie “The Terminator” is set in a dystopian future where intelligent machines, created by a military defence system known as Skynet, become self-aware and perceive humanity as a threat, initiating a war to eliminate humans. Skynet creates advanced humanoid robots called Terminators to hunt down and kill human survivors. The Terminator possesses advanced learning algorithms that enable it to adapt to any environment, making it a formidable antagonist for humans. The debate and concerns about the impact of AI are often conducted against the backdrop of the film’s setting. This paper takes these concerns seriously by developing an AI-centered assessment of the possible exposure of different occupations to artificial intelligence. Assessing the potential impacts of technology on the labour market is not easy, as there are several potential channels at work. As stressed by Acemoglu and Restrepo (2019) technology has three major effects on labour demand. The first is the productivity effects that operate through lower production costs brought about by new technologies. The second is the displacement effect of workers operated by machines and alike. These two effects operate in different directions and depend on whether technology substitutes or complements human labour. Economic jargon depends on the elasticity of substitution between tasks. Moreover, there is another third effect of technology: the creation of new tasks and activities where labour can be productively employed (reinstatement effect). Indeed, if we look at history, the reinstatement effect has been a central feature of all technological revolutions that continuously created new opportunities for labour. For the reinstatement effect to take hold, technology must have a wider impact than its narrow scope, with spillover effects in sectors/areas other than those for which it was designed. In other words, technology must have the features of a general purpose technology, which, according to Lipsey, Carlaw, and Bekar (2005), are pervasiveness across the economy, ability to generate complementary innovations, and improvement over time.

AI, due to its broad applicability, potential productivity gains, and potential for driving further innovation, provides strong arguments for being considered a general-purpose technology. These features of AI, however, create a relevant measurement issue, as it is extremely difficult to identify all the channels through which it affects the economy. This paper contributes to this field by developing a methodology for assessing AI exposure using Large Language Models (LLMs), using a very granular approach that analyses exposure for each task that makes up each occupation.

Contribution. Our main contribution is twofold:

  1. 1.

    From a methodological point of view, we design and implement a reproducible framework to estimate to what extent existing AI and robotics technologies can perform job-related tasks relying on Large Language Models (LLMs). In a nutshell, instead of assessing AI exposure through external benchmarks such as expert judgment or AI patents and innovations data, we construct an internal assessment using LLM’s own evaluation. To do so, we use O*NET111O*NET, namely the Occupational Information Network, which is a comprehensive database of detailed information on hundreds of standardized and occupation-specific descriptors. It is sponsored by the U.S. Department of Labor/Employment and Training Administration. O*NET serves as a resource to provide information on skills, abilities, knowledge, work activities, and interests associated with occupations as a reference taxonomy of about 1K occupations and 19K+ job-related tasks.

  2. 2.

    From an economic perspective, we develop an AI exposure measure for individual tasks reaching a high level of granularity. Then, we show that in the US, approximately 1/3 of employment is highly exposed to AI technologies, the major being high-skill jobs. Finally, we show that AI exposure is positively associated with employment and wage growth in 2019-2023, suggesting a positive effect of AI technologies on productivity.

To allow the community to compare our results and estimate the advances of LLMs capabilities over time, both codes and the enriched O*NET have been made available on GitHub222https://github.com/Crisp-Unimib/Terminator-Economy. The remainder of the paper is structured as follows: section Background and Related Works discusses the related literature, section Building the AI Exposure Index describes the methodology and the construction of the AI index, section Experimental Results presents the results; finally section Conclusions, limitations and future extensions concludes.

Background and Related Works

AI and jobs.

Since the seminal paper by Autor, Levy, and Murnane (2003), the task approach has proven to be very effective in analyzing the impact of technology and jobs. It divides work activities into tasks, each of which can be performed by humans or by machines. In this way, the distinction between capital and labour tasks is more precise, flexible, and able to shift over time. In fact, capital and machines can substitute for labour in the performance of a particular task while complementing it in the performance of others.

The task approach has been applied to analyse the effect of technology and trade (offshoring) Acemoglu and Autor (2011), to the long run effect of technology (Consoli et al. 2023) and to skill task interaction (Colombo, Mercorio, and Mezzanzanica 2019).

This approach has been used to measure occupational exposure to computers and robots recently. In a seminal paper Frey and Osborne (2017) estimated that up to 47% of jobs in the US are at risk of automation.333See also Nedelkoska and Quintini (2018) for a similar approach. Subsequently, other attempts focused on developing measures of exposure to machine learning and robotics Brynjolfsson and Mitchell (2017); Acemoglu and Restrepo (2020) and to AI Felten, Raj, and Seamans (2021); Webb (2023); Eloundou et al. (2023); Pizzinelli et al. (2023).

Overall all these works find an extensive share of employment exposed to AI; the specific effect on occupations varies depending on the nuances that the different indicators capture about the effect of technology, i.e., whether they focus on aspects of technology that impact more routine-based activities (Frey and Osborne 2017) or more cognitive elements (Felten, Raj, and Seamans 2021). All these works share the attempt to quantify AI exposure through an external benchmark, which may be expert judgment or data analysis on patents and innovations. In contrast, our approach is based on an internal assessment, whereby LLM systems are asked to assess the suitability of tasks for AI. This approach has two major advantages. Firstly, it is fully transparent, with outcomes and results being fully disclosed. Secondly, the approach is entirely reproducible. This implies that when subsequent generations of LLM are available, they can be employed in our approach to measuring the change in task exposure that they imply.

Large Language Models.

LLMs are powerful computational models designed to understand and generate human-like text by leveraging vast amounts of textual data, have taken Natural Language Processing (NLP) by storm, achieving state-of-the-art performance on many tasks (Min et al. 2023). Typically these models are based on Transformer architecture (Vaswani et al. 2017), powered by Attention mechanism (Luong, Pham, and Manning 2015; Bahdanau, Cho, and Bengio 2014) and are composed by decoder-only stack. These models are initially trained on Autoregressive task (Radford et al. 2018), where given a sequence of words S=(w1,w2,,wn1)𝑆subscript𝑤1subscript𝑤2subscript𝑤𝑛1S=(w_{1},w_{2},...,w_{n-1})italic_S = ( italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_w start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ) the training objective is to maximize the log-likelihood ilogP(wi|w1,w2,,wi1;θT)subscript𝑖𝑃conditionalsubscript𝑤𝑖subscript𝑤1subscript𝑤2subscript𝑤𝑖1superscript𝜃𝑇\sum_{i}\log P(w_{i}|w_{1},w_{2},\dots,w_{i-1};\theta^{T})∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_log italic_P ( italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_w start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ; italic_θ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) where θTsuperscript𝜃𝑇\theta^{T}italic_θ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT are the model parameters, in order to predict the next word in the sequence i=1nP(wiw1,,wi1)superscriptsubscriptproduct𝑖1𝑛𝑃conditionalsubscript𝑤𝑖subscript𝑤1subscript𝑤𝑖1\prod_{i=1}^{n}P(w_{i}\mid w_{1},...,w_{i-1})∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_P ( italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∣ italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_w start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ). After being pre-trained, these models are fine-tuned for several tasks, providing examples of Natural Language Inference (Radford et al. 2018). Thanks to their capability to learn from context, known as in-context learning (Radford et al. 2019), LLMs can accomplish specific tasks with high accuracy (Zhao et al. 2023), exploiting prompt engineering methodologies such as zero-shot (Wei et al. 2021) and few-shot learning (Brown et al. 2020).

Building the AI Exposure Index

Our method can be summarised as follows: First, we obtain from O*NET the description of each task associated with each SOC occupation. Second, we apply LLMs to task descriptions to obtain a rating about how well AI technologies can accomplish each task. Third, we aggregate the rating at the occupation level to obtain an AI occupation score. Finally, we apply our score to US data to assess the extent of AI exposure in the US labour market and the effect of AI on employment and wages. Figure 1 provides a graphical representation of our approach.

Refer to caption
Figure 1: Graphical overview of the framework to compute the TEAI Index

Step 1: Compute the AI rate

To obtain the AI rate we propose a methodology driven by LLMs. To avoid the risk of being driven by the LLMs’ well-known problem called ”hallucinations” (Ji et al. 2023), we design a framework involving three different LLMs aiming to identify and limit the false information generated, creating a consensus system between them

Model choice.

To ensure the reproducibility of this work we use three of the best open source models, according to performance benchmarks, available on the open LLM leaderboard.444https://huggingface.co/spaces/HuggingFaceH4/open˙llm˙leaderboard To reduce the lack of computational complexity, we use 7 billion parameter models. The three selected models are Mistral 7B Instruct v 0.2 (Jiang et al. 2023), openchat 3.5 0106 (Wang et al. 2023) and orca mini v3 7b (Mukherjee et al. 2023).

Prompt design.

The starting point is the ONET taxonomy which identifies 19281 tasks for 923 SOC occupations.555We use the ONET 28.2 version released in February 2024. We formulate a five-shot prompt using the few-shot learning approach (Brown et al. 2020). We use each individual task description assigned to an occupation to ask the models how well, on a scale of 1 to 5,666How well an AI system, which can be an LLM, Image Processing System or a Robot, could perform in the task on a scale of 1 to 5 where 1 stands for poor and 5 stands for excellent? the combination of different AI technologies could perform the input task and a discursive motivation for the evaluation. As AI technologies, we consider i) LLMs for textual data understanding, ii) Image Processing Systems for elaboration and decision-making based on visual data analysis, and iii) Robotic systems for physical execution. At the end of the prompt, we provide the model with five examples of this task to obtain more contextual and accurate results.

To automate the methodology, the models are asked to return the result in a list format, with the rate as the first element and the motivation as the second element.

Consensus System.

We iterate this process for each task provided by ONET and for each model chosen, ending up with three scores and natural language motivations provided by each model. Table Consensus System. provides an example of the results after this stage for a selection of occupations and tasks.

Table 1: Extraction of three occupation profiles and two tasks from O*NET, enriched as discussed in Step1 to derive to what extent existing AI and robotics technologies can perform job-related tasks according to Mistral, Orca-mini and Open-Chat LLMs. Key activities are highlighted for a better reading
O*NET Occupation O*NET Task Ratings Motivation provided by Mistral
Mistral Orca-mini Open-Chat
Administrative Services Managers Direct or coordinate the supportive services department of a business, agency, or organization. \usym 1F312 \usym 1F312 \usym 1F313 Robotics can automate repetitive tasks such as data entry, document scanning, and mail sorting. Image Processing Systems can analyze documents and emails for information extraction and categorization. LLMs can handle complex tasks like scheduling, coordinating resources, and communicating with team members. Combining these AI technologies can significantly improve the efficiency and accuracy of administrative services, allowing managers to focus on more strategic tasks.
.
.
.
Conduct classes to teach procedures to staff. \usym 1F314 \usym 1F313 \usym 1F314 LLMs could potentially create instructional materials, but they lack the ability to engage and assess staff performance during classes. Image Processing Systems have no role in this task. Robotics could theoretically conduct classes through pre-recorded videos, but the human interaction and customization required for effective training make this a poor solution.
Taxi Drivers Complete accident reports when necessary. \usym 1F313 \usym 1F314 \usym 1F313 Image Processing Systems can analyze accident scene images to determine the extent of damage and potential causes, while LLMs can assist in generating reports based on the data. However, the complexity of accident reports, which often require a nuanced understanding of legal and insurance terminology, may limit the effectiveness of AI in this task. Robotics could potentially automate the collection of physical evidence, but their role is limited compared to the other AI technologies.
.
.
.
Vacuum and clean interiors and wash and polish exteriors of automobiles. \usym 1F315 \usym 1F314 \usym 1F315 Robotics are best suited for this task, as they can physically perform the vacuuming, cleaning, and washing functions. LLMs and Image Processing Systems have limited applicability, as they do not possess the physical capabilities required to accomplish the task.
Veterinarians Euthanize animals. \usym 1F315 \usym 1F314 \usym 1F315 Euthanizing animals is a sensitive and complex task that requires a high level of empathy, compassion, and expertise. The use of AI technologies, including LLMs, Image Processing Systems, and Robotics, is not suitable for this task. The emotional intelligence and ethical considerations required are beyond the capabilities of current AI technologies. Additionally, the potential risks and ethical concerns associated with using AI to euthanize animals outweigh any potential benefits.
.
.
.
Plan or execute animal nutrition or reproduction programs. \usym 1F312 \usym 1F313 \usym 1F313 Robotics can automate the feeding and care of animals, while Image Processing Systems can analyze animal health and dietary needs based on visual data. LLMs can research and develop nutrition plans, analyze data from various sources, and provide recommendations based on the latest research. Combining these technologies can lead to more efficient and effective animal nutrition and reproduction programs.
\usym

1F315 = 1: Poor, \usym1F314 = 2: Fair, \usym1F313 = 3: Average, \usym1F312 = 4: Good, \usym1F311 = 5: Excellence

As mentioned above, the choice to use three different models was made to avoid hallucinations. In order to construct a single indicator, we took a conservative approach by assigning to each task the value of the rating with the highest frequency among the three models; if the three rates were different, we selected the lowest.

To assess the agreement between the rates expressed by the LLMs, we compute a consensus metric (Tastle and Wierman 2007).

Cns(ai)=1+k=1mpklog2(1|LVkμLV|dLV)𝐶𝑛𝑠subscript𝑎𝑖1superscriptsubscript𝑘1𝑚subscript𝑝𝑘subscript21𝐿subscript𝑉𝑘subscript𝜇𝐿𝑉subscript𝑑𝐿𝑉Cns(a_{i})=1+\sum_{k=1}^{m}p_{k}\log_{2}(1-\frac{\lvert LV_{k}-\mu_{LV}\rvert}% {d_{LV}})italic_C italic_n italic_s ( italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = 1 + ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( 1 - divide start_ARG | italic_L italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_L italic_V end_POSTSUBSCRIPT | end_ARG start_ARG italic_d start_POSTSUBSCRIPT italic_L italic_V end_POSTSUBSCRIPT end_ARG ) (1)

The equation 1 shows the consensus calculation in which LVk𝐿subscript𝑉𝑘LV_{k}italic_L italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT represents the observed rating value, pksubscript𝑝𝑘p_{k}italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT its relative frequency, μLVsubscript𝜇𝐿𝑉\mu_{LV}italic_μ start_POSTSUBSCRIPT italic_L italic_V end_POSTSUBSCRIPT represents the weighted average of the LV𝐿𝑉LVitalic_L italic_V ratings using pksubscript𝑝𝑘p_{k}italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT probabilities as weights, and dLVsubscript𝑑𝐿𝑉d_{L}Vitalic_d start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT italic_V represents the scale size of the ratings adopted. The logarithmic function calculates the impact of the normalised difference between each rating and the weighted average, moderated by the dLVsubscript𝑑𝐿𝑉d_{L}Vitalic_d start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT italic_V dimension. The calculation uses a repeated summation for each k-th rate expressed for each individual task.

Similarly, to estimate the similarity between the motivations provided by the LLMs, we compute the centroid of semantic cosine similarity (Rahutomo et al. 2012), between the three motivations. The embedding vectors for the centroid computation is obtained using an open source Transformer model: as for the LLMs, we chose the Transformer model to be used in accordance with the Massive Text Embedding Benchmark (MTEB) Leaderboard.777https://huggingface.co/spaces/mteb/leaderboard Having English-language motivations, the choice fell on the UAE-Large-V1888https://huggingface.co/WhereIsAI/UAE-Large-V1, which represented an excellent compromise between effectiveness and efficiency, given its small size.

Table 2 shows the evolution of the results following the calculation of semantic similarity and consensus between the different rates. Notably, as higher cosine similarity values reflect higher semantic similarities between the text of the LLM motivations, we expect a strong correlation between the consensus metric and the cosine similarity. The heat map represented in figure 2 shows that both the values of cosine similarity and the consensus metric are extremely high, with an average close to 0.9 in both cases.

Table 2: The three occupation profiles and two tasks from O*NET, already shown previously, enriched with the centroid of semantic similarity and consensus among the three rates
O*NET Occupation O*NET Task Similarity Consensus
Administrative Services Managers Direct or coordinate the supportive services department of a business, agency, or organization. 0.918 0.828
.
.
.
Conduct classes to teach procedures to staff. 0.935 0.828
Taxi Drivers Complete accident reports when necessary. 0.948 0.828
.
.
.
Vacuum and clean interiors and wash and polish exteriors of automobiles. 0.925 0.828
Veterinarians Euthanize animals. 0.846 0.828
.
.
.
Plan or execute animal nutrition or reproduction programs. 0.963 0.828
Refer to caption
Figure 2: Heat map between cosine similarity of textual motivation of LLMs and consensus measure between scores. Data are aggregated at occupation level

On the one hand, this suggests coherence between LLM-generated rates and the associated motivations, on the other it adds robustness to our conservative approach in selecting the score among different models.

This process results in a single score te that returns a metric from 1 to 5, measuring the extent to which AI can perform each specific task and a quantitative indicator of similarity between discursive motivation generated by models.

Step 2: Compute the AI exposure

To compute occupation exposure to AI, we aggregate the te scores at the occupation level by weighting them by task relevance (R𝑅Ritalic_R), importance (I𝐼Iitalic_I) and frequency (F𝐹Fitalic_F) as measured by ONET.999Weights capture different aspects of the tasks. More specifically. Importance: indicates the degree of importance a particular descriptor is to the occupation. Relevance refers to the proportion of job incumbents who rated the provided task as relevant to their job. Frequency refers to the frequency of each task within the occupation from yearly to hourly. Despite providing task descriptions, ONET does not provide rating, importance and frequency of tasks for 39 occupations. We manually assigned values for them. More specifically, for each task j𝑗jitalic_j and occupation i𝑖iitalic_i our AI exposure score is computed as follows

TEAIi=j=1nTEijRijIijFijj=1nRijIijFij𝑇𝐸𝐴subscript𝐼𝑖subscriptsuperscript𝑛𝑗1𝑇subscript𝐸𝑖𝑗subscript𝑅𝑖𝑗subscript𝐼𝑖𝑗subscript𝐹𝑖𝑗subscriptsuperscript𝑛𝑗1subscript𝑅𝑖𝑗subscript𝐼𝑖𝑗subscript𝐹𝑖𝑗TEAI_{i}=\frac{\sum^{n}_{j=1}TE_{ij}\cdot R_{ij}\cdot I_{ij}\cdot F_{ij}}{\sum% ^{n}_{j=1}\cdot R_{ij}\cdot I_{ij}\cdot F_{ij}}italic_T italic_E italic_A italic_I start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG ∑ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT italic_T italic_E start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ⋅ italic_R start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ⋅ italic_I start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ⋅ italic_F start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_ARG start_ARG ∑ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT ⋅ italic_R start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ⋅ italic_I start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ⋅ italic_F start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_ARG (2)

where TEij𝑇subscript𝐸𝑖𝑗TE_{ij}italic_T italic_E start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT identifies the metric developed in step 1 at task level, n𝑛nitalic_n defines the number of tasks within each occupation. Each weight is scaled by its maximum to obtain equal weights. The ONET model uses different scales for Relevance (scale 1-100), Importance (scale 1-5), and Frequency (scale 1-7). We normalised the indexes to ensure equal scale across weights. Finally, the score was normalised to ensure comparability with other similar scores.

Experimental Results

Benchmarking evaluation

First, we compare our AI index with other existing measures in the literature. Figure LABEL:fig:scatter_te_1 shows the correlation between the TEAI index and the well-known measure developed by Frey and Osborne (2017), the AI exposure index by Felten, Raj, and Seamans (2021) and by Webb (2023) and the offshorability index developed by Acemoglu and Autor (2011). The pairwise correlation is always statistically significant at 5%. It is higher for the AIOE index, much lower for the AI Webb and the offshorability index, and negative for the Frey-Osborne index. This means our measure is broadly consistent with existing measures but captures different elements of the relationship between AI and the labour market. The negative correlation with the Frey and Osborne index can be explained by the latter being a measure of exposure to robotisation and computerisation and is more centred on routine tasks. At the same time, generative AI is more centred on cognitive/non-routine tasks.

Refer to caption
(b) Correlation with different skill intensity measures
Figure 3: Correlation with existing exposure indexes (Fig LABEL:fig:scatter_te_1) and with different skill intensity measures (Fig 2(b)). Each dot represents a SOC occupation

AI and skills

Next we explore the relationship between our TEAI index and different skills. In figure 2(b), we plot scatterplots comparing the TEAI index with the intensity of different skill types at occupation level derived from Acemoglu and Autor (2011). The graph shows the peculiar nature of AI technologies, which are positively correlated with cognitive analytical and interpersonal skills while negatively correlated with routine manual skills and non-routine manual skills that require physical adaptability. Surprisingly, the correlation with cognitive routine skills is only weekly positive, while it is positive for non-routine manual skills that require interpersonal adaptability. The results of the figure are purely descriptive, therefore we add a more robust analysis by extracting from ONET the detailed skills associated with each occupation. We group skills into 4 classes: Cognitive, Social, Problem solving and management and Technical skills. We then develop a skill relevance index for each class at the occupation level by weighting each skill by its level and importance.101010As provided by ONET. The skill relevance index is constructed as follows:

SRci=z=1mSzcjLzcjIzcjz=1mLzcjIzcj𝑆subscript𝑅𝑐𝑖subscriptsuperscript𝑚𝑧1subscript𝑆𝑧𝑐𝑗subscript𝐿𝑧𝑐𝑗subscript𝐼𝑧𝑐𝑗subscriptsuperscript𝑚𝑧1subscript𝐿𝑧𝑐𝑗subscript𝐼𝑧𝑐𝑗SR_{ci}=\frac{\sum^{m}_{z=1}S_{zcj}\cdot L_{zcj}\cdot I_{zcj}}{\sum^{m}_{z=1}% \cdot L_{zcj}\cdot I_{zcj}}italic_S italic_R start_POSTSUBSCRIPT italic_c italic_i end_POSTSUBSCRIPT = divide start_ARG ∑ start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_z = 1 end_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT italic_z italic_c italic_j end_POSTSUBSCRIPT ⋅ italic_L start_POSTSUBSCRIPT italic_z italic_c italic_j end_POSTSUBSCRIPT ⋅ italic_I start_POSTSUBSCRIPT italic_z italic_c italic_j end_POSTSUBSCRIPT end_ARG start_ARG ∑ start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_z = 1 end_POSTSUBSCRIPT ⋅ italic_L start_POSTSUBSCRIPT italic_z italic_c italic_j end_POSTSUBSCRIPT ⋅ italic_I start_POSTSUBSCRIPT italic_z italic_c italic_j end_POSTSUBSCRIPT end_ARG (3)

where z𝑧zitalic_z denotes the m𝑚mitalic_m skills of class c𝑐citalic_c in each occupation j𝑗jitalic_j; L𝐿Litalic_L and I𝐼Iitalic_I denote, respectively, the level and importance of each skill in each occupation.

We therefore estimate the following regression:

TEAIi=αi+βSi+γOi+ϵi𝑇𝐸𝐴subscript𝐼𝑖subscript𝛼𝑖𝛽subscript𝑆𝑖𝛾subscript𝑂𝑖subscriptitalic-ϵ𝑖TEAI_{i}=\alpha_{i}+\beta S_{i}+\gamma O_{i}+\epsilon_{i}italic_T italic_E italic_A italic_I start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_β italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_γ italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT

where each observation is a SOC occupation, TEAIi𝑇𝐸𝐴subscript𝐼𝑖TEAI_{i}italic_T italic_E italic_A italic_I start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is our measure of AI exposure, S𝑆Sitalic_S is a vector of skill relevance at the occupation level, and O𝑂Oitalic_O defines occupation dummies. We saturate the model using more detailed dummies up to the fourth digit; therefore, the results are identified within group variation. Table 3 shows the results. The TEAI index is positively related to cognitive skills and problem-solving and management skills; on the contrary, as expected, it is negatively correlated with social skills. The relationship with technical skills is very weak and does not survive the inclusion of detailed SOC occupation dummies.

Table 3: OLS estimates of TE-AI index on measures of skill intensity

Note: Each observation consists of an occupation. OLS regression using TEAI index as the dependent variable. The independent variables are skill intensities. All the regressions include occupation (SOC) fixed effects at 3, 4 and 5 digits. Robust standard errors in parentheses ** p <<< 0.001, ** p <<< 0.01, * p <<< 0.05

AI employment and wages

Finally, we explore the relationship between TEAI and labour market outcomes. We start by analysing the size and the characteristics of workers exposed to AI technologies. First, we divide the distribution of TEAI scores into three tertiles representing High, Medium and Low AI exposure. Subsequently, we computed the degree of exposure of the US population using BLS employment data. Finally, we distinguish between occupation groups and by skill groups within each tertile. Figures 6(a) and 3(b) show the results. Overall, in 2023, 34% of US employment is highly exposed to AI technologies, while medium and low exposure represents 32% and 34%, respectively. Our findings do not suggest a polarising effect of AI exposure as found by Frey and Osborne (2017); on the contrary, AI seems to have a more balanced impact on the labour market. This is because our indicator is able to capture recent advances in AI, such as LLMs, that have affected occupation groups such as management, business, administration, and finance, as well as ICT and science, which are intensive in non-routine cognitive tasks. For example, AI technologies are increasingly used to diagnose diseases, write reports, code, or brainstorm ideas in management and business. On the contrary, previous studies that focus more on the effect of AI on routine tasks find these tasks and occupations to be less exposed to AI.

Grouping occupations by skill intensity shows that in the group highly exposed to AI, 88% of employment is in high-skill jobs; in the group with medium exposure 53% of employment is in medium-skill jobs while 40% in high-skill jobs. In the group with the lowest exposure 67% are medium-skill jobs and 25% low-skill jobs. Overall, AI exposure disproportionately affects high-skill jobs, characterised by the competencies most heavily affected by AI technologies.

Refer to caption
(a) TEAI index by SOC group
Refer to caption
(b) TEAI index by skill intensity
Figure 4: Exposure to TEAI index by SOC group (Fig. 6(a)) and by skill intensity (Fig. 3(b)). US BLS employment. Values in millions of workers. Each bar represents a tertile of the TEAI score distribution.

Next we analyse the relationship between AI exposure and workers’ characteristics.

Figure 5 shows that TEAI exposure is higher for workers’ with high level of education, in particular graduates and postgraduates. Age is slightly increasing in exposure to TEAI albeit the variation in exposure is really limited above the age of 30. Males are more exposed than females at all age groups.

Figure 5: AI exposure by workers’ characteristics.
Refer to caption
(a) Exposure by education
Refer to caption
(b) Exposure by age and sex
Panel a) shows coefficients of regression of education categories on TEAI exposure (in percentiles). Covariates include age and sex. Estimates control for occupation(4d), industry(3d), state and year fixed effects. ACS weights are used. Robust standard errors are clustered at the industry level. Panel b) is a binscatter. The x-axis is the average age of workers in an industry-occupation-state observation in the 2022-18 ACS 5 years sample. Biscatter is computing, considering education as a covariate. ACS weights are used.

Finally, we assess the relationship between AI exposure, employment, and wages. To compute the medium-term effect of AI in a flexible way, allowing for changes during the estimation period, we compute the log change in employment and wages over a 4-year rolling window from 2003 to 2019. Therefore, we run the following regression.

Δyi,j=αi+βTEAIi+γZi,j+δi+ηj+ϵi,jΔsubscript𝑦𝑖𝑗subscript𝛼𝑖𝛽𝑇𝐸𝐴subscript𝐼𝑖𝛾subscript𝑍𝑖𝑗subscript𝛿𝑖subscript𝜂𝑗subscriptitalic-ϵ𝑖𝑗\Delta y_{i,j}=\alpha_{i}+\beta TEAI_{i}+\gamma Z_{i,j}+\delta_{i}+\eta_{j}+% \epsilon_{i,j}roman_Δ italic_y start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_β italic_T italic_E italic_A italic_I start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_γ italic_Z start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT + italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_η start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_ϵ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT (4)

where Δyi,jΔsubscript𝑦𝑖𝑗\Delta y_{i,j}roman_Δ italic_y start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT denotes the 4 changes in log employment and log wages in sector j𝑗jitalic_j for occupation i𝑖iitalic_i. To control for possible endogeneity and omitted variable problems, we add the initial level of employment, the initial level of wage and wage squared. We also included detailed NAICS and SOC fixed effects in the regression, and we clustered errors at the NAICS level. Figure 6 demonstrates that exposure to artificial intelligence (AI) positively correlates with employment and wage growth. This suggests that AI technologies complement labour and enhance productivity, thereby increasing employment and wages in occupations with greater exposure to AI.

The presence of detailed controls at the industry and occupation level allows us to control for factors on the production side (changes in output across industries), on the demand side (changes in product demand across industries) and on the labour supply side (changes in employment across industries and occupations) that are unrelated to AI technologies and that could affect wages and employment. Moreover, the focus on a relatively short period of time isolates our results from long-term trends within industries and occupations.

Therefore, the positive relationship between employment and wages and AI exposure should be interpreted as meaning that occupations more exposed to AI have stronger employment and wage growth within the occupation and sector. Our results contrast with those obtained by Acemoglu et al. (2022); Webb (2023), who find a negative relationship between employment and wages. The potential reconciliation between our findings and theirs lies, on the one hand, in our construction of a different measure of AI exposure, which emphasizes more recent advances in AI. On the other hand, our analysis concentrates on changes occurring in the last 20 years, whereas their analysis adopts a more long-term perspective, focusing on changes spanning several decades.

Figure 6: TEAI index, employment and wage growth.
Refer to caption
(a) TEAI index and employment growth
Refer to caption
(b) TEAI and wage growth
This figure plots the effect of AI score on employment and wage growth. Estimates are derived from equation 4, with rolling regression coefficients and 95% confidence intervals of 4-year windows, starting in 2003-2007 and ending in 2019-2023. The point estimate refers to TEAI score, and the dependent variables are annual percentage growth rates of employment and wages. Employment regression includes the log of the initial period of employment. Wage regression includes a log of initial period employment, log initial period wage and log initial period wage squared. All the regressions include occupation (SOC 4 digit) and sector (NAICS 3 digit) fixed effects. Robust standard errors clustered at NAICS level.

Conclusions, limitations and future extensions

This paper provides a comprehensive assessment of AI exposure for 19281 tasks for 923 SOC occupations identified by ONET. We use the task description and perform the task assessment using LLMs own evaluation. We then aggregate task scores, obtaining an occupation-based score of AI exposure. Our methodology ensures the full reproducibility of results, allowing future assessment of potential performance improvements in new versions of LLMs. Our AI exposure index is positively related to cognitive, problem-solving and management skills, emphasising the role of recent advances in AI that heavily affect management and decision-making tasks; on the contrary, our measure is negatively correlated with social skills, a well-known area of weakness of AI.

Regarding labour market outcomes, we find that AI exposure is positively associated with both employment and wage growth in the period 2003-2023, suggesting that AI has a positive effect on productivity. Therefore, at least in the medium run, AI has an overall positive impact on the labour market. However, our estimates show that high exposure to artificial intelligence affects about one-third of the American workforce, of which the largest part is composed of high-skill jobs. Whether for these workers, in the future, AI will turn out to be an opportunity or a threat will depend on whether AI will complement or substitute human labour. Our measure in this regard is relatively agnostic, as we cannot yet disentangle the substitutability from the complementarity effect. In other words, a high exposure metric does not necessarily imply full substitution of labour by technology, which, on the contrary, may fully complement human activities, leading to higher productivity without displacing labour. Future research will explore this important distinction.

References

  • Acemoglu and Autor (2011) Acemoglu, D.; and Autor, D. 2011. Skills, Tasks and Technologies: Implications for Employment and Earnings. volume 4 of Handbook of Labor Economics, 1043–1171. Elsevier.
  • Acemoglu et al. (2022) Acemoglu, D.; Autor, D.; Hazell, J.; and Restrepo, P. 2022. Artificial Intelligence and Jobs: Evidence from Online Vacancies. Journal of Labor Economics, 40(S1): 293–340.
  • Acemoglu and Restrepo (2019) Acemoglu, D.; and Restrepo, P. 2019. Automation and New Tasks: How Technology Displaces and Reinstates Labor. Journal of Economic Perspectives, 33(2): 3–30.
  • Acemoglu and Restrepo (2020) Acemoglu, D.; and Restrepo, P. 2020. Robots and Jobs: Evidence from US Labor Markets. Journal of Political Economy, 128(6): 2188–2244.
  • Autor, Levy, and Murnane (2003) Autor, D. H.; Levy, F.; and Murnane, R. J. 2003. The Skill Content of Recent Technological Change: An Empirical Exploration. The Quarterly Journal of Economics, 118(4): 1279–1333.
  • Bahdanau, Cho, and Bengio (2014) Bahdanau, D.; Cho, K.; and Bengio, Y. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
  • Brown et al. (2020) Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J. D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. 2020. Language models are few-shot learners. Advances in neural information processing systems, 33: 1877–1901.
  • Brynjolfsson and Mitchell (2017) Brynjolfsson, E.; and Mitchell, T. 2017. What can machine learning do? Workforce implications. Science, 358(6370): 1530–1534.
  • Colombo, Mercorio, and Mezzanzanica (2019) Colombo, E.; Mercorio, F.; and Mezzanzanica, M. 2019. AI meets labor market: Exploring the link between automation and skills. Information Economics and Policy, 47(C): 27–37.
  • Consoli et al. (2023) Consoli, D.; Marin, G.; Rentocchini, F.; and Vona, F. 2023. Routinization, within-occupation task changes and long-run employment dynamics. Research Policy, 52(1): 104658.
  • Eloundou et al. (2023) Eloundou, T.; Manning, S.; Mishkin, P.; and Rock, D. 2023. GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models. Papers 2303.10130, arXiv.org.
  • Felten, Raj, and Seamans (2021) Felten, E.; Raj, M.; and Seamans, R. 2021. Occupational, industry, and geographic exposure to artificial intelligence: A novel dataset and its potential uses. Strategic Management Journal, 42(12): 2195–2217.
  • Frey and Osborne (2017) Frey, C. B.; and Osborne, M. A. 2017. The future of employment: How susceptible are jobs to computerisation? Technological Forecasting and Social Change, 114: 254–280.
  • Ji et al. (2023) Ji, Z.; Lee, N.; Frieske, R.; Yu, T.; Su, D.; Xu, Y.; Ishii, E.; Bang, Y. J.; Madotto, A.; and Fung, P. 2023. Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12): 1–38.
  • Jiang et al. (2023) Jiang, A. Q.; Sablayrolles, A.; Mensch, A.; Bamford, C.; Chaplot, D. S.; de las Casas, D.; Bressand, F.; Lengyel, G.; Lample, G.; Saulnier, L.; Lavaud, L. R.; Lachaux, M.-A.; Stock, P.; Scao, T. L.; Lavril, T.; Wang, T.; Lacroix, T.; and Sayed, W. E. 2023. Mistral 7B. arXiv:2310.06825.
  • Lipsey, Carlaw, and Bekar (2005) Lipsey, R.; Carlaw, K. I.; and Bekar, C. T. 2005. Economic Transformations: General Purpose Technologies and Long-Term Economic Growth. Oxford University Press.
  • Luong, Pham, and Manning (2015) Luong, M.-T.; Pham, H.; and Manning, C. D. 2015. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025.
  • Min et al. (2023) Min, B.; Ross, H.; Sulem, E.; Veyseh, A. P. B.; Nguyen, T. H.; Sainz, O.; Agirre, E.; Heintz, I.; and Roth, D. 2023. Recent advances in natural language processing via large pre-trained language models: A survey. ACM Computing Surveys, 56(2): 1–40.
  • Mukherjee et al. (2023) Mukherjee, S.; Mitra, A.; Jawahar, G.; Agarwal, S.; Palangi, H.; and Awadallah, A. 2023. Orca: Progressive Learning from Complex Explanation Traces of GPT-4. arXiv:2306.02707.
  • Nedelkoska and Quintini (2018) Nedelkoska, L.; and Quintini, G. 2018. Automation, skills use and training. OECD Social, Employment and Migration Working Papers 202, OECD Publishing.
  • Pizzinelli et al. (2023) Pizzinelli, C.; Panton, A. J.; Tavares, M. M. M.; Cazzaniga, M.; and Li, L. 2023. Labor Market Exposure to AI: Cross-country Differences and Distributional Implications. IMF Working Papers 2023/216, International Monetary Fund.
  • Radford et al. (2018) Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I.; et al. 2018. Improving language understanding by generative pre-training.
  • Radford et al. (2019) Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I.; et al. 2019. Language models are unsupervised multitask learners. OpenAI blog, 1(8): 9.
  • Rahutomo et al. (2012) Rahutomo, F.; Kitasuka, T.; Aritsugi, M.; et al. 2012. Semantic cosine similarity. In The 7th international student conference on advanced science and technology ICAST, volume 4, 1. University of Seoul South Korea.
  • Tastle and Wierman (2007) Tastle, W. J.; and Wierman, M. J. 2007. Consensus and dissention: A measure of ordinal dispersion. International Journal of Approximate Reasoning, 45(3): 531–545. North American Fuzzy Information Processing Society Annual Conference NAFIPS ’2005.
  • Vaswani et al. (2017) Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, Ł.; and Polosukhin, I. 2017. Attention is all you need. Advances in neural information processing systems, 30.
  • Wang et al. (2023) Wang, G.; Cheng, S.; Zhan, X.; Li, X.; Song, S.; and Liu, Y. 2023. OpenChat: Advancing Open-source Language Models with Mixed-Quality Data. arXiv preprint arXiv:2309.11235.
  • Webb (2023) Webb, M. 2023. The Impact of Artificial Intelligence on the Labor Market. Mimeo, Stanford University.
  • Wei et al. (2021) Wei, J.; Bosma, M.; Zhao, V. Y.; Guu, K.; Yu, A. W.; Lester, B.; Du, N.; Dai, A. M.; and Le, Q. V. 2021. Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652.
  • Zhao et al. (2023) Zhao, W. X.; Zhou, K.; Li, J.; Tang, T.; Wang, X.; Hou, Y.; Min, Y.; Zhang, B.; Zhang, J.; Dong, Z.; et al. 2023. A survey of large language models. arXiv preprint arXiv:2303.18223.