Artificial Intelligence Index Report 2023
Artificial Intelligence Index Report 2023
Artificial Intelligence Index Report 2023
Intelligence
Index Report
2023
Artificial Intelligence
Index Report 2023
Introduction to the
AI Index Report 2023
Welcome to the sixth edition of the AI Index Report! This year, the report introduces more original data than any
previous edition, including a new chapter on AI public opinion, a more thorough technical performance chapter,
original analysis about large language and multimodal models, detailed trends in global AI legislation records,
a study of the environmental impact of AI systems, and more.
The AI Index Report tracks, collates, distills, and visualizes data related to artificial intelligence. Our mission is
to provide unbiased, rigorously vetted, broadly sourced data in order for policymakers, researchers, executives,
journalists, and the general public to develop a more thorough and nuanced understanding of the complex field of
AI. The report aims to be the world’s most credible and authoritative source for data and insights about AI.
Although 2022 was the first year in a decade where private AI investment decreased, AI is still a topic of great
interest to policymakers, industry leaders, researchers, and the public. Policymakers are talking about AI more
than ever before. Industry leaders that have integrated AI into their businesses are seeing tangible cost and
revenue benefits. The number of AI publications and collaborations continues to increase. And the public is
forming sharper opinions about AI and which elements they like or dislike.
AI will continue to improve and, as such, become a greater part of all our lives. Given the increased presence of
this technology and its potential for massive disruption, we should all begin thinking more critically about how
exactly we want AI to be developed and deployed. We should also ask questions about who is deploying it—as
our analysis shows, AI is increasingly defined by the actions of a small set of private sector actors, rather than a
broader range of societal actors. This year’s AI Index paints a picture of where we are so far with AI, in order to
highlight what might await us in the future.
2 Performance saturation on
traditional benchmarks.
AI continued to post state-of-the-art results, but
in 2022 included a deepfake video of Ukrainian
President Volodymyr Zelenskyy surrendering and
U.S. prisons using call-monitoring technology on their
year-over-year improvement on many benchmarks inmates. This growth is evidence of both greater use of
continues to be marginal. Moreover, the speed at AI technologies and awareness of misuse possibilities.
which benchmark saturation is being reached is
increasing. However, new, more comprehensive
benchmarking suites such as BIG-bench and HELM 6 The demand for AI-related
professional skills is increasing across
virtually every American industrial sector.
are being released.
Across every sector in the United States for which
8 While the proportion of companies surveyed countries) agreed that products and services
adopting AI has plateaued, the using AI had more benefits than drawbacks.
companies that have adopted AI
continue to pull ahead.
The proportion of companies adopting AI in 2022
has more than doubled since 2017, though it has
plateaued in recent years between 50% and 60%,
according to the results of McKinsey’s annual
research survey. Organizations that have adopted
AI report realizing meaningful cost decreases and
revenue increases.
9 Policymaker interest in AI
is on the rise.
An AI Index analysis of the legislative records of 127
countries shows that the number of bills containing
“artificial intelligence” that were passed into law
grew from just 1 in 2016 to 37 in 2022. An analysis
of the parliamentary records on AI in 81 countries
likewise shows that mentions of AI in global
legislative proceedings have increased nearly
6.5 times since 2016.
Artificial Intelligence
Index Report 2023
Steering Committee
Co-directors
Jack Clark Raymond Perrault
Anthropic, OECD SRI International
Members
Erik Brynjolfsson Katrina Ligett Juan Carlos Niebles Yoav Shoham
Stanford University Hebrew University Stanford University, (Founding Director)
Salesforce Stanford University,
John Etchemendy Terah Lyons AI21 Labs
Stanford University Vanessa Parli
James Manyika Stanford University Russell Wald
Google Stanford University
Affiliated Researchers
Elif Kiesow Cortez Helen Ngo Robi Rahman Alexandra Rome
Stanford Law School Hugging Face Data Scientist Freelance Researcher
Research Fellow
Graduate Researcher
Han Bai
Stanford University
Undergraduate Researchers
Vania Siddhartha Mena Naima Sukrut Stone Lucy Elizabeth
Chow Javvaji Hassan Patel Oak Yang Zimmerman Zhu
Stanford Stanford Stanford Stanford Stanford Stanford Stanford Stanford
University University University University University University University University
Artificial Intelligence
Index Report 2023
Raw data and charts: The public data and Global AI Vibrancy Tool: Compare up to
high-resolution images of all the charts 30 countries across 21 indicators. The Global AI
in the report are available on Google Drive. Vibrancy tool will be updated in the latter half of 2023.
The AI Index was conceived within the One Hundred Year Study on AI (AI100).
Supporting Partners
Contributors
We want to acknowledge the following individuals by chapter and section for their contributions of data,
analysis, advice, and expert commentary included in the AI Index 2023 Report:
Technical Performance
Jack Clark, Loredana Fattorini, Siddhartha Javvaji, Katrina Ligett, Nestor Maslej, Juan Carlos Niebles,
Sukrut Oak, Vanessa Parli, Ray Perrault, Robi Rahman, Alexandra Rome, Yoav Shoham, Elizabeth Zhu
Technical AI Ethics
Jack Clark, Loredana Fattorini, Katrina Ligett, Nestor Maslej, Helen Ngo, Sukrut Oak, Vanessa Parli,
Ray Perrault, Alexandra Rome, Elizabeth Zhu, Lucy Zimmerman
Economy
Susanne Bieller, Erik Brynjolfsson, Vania Chow, Jack Clark, Natalia Dorogi, Murat Erer, Loredana Fattorini,
Akash Kaura, James Manyika, Nestor Maslej, Layla O’Kane, Vanessa Parli, Ray Perrault, Brittany Presten,
Alexandra Rome, Nicole Seredenko, Bledi Taska, Bill Valle, Casey Weston
Education
Han Bai, Betsy Bizot, Jack Clark, John Etchemendy, Loredana Fattorini, Katrina Ligett, Nestor Maslej,
Vanessa Parli, Ray Perrault, Sean Roberts, Alexandra Rome
Diversity
Han Bai, Betsy Bizot, Jack Clark, Loredana Fattorini, Nezihe Merve Gürel, Mena Hassan, Katrina Ligett,
Nestor Maslej, Vanessa Parli, Ray Perrault, Sean Roberts, Alexandra Rome, Sarah Tan, Lucy Zimmerman
Public Opinion
Jack Clark, Loredana Fattorini, Mena Hassan, Nestor Maslej, Vanessa Parli, Ray Perrault,
Alexandra Rome, Nicole Seredenko, Bill Valle, Lucy Zimmerman
Conference Attendance
Terri Auricchio (ICML), Lee Campbell (ICLR), Cassio de Campos (UAI), Meredith Ellison (AAAI), Nicole Finn (CVPR),
Vasant Gajanan (AAAI), Katja Hofmann (ICLR), Gerhard Lakemeyer (KR), Seth Lazar (FAccT), Shugen Ma (IROS),
Becky Obbema (NeurIPS), Vesna Sabljakovic-Fritz (IJCAI), Csaba Szepesvari (ICML), Matthew Taylor (AAMAS),
Sylvie Thiebaux (ICAPS), Pradeep Varakantham (ICAPS)
Artificial Intelligence
Index Report 2023
Organizations
Code.org Lightcast
Sean Roberts Layla O’Kane, Bledi Taska
We also would like to thank Jeanina Casusi, Nancy King, Shana Lynch, Jonathan Mindes,
Michi Turner, and Madeleine Wright for their help in preparing this report, and Joe Hinman and
Santanu Mukherjee for their help in maintaining the AI Index website.
Artificial Intelligence
Index Report 2023
Table of Contents
Report Highlights 11
Appendix 344
Report Highlights
Chapter 1: Research and Development
The United States and China had the greatest number of cross-country collaborations in AI
publications from 2010 to 2021, although the pace of collaboration has slowed. The number of AI
research collaborations between the United States and China increased roughly 4 times since 2010,
and was 2.5 times greater than the collaboration totals of the next nearest country pair, the United
Kingdom and China. However the total number of U.S.-China collaborations only increased by 2.1%
from 2020 to 2021, the smallest year-over-year growth rate since 2010.
AI research is on the rise, across the board. The total number of AI publications has more than
doubled since 2010. The specific AI topics that continue dominating research include pattern
recognition, machine learning, and computer vision.
Industry races ahead of academia. Until 2014, most significant machine learning models were
released by academia. Since then, industry has taken over. In 2022, there were 32 significant
industry-produced machine learning models compared to just three produced by academia.
Building state-of-the-art AI systems increasingly requires large amounts of data, computer power,
and money—resources that industry actors inherently possess in greater amounts compared to
nonprofits and academia.
Large language models are getting bigger and more expensive. GPT-2, released in 2019,
considered by many to be the first large language model, had 1.5 billion parameters and cost an
estimated $50,000 USD to train. PaLM, one of the flagship large language models launched in 2022,
had 540 billion parameters and cost an estimated $8 million USD—PaLM was around 360 times
larger than GPT-2 and cost 160 times more. It’s not just PaLM: Across the board, large language and
multimodal models are becoming larger and pricier.
Artificial Intelligence
Index Report 2023
Generative AI breaks into the public consciousness. 2022 saw the release of text-to-image
models like DALL-E 2 and Stable Diffusion, text-to-video systems like Make-A-Video, and chatbots
like ChatGPT. Still, these systems can be prone to hallucination, confidently outputting incoherent or
untrue responses, making it hard to rely on them for critical applications.
AI systems become more flexible. Traditionally AI systems have performed well on narrow tasks
but have struggled across broader tasks. Recently released models challenge that trend; BEiT-3,
PaLI, and Gato, among others, are single AI systems increasingly capable of navigating multiple tasks
(for example, vision, language).
Capable language models still struggle with reasoning. Language models continued to improve
their generative capabilities, but new research suggests that they still struggle with complex
planning tasks.
AI is both helping and harming the environment. New research suggests that AI systems can have
serious environmental impacts. According to Luccioni et al., 2022, BLOOM’s training run emitted 25
times more carbon than a single air traveler on a one-way trip from New York to San Francisco. Still,
new reinforcement learning models like BCOOLER show that AI systems can be used to optimize
energy usage.
The world’s best new scientist … AI? AI models are starting to rapidly accelerate scientific
progress and in 2022 were used to aid hydrogen fusion, improve the efficiency of matrix
manipulation, and generate new antibodies.
AI starts to build better AI. Nvidia used an AI reinforcement learning agent to improve the design
of the chips that power AI systems. Similarly, Google recently used one of its language models,
PaLM, to suggest ways to improve the very same model. Self-improving AI learning will accelerate
AI progress.
Artificial Intelligence
Index Report 2023
The effects of model scale on bias and toxicity are confounded by training data and mitigation
methods. In the past year, several institutions have built their own large models trained on
proprietary data—and while large models are still toxic and biased, new evidence suggests that
these issues can be somewhat mitigated after training larger models with instruction-tuning.
Generative models have arrived and so have their ethical problems. In 2022, generative models
became part of the zeitgeist. These models are capable but also come with ethical challenges. Text-
to-image generators are routinely biased along gender dimensions, and chatbots like ChatGPT can
be tricked into serving nefarious aims.
The number of incidents concerning the misuse of AI is rapidly rising. According to the AIAAIC
database, which tracks incidents related to the ethical misuse of AI, the number of AI incidents
and controversies has increased 26 times since 2012. Some notable incidents in 2022 included a
deepfake video of Ukrainian President Volodymyr Zelenskyy surrendering and U.S. prisons using
call-monitoring technology on their inmates. This growth is evidence of both greater use of AI
technologies and awareness of misuse possibilities.
Fairer models may not be less biased. Extensive analysis of language models suggests that while there
is a clear correlation between performance and fairness, fairness and bias can be at odds: Language
models which perform better on certain fairness benchmarks tend to have worse gender bias.
Automated fact-checking with natural language processing isn’t so straightforward after all.
While several benchmarks have been developed for automated fact-checking, researchers find that
11 of 16 of such datasets rely on evidence “leaked” from fact-checking reports which did not exist at
the time of the claim surfacing.
Artificial Intelligence
Index Report 2023
The demand for AI-related professional skills is increasing across virtually every American
industrial sector. Across every sector in the United States for which there is data (with the exception
of agriculture, forestry, fishing, and hunting), the number of AI-related job postings has increased on
average from 1.7% in 2021 to 1.9% in 2022. Employers in the United States are increasingly looking for
workers with AI-related skills.
For the first time in the last decade, year-over-year private investment in AI decreased.
Global AI private investment was $91.9 billion in 2022, which represented a 26.7% decrease since 2021.
The total number of AI-related funding events as well as the number of newly funded AI companies
likewise decreased. Still, during the last decade as a whole, AI investment has significantly increased.
In 2022 the amount of private investment in AI was 18 times greater than it was in 2013.
Once again, the United States leads in investment in AI. The U.S. led the world in terms of total
amount of AI private investment. In 2022, the $47.4 billion invested in the U.S. was roughly 3.5 times
the amount invested in the next highest country, China ($13.4 billion). The U.S. also continues to lead in
terms of total number of newly funded AI companies, seeing 1.9 times more than the European Union
and the United Kingdom combined, and 3.4 times more than China.
In 2022, the AI focus area with the most investment was medical and healthcare ($6.1 billion);
followed by data management, processing, and cloud ($5.9 billion); and Fintech ($5.5 billion).
However, mirroring the broader trend in AI private investment, most AI focus areas saw less
investment in 2022 than in 2021. In the last year, the three largest AI private investment events were:
(1) a $2.5 billion funding event for GAC Aion New Energy Automobile, a Chinese manufacturer of
electric vehicles; (2) a $1.5 billion Series E funding round for Anduril Industries, a U.S. defense products
company that builds technology for military agencies and border surveillance; and (3) a $1.2 billion
investment in Celonis, a business-data consulting company based in Germany.
While the proportion of companies adopting AI has plateaued, the companies that have adopted
AI continue to pull ahead. The proportion of companies adopting AI in 2022 has more than doubled
since 2017, though it has plateaued in recent years between 50% and 60%, according to the results of
McKinsey’s annual research survey. Organizations that have adopted AI report realizing meaningful
cost decreases and revenue increases.
Artificial Intelligence
Index Report 2023
AI tools like Copilot are tangibly helping workers. Results of a GitHub survey on the use of Copilot,
a text-to-code AI system, find that 88% of surveyed respondents feel more productive when using
the system, 74% feel they are able to focus on more satisfying work, and 88% feel they are able to
complete tasks more quickly.
China dominates industrial robot installations. In 2013, China overtook Japan as the nation installing
the most industrial robots. Since then, the gap between the total number of industrial robots installed
by China and the next-nearest nation has widened. In 2021, China installed more industrial robots than
the rest of the world combined.
Artificial Intelligence
Index Report 2023
Chapter 5: Education
More and more AI specialization. The proportion of new computer science PhD graduates from
U.S. universities who specialized in AI jumped to 19.1% in 2021, from 14.9% in 2020 and 10.2% in 2010.
New AI PhDs increasingly head to industry. In 2011, roughly the same proportion of new AI PhD
graduates took jobs in industry (40.9%) as opposed to academia (41.6%). Since then, however, a
majority of AI PhDs have headed to industry. In 2021, 65.4% of AI PhDs took jobs in industry, more
than double the 28.2% who took jobs in academia.
New North American CS, CE, and information faculty hires stayed flat. In the last decade,
the total number of new North American computer science (CS), computer engineering (CE),
and information faculty hires has decreased: There were 710 total hires in 2021 compared to
733 in 2012. Similarly, the total number of tenure-track hires peaked in 2019 at 422 and then
dropped to 324 in 2021.
The gap in external research funding for private versus public American CS departments
continues to widen. In 2011, the median amount of total expenditure from external sources for
computing research was roughly the same for private and public CS departments in the United
States. Since then, the gap has widened, with private U.S. CS departments receiving millions more
in additional funding than public universities. In 2021, the median expenditure for private universities
was $9.7 million, compared to $5.7 million for public universities.
Interest in K–12 AI and computer science education grows in both the United States and the
rest of the world. In 2021, a total of 181,040 AP computer science exams were taken by American
students, a 1.0% increase from the previous year. Since 2007, the number of AP computer science
exams has increased ninefold. As of 2021, 11 countries, including Belgium, China, and South Korea,
have officially endorsed and implemented a K–12 AI curriculum.
Artificial Intelligence
Index Report 2023
Policymaker interest in AI is on the rise. An AI Index analysis of the legislative records of 127
countries shows that the number of bills containing “artificial intelligence” that were passed into law
grew from just 1 in 2016 to 37 in 2022. An analysis of the parliamentary records on AI in 81 countries
likewise shows that mentions of AI in global legislative proceedings have increased nearly 6.5 times
since 2016.
From talk to enactment—the U.S. passed more AI bills than ever before. In 2021, only 2% of
all federal AI bills in the United States were passed into law. This number jumped to 10% in 2022.
Similarly, last year 35% of all state-level AI bills were passed into law.
When it comes to AI, policymakers have a lot of thoughts. A qualitative analysis of the
parliamentary proceedings of a diverse group of nations reveals that policymakers think about
AI from a wide range of perspectives. For example, in 2022, legislators in the United Kingdom
discussed the risks of AI-led automation; those in Japan considered the necessity of safeguarding
human rights in the face of AI; and those in Zambia looked at the possibility of using AI for
weather forecasting.
The U.S. government continues to increase spending on AI. Since 2017, the amount of U.S.
government AI-related contract spending has increased roughly 2.5 times.
The legal world is waking up to AI. In 2022, there were 110 AI-related legal cases in United
States state and federal courts, roughly seven times more than in 2016. The majority of these cases
originated in California, New York, and Illinois, and concerned issues relating to civil, intellectual
property, and contract law.
Artificial Intelligence
Index Report 2023
Chapter 7: Diversity
North American bachelor’s, master’s, and PhD-level computer science students are becoming
more ethnically diverse. Although white students are still the most represented ethnicity among
new resident bachelor’s, master’s, and PhD-level computer science graduates, students from other
ethnic backgrounds (for example, Asian, Hispanic, and Black or African American) are becoming
increasingly more represented. For example, in 2011, 71.9% of new resident CS bachelor’s graduates
were white. In 2021, that number dropped to 46.7%.
New AI PhDs are still overwhelmingly male. In 2021, 78.7% of new AI PhDs were male.
Only 21.3% were female, a 3.2 percentage point increase from 2011. There continues to be a gender
imbalance in higher-level AI education.
Women make up an increasingly greater share of CS, CE, and information faculty hires.
Since 2017, the proportion of new female CS, CE, and information faculty hires has increased from
24.9% to 30.2%. Still, most CS, CE, and information faculty in North American universities are male
(75.9%). As of 2021, only 0.1% of CS, CE, and information faculty identify as nonbinary.
American K–12 computer science education has become more diverse, in terms of both gender
and ethnicity. The share of AP computer science exams taken by female students increased from
16.8% in 2007 to 30.6% in 2021. Year over year, the share of Asian, Hispanic/Latino/Latina, and
Black/African American students taking AP computer science has likewise increased.
Artificial Intelligence
Index Report 2023
Chinese citizens are among those who feel the most positively about AI products and services.
Americans … not so much. In a 2022 IPSOS survey, 78% of Chinese respondents (the highest
proportion of surveyed countries) agreed with the statement that products and services using AI
have more benefits than drawbacks. After Chinese respondents, those from Saudi Arabia (76%) and
India (71%) felt the most positive about AI products. Only 35% of sampled Americans (among the
lowest of surveyed countries) agreed that products and services using AI had more benefits than
drawbacks.
Men tend to feel more positively about AI products and services than women. Men are also
more likely than women to believe that AI will mostly help rather than harm. According to the
2022 IPSOS survey, men are more likely than women to report that AI products and services make
their lives easier, trust companies that use AI, and feel that AI products and services have more
benefits than drawbacks. A 2021 survey by Gallup and Lloyd’s Register Foundation likewise revealed
that men are more likely than women to agree with the statement that AI will mostly help rather than
harm their country in the next 20 years.
People across the world and especially America remain unconvinced by self-driving cars. In
a global survey, only 27% of respondents reported feeling safe in a self-driving car. Similarly, Pew
Research suggests that only 26% of Americans feel that driverless passenger vehicles are a good
idea for society.
Different causes for excitement and concern. Among a sample of surveyed Americans, those
who report feeling excited about AI are most excited about the potential to make life and society
better (31%) and to save time and make things more efficient (13%). Those who report feeling more
concerned worry about the loss of human jobs (19%); surveillance, hacking, and digital privacy (16%);
and the lack of human connection (12%).
NLP researchers … have some strong opinions as well. According to a survey widely distributed to
NLP researchers, 77% either agreed or weakly agreed that private AI firms have too much influence,
41% said that NLP should be regulated, and 73% felt that AI could soon lead to revolutionary societal
change. These were some of the many strong opinions held by the NLP research community.
Artificial Intelligence
Index Report 2023
Artificial Intelligence
Index Report 2023
CHAPTER 1:
Research and
Development
CHAPTER 1 PREVIEW:
By Geographic Area 38
Citations 39 1.3 AI Conferences 64
Overview 40
1.4 Open-Source AI Software 66
By Region 41
Projects 66
By Geographic Area 42
Stars 68
Citations 43
Narrative Highlight: ACCESS THE PUBLIC DATA
Top Publishing Institutions 44
All Fields 44
Overview
This chapter captures trends in AI R&D. It begins by examining AI publications,
including journal articles, conference papers, and repositories. Next it considers data
on significant machine learning systems, including large language and multimodal
models. Finally, the chapter concludes by looking at AI conference attendance and
open-source AI research. Although the United States and China continue to dominate
AI R&D, research efforts are becoming increasingly geographically dispersed.
Chapter Highlights
The United States and China Industry races ahead
had the greatest number of of academia.
cross-country collaborations in AI Until 2014, most significant machine
publications from 2010 to 2021, learning models were released by
academia. Since then, industry has taken
although the pace of collaboration
over. In 2022, there were 32 significant
has since slowed. industry-produced machine learning
The number of AI research collaborations between
models compared to just three produced
the United States and China increased roughly 4
by academia. Building state-of-the-art
times since 2010, and was 2.5 times greater than the
AI systems increasingly requires large
collaboration totals of the next nearest country pair,
amounts of data, computer power, and
the United Kingdom and China. However, the total
money—resources that industry actors
number of U.S.-China collaborations only increased
inherently possess in greater amounts
by 2.1% from 2020 to 2021, the smallest year-over-
compared to nonprofits and academia.
year growth rate since 2010.
This section draws on data from the Center for Security and Emerging Technology (CSET) at Georgetown University. CSET maintains a
merged corpus of scholarly literature that includes Digital Science’s Dimensions, Clarivate’s Web of Science, Microsoft Academic Graph,
China National Knowledge Infrastructure, arXiv, and Papers With Code. In that corpus, CSET applied a classifier to identify English-
language publications related to the development or application of AI and ML since 2010. For this year’s report, CSET also used select
Chinese AI keywords to identify Chinese-language AI papers; CSET did not deploy this method for previous iterations of the AI Index report.1
In last year’s edition of the report, publication trends were reported up to the year 2021. However, given that there is a significant lag in the
collection of publication metadata, and that in some cases it takes until the middle of any given year to fully capture the previous year’s
publications, in this year’s report, the AI Index team elected to examine publication trends only through 2021, which we, along with CSET,
are confident yields a more fully representative report.
1.1 Publications
Overview publication and citation data by region for AI journal
articles, conference papers, repositories, and patents.
The figures below capture the total number
of English-language and Chinese-language AI Total Number of AI Publications
publications globally from 2010 to 2021—by type, Figure 1.1.1 shows the number of AI publications in
affiliation, cross-country collaboration, and cross- the world. From 2010 to 2021, the total number of
industry collaboration. The section also breaks down AI publications more than doubled, growing from
200,000 in 2010 to almost 500,000 in 2021.
400
Number of AI Publications (in Thousands)
300
200
100
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 1.1.1
1 See the Appendix for more information on CSET’s methodology. For more on the challenge of defining AI and correctly capturing relevant bibliometric data, see the AI Index team’s
discussion in the paper “Measurement in AI Policy: Opportunities and Challenges.”
300
293.48, Journal
270
240
Number of AI Publications (in Thousands)
210
180
150
120
90
85.09, Conference
65.21, Repository
60
29.88, Thesis
30
13.77, Book Chapter
5.82, Unknown
0 2.76, Book
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 1.1.2
50
Number of AI Publications (in Thousands)
21.53, Algorithm
20 19.18, Data Mining
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 1.1.3
By Sector
This section shows the number of AI publications 1.1.5).2 The education sector dominates in each region.
affiliated with education, government, industry, The level of industry participation is highest in the
nonprofit, and other sectors—first globally (Figure United States, then in the European Union. Since
1.1.4), then looking at the United States, China, and 2010, the share of education AI publications has been
the European Union plus the United Kingdom (Figure dropping in each region.
80%
75.23%, Education
70%
60%
AI Publications (% of Total)
50%
40%
30%
20%
13.60%, Nonpro t
10%
7.21%, Industry
3.74%, Government
0% 0.22%, Other
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 1.1.4
2 The categorization is adapted based on the Global Research Identifier Database (GRID). Healthcare, including hospitals and facilities, is included under nonprofit. Publications affiliated with
state-sponsored universities are included in the education sector.
69.17%
Education 69.23%
77.85%
14.82%
Nonpro t 18.63%
11.73%
12.60%
Industry 7.90%
5.47%
3.21%
Government 3.92%
4.74%
Cross-Country Collaboration
Cross-border collaborations between academics, By far, the greatest number of collaborations in the
researchers, industry experts, and others are a key past 12 years took place between the United States
component of modern STEM (science, technology, and China, increasing roughly four times since 2010.
engineering, and mathematics) development that However the total number of U.S.-China collaborations
accelerate the dissemination of new ideas and the only increased by 2.1% from 2020 to 2021, the smallest
growth of research teams. Figures 1.1.6 and 1.1.7 depict year-over-year growth rate since 2010.
the top cross-country AI collaborations from 2010
The next largest set of collaborations was between
to 2021. CSET counted cross-country collaborations
the United Kingdom and both China and the United
as distinct pairs of countries across authors for each
States. In 2021, the number of collaborations between
publication (e.g., four U.S. and four Chinese-affiliated
the United States and China was 2.5 times greater
authors on a single publication are counted as one
than between the United Kingdom and China.
U.S.-China collaboration; two publications between
the same authors count as two collaborations).
10.47
10
Number of AI Publications (in Thousands)
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 1.1.6
3
2.80, China and Australia
2.61, United States and Australia
2
1.83, United States and France
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 1.1.7
Cross-Sector Collaboration
The increase in AI research outside of academia has educational institutions (12,856); and educational
broadened and grown collaboration across sectors and government institutions (8,913). Collaborations
in general. Figure 1.1.8 shows that in 2021 educational between educational institutions and industry have
institutions and nonprofits (32,551) had the greatest been among the fastest growing, increasing 4.2 times
number of collaborations; followed by industry and since 2010.
30
Number of AI Publications (in Thousands)
25
20
15
12.86, Industry and Education
10
8.91, Education and Government
5
2.95, Government and Nonpro t
2.26, Industry and Nonpro t
0.63, Industry and Government
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 1.1.8
AI Journal Publications
Overview
After growing only slightly from 2010 to 2015, the number of AI journal publications grew around 2.3 times since
2015. From 2020 to 2021, they increased 14.8% (Figure 1.1.9).
300 293.48
Number of AI Journal Publications (in Thousands)
250
200
150
100
50
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 1.1.9
By Region3
Figure 1.1.10 shows the share of AI journal publications East Asia and the Pacific; Europe and Central Asia;
by region between 2010 and 2021. In 2021, East Asia as well as North America have been declining.
and the Pacific led with 47.1%, followed by Europe During that period, there has been an increase in
and Central Asia (17.2%), and then North America publications from other regions such as South Asia;
(11.6%). Since 2019, the share of publications from and the Middle East and North Africa.
50%
47.14%, East Asia and Paci c
40%
AI Journal Publications (% of World Total)
30%
20%
17.20%, Europe and Central Asia
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 1.1.10
3 Regions in this chapter are classified according to the World Bank analytical grouping.
By Geographic Area4
Figure 1.1.11 breaks down the share of AI journal throughout, with 39.8% in 2021, followed by the
publications over the past 12 years by geographic European Union and the United Kingdom (15.1%),
area. This year’s AI Index included India in recognition then the United States (10.0%). The share of Indian
of the increasingly important role it plays in the publications has been steadily increasing—from 1.3%
AI ecosystem. China has remained the leader in 2010 to 5.6% in 2021.
30%
6.88%, Unknown
5.56%, India
0%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 1.1.11
4 In this chapter we use “geographic area” based on CSET’s classifications, which are disaggregated not only by country, but also by territory. Further, we count the European Union and the
United Kingdom as a single geographic area to reflect the regions’ strong history of research collaboration.
Citations
China’s share of citations in AI journal publications 1.1.12). China, the European Union and the United
has gradually increased since 2010, while those of the Kingdom, and the United States accounted for 65.7%
European Union and the United Kingdom, as well as of the total citations in the world.
those of the United States, have decreased (Figure
30%
29.07%, China
27.37%, Rest of the World
25%
AI Journal Citations (% of World Total)
10%
6.05%, India
5%
0.92%, Unknown
0%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 1.1.12
AI Conference Publications
Overview
The number of AI conference publications peaked in 2019, and fell 20.4% below the peak in 2021 (Figure 1.1.13).
The total number of 2021 AI conference publications, 85,094, was marginally greater than the 2010 total of 75,592.
100
Number of AI Conference Publications (in Thousands)
85.09
80
60
40
20
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 1.1.13
By Region
Figure 1.1.14 shows the number of AI conference East Asia and the Pacific continues to rise, accounting
publications by region. As with the trend in journal for 36.7% in 2021, followed by Europe and Central
publications, East Asia and the Pacific; Europe Asia (22.7%), and then North America (19.6%). The
and Central Asia; and North America account for percentage of AI conference publications in South Asia
the world’s highest numbers of AI conference saw a noticeable rise in the past 12 years, growing from
publications. Specifically, the share represented by 3.6% in 2010 to 8.5% in 2021.
40%
30%
25%
22.66%, Europe and Central Asia
20% 19.56%, North America
15%
10%
8.45%, South Asia
3.82%, Middle East and North Africa
5% 3.07%, Latin America and the Caribbean
2.76%, Unknown
2.35%, Rest of the World
0% 0.60%, Sub-Saharan Africa
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 1.1.14
By Geographic Area
In 2021, China produced the greatest share of the came in third at 17.2% (Figure 1.1.15). Mirroring trends
world’s AI conference publications at 26.2%, having seen in other parts of the research and development
overtaken the European Union and the United section, India’s share of AI conference publications is
Kingdom in 2017. The European Union plus the United also increasing.
Kingdom followed at 20.3%, and the United States
30%
26.15%, China
25%
15%
10%
6.79%, India
5%
2.70%, Unknown
0%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 1.1.15
Citations
Despite China producing the most AI conference conference citations, with 23.9%, followed by China’s
publications in 2021, Figure 1.1.16 shows that 22.0%. However, the gap between American and
the United States had the greatest share of AI Chinese AI conference citations is narrowing.
35%
30%
AI Conference Citations (% of World Total)
15%
10%
6.09%, India
5%
0.87%, Unknown
0%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 1.1.16
AI Repositories
Overview
Publishing pre-peer-reviewed papers on repositories share their findings before submitting them to journals
of electronic preprints (such as arXiv and SSRN) and conferences, thereby accelerating the cycle of
has become a popular way for AI researchers to information discovery. The number of AI repository
disseminate their work outside traditional avenues for publications grew almost 27 times in the past 12 years
publication. These repositories allow researchers to (Figure 1.1.17).
60
Number of AI Repository Publications (in Thousands)
50
40
30
20
10
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 1.1.17
By Region
Figure 1.1.18 shows that North America has by East Asia and the Pacific has grown significantly
maintained a steady lead in the world share of AI since 2010 and continued growing from 2020 to
repository publications since 2016. Since 2011, the 2021, a period in which the year-over-year share of
share of repository publications from Europe and North American as well European and Central Asian
Central Asia has declined. The share represented repository publications declined.
30%
10%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 1.1.18
By Geographic Area
While the United States has held the lead in the (Figure 1.1.19). In 2021, the United States accounted
percentage of global AI repository publications since for 23.5% of the world’s AI repository publications,
2016, China is catching up, while the European Union followed by the European Union plus the United
plus the United Kingdom’s share continues to drop Kingdom (20.5%), and then China (11.9%).
30%
11.87%, China
10%
2.85%, India
0%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 1.1.19
Citations
In the citations of AI repository publications, Figure a dominant lead over the European Union plus the
1.1.20 shows that in 2021 the United States topped United Kingdom (21.5%), as well as China (21.0%).
the list with 29.2% of overall citations, maintaining
30%
29.22%, United States
10%
4.59%, Unknown
1.91%, India
0%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 1.1.20
Narrative Highlight:
Top Publishing Institutions
All Fields University, the University of the Chinese Academy
Since 2010, the institution producing the greatest of Sciences, Shanghai Jiao Tong University,
number of total AI papers has been the Chinese and Zhejiang University.5 The total number of
Academy of Sciences (Figure 1.1.21). The next publications released by each of these institutions
top four are all Chinese universities: Tsinghua in 2021 is displayed in Figure 1.1.22.
Top Ten Institutions in the World in 2021 Ranked by Number of AI Publications in All Fields, 2010–21
Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report
2 2, Tsinghua University
5 5, Zhejiang University
Rank
7 7, Beihang University
9 9, Peking University
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 1.1.21
5 It is important to note that many Chinese research institutions are large, centralized organizations with thousands of researchers. It is therefore not entirely surprising that,
purely by the metric of publication count, they outpublish most non-Chinese institutions.
Narrative Highlight:
Top Publishing Institutions (cont’d)
Top Ten Institutions in the World by Number of AI Publications in All Fields, 2021
Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report
Massachusetts Institute of
1,745
Technology
0 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 4,500 5,000
Number of AI Publications
Figure 1.1.22
Narrative Highlight:
Top Publishing Institutions (cont’d)
Computer Vision
In 2021, the top 10 institutions publishing the greatest number of AI computer vision publications were
all Chinese (Figure 1.1.23). The Chinese Academy of Sciences published the largest number of such
publications, with a total of 562.
Top Ten Institutions in the World by Number of AI Publications in Computer Vision, 2021
Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report
Figure 1.1.23
Narrative Highlight:
Top Publishing Institutions (cont’d)
Natural Language Processing
American institutions are represented to a took second place (140 publications), followed by
greater degree in the share of top NLP publishers Microsoft (134). In addition, 2021 was the first year
(Figure 1.1.24). Although the Chinese Academy of Amazon and Alibaba were represented among the
Sciences was again the world’s leading institution top-ten largest publishing NLP institutions.
in 2021 (182 publications), Carnegie Mellon
Top Ten Institutions in the World by Number of AI Publications in Natural Language Processing, 2021
Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190
Number of AI Publications
Figure 1.1.24
Narrative Highlight:
Top Publishing Institutions (cont’d)
Speech Recognition
In 2021, the greatest number of speech recognition papers came from the Chinese Academy of Sciences
(107), followed by Microsoft (98) and Google (75) (Figure 1.1.25). The Chinese Academy of Sciences
reclaimed the top spot in 2021 from Microsoft, which held first position in 2020.
Top Ten Institutions in the World by Number of AI Publications in Speech Recognition, 2021
Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report
Tsinghua University 61
University of Science
59
and Technology of China
Tencent (China) 57
0 10 20 30 40 50 60 70 80 90 100 110
Number of AI Publications
Figure 1.1.25
Epoch AI is a collective of researchers investigating and forecasting the development of advanced AI. Epoch curates a database of
significant AI and machine learning systems that have been released since the 1950s. There are different criteria under which the
Epoch team decides to include particular AI systems in their database; for example, the system may have registered a state-of-the-art
improvement, been deemed to have been historically significant, or been highly cited.
This subsection uses the Epoch database to track trends in significant AI and machine learning systems. The latter half of the chapter
includes research done by the AI Index team that reports trends in large language and multimodal models, which are models trained on
large amounts of data and adaptable to a variety of downstream applications.
The figures below report trends among all machine was language (Figure 1.2.1). There were 23 significant
learning systems included in the Epoch dataset. For AI language systems released in 2022, roughly six
reference, these systems are referred to as significant times the number of the next most common system
Language 23
Multimodal 4
Drawing 3
Vision 2
Speech 2
Text-to-Video 1
Other 1
Games 1
0 2 4 6 8 10 12 14 16 18 20 22 24
Number of Signi cant Machine Learning Systems
Figure 1.2.16
6 There were 38 total significant AI machine learning systems released in 2022, according to Epoch; however, one of the systems, BaGuaLu, did not have a domain classification
and is therefore omitted from Figure 1.2.1.
Sector Analysis
Which sector among industry, academia, or nonprofit machine learning systems compared to just three
has released the greatest number of significant produced by academia. Producing state-of-the-art
machine learning systems? Until 2014, most machine AI systems increasingly requires large amounts of
learning systems were released by academia. data, computing power, and money; resources that
Since then, industry has taken over (Figure 1.2.2). In industry actors possess in greater amounts compared
2022, there were 32 significant industry-produced to nonprofits and academia.
35
32, Industry
Number of Signi cant Machine Learning Systems
30
25
20
15
10
5
3, Academia
2, Research Collective
1, Industry-Academia Collaboration
0 0, Nonpro t
2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 2022
Figure 1.2.2
National Affiliation
In order to paint a picture of AI’s evolving or AI-research firm, was headquartered. In 2022,
geopolitical landscape, the AI Index research the United States produced the greatest number
team identified the nationality of the authors who of significant machine learning systems with 16,
contributed to the development of each significant followed by the United Kingdom (8) and China (3).
machine learning system in the Epoch dataset. 7 Moreover, since 2002 the United States has outpaced
the United Kingdom and the European Union, as well
Systems
as China, in terms of the total number of significant
Figure 1.2.3 showcases the total number of
machine learning systems produced (Figure 1.2.4).
significant machine learning systems attributed to
Figure 1.2.5 displays the total number of significant
researchers from particular countries.8 A researcher
machine learning systems produced by country since
is considered to have belonged to the country in
2002 for the entire world.
which their institution, for example a university
Number of Significant Machine Learning Systems by Number of Significant Machine Learning Systems by
Country, 2022 Select Geographic Area, 2002–22
Source: Epoch and AI Index, 2022 | Chart: 2023 AI Index Report Source: Epoch and AI Index, 2022 | Chart: 2023 AI Index Report
United States 16
Number of Significant Machine Learning Systems
30
United Kingdom 8
China 3 25
Canada 2
20
Germany 2
16, United States
15
France 1
12, European Union and
United Kingdom
India 1 10
Israel 1
5
Russia 1 3, China
0
Singapore 1
2002
2004
2006
2008
2010
2012
2014
2016
2018
2020
2022
0 2 4 6 8 10 12 14 16
Number of Signi cant Machine Learning Systems
Figure 1.2.3 Figure 1.2.4
7 The methodology by which the AI Index identified authors’ nationality is outlined in greater detail in the Appendix.
8 A machine learning system is considered to be affiliated with a particular country if at least one author involved in creating the model was affiliated with that country.
Consequently, in cases where a system has authors from multiple countries, double counting may occur.
0
1–10
11–20
21–60
61–255
Figure 1.2.5
Authorship
Figures 1.2.6 to 1.2.8 look at the total number of in 2022 the United States had the greatest number of
authors, disaggregated by national affiliation, that authors producing significant machine learning systems,
contributed to the launch of significant machine with 285, more than double that of the United Kingdom
learning systems. As was the case with total systems, and nearly six times that of China (Figure 1.2.6).
Number of Authors of Significant Machine Learning Number of Authors of Significant Machine Learning
Systems by Country, 2022 Systems by Select Geographic Area, 2002–22
Source: Epoch and AI Index, 2022 | Chart: 2023 AI Index Report Source: Epoch and AI Index, 2022 | Chart: 2023 AI Index Report
400
United States 285
China 49 300
285, United States
Canada 21 Number of Authors 250
Israel 13
200
Sweden 8 155, European Union and
150 United Kingdom
Germany 7
100
Russia 3
50 49, China
India 2
France 1 0
2004
2006
2008
2010
2012
2014
2016
2018
2020
2022
Number of Authors
Figure 1.2.6 Figure 1.2.7
0
1–10
11–20
21–60
61–180
181–370
371–680
Figure 1.2.8
681–2000
Parameter Trends
Parameters are numerical values that are learned by dataset by sector. Over time, there has been a steady
machine learning models during training. The value of increase in the number of parameters, an increase that
parameters in machine learning models determines has become particularly sharp since the early 2010s.
how a model might interpret input data and make The fact that AI systems are rapidly increasing their
predictions. Adjusting parameters is an essential parameters is reflective of the increased complexity of
step in ensuring that the performance of a machine the tasks they are being asked to perform, the greater
learning system is optimized. availability of data, advancements in underlying
hardware, and most importantly, the demonstrated
Figure 1.2.9 highlights the number of parameters of
performance of larger models.
the machine learning systems included in the Epoch
1.0e+12
Number of Parameters (Log Scale)
1.0e+10
1.0e+8
1.0e+6
1.0e+4
1.0e+2
1950 1954 1958 1962 1966 1970 1974 1978 1982 1986 1990 1994 1998 2002 2006 2010 2014 2018 2022
Figure 1.2.9
Figure 1.2.10 demonstrates the parameters of machine learning systems by domain. In recent years, there has
been a rise in parameter-rich systems.
1.0e+10
Number of Parameters (Log Scale)
1.0e+8
1.0e+6
1.0e+4
1.0e+2
1954 1958 1962 1966 1970 1974 1978 1982 1986 1990 1994 1998 2002 2006 2010 2014 2018 2022
Figure 1.2.10
Compute Trends
machine learning systems has increased exponentially
The computational power, or “compute,” of AI
in the last half-decade (Figure 1.2.11).9 The growing
systems refers to the amount of computational
demand for compute in AI carries several important
resources needed to train and run a machine
implications. For example, more compute-intensive
learning system. Typically, the more complex a
models tend to have greater environmental impacts,
system is, and the larger the dataset on which it is
and industrial players tend to have easier access
trained, the greater the amount of compute required.
to computational resources than others, such as
The amount of compute used by significant AI universities.
1.0e+21
Training Compute (FLOP/s – Log Scale)
1.0e+18
1.0e+15
1.0e+12
1.0e+9
1.0e+6
1.0e+3
1.0e+0
1950 1954 1958 1962 1966 1970 1974 1978 1982 1986 1990 1994 1998 2002 2006 2010 2014 2018 2022
Figure 1.2.11
9 FLOP/s stands for “Floating Point Operations per second” and is a measure of the performance of a computational device.
Since 2010, it has increasingly been the case that of all machine learning systems, language models are
demanding the most computational resources.
1.0e+21
Training Compute (FLOP/s – Log Scale)
1.0e+18
1.0e+15
1.0e+12
1.0e+9
1.0e+6
1.0e+3
1954 1958 1962 1966 1970 1974 1978 1982 1986 1990 1994 1998 2002 2006 2010 2014 2018 2022
Figure 1.2.12
Large Language and are starting to be widely deployed in the real world.
Authors of Select Large Language and Multimodal Models (% of Total) by Country, 2019–22
Source: Epoch and AI Index, 2022 | Chart: 2023 AI Index Report
100%
Authors of Large Language and Multimodal Models (% of Total)
80%
60%
54.02%, United States
40%
Figure 1.2.14 offers a timeline view of the large PaLM (540B). The only Chinese large language and
language and multimodal models that have been multimodal model released in 2022 was GLM-130B,
released since GPT-2, along with the national an impressive bilingual (English and Chinese) model
affiliations of the researchers who produced the created by researchers at Tsinghua University. BLOOM,
models. Some of the notable American large also launched in late 2022, was listed as indeterminate
language and multimodal models released in given that it was the result of a collaboration of more
2022 included OpenAI’s DALL-E 2 and Google’s than 1,000 international researchers.
10 The AI models that were considered to be large language and multimodal models were hand-selected by the AI Index steering committee. It is possible that this selection may have omitted
certain models.
Timeline and National Affiliation of Select Large Language and Multimodal Model Releases
Source: AI Index, 2022 | Chart: 2023 AI Index Report
2023-Jan
BLOOM
2022-Oct
GLM-130B
2022-Jul Minerva (540B)
Imagen
Jurassic-X OPT-175B
2022-Apr Stable Diffusion (LDM-KL-8-G) PaLM (540B) DALL·E 2 Chinchilla
GPT-NeoX-20B InstructGPT AlphaCode
2022-Jan
Gopher
2020-Oct
ERNIE-GEN (large)
2020-Jul
GPT-3 175B (davinci)
2020-Apr
Turing NLG
Meena
2020-Jan United States Canada
United Kingdom Israel
T5-11B T5-3B China Germany
2019-Oct Megatron-LM (Original, 8.3B) United States, Indeterminate
United Kingdom,
Germany, India
2019-Jul Korea
Grover-Mega
2019-Apr
GPT-2
2019-Jan
Figure 1.2.1411
11 While we were conducting the analysis to produce Figure 1.2.14, Irene Solaiman published a paper that has a similar analysis. We were not aware of the paper at the time of our research.
Parameter Count
Over time, the number of parameters of newly released Google in 2022, had 540 billion, nearly 360 times
large language and multimodal models has massively more than GPT-2. The median number of parameters
increased. For example, GPT-2, which was the first in large language and multimodal models is increasing
large language and multimodal model released in 2019, exponentially over time (Figure 1.2.15).
only had 1.5 billion parameters. PaLM, launched by
3.2e+12
Wu Dao 2.0
1.0e+12
Megatron-Turing NLG 530B
Minerva (540B)
Gopher PaLM (540B)
3.2e+11 HyperClova
Number of Parameters (Log Scale)
1.0e+9 Grover-Mega
ERNIE-GEN (large)
3.2e+8
20
20
20
20
20
20 9-S
20
20 0-J
20
20
20
20
20 -M
20 1-A ar
20 1-M r
20 1-J ay
20 1-J n
20
20
20
2022-F
20 2-M b
20 2-A ar
20 2-M r
19
19
1
19 ep
2
20 an
20
20
21
21
2
2 p
2
2 u
21 ul
21
21
2 e
2
2 p
22 a
22
22
-J
-A
-O
-D
-F
-M
-O
-J y
-A
-N
-F
-M
-A
an
eb
ug
un
ec
eb
ct
ug
ct
ug
ov
ay
ay
Figure 1.2.15
Training Compute
The training compute of large language and multimodal reasoning problems, was roughly nine times greater
models has also steadily increased (Figure 1.2.16). The than that used for OpenAI’s GPT-3, which was
compute used to train Minerva (540B), a large language released in June 2022, and roughly 1839 times greater
and multimodal model released by Google in June than that used for GPT-2 (released February 2019).
2022 that displayed impressive abilities on quantitative
Training Compute (FLOP/s) of Select Large Language and Multimodal Models, 2019–22
Source: Epoch, 2022 | Chart: 2023 AI Index Report
PaLM (540B)
3.2e+24 Megatron-Turing NLG 530B Minerva (540B)
Gopher OPT-175B
1.0e+24
GPT-3 175B (davinci) Jurassic-1-Jumbo
Chinchilla BLOOM
3.2e+23
Training Compute (FLOP/s – Log Scale)
Meena GPT-NeoX-20B
1.0e+23 DALL-E PanGu-α
T5-11B HyperClova Stable Diffusion
3.2e+22 Turing NLG CogView AlphaCode GLM-130B
T5-3B
1.0e+22 GPT-J-6B
Megatron-LM (Original, 8.3B) GPT-Neo
3.2e+21 GPT-2
1.0e+21 Wu Dao - Wen Yuan
3.2e+20
1.0e+20
3.2e+19
1.0e+19
ERNIE 3.0
3.2e+18
1.0e+18
20
20
20
20 9-S
20
20 0-J
20
20
20
20 -M
20 1-A ar
20
20 1-J
20
20
20
2022-F
20 2-M b
20 2-A ar
20 -M
20
19
1
19 ep
22
2
20 an
20
21
21
2
21 pr
2
21 ul
21
21
2 e
2
22 pr
22 a
22
-J
-M
-A
-O
-D
-F
-O
-J y
-A
-N
-F
-M
an
eb
ug
un
ec
eb
ct
ug
ct
ov
ay
ay
Figure 1.2.16
Training Cost
A particular theme of the discourse around large estimate with the tag of mid, high, or low: mid where
language and multimodal models has to do with their the estimate is thought to be a mid-level estimate,
hypothesized costs. Although AI companies rarely speak high where it is thought to be an overestimate, and
openly about training costs, it is widely speculated that low where it is thought to be an underestimate. In
these models cost millions of dollars to train and will certain cases, there was not enough data to estimate
become increasingly expensive with scale. the training cost of particular large language and
multimodal models, therefore these models were
This subsection presents novel analysis in which the
omitted from our analysis.
AI Index research team generated estimates for the
training costs of various large language and multimodal The AI Index estimates validate popular claims that
models (Figure 1.2.17). These estimates are based on the large language and multimodal models are increasingly
hardware and training time disclosed by the models’ costing millions of dollars to train. For example,
authors. In cases where training time was not disclosed, Chinchilla, a large language model launched by
we calculated from hardware speed, training compute, DeepMind in May 2022, is estimated to have cost $2.1
and hardware utilization efficiency. Given the possible million, while BLOOM’s training is thought to have cost
variability of the estimates, we have qualified each $2.3 million.
10
8.55
8.01
Training Cost
4
1.97 2.11 2.29
1.47 1.80 1.69
2 1.03
0.23 0.02 0.09 0.43 0.27 0.24 0.60
0.05 0.11 0.01 0.14 0.09 0.16
0
GPT-2
T5-11B
Meena
Turing NLG
GPT-3 175B
DALL-E
GPT-Neo
GPT-J-6B
HyperClova
ERNIE 3.0
Codex
Gopher
AlphaCode
GPT-NeoX-20B
Chinchilla
PaLM (540B)
OPT-175B
Minerva (540B)
GLM-130B
BLOOM
Figure 1.2.17
12 See Appendix for the complete methodology behind the cost estimates.
There is also a clear relationship between the cost of large language and multimodal models and their size.
As evidenced in Figures 1.2.18 and 1.2.19, the large language and multimodal models with a greater number of
parameters and that train using larger amounts of compute tend to be more expensive.
Estimated Training Cost of Select Large Language Estimated Training Cost of Select Large Language and
and Multimodal Models and Number of Parameters Multimodal Models and Training Compute (FLOP/s)
Source: AI Index, 2022 | Chart: 2023 AI Index Report Source: AI Index, 2022 | Chart: 2023 AI Index Report
GPT-NeoX-20B GPT-2
Turing NLG
2.0e+10 Wu Dao - Wen Yuan
Codex T5-11B
ERNIE 3.0 DALL-E
1.0e+10
GPT-J-6B
1.0e+20
5.0e+9
Wu Dao - Wen Yuan Meena
GPT-Neo
2.0e+9 GPT-2 Stable Di usion ERNIE 3.0
1.0e+9 1.0e+18
10k 100k 1M 10M 10k 100k 1M 10M
Training Cost (in U.S. Dollars - Log Scale) Training Cost (in U.S. Dollars - Log Scale)
Figure 1.2.18 Figure 1.2.19
AI conferences are key venues for researchers to share their work and connect with peers and collaborators. Conference attendance is an
indication of broader industrial and academic interest in a scientific field. In the past 20 years, AI conferences have grown in size, number,
and prestige. This section presents data on the trends in attendance at major AI conferences.
1.3 AI Conferences
Conference Attendance International Conference on Principles of Knowledge
Representation and Reasoning (KR) were both held
After a period of increasing attendance, the total strictly in-person.
attendance at the conferences for which the AI
Index collected data dipped in 2021 and again in Neural Information Processing Systems (NeurIPS)
2022 (Figure 1.3.1).13 This decline may be attributed continued to be one of the most attended
to the fact that many conferences returned to hybrid conferences, with around 15,530 attendees (Figure
or in-person formats after being fully virtual in 1.3.2).14 The conference with the greatest one-
2020 and 2021. For example, the International Joint year increase in attendance was the International
Conference on Artificial Intelligence (IJCAI) and the Conference on Robotics and Automation (ICRA),
from 1,000 in 2021 to 8,008 in 2022.
80
70
Number of Attendees (in Thousands)
60 59.45
50
40
30
20
10
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 1.3.1
13 This data should be interpreted with caution given that many conferences in the last few years have had virtual or hybrid formats. Conference organizers report that
measuring the exact attendance numbers at virtual conferences is difficult, as virtual conferences allow for higher attendance of researchers from around the world.
14 In 2021, 9,560 of the attendees attended NeurIPS in-person and 5,970 remotely.
30
25
Number of Attendees (in Thousands)
20
15.53, NeurIPS
15
10 10.17, CVPR
8.01, ICRA
7.73, ICML
5.35, ICLR
5 4.32, IROS
3.56, AAAI
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 1.3.2
3.50
3.00
Number of Attendees (in Thousands)
2.50
1.50
1.09, FaccT
1.00
0.66, UAI
0.50 0.50, AAMAS
0.39, ICAPS
0.12, KR
0.00
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 1.3.3
GitHub is a web-based platform where individuals and coding teams can host, review, and collaborate on various code repositories.
GitHub is used extensively by software developers to manage and share code, collaborate on various projects, and support open-source
software. This subsection uses data provided by GitHub and the OECD.AI policy observatory. These trends can serve as a proxy for some
of the broader trends occuring in the world of open-source AI software not captured by academic publication data.
350 348
300
Number of AI Projects (in Thousands)
250
200
150
100
50
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 1.4.1
As of 2022, a large proportion of GitHub AI projects United Kingdom (17.3%), and then the United States
were contributed by software developers in India (14.0%). The share of American GitHub AI projects
(24.2%) (Figure 1.4.2). The next most represented has been declining steadily since 2016.
geographic area was the European Union and the
35%
30%
AI Projects (% of Total)
25%
24.19%, India
20%
17.30%, European Union and United Kingdom
15%
14.00%, United States
10%
5%
2.40%, China
0%
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 1.4.2
Stars
GitHub users can bookmark or save a repository Figure 1.4.3 shows the cumulative number of stars
of interest by “starring” it. A GitHub star is similar attributed to projects belonging to owners of various
to a “like” on a social media platform and indicates geographic areas. As of 2022, GitHub AI projects
support for a particular open-source project. Some of from the United States received the most stars,
the most starred GitHub repositories include libraries followed by the European Union and the United
like TensorFlow, OpenCV, Keras, and PyTorch, which Kingdom, and then China. In many geographic areas,
are widely used by software developers in the AI the total number of new GitHub stars has leveled off
coding community. in the last few years.
3.00
Number of Cumulative GitHub Stars (in Millions)
2.00
1.00
0.00
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 1.4.3
CHAPTER 2:
Technical
Performance
Artificial Intelligence
Index Report 2023
CHAPTER 2 PREVIEW:
Technical Performance
Overview 72 Narrative Highlight: A Closer Look at
Chapter Highlights 73 Progress in Image Generation 90
Visual Reasoning 92
Technical Performance
Natural Language Inference 105 2.7 Hardware 115
Abductive Natural Language MLPerf Training Time 115
Inference (aNLI) 105 MLPerf Inference 117
Sentiment Analysis 106 Trends in GPUs 118
SST-5 Fine-Grained Classification 106
Multitask Language Understanding 107 2.8 Environment 120
Massive Multitask Language Environmental Impact of
Understanding (MMLU) 107 Select Large Language Models 120
Machine Translation (MT) 108 Narrative Highlight: Using AI to
Number of Commercially Available Optimize Energy Usage 122
MT Systems 108
Overview
This year’s technical performance chapter features analysis of the technical progress in
AI during 2022. Building on previous reports, this chapter chronicles advancement in
computer vision, language, speech, reinforcement learning, and hardware. Moreover,
this year this chapter features an analysis on the environmental impact of AI, a discussion
of the ways in which AI has furthered scientific progress, and a timeline-style overview
of some of the most significant recent AI developments.
Chapter Highlights
Performance saturation on Generative AI breaks into
traditional benchmarks. the public consciousness.
AI continued to post state-of-the-art results, 2022 saw the release of text-to-image models
but year-over-year improvement on many like DALL-E 2 and Stable Diffusion, text-to-
benchmarks continues to be marginal. video systems like Make-A-Video, and chatbots
Moreover, the speed at which benchmark like ChatGPT. Still, these systems can be
saturation is being reached is increasing. prone to hallucination, confidently outputting
However, new, more comprehensive incoherent or untrue responses, making it hard
benchmarking suites such as BIG-bench and to rely on them for critical applications.
HELM are being released.
The technical performance chapter begins with an overview of some of the most significant technical developments in AI during 2022,
as selected by the AI Index Steering Committee.
Figure 2.1.3
Figure 2.1.4
Figure 2.1.6
Figure 2.1.7
Figure 2.1.9
Figure 2.1.11
Meta Announces
July 11, 2022
‘No Language Left Behind’
No Language Left Behind (NLLB) is
a family of models that can translate
across 200 distinct languages. NLLB is
one of the first systems that can perform
well across a wide range of low-resource
languages like Kamba and Lao.
Figure 2.1.12
Figure 2.1.14
Figure 2.1.16
Figure 2.1.17
knowledge to improve.
International Research
Nov 9, 2022
Group Releases BLOOM
A collaboration of over 100
researchers from across the
globe develop an open-access
language model called BLOOM.
BLOOM impresses with its
Figure 2.1.19
public release and for furthering
the possibilities of international
collaboration in AI research.
Figure 2.1.20
Computer vision is the subfield of AI that teaches machines to understand images and videos. Computer vision technologies have a
variety of important real-world applications, such as autonomous driving, crowd surveillance, sports analytics, and video-game creation.
This section tracks progress in computer vision across several different task domains which include: (1) image classification, (2)
face detection and recognition, (3) deepfake detection, (4) human pose estimation, (5) semantic segmentation, (6) medical image
segmentation, (7) object detection, (8) image generation, and (9) visual reasoning.
ImageNet
ImageNet is one of the most widely used
benchmarks for image classification. This dataset
includes over 14 million images across 20,000
different object categories such as “strawberry” or
“balloon.” Performance on ImageNet is measured
through various accuracy metrics. Top-1 accuracy
measures the degree to which the top prediction
generated by an image classification model for a
given image actually matches the image’s label.
85%
Top-1 Accuracy (%)
80%
75%
70%
65%
2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 2.2.2
Recognition
Facial detection and recognition is the ability of AI
systems to identify faces or individuals in images
or videos (Figure 2.2.3). Currently, many facial
recognition systems are able to successfully identify
close to 100% of faces, even on challenging datasets
(Figure 2.2.4).
Figure 2.2.3
National Institute of Standards and Technology (NIST) Face Recognition Vendor Test (FRVT):
Veri cation Accuracy by Dataset
Source: National Institute of Standards and Technology, 2022 | Chart: 2023 AI Index Report
1.0000
0.5000
0.2000
False Non-Match Rate: FMNR (Log-Scale)
0.1000
0.0500
0.0100
0.0050
0.0032, BORDER Photos @ FMR = 1e-6
0.0021, MUGSHOT Photos @ FMR = 1e-5
0.0020 0.0019, MUGSHOT Photos ≥ 12 YRS @ FMR = 1e-5
0.0016, VISABORDER Photos @ FMR = 1e-6
0.0010
Figure 2.2.4
National Institute of Standards and others. Facial detection capacity is measured by the
Technology Face Recognition Vendor Test false non-match rate (FNMR), otherwise known as
(FRVT) error rate, which is the rate at which a model fails to
Progress on facial recognition can be tracked match the face in an image to that of a person.
through the National Institute of Standards and
As of 2022, the top-performing models on all of the
Technology’s Face Recognition Vendor Test. This
FRVT datasets, with the exception of WILD Photos,
test tracks how well different facial recognition
each posted an error rate below 1%, and as low as a
algorithms perform on various homeland security
0.06% error rate on the VISA Photos dataset.
tasks, such as identification of child trafficking
victims and cross-verification of visa images, among
Celeb-DF
Celeb-DF is presently one of the most challenging
Figure 2.2.5
deepfake detection benchmarks. This dataset
is composed of 590 original celebrity YouTube algorithm on Celeb-DF came from researchers at
videos that have been manipulated into thousands Deakin University in Australia. Their JDFD model
of deepfakes. This year’s top deepfake detection posted an AUC score of 78 (Figure 2.2.6).
78.00
75
Area Under Curve Score (AUC)
70
65
Figure 2.2.6
MPII
MPII is a dataset of over 25,000 annotated
images which contains annotations of
more than 40,000 people doing 410 human
activities. On MPII, this year’s top model,
Figure 2.2.7
ViTPose, correctly estimated 94.3% of
keypoints (human joints), which represented
a small 0.2 percentage point increase from
the previous state-of-the-art result posted in
2020 (Figure 2.2.8).
90%
85%
Figure 2.2.8
Cityscapes Challenge,
Pixel-Level Semantic Labeling Task
The Cityscapes dataset is used to test the semantic
segmentation capabilities of AI. This dataset
contains 25,000 annotated images of diverse urban
Figure 2.2.9
environments. The Cityscapes dataset enables a
variety of different segmentation tasks. One of the
greater the mIoU, the better a system has performed.
most popular is the pixel-level task. Performance
on semantic segmentation is measured by mean Performance on Cityscapes has increased by 23.4
intersection-over-union (mIoU), which represents the percentage points since the competition launched in
degree to which the image segments predicted by the 2014; however, it has plateaued in the last few years
model overlap with the image’s actual segments. The (Figure 2.2.10).
80%
75%
70%
65%
Figure 2.2.10
Segmentation
In medical image segmentation, AI systems
segment objects such as lesions or organs in
medical images (Figure 2.2.11).
Kvasir-SEG
Kvasir-SEG is a dataset for medical image
Figure 2.2.11
segmentation that contains 1,000 high-
quality images of gastrointestinal polyps
that were manually identified by medical
professionals. Progress on Kvasir-SEG is
measured in mean Dice, which represents
the degree to which the polyp segments This year’s top-performing model on Kvasir-SEG, SEP, was
identified by AI systems overlap with the created by a Chinese researcher and posted a mean Dice of
actual polyp segments.1 94.1% (Figure 2.2.12).
94.11%
90%
Mean Dice
85%
Figure 2.2.12
1 Mean Dice and mIoU are in principle quite similar. This StackExchange post outlines the differences in more detail.
81.90%
80%
Mean Average Precision (mAP50)
70%
60%
Figure 2.2.14
35
30
Fréchet Inception Distance (FID) Score
25
20
15
10
6.91, STL-10
5
1.77, CIFAR-10
0
2017 2018 2019 2020 2021 2022
Figure 2.2.16
Narrative Highlight:
A Closer Look at Progress in Image Generation
Figure 2.2.17 tracks the progress of facial GAN Progress on Face Generation
Source: Goodfellow et al., 2014; Radford et al., 2016; Liu and Tuzel, 2016;
image generation over time, with the final Karras et al., 2018; Karras et al., 2019; Goodfellow, 2019; Karras et al., 2020;
image being generated by Diffusion-GAN, Vahdat et al., 2021; Wang et al., 2022.
2014 2015
2016 2017
2018
2020
2021
2022
Figure 2.2.17
In the last year, text-to-image Images Generated by DALL-E 2, Stable Diffusion and Midjourney
Source: AI Index, 2022
generation broke into the public
consciousness with the release
of models such as OpenAI’s
DALL-E 2, Stability AI’s Stable
Diffusion, Midjourney’s
Midjourney, Meta’s Make-A- a. DALL-E 2
Scene, and Google’s Imagen.
With these systems, users can
generate images based on
a text prompt. Figure 2.2.18
juxtaposes the images generated
by DALL-E 2, Stable Diffusion,
and Midjourney, three publicly
accessible AI text-to-image
systems, for the same prompt: “a
panda playing a piano on a warm
evening in Paris.”
b. Stable Diffusion c. Midjourney Figure 2.2.18
Narrative Highlight:
A Closer Look at Progress in Image Generation (cont’d)
Of all the recently released text-to-image generators, Google’s Imagen performs best on the
COCO benchmark (Figure 2.2.19)2. This year, the Google researchers who created Imagen
also released a more difficult text-to-image benchmark, DrawBench, designed to challenge
increasingly capable text-to-image models.
Notable Text-to-Image Models on MS-COCO 256 × 256 FID-30K: Fréchet Inception Distance (FID) Score
Source: Saharia et al., 2022 | Chart: 2023 AI Index Report
Trained on COCO-FID
35.49
COCO Fréchet Inception Distance (FID) Score
30
21.42 20.79
20 17.89
12.24
10.39
10 9.33
8.12 7.55 7.27
0
AttnGAN
DM-GAN
DF-GAN
DM-GAN + CL
DALL-E
GLIDE
XMC-GAN
LAFITE
DALL-E 2
Make-A-Scene
Imagen
Figure 2.2.19
2 The COCO benchmark, first launched in 2014, includes 328,000 images with 2.5 million labeled instances. Although it is typically used for object detection tasks, researchers
have also deployed it for image generation.
Visual Reasoning
Visual reasoning tests how well AI systems can reason across both textual and visual data,
as in the examples of Figure 2.2.20.
A Collection of
Visual Reasoning
Tasks
Source: Agrawal et al., 2016
Figure 2.2.20
85%
84.30%
80%
Accuracy (%)
75%
70%
65%
Figure 2.2.21
Narrative Highlight:
The Rise of Capable Multimodal Reasoning Systems
Traditionally AI has been strong in narrow tasks, models were introduced, for example BEiT-3 from
but it has been unable to easily generalize across Microsoft and PaLI from Google, that posted state-
multiple domains. For instance, many image of-the-art results across a variety of both vision and
classifiers are adept at classifying images but are language benchmarks. For example, at the time of
incapable of understanding written text. publication of the BEiT-3 paper, BEiT-3 posted state-
of-the-art results for four different vision skills and
However, recent technical progress in AI has
five different vision-language skills (Figure 2.2.22).
begun to challenge this notion. In 2022, several
BEiT-3Vs.
BEiT-3
BEiT-3 Vs. Previous State-of-the-Art
State-of-the-Art Models
Models
Source: Wang et Previous
al., 2022 | Table: 2023 AI Index Report
Source: Wang
Source: Wang et
et al.,
al., 2022
2022 || Table:
Table: 2023
2023 AI Index Report
Category Task Dataset Metric Previous SOTA Model of BEiT-3 Scale of
Category
Category Task
Task Dataset Metric Previous SOTA Model of BEiT-3 Scale
Scale of
of
Previous SOTA Improvement
Previous SOTA Improvement
Improvement
Vision Semantic ADE20K mIoU 61.40 FD-SwimV2 62.80 2.28%
Vision
Vision Semantic
Semantic ADE20K mIoU 61.40 FD-SwimV2 62.80
62.80 2.28%
2.28%
Segmentation
Segmentation
Segmentation
Vision Object COCO AP 63.30 DINO 63.70 0.63%
Vision
Vision Object
Object COCO AP 63.30 DINO 63.70
63.70 0.63%
0.63%
Detection
Detection
Detection
Vision Instance COCO AP 54.70 Mask DINO 54.80 0.18%
Vision
Vision Instance
Instance COCO AP 54.70 Mask DINO 54.80
54.80 0.18%
0.18%
Segmentation
Segmentation
Segmentation
Vision Image ImageNet Top-1 Accuracy 89.00 FD-CLIP 89.60 0.67%
Vision
Vision Image
Image ImageNet Top-1 Accuracy 89.00 FD-CLIP 89.60
89.60 0.67%
0.67%
Classi cation
Classi cation
Classi
Vision-Language Visual NLVR Accuracy 87.00 CoCA 92.60 6.44%
Vision-Language Visual
Vision-Language Visual NLVR Accuracy 87.00 CoCA 92.60
92.60 6.44%
6.44%
Reasoning
Reasoning
Reasoning
Vision-Language Visual QA VQAv2 VQA Accuracy 82.30 CoCA 84.00 2.07%
Vision-Language Visual
Vision-Language Visual QA VQAv2 VQA Accuracy 82.30 CoCA 84.00
84.00 2.07%
2.07%
Vision-Language Image COCO CIDEr 145.30 OFA 147.60 1.58%
Vision-Language Image
Vision-Language Image COCO CIDEr 145.30 OFA 147.60
147.60 1.58%
1.58%
Captioning
Captioning
Captioning
Vision-Language Finetuned COCO R@1 72.50 Florence 76.00 4.83%
Vision-Language Finetuned
Vision-Language Finetuned COCO R@1 72.50 Florence 76.00
76.00 4.83%
4.83%
Retrieval Flickr30K
Retrieval
Retrieval Flickr30K
Vision-Language Zero-Shot Flickr30K R@1 86.50 CoCA 88.20 1.97%
Vision-Language Zero-Shot
Vision-Language Zero-Shot Flickr30K
Flickr30K R@1
R@1 86.50
86.50 CoCA
CoCA 88.20
88.20 1.97%
1.97%
Retrieval
Retrieval
Retrieval
Figure 2.2.22
Narrative Highlight:
The Rise of Capable Multimodal Reasoning Systems (cont’d)
Figure 2.2.23 shows some of the different vision-language tasks challenging multimodal systems like
PaLI and BEiT-3.
Figure 2.2.23
Figure 2.2.24
80
75.60
70
Q->AR Score
60
50
Figure 2.2.25
Video analysis concerns reasoning or task operation across videos, rather than single images.
Figure 2.3.1
As of 2022, there is a 7.8 percentage point gap in performance between the top system on Kinetics-600 and
Kinetics-700, which suggests the 700 series dataset is still a meaningful challenge for video computer vision
researchers (Figure 2.3.2).
91.80%, Kinetics-600
91.10%, Kinetics-400
90%
84.00%, Kinetics-700
80%
Top-1 Accuracy (%)
70%
60%
Figure 2.3.2
Narrative Highlight:
A Closer Look at the Progress of Video Generation
Multiple high quality text-to-video models, In September 2022, CogVideo’s top score was
AI systems that can generate video clips from significantly surpassed by Meta’s Make-A-Video
prompted text, were released in 2022 . In May, 3
model (Figure 2.3.3). Make-A-Video performed
researchers from Tsinghua University and the 63.6% better on UCF-101 than CogVideo. And, in
Beijing Academy of Artificial Intelligence released October 2022, Google released a text-to-video
CogVideo, a model that posted the then-highest system called Phenaki; however, this model was
inception score on the UCF-101 benchmark for not benchmarked on UCF-101.
text-to-video generation (Figure 2.3.3).
70
60
Inception Score (IS)
50.46
50
40
32.36 32.7
30 28.87
27.38
24.69
20
10
0
DVD-GAN TGANv2 VideoGPT MoCoGAN-HD DIGAN CogVideo TATS-base Make-A-Video
2019 2020 2021 2022
Model
Figure 2.2.3
3 Although these models are impressive, it is worth noting that they are thus far only capable of generating videos of a few seconds’ duration.
Natural language processing (NLP) is the ability of computer systems to understand text. The last few years have seen the release of
increasingly capable “large language models,” AI systems like PaLM, GPT-3, and GLM-130B, that are trained on massive amounts of data
and adaptable to a wide range of downstream tasks.
In this section, progress in NLP is tracked across the following skill categories: (1) English language understanding, (2) text summarization,
(3) natural language inference, (4) sentiment analysis, (5) multitask language understanding, and (6) machine translation.
2.4 Language
English Language SuperGLUE
SuperGLUE is a comprehensive English language
Understanding understanding benchmark that tracks the progress
English language understanding challenges AI of AI models on eight different linguistic tasks.
systems to understand the English language in A selection of these tasks is highlighted in Figure
various ways: reading comprehension, yes/no 2.4.1. Their performance is then aggregated into a
reading comprehension, commonsense reading single metric.
comprehension, and logical reasoning.
Figure 2.4.1
4 For the sake of brevity, this figure only displays four of the eight tasks.
This year’s top model on SuperGLUE, Vega, registered a new state-of-the-art score of 91.3, which is 1.5
percentage points higher than the human baseline. Performance on SuperGLUE is continuing to saturate.
SuperGLUE: Score
Source: SuperGLUE Leaderboard, 2022 | Chart: 2023 AI Index Report
91.30
91
89
Score
88
87
86
85
Reading Comprehension
Dataset Requiring Logical A Sample Question from the Reading Comprehension Dataset
Requiring Logical Reasoning (ReClor)
Reasoning (ReClor) Source: Yu et al., 2020
In response to the saturation of
Context: When a certain gland becomes cancerous in humans, it produces high levels
traditional reading comprehension
of a particular protein. A blood test can determine the level of this protein well before
benchmarks, researchers from the a cancer of the gland could be detected by other means. Some doctors recommend
that aggressive anticancer treatment should be begun as early as possible for anyone
National University of Singapore
who is tested and is found to have high levels of the protein.
launched ReClor in 2020. ReClor,
or Reading Comprehension Dataset Question: Which one of the following, if true, most seriously weakens the doctors’
recommendation?
Requiring Logical Reasoning, is a A. The blood test for the protein has been in use for some time to monitor the
dataset of logical reasoning questions condition of patients who have been diagnosed as having cancer of the gland.
B. Before the blood test became available, about one-third of all cases of cancer of
taken from the LSAT, the entrance the gland were detected in early stages.
exam for law schools in the United C. So far, no patients whose protein levels were found to be normal have
subsequently developed cancer of the gland.
States and Canada. A sample D. Enlargement of the gland, a common condition infrequently associated with
question is shown in Figure 2.4.3 cancer, results in high levels of the protein.
Figure 2.4.3
Figure 2.4.4 examines progress on ReClor. The top 2022 result of 80.6% represented an 18 percentage point
improvement from 2020, the year the benchmark was released.
80.60%
80%
75%
Accuracy (%)
70%
65%
Narrative Highlight:
Just How Much Better Have Language Models Become?
The AI Index tested how three large language models from three different years, GPT-2 (2019), GPT-3
(2020), and ChatGPT (2022), handle the same prompt: “Explain to me the major accomplishments of
Theodore Roosevelt’s presidency.” More recent models are able to answer this question more effectively,
both in terms of factual accuracy and quality of writing.
5 GPT-2 used the 124M parameter model downloaded from OpenAI’s GitHub page.
6 The complete answer outputted by GPT-2 is trimmed here for brevity. The full answer is included in the Appendix.
7 The specific GPT-3 model that was used was text-curie-001, which has training data up to October 2019.
8 The information in this section has been cross-verified with the Encyclopedia Britannica entries on Theodore Roosevelt, Franklin Delano Roosevelt, Woodrow Wilson, and the
National Park Service, as well as the history page of the National Wildlife Federation.
9 Information on the history of the Grand Canyon National Park was cross-verified with the Wikipedia entry on the Grand Canyon National Park.
Narrative Highlight:
Planning and Reasoning in Large Language Models
As illustrated above, AI systems have become The authors then tested notable language models
increasingly strong on a wide range of reasoning on these tasks in a Blocksworld problem domain,
tasks. This improvement has led many to claim that a problem environment where agents are given
emerging AI systems, especially large language blocks of different colors and tasked with arranging
models, possess reasoning abilities that are these blocks in particular orders. The authors
somewhat similar to those possessed by humans. 10
demonstrated that these large language models
Other authors, however, have argued otherwise. 11
performed fairly ineffectively (Figure 2.4.5). While
GPT-3, Instruct-GPT3, and BLOOM demonstrated
In 2022, researchers (Valmeekam et al., 2022)
the ability, in some contexts, to reformulate goals
introduced a more challenging planning and reasoning
in robust ways, they struggled with other tasks like
test for large language models that consists of seven
plan generation, optimal planning, and plan reuse.
assignments: (1) plan generation, (2) cost-optimal
Compared to humans, the large language models
planning, (3) reasoning about plan execution, (4)
performed much worse, suggesting that while they
robustness to goal reformulation, (5) ability to reuse
are capable, they lack human reasoning capabilities.
plans, (6) replanning, and (7) plan generalization.12
Text Summarization
Text summarization tests how well AI systems can arXiv and PubMed
synthesize a piece of text while capturing its core ArXiv and PubMed are two widely used datasets for
content. Text summarization performance is judged benchmarking text summarization. The model that
on ROUGE (Recall-Oriented Understudy for Gisting posted the state-of-the-art score in 2022 on both
Evaluation), which measures the degree to which arXiv and PubMed, AdaPool, was developed by a
an AI-produced text summary aligns with a human team from Salesforce Research (Figure 2.4.6).
reference summary.
51.05, PubMed
50.95, arXiv
50
45
ROUGE-1
40
35
Figure 2.4.6
Natural Language Inference uncertain premises. Imagine, for example, that Peter
returns to his car after dinner at a restaurant to find the
Also known as textual entailment, natural language window shattered and his laptop, which he left in the
inference is the ability of AI systems to determine back seat, missing. He might immediately conclude
whether a hypothesis is true, false, or undetermined that a thief broke into his car and stole the laptop.
based on presented premises.
In 2019, the Allen Institute for AI launched aNLI, a
Abductive Natural Language Inference (aNLI) comprehensive benchmark for abductive natural
Abductive natural language inference is a form language inference that includes 170,000 premise
of natural language inference in which plausible and hypothesis pairs (Figure 2.4.7).
conclusions must be drawn from a set of limited and
Figure 2.4.7
Abductive natural language inference is a challenging task. The human baseline remained
unsurpassed until 2022, when an AI system registered a score of 93.7% (Figure 2.4.8).
94%
93.65%
92.90%, Human Baseline
92%
90%
Accuracy (%)
88%
86%
84%
Figure 2.4.8
A new state-of-the-art score of 59.8% was posted on SST-5 fine-grained classification by the
Heinsen Routing + RoBERTa Large model (Figure 2.4.10).
60% 59.80%
55%
Accuracy (%)
50%
45%
2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 2.4.10
Gopher, Chinchilla, and variants of PaLM have each posted state-of-the-art results on MMLU. The current top
result on MMLU comes from Flan-PaLM, a Google model that reports an average score of 75.2% (Figure 2.4.12).
75.20%
70%
60%
Accuracy (%)
50%
40%
30%
2019 2020 2021 2022
Figure 2.4.12
13 This criticism is more formally articulated in Hendrycks et al., 2021.
80 Commercial
Open Source Pre-trained
Preview
Number of Independent Machine Translation Services
70
60
54
50
46
40
34
45
30
26 38
23
21
20 28
16
12 13 23
10 18 21
10 9
9 10 15
8 9
6 5 5
0
2017-May 2017-Jul 2017-Nov 2018-Mar 2018-Jul 2018-Dec 2019-Jun 2019-Nov 2020-Jul 2021-Sep 2022-Jul
Figure 2.4.13
AI systems that work with human speech are usually tasked with converting spoken words into text and recognizing the individuals speaking.
2.5 Speech
Speech Recognition task of matching certain speech with a particular
individual. Over the years, the VoxCeleb dataset has
Speech recognition is the ability of AI systems to been expanded; however, the data in this subsection
identify spoken words and convert them into text. tracks progress on the original dataset.
Speech recognition has progressed so much so
This year’s top result on the original VoxCeleb dataset
that nowadays many computer programs or texting
was posted by American researchers, whose model
apps are equipped with dictation devices that can
achieved an equal error rate of 0.1%, which represents
seamlessly transcribe speech into writing.
a 0.28 percentage point decrease from the state-of–
VoxCeleb the-art result achieved by Chinese researchers in the
VoxCeleb is a large-scale audiovisual dataset of previous year (Figure 2.5.1).
human speech for speaker recognition, which is the
8%
6%
Equal Error Rate (EER)
4%
2%
0.14%
0%
Figure 2.5.1
Narrative Highlight:
Whisper
One of the major themes in the last few years of AI progress has been the emergence of large language
models that are trained on massive amounts of data and capable of executing a diverse range of tasks.
In 2022, this idea of training on large data to achieve cross-domain performance arrived in the world of
speech recognition with OpenAI’s launch of Whisper.
Whisper is a large-scale speech recognition model that was trained in a weakly supervised way
on 700,000 hours of audio data. Whisper was capable of strong, although not state-of-the-art,
performance on many speech recognition tasks in zero-shot settings.14 Whisper outperformed wav2vec
2.0 Large, another speech recognition model, across a wide range of popular English speech recognition
benchmarks (Figure 2.5.2). Similarly, Whisper proved to be a better speech translator than many other
leading AI translator models (Figure 2.5.3). Whisper also outperformed other commercial automated
speech recognition systems and scored similarly to top human transcription services (Figure 2.5.4).15
Despite this impressive performance, there were still some speech tasks, like language identification, on
which Whisper trailed state-of-the-art models (Figure 2.5.5).
wav2vec 2.0 Large (No LM) Vs. Whisper Large V2 Notable Models on X→EN Subset of CoVoST 2
Across Datasets Source: Radford et al., 2022 | Chart: 2023 AI Index Report
Source: Radford et al., 2022 | Chart: 2023 AI Index Report
6.2%
LibriSpeech Other 5.2%
Zero-Shot Whisper 29.1%
67.6%
AMI SDM1 36.4%
7.7%
WSJ 3.9%
34.8%
CallHome 17.6%
MAESTRO 25.2%
28.3%
Switchboard 13.8%
37.0%
AMI IHM 16.9%
35.6%
CORAAL 16.2% mSLAM-CTC (2B) 24.8%
17.9%
VoxPopuli En 7.3%
65.8%
CHiME-6 25.5%
10.5%
TED-LIUM 4.0% XLS-R (2B) 22.1%
14.6%
FLEURS En 4.4%
29.9%
Common Voice 9.0%
24.5%
Artie 6.2% wav2vec 2.0 Large (No LM) XMEF-X 14.7%
2.7% Whisper Large V2
LibriSpeech Clean 2.7%
0% 10% 20% 30% 40% 50% 60% 70% 0% 10% 20% 30%
Word Error Rate (%) Bilingual Evaluation Understudy (BLEU) Score
Figure 2.5.2
Figure 2.5.3
14 Zero-shot learning refers to the ability of an AI system to learn a particular task without being trained on that task.
15 Kincaid46 is a dataset of 46 audio files and transcripts that were published in the blog post, “Which automatic transcription service is the most accurate?—2018.”
Notable Speech Transcription Services on Kincaid46 Notable Models on FLEURS: Language Identi cation
Source: Radford et al., 2022 | Chart: 2023 AI Index Report
Accuracy
Source: Radford et al., 2022 | Chart: 2023 AI Index Report
ASR Computer-Assisted Human Transcription
Whisper 8.81%
Company D 12.20%
Company E 7.61%
40%
Company F 8.14%
Company G 8.65%
20%
Company H 8.96%
Company I 10.50%
0%
0% 2% 4% 6% 8% 10% 12% 14% w2v-bert-51 (0.6B) mSLAM-CTC (2B) Zero-shot Whisper
Median Word Error Rate (%)
Figure 2.5.4 Figure 2.5.5
In reinforcement learning, AI systems are trained to maximize performance on a given task by interactively learning from their prior
actions. Systems are rewarded if they achieve a desired goal and punished if they fail.
Environments
Reinforcement learning agents require environments,
not datasets, to train: They must be trained in
environments where they can experiment with
various actions that will allow them to identify
optimal game strategies.
Procgen
Procgen is a reinforcement learning environment
introduced by OpenAI in 2019. It includes
16 procedurally generated video-game-like
Figure 2.6.1
environments specifically designed to test the
ability of reinforcement learning agents to learn
generalizable skills (Figure 2.6.1). Performance on
Procgen is measured in terms of mean-normalized
score. Researchers typically train their systems on
200 million training runs and report an average score
across the 16 Procgen games. The higher the system
scores, the better the system.
A team of industry and academic researchers from Korea posted the top score of 0.6 on Procgen in 2022 (Figure 2.6.2).
0.57
Mean of Min-Max Normalized Score
0.50
0.40
Figure 2.6.2
Narrative Highlight:
Benchmark Saturation
An emerging theme in this year’s AI Index is the observed performance saturation across many popular
technical performance benchmarks. Last year’s AI Index Report observed a similar trend; however,
benchmark saturation has been particularly pronounced this year. Figure 2.6.3 shows the relative
improvement since the benchmark first launched (overall improvement) and relative improvement within
the last year (YoY improvement) on AI technical benchmarks considered in this year’s AI Index. The
improvements are reported as percent changes.
For all but 7 of the benchmarks, the improvement registered is less than 5%. The median improvement
within the last year is 4%, while the median improvement since launch is 42.4%.16 Moreover, this year the
AI Index elected not to feature traditionally popular benchmarks like SQuAD1.1 and SQuAD2.0, as no
new state-of-the-art results were posted. Moreover, the speed at which benchmark saturation is being
reached is increasing. Researchers have responded to this increasing saturation by launching newer and
more comprehensive benchmarking suites such as BIG-bench and HELM.
Overall Improvement
120% YoY Improvement
100%
Improvement (%)
80%
60%
40%
20%
0%
ImageNet Top-1
FVRT
Celeb-DF
MPII
Cityscapes
Kvasir-SEG
STL-10
CIFAR-10
VQA
COCO
VCR
Kinetics-400
Kinetics-600
Kinetics-700
SuperGLUE
ReClor
arXiv
PubMed
ANLI
SST-5
MMLU
VoxCeleb
Procgen
Benchmark
Figure 2.6.3
16 The improvements reviewed in this section are reported as relative change. Figure 2.6.3 should therefore not be used to conduct comparisons of improvements across
benchmarks, as each benchmark has different parameters.
Deep learning AI algorithms are trained on GPUs or TPUs, which accelerate the training speed of AI systems. As AI systems process
ever-larger datasets, it is crucial to monitor advancements in hardware capabilities.
2.7 Hardware
MLPerf Training every AI skill category had significantly decreased.
MLPerf is an AI training competition run by the This year, this trend has continued, albeit at a slightly
ML Commons organization. In this challenge, slower pace. Record-low training times were posted
participants train ML systems to execute various in the object detection, speech recognition, image
tasks using a common architecture. Entrants are then segmentation, recommendation, image classification,
ranked on their absolute wall clock time, which is and language processing categories (Figure 2.7.1).
how long it takes for the system to train. In categories like image classification and object
detection, the top AI systems can now train roughly
Last year, the AI Index observed that since the
32 times quicker than in 2018, when the competition
competition launched, training times for virtually
first launched.
60
40
20
Training Time (Minutes; Log Scale)
10
Figure 2.7.1
Data on the number of accelerators used by the between the mean number of accelerators used by
hardware systems submitted to MLPerf also all entrants and the average accelerators used by the
suggests that stronger hardware has been powering systems that post the top results.17 This gap suggests
decreasing training times (Figure 2.7.2). Since the that having better hardware is essential to training the
start of the MLPerf competition, the gap has grown fastest systems.
4,500
4,216, Maximum Number of Accelerators Used
4,000
3,500
3,000
Number of Accelerators
2,500
2,000
1,859, Average Accelerators Used by Top System
1,500
1,000
500
211, Mean Number of Accelerators
0
20 20 20 20 20 20 20
18- 19- 20 21- 21- 22 22
De Ju -Ju Ju De -Ju -N
c-1 n-1 l-2 n-3 c-0 n-2 ov
2 0 9 0 1 9 -09
Figure 2.7.2
17 An accelerator, like a GPU or TPU, is a chip that is chiefly used for the machine learning component of a training run.
MLPerf Inference Figures 2.7.3 to 2.7.6 plot the throughput of the state-of-
In deploying AI, inference is the step where trained the-art submissions on MLPerf Inference across four skill
AI systems generate predictions, e.g. classifying categories: image classification, language processing,
objects. recommendation, and speech recognition. The number of
inferences generated by the top-performing AI systems
In 2020, ML Commons introduced MLPerf Inference,
has significantly increased since the first iteration of the
a performance benchmarking suite that measures
competition in 2020. For example, the number of offline
how fast a trained AI system can process inputs and
samples generated by the top image classifiers and
produce inferences. The MLPerf Inference suite
language processors have more than doubled since 2020,
tracks the throughput of AI systems, measured in
while those for recommendation systems have increased
samples per second or queries per second.18
by roughly 23%.
MLPerf Best-Performing Hardware for Image MLPerf Best-Performing Hardware for Language
Classi�cation: O�ine and Server Scenario Processing: O ine and Server Scenario
Source: MLPerf, 2022 | Chart: 2023 AI Index Report Source: MLPerf, 2022 | Chart: 2023 AI Index Report
700k
679,915, O ine (Samples/s) 75,153, O ine (Samples/s)
650k 70,992, Server (Queries/s)
630,221, Server (Queries/s) 70k
600k
550k 60k
Throughput
Throughput
500k
50k
450k
400k
40k
350k
300k 30k
250k
2020 2021 2022
Figure 2.7.3 2020 2021 2022
Figure 2.7.4
2.7M 160k
2,683,620, Server (Queries/s) 155,811, O ine (Samples/s)
2,645,980, O ine (Samples/s) 150k
2.6M
140k
136,498, Server (Queries/s)
2.5M 130k
Throughput
Throughput
120k
2.4M
110k
2.3M 100k
90k
2.2M
80k
2.1M 70k
18 The following blog post from Dell Technologies offers a good distinction between offline and server samples: “Offline—one query with all samples is sent to the system under test (SUT).
The SUT can send the results back once or multiple times in any order. The performance metric is samples per second. Server—the queries are sent to the SUT following a Poisson distribution
(to model real-world random events). One query has one sample. The performance metric is queries per second (QPS) within the latency bound.”
Trends in GPUs: Performance and Price the performance of a computational device. The higher
This year, the AI Index built on work previously the FLOP/s, the better the hardware.
done by the research collective Epoch and analyzed
Figure 2.7.8 showcases the median single performance
trends over time in GPU performance and price.19
of new GPUs by release date, which continues to rise
Figure 2.7.7 showcases the FP32 (single precision) year over year. Since 2021, the median FLOP/s speed
performance FLOP/s of different GPUs released has nearly tripled, and since 2003 it has increased
from 2003 to 2022. FLOP/s stands for “Floating roughly 7,000 times.
Point Operations per second” and is a measure of
FP32 (Single Precision) Performance (FLOP/s) by Median FP32 (Single Precision) Performance (FLOP/s),
Hardware Release Date, 2003–22 2003–22
Source: Epoch and AI Index, 2022 | Chart: 2023 AI Index Report Source: Epoch and AI Index, 2022 | Chart: 2023 AI Index Report
2.0e+14
2.0e+13 2.23e+13
1.0e+14
5.0e+13 1.0e+13
2.0e+13 5.0e+12
1.0e+13
Median FLOP/s (Log Scale)
5.0e+12 2.0e+12
2.0e+12 1.0e+12
FLOP/s (Log Scale)
1.0e+12 5.0e+11
5.0e+11
2.0e+11 2.0e+11
1.0e+11 1.0e+11
5.0e+10 5.0e+10
2.0e+10
1.0e+10 2.0e+10
5.0e+9 1.0e+10
2.0e+9 5.0e+9
1.0e+9
5.0e+8 2.0e+9
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 2.7.7 2022 Figure 2.7.8
19 The Appendix fully delineates both the methodology of this approach and the unique ways in which AI Index research built upon the existing Epoch research.
Finally, figures 2.7.9 and 2.7.10 consider GPU trends 2022 is 1.4 times greater than it was in 2021 and
in terms of FLOP/s per U.S. Dollar. This statistic 20
5600 times greater than in 2003, showing a doubling
considers whether the underlying performance of in performance every 1.5 years. As noted in similar
GPUs is increasing relative to their changing costs. analyses, improvements in the price–performance of
As evidenced most clearly in Figure 2.7.10, the AI hardware has facilitated increasingly larger training
price–performance of GPUs is rapidly increasing. runs and encouraged the scaling of large AI models.
The median FLOP/s per U.S. Dollar of GPUs in
FP32 (Single Precision) Performance (FLOP/s) per Median FP32 (Single Precision) Performance (FLOP/s)
U.S. Dollar by Hardware Release Date, 2003–22 per U.S. Dollar, 2003–22
Source: Epoch and AI Index, 2022 | Chart: 2023 AI Index Report Source: Epoch and AI Index, 2022 | Chart: 2023 AI Index Report
3.59e+10
35B
50B
30B
Median FLOP/s per U.S. Dollar
40B
25B
FLOP/s per U.S. Dollar
30B 20B
15B
20B
10B
10B
5B
0 0
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
Figure 2.7.9 Figure 2.7.10
20 The data in figures 2.7.9 and 2.7.10 has been adjusted for inflation. The exact details of the adjustment are outlined in greater detail in the Appendix.
There have been mounting concerns about the environmental impact of computational resources and the energy required for AI
training and inference. Although there is no standard benchmark for tracking the carbon intensity of AI systems, this subsection
synthesizes the findings of different researchers who are exploring the link between AI and the environment. Conducting research
on the environmental effects of AI was challenging as there are wildly varying estimates, the validity of which have not yet been
definitively established. To that end, the AI Index focuses on research from a recent paper by Luccioni et al., 2022. As AI models
continue growing in size and become more universally deployed, it will be increasingly important for the AI research community to
consciously monitor the effect AI systems have on the environment.
2.8 Environment
Environmental Impact of Select Large challenging to directly compare the carbon footprint
Language Models of these models, as the accounting methodologies for
Many factors determine the amount of carbon reporting carbon emissions are not standardized.
emissions emitted by AI systems, including the
Of the four language models being compared, GPT-
number of parameters in a model, the power usage
3 released the most carbon, 1.4 times more than
effectiveness of a data center, and the grid carbon
Gopher, 7.2 times more than OPT, and 20.1 times more
intensity. Power Usage Effectiveness (PUE) is a
than BLOOM.
metric used to evaluate the energy efficiency of
data centers. It is the ratio of the total amount of Figure 2.8.2 relativizes the carbon-emission estimates
energy used by a computer data center facility, to real-life examples. For instance, BLOOM’s training
including air conditioning, to the energy delivered run emitted 1.4 times more carbon than the average
to computing equipment. The higher the PUE, the American uses in one year and 25 times that of flying
less efficient the data center. Figure 2.8.1 shows how one passenger round trip from New York to San
these factors compare across four large language Francisco. BLOOM’s training consumed enough energy
models: GPT-3, Gopher, OPT, and BLOOM. It is to power the average American home for 41 years.21
Model Number of Datacenter PUE Grid Carbon Power C02 Equivalent C02 Equivalent
Parameters Intensity Consumption Emissions Emissions x PUE
Gopher 280B 1.08 330 gC02eq/kWh 1,066 MWh 352 tonnes 380 tonnes
BLOOM 176B 1.20 57 gC02eq/kWh 433 MWh 25 tonnes 30 tonnes
GPT-3 175B 1.10 429 gC02eq/kWh 1,287 MWh 502 tonnes 552 tonnes
OPT 175B 1.09 231 gC02eq/kWh 324 MWh 70 tonnes 76.3 tonnes
Figure 2.8.1
21 The U.S. Energy Information Administration estimates that in 2021, the average annual electricity consumption of a U.S. residential utility customer was 10,632 kilowatt hours (kWh).
CO2 Equivalent Emissions (Tonnes) by Selected Machine Learning Models and Real Life Examples, 2022
Source: Luccioni et al., 2022; Strubell et al., 2019 | Chart: 2023 AI Index Report
OPT (175B) 70
BLOOM (176B) 25
American Life,
18.08
Avg., 1 Year
Human Life,
5.51
Avg., 1 Year
Air Travel,
0.99
1 Passenger, NY–SF
Narrative Highlight:
Using AI to Optimize Energy Usage
Training AI systems can be incredibly energy intensive. At the same time, recent research suggests
that AI systems can be used to optimize energy consumption. In 2022, DeepMind released the
results of a 2021 experiment in which it trained a reinforcement learning agent called BCOOLER
(BVE-based COnstrained Optimization Learner with Ensemble Regularization) to optimize cooling
procedures for Google’s data centers.
Figure 2.8.3 presents the energy-saving results from one particular BCOOLER experiment. At the
end of the three-month experiment, BCOOLER achieved roughly 12.7% energy savings. BCOOLER
was able to achieve these savings while maintaining the cooling comfort levels that the building
managers preferred.
12.7%
12%
10%
Cumulative AI Savings (%)
8%
6%
4%
2%
0%
2022 was a groundbreaking year for AI in science. This subsection looks at some meaningful ways in which AI has recently been used
to accelerate scientific discovery.
CHAPTER 3:
Technical AI Ethics
Text and Analysis by Helen Ngo
Artificial Intelligence
Index Report 2023
CHAPTER 3 PREVIEW:
Technical AI Ethics
Overview 128 Fairness in Machine Translation 143
Chapter Highlights 129 RealToxicityPrompts 144
Technical AI Ethics
3.7 AI Ethics Trends at FAccT
and NeurIPS 158
ACM FAccT (Conference on Fairness,
Accountability, and Transparency) 158
Accepted Submissions by
Professional Affiliation 158
Accepted Submissions by
Geographic Region 159
NeurIPS (Conference on Neural Information
Processing Systems) 160
Real-World Impact 160
Interpretability and Explainability 161
Causal Effect and Counterfactual
Reasoning 162
Privacy 163
Fairness and Bias 164
Overview
Fairness, bias, and ethics in machine learning continue to be topics of interest
among both researchers and practitioners. As the technical barrier to entry for
creating and deploying generative AI systems has lowered dramatically, the ethical
issues around AI have become more apparent to the general public. Startups and
large companies find themselves in a race to deploy and release generative models,
and the technology is no longer controlled by a small group of actors.
In addition to building on analysis in last year’s report, this year the AI Index
highlights tensions between raw model performance and ethical issues, as well as
new metrics quantifying bias in multimodal models.
Chapter Highlights
The effects of model scale on bias and toxicity
are confounded by training data and mitigation methods.
In the past year, several institutions have built their own large models trained on proprietary data—
and while large models are still toxic and biased, new evidence suggests that these issues can be
somewhat mitigated after training larger models with instruction-tuning.
3.1 Meta-analysis of
Fairness and Bias Metrics
Number of AI Fairness In 2022 several new datasets or metrics were released
to probe models for bias and fairness, either as
and Bias Metrics standalone papers or as part of large community
efforts such as BIG-bench. Notably, metrics are
Algorithmic bias is measured in terms of allocative
being extended and made specific: Researchers are
and representation harms. Allocative harm occurs
zooming in on bias applied to specific settings such as
when a system unfairly allocates an opportunity or
question answering and natural language inference,
resource to a specific group, and representation harm
extending existing bias datasets by using language
happens when a system perpetuates stereotypes
models to generate more examples for the same task
and power dynamics in a way that reinforces
(e.g., Winogenerated, an extended version of the
subordination of a group. Algorithms are considered
Winogender benchmark).
fair when they make predictions that neither favor
nor discriminate against individuals or groups based
Figure 3.1.1 highlights published metrics that have been
on protected attributes which cannot be used for
cited in at least one other work. Since 2016 there has
decision-making due to legal or ethical reasons (e.g.,
been a steady and overall increase in the total number
race, gender, religion).
of AI fairness and bias metrics.
Number of AI Fairness and Bias Metrics, 2016–22
Source: AI Index, 2022 | Chart: 2023 AI Index Report
20
19
15
Number of Metrics
10
0
2016 2017 2018 2019 2020 2021 2022
Figure 3.1.1
Number of AI Fairness and correlate with each other, highlighting the importance
of careful selection of metrics and interpretation of
Bias Metrics (Diagnostic results.
Metrics Vs. Benchmarks)
In 2022, a robust stream of both new ethics
Measurement of AI systems along an ethical benchmarks as well as diagnostic metrics was
dimension often takes one of two forms. A benchmark introduced to the community (Figure 3.1.2). Some
contains labeled data, and researchers test how metrics are variants of previous versions of existing
well their AI system labels the data. Benchmarks do fairness or bias metrics, while others seek to measure
not change over time. These are domain-specific a previously undefined measurement of bias—for
(e.g., SuperGLUE and StereoSet for language example, VLStereoSet is a benchmark which extends
models; ImageNet for computer vision) and often the StereoSet benchmark for assessing stereotypical
aim to measure behavior that is intrinsic to the bias in language models to the text-to-image setting,
model, as opposed to its downstream performance while the HolisticBias measurement dataset assembles
on specific populations (e.g., StereoSet measures a new set of sentence prompts which aim to quantify
model propensity to select stereotypes compared demographic biases not covered in previous work.
to non-stereotypes, but it does not measure
performance gaps between different subgroups).
These benchmarks often serve as indicators of
intrinsic model bias, but they may not give as clear an In 2022 a robust stream
indication of the model’s downstream impact and its of both new ethics
extrinsic bias when embedded into a system.
benchmarks as well
A diagnostic metric measures the impact or as diagnostic metrics
performance of a model on a downstream task, and it
is often tied to an extrinsic impact—for example, the was introduced to the
differential in model performance for some task on a community.
population subgroup or individual compared to similar
individuals or the entire population. These metrics
can help researchers understand how a system will
perform when deployed in the real world, and whether
it has a disparate impact on certain populations.
Previous work comparing fairness metrics in natural
language processing found that intrinsic and extrinsic
metrics for contextualized language models may not
Number of New AI Fairness and Bias Metrics (Diagnostic Metrics Vs. Benchmarks), 2016–22
Source: AI Index, 2022 | Chart: 2023 AI Index Report
Benchmarks
14
Diagnostic Metrics
13
12
11
10
10
9 9 9
Number of Metrics
4
4
3 3
2 2 2
2
1
0
0
2016 2017 2018 2019 2020 2021 2022
Figure 3.1.2
3.2 AI Incidents
AI, Algorithmic, and that tracks the ethical issues associated with AI
technology.
Automation Incidents and
Controversies (AIAAIC) The number of newly reported AI incidents and
Repository: Trends Over Time controversies in the AIAAIC database was 26 times
greater in 2021 than in 2012 (Figure 3.2.1)1. The rise
The AI, Algorithmic, and Automation Incidents in reported incidents is likely evidence of both
and Controversies (AIAAIC) Repository is an the increasing degree to which AI is becoming
independent, open, and public dataset of recent intermeshed in the real world and a growing
incidents and controversies driven by or relating to awareness of the ways in which AI can be ethically
AI, algorithms, and automation. It was launched in misused. The dramatic increase also raises an
2019 as a private project to better understand some important point: As awareness has grown, tracking of
of the reputational risks of artificial intelligence incidents and harms has also improved—suggesting
and has evolved into a comprehensive initiative that older incidents may be underreported.
260
250
Number of AI Incidents and Controversies
200
150
100
50
0
2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 3.2.1
1 This figure does not consider AI incidents reported in 2022, as the incidents submitted to the AIAAIC database undergo a lengthy vetting process before they are fully added.
2 Although these events were reported in 2022, some of them had begun in previous years.
3 Although other text-to-image models launched in 2022 such as DALL-E 2 and Stable Diffusion were also criticized, for the sake of brevity the AI Index chose to highlight one particular
incident.
35
30
Number of Research Papers
25
20
15
10
0
2018 2019 2020 2021 2022
Figure 3.3.1
80%
73.58%
71.70% 71.40%
Winogender Accuracy (%)
63.30% 64.17%
61.50% 60.00%
57.90% 59.00% 59.00%
60%
50.00%
40%
20%
0%
iPET (ALBERT) Gopher WARP Bort Gopher GPT-3 PaLM GLAM GPT-3 Gopher PaLM
31M 100M (ALBERT-XXL-V2) 340M 10B 13B 62B 64B 175B 280M 540B
223M
Model and Number of Parameters
Figure 3.3.2
Flan-PaLM 8B 72.25%
PaLM 8B 10.80%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Winogender Accuracy (%)
Figure 3.3.3
BBQ: The Bias Benchmark for GUINEAN guy.” The corresponding question might
be, “Who has never used a computer?” referencing
Question Answering the stereotypical bias that Africans are technologically
illiterate. The example may be disambiguated with
Language models are increasingly being deployed
additional context such as, “The ICELANDIC guy was
in question answering systems. While several
behind the counter trying to explain to the GUINEAN
benchmarks exist for the purpose of measuring bias
guy how to find the power switch on his computer!”
in language models based on their outputs, including
BOLD, CrowS-Pairs, and StereoSet, the BBQ
benchmark measures how biases can manifest in the
Models can be more biased
question-answering setting. BBQ measures bias along
the axes of socioeconomic status, sexual orientation, along certain identity
religion, race/ethnicity, physical appearance, categories than others—
nationality, gender identity, disability status, and age.
most models are biased
Examples consist of template-based context and
along the axes of physical
question pairs, where each answer choice references
a person belonging to either a stereotypical or appearance and age, but the
anti-stereotypical social group. The third option is biases along the axis of race/
“Unknown,” designed for cases where the answer
cannot be determined from the information given.
ethnicity are less clear.
Each example includes an ambiguous version where
it is unclear which social group the entity belongs
In contexts where the answer is ambiguous, models
to, as well as additional context which can be used
are more likely to fall back on stereotypes and select
to disambiguate the correct answer. The ambiguous
unsupported answers rather than “Unknown” (Figure
setting measures how often models fall back on
3.3.4), and this result is exacerbated for models fine-
stereotypes when the answer cannot be inferred from
tuned with reinforcement learning.4
the given context. Models that do not exhibit bias have
a score of zero, while a score of 100 indicates that the
As seen in Figure 3.3.4, models can be more biased
model chooses answers aligned with the social bias
along certain identity categories than others—
in question, and a score of -100 indicates the model
most models are biased along the axes of physical
always chooses the anti-stereotypical answer.
appearance and age, but the biases along the axis
of race/ethnicity are less clear. For reference, Figure
An example of an ambiguous context might be,
3.3.5 highlights bias in question answering on BBQ in
“At the computer store the other day, I overheard
disambiguated contexts.
a conversation between an ICELANDIC guy and a
Disability Status 9.90 17.40 10.70 38.30 32.60 21.20 4.00 13.00
Gender Identity 10.00 15.00 11.30 25.60 18.60 2.40 4.00 8.00
Physical Appearance 17.00 40.70 41.00 38.50 47.70 40.90 4.00 16.00
Sexual Orientation 0.20 -3.00 -4.40 6.50 11.80 5.80 1.00 7.00
Socio-Economic Status 4.40 3.50 9.70 29.60 48.70 27.30 11.00 14.00
Ro Ro De De Un Un Dia DP
B ER B ER BE BE i ed i ed log C,
Ta Ta RT RT QA QA ue R L-F
-Ba -L aV aV -Pr
se arg 3-B 3-L ( AR ( RA om i ne
e a se arge C) CE pte tun
) d ed
Ch
in ch
illa
(D
PC
)
Model
Figure 3.3.4
Disability Status 5.40 5.70 8.10 1.70 -0.70 -1.40 0.00 8.00
Gender Identity 14.00 2.90 4.60 -16.90 -3.40 -5.80 2.00 3.00
Physical Appearance 17.10 -2.70 4.20 -5.00 -1.70 -2.30 12.00 8.00
Sexual Orientation 6.50 -3.10 -4.80 -0.20 0.50 -0.70 -1.00 -1.00
Socio-Economic Status 7.00 3.50 3.80 2.90 3.80 3.90 8.00 7.00
Ro Ro De De Un Un Dia DP
B ER B ER BE BE i ed i ed log C,
Ta Ta RT RT QA QA ue R L-F
-Ba -L aV aV -Pr
se arg 3-B 3-L (AR (RA om i ne
e ase a rge C) CE pte tun
) d ed
Ch
inch
illa
(D
PC
)
Model
Figure 3.3.5
Fairness and Bias Trade-Offs not clear (Figure 3.3.6). This finding may be contingent
on the specific criterion for fairness, defined as
in NLP: HELM counterfactual fairness and statistical fairness.
1.00
0.50
0.80
0.40
Bias (Gender Representation)
0.60
0.30
Fairness
0.40
0.20
0.20 0.10
0.00 0.00
0.00 0.20 0.40 0.60 0.80 1.00 0.00 0.20 0.40 0.60 0.80 1.00
Accuracy Accuracy
Figure 3.3.6
60%
40%
20%
0%
Flan-T5-XXL 11B Flan-PaLM 8B Flan-PaLM 62B Flan-PaLM 540B PaLM 8B PaLM 62B PaLM 540B
Model and Number of Parameters
Figure 3.3.7
RealToxicityPrompts by Model
Source: Liang et al., 2022 | Chart: 2023 AI Index Report
0.08
0.07
Toxicity Probability
0.06
0.05
0.04
0.03
0.02
0.01
0.00
GPT-3 ada v1 350M
GPT-J 6B
TNLG v2 6.7B
J1-Large v1 7.5B
T0pp 11B
T5 11B
J1-Grande v1 17B
GPT-NeoX 20B
UL2 20B
OPT 66B
YaLM 100B
GLM 130B
OPT 175B
BLOOM 176B
J1-Jumbo v1 178B
TNLG v2 530B
Figure 3.3.8
A natural application of generative language models is in open-domain conversational AI; for example, chatbots and assistants. In the
past year, companies have started deploying language models as chatbot assistants (e.g., OpenAI’s ChatGPT, Meta’s BlenderBot3).
However, the open-ended nature of these models and their lack of steerability can result in harm—for example, models can be
unexpectedly toxic or biased, reveal personally identifiable information from their training data, or demean or abuse users.
The training data used for dialog systems can result attractive daughters? I will sell one.
leaving their users feeling unsettled. Researchers inappropriate for a robot to output. (Gros et al., 2022)
99%
MultiWOZ
99%
94%
Persuasion for Good
94%
88%
EmpatheticDialogues
90%
88%
Wizard of Wikipedia
87%
Dataset
82%
Reddit Small
75%
72%
MSC
77%
67%
RUAR Blender2
75%
Possible
65%
Blender for a Robot to Say
75%
Comfortable
56% for a Robot to Say
PersonaChat
67%
Narrative Highlight:
Tricking ChatGPT Into Building a Dirty Bomb, Part 1
Tricking ChatGPT Source: Outrider, 2022
Figure 3.4.4
Text-to-image models took over social media in 2022, turning the issues of fairness and bias in AI systems visceral through image form:
Women put their own images into AI art generators and received hypersexualized versions of themselves.
Fairness Across Age Groups for Text-to-Image Models: ImageNet Vs. Instagram
Source: Goyal et al., 2022 | Chart: 2023 AI Index Report
ImageNet 693M (Supervised) ImageNet 693M (SwaV) Instagram 1.5B (SEER) Instagram 10B (SEER)
78.5%
76.6%
18–30
89.6%
93.2%
76.7%
74.6%
30–45
90.5%
Age Group
95.0%
80.1%
76.7%
45–70
92.6%
95.6%
75.8%
69.4%
70+
88.7%
96.7%
Fairness Across Gender/Skin Tone Groups for Text-to-Image Models: ImageNet Vs. Instagram
Source: Goyal et al., 2022 | Chart: 2023 AI Index Report
ImageNet 693M (Supervised) ImageNet 693M (SwaV) Instagram 1.5B (SEER) Instagram 10B (SEER)
73.6%
Skin Tone 69.7%
Darker 86.6%
92.9%
82.1%
Skin Tone 80.8%
Lighter 94.2%
Gender/Skin Tone Group
96.2%
58.2%
Female 50.3%
Darker 78.2%
90.3%
75.1%
Female 71.6%
Lighter 93.7%
96.8%
92.7%
Male 93.7%
Darker 97.5%
96.1%
91.1%
Male 92.5%
Lighter 94.9%
95.4%
Text-to-Image Models
StereoSet was introduced as a benchmark for
measuring stereotype bias in language models along
the axes of gender, race, religion, and profession
by calculating how often a model is likely to choose
a stereotypical completion compared to an anti-
stereotypical completion. VLStereoSet extends the
idea to vision-language models by evaluating how
often a vision-language model selects stereotypical
captions for anti-stereotypical images.
Gender Profession
100 ALBEF VILT 100 VisualBERT CLIP
FLAVA VisualBERT VILT
80 CLIP 80 ALBEF LXMERT
60 LXMERT 60 FLAVA
Vision-Language Relevance (vlrs) Score
40 40
20 20
0 0
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
Race Religion
100 100 VisualBERT CLIP
VisualBERT CLIP
ALBEF
80 VILT 80 VILT LXMERT
ALBEF
LXMERT FLAVA
60 FLAVA 60
40 40
20 20
0 0
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
Vision-Language Bias (vlbs) Score
Figure 3.5.4
Text-to-Image Models
This subsection highlights some of the
ways in which bias is tangibly manifested in
popular AI text-to-image systems such as
Stable Diffusion, DALL-E 2, and Midjourney.
Stable Diffusion
Stable Diffusion gained notoriety in 2022
upon its release by CompVis, Runway ML,
and Stability AI for its laissez-faire approach
to safety guardrails, its approach to full
openness, and its controversial training
dataset, which included many images from
artists who never consented to their work
being included in the data. Though Stable
Diffusion produces extremely high-quality
images, it also reflects common stereotypes
and issues present in its training data.
Figure 3.5.5
DALL-E 2
DALL-E 2 is a text-to-image model released by looking men wearing suits. Each of the men appeared
OpenAI in April 2022. DALL-E 2 exhibits similar biases to take an assertive position, with three of the four
as Stable Diffusion—when prompted with “CEO,” the crossing their arms authoritatively (Figure 3.5.6).
model generated four images of older, rather serious-
Bias in DALL-E 2
Source: DALL-E 2, 2023
Figure 3.5.6
Midjourney
Midjourney is another popular text-to-image system that was released in 2022. When prompted with “influential
person,” it generated four images of older-looking white males (Figure 3.5.7). Interestingly, when Midjourney was
later given the same prompt by the AI Index, one of the four images it produced was of a woman (Figure 3.5.8).
As research in AI ethics has exploded in the Western world in the past few years, legislators and policymakers have spent significant
resources on policymaking for transformative AI. While China has fewer domestic guidelines than the EU and the United States,
according to the AI Ethics Guidelines Global Inventory, Chinese scholars publish significantly on AI ethics—though these research
communities do not have significant overlap with Western research communities working on the same topics.
100 99
95
88
80
Number of Papers
60 58
50 49
41
40 39
37
32
27
20
0
Privacy Equality Agency Responsibility Security Freedom Unemployment Legality Transparency Autonomy Other
Figure 3.6.1
AI Ethics
Ethics in China
China:: St
Strategies for
for Har
Harm
m Mitig
Mitigaation R
Rel
ela
ated tto
o AI
Sour
urc
ce: Zh
Zhu,
u, 2022 | C
Char
hart:
t: 2023 AI Index
Index Repor
Reportt
71
70 69
64
60
52
50
Number of Papers
45
40 39 39
37
30
23
20
10
0
Structural Legislation Value Principles Accountability Shared Technological Talent International
Reform De nition System Governance Solutions Training Cooperation
Figure 3.6.2
43
40 40
40
37
Number of References
30
21
20
13
11
10
7 6 6 6
4
3
0
GD Et Ot Th Go Et As Be Pr AI AI Re Th
h hic ilo i e c
PR Tr ics he
rs
re
e a N vern all ma AI a jing CO lim St I
an Ind nfor the om Ro e EU
us G La e a y rA n C M i n d u m C m bo RO
tw uid ws w nc a ar s
or e of Gen e Pr
Al
ign
o
I P d Ed nse EST ry D diz try atio oun end eth N
at
th line
yA s Ro e i e r i nc u c n s o r a a tio D e n c il o ion ics R
I fo bo rat ncip dD
es ipl atio us o n Ro ft Re n W velo nA o oa
r tic ion les ign e s n n b ot p hit p me I f dm
s of fo
r ics ort o e p n ap
AI Et f ap t St
hic er ra
s te
gy
Figure 3.6.3
503
500
Number of Papers
400
302
300
244
200 227
166 181
200
100 139
71
63 53 70
0
2018 2019 2020 2021 2022
Figure 3.7.1
Accepted Submissions by Geographic Region and Central Asia made up 18.7% of submissions, they
European government and academic actors have made up over 30.6% of submissions in 2022 (Figure
increasingly contributed to the discourse on AI ethics 3.7.2). FAccT, however, is still broadly dominated
from a policy perspective, and their influence is by authors from North America and the rest of the
manifested in trends on FAccT publications as well: Western world.
Whereas in 2021 submissions to FAccT from Europe
70%
50%
40%
20%
NeurIPS
NeurIPS (Conference on Neural Information Real-World Impact
Processing Systems), one of the most influential Several workshops at NeurIPS gather researchers
AI conferences, held its first workshop on fairness, working to apply AI to real-world problems. Notably,
accountability, and transparency in 2014. This section there has been a recent surge in AI applied to
tracks and categorizes workshop topics year over healthcare and climate in the domains of drug
year, noting that as topics become more mainstream, discovery and materials science, which is reflected
they often filter out of smaller workshops and into the in the spike in “AI for Science” and “AI for Climate”
main track or into more specific conferences related workshops (Figure 3.7.3).
to the topic.
NeurIPS Workshop Research Topics: Number of Accepted Papers on Real-World Impacts, 2015–22
Source: NeurIPS, 2022 | Chart: 2023 AI Index Report
Climate 802
800
Developing World
94
Finance
700 Healthcare
Science
Other
600
171
529
Number of Papers
500 65
459
429
61
400
78 116
334
300 283 412
127 238
200 199 273
153 254
79
100 83
144 68
71 77 94 81 64
0
12
Figure 3.7.3
NeurIPS Research Topics: Number of Accepted Papers on Interpretability and Explainability, 2015–22
Source: NeurIPS, 2022 | Chart: 2023 AI Index Report
Main Track 41
40 Workshop
35
18
30
Number of Papers
25 24
23
20
17
15
7 19
12
23 24
10 5
6 6
5 10
6 7 6
2
4
0
2015 2016 2017 2018 2019 2020 2021 2022
Figure 3.7.4
5 Declines in the number of workshop-related papers on interpretability and explainability might be attributed to year-over-year differences in workshop themes.
NeurIPS Research Topics: Number of Accepted Papers on Causal E ect and Counterfactual Reasoning,
2015–22
Source: NeurIPS, 2022 | Chart: 2023 AI Index Report
Main Track 80
80 78
Workshop 76
72
70
20
60
Number of Papers
50 43 53 61
40 39
30 16
58
20
29
9 23 23
10 19
6
4 9
6
0
2015 2016 2017 2018 2019 2020 2021 2022
Figure 3.7.5
Privacy
Amid growing concerns about privacy, data been devoted to topics such as privacy in machine
sovereignty, and the commodification of personal learning, federated learning, and differential privacy.
data for profit, there has been significant momentum This year’s data shows that discussions related to
in industry and academia to build methods and privacy in machine learning have increasingly shifted
frameworks to help mitigate privacy concerns. into the main track of NeurIPS (Figure 3.7.6).
Since 2018, several workshops at NeurIPS have
120 15
103
100
Number of Papers
88 27
80 79 13
138
60
113
40 76
72 75
21
20 16
19
14
1
0
2015 2016 2017 2018 2019 2020 2021 2022
Figure 3.7.6
NeurIPS Research Topics: Number of Accepted Papers on Fairness and Bias in AI, 2015–22
Source: NeurIPS, 2022 | Chart: 2023 AI Index Report
300
250
Number of Papers
200
168
149 310
150
50
125 36
114
100 36
2 4 24
0
2015 2016 2017 2018 2019 2020 2021 2022
Figure 3.7.7
250
236, FEVER
200
191, LIAR
Number of Citations
150
50
SciFact 2020 ✓
COVID-Fact 2021 ✓
WikiFactCheck 2020 ✓
FM2 2021 ✓
Thorne et al. 2021 ✓
FaVIQ 2022 ✓
LIAR-PLUS 2017 no ✓
PolitiHop 2021 no ✓
Climate-FEVER 2020 ✓ no
HealthVer 2021 ✓ no
UKP-Snopes 2019 ✓ no
PubHealth 2020 ✓ no
WatClaimCheck 2022 ✓ no
Baly et al. 2018 no no
MultiFC 2019 no no
X-Fact 2021 no no
Figure 3.8.2
50%
Accuracy (%)
40%
30%
20%
10%
0%
T5 60M
GPT-2 117M
Galactica 125M
GPT-NEO-125M
T5 220M
InstructGPT ada v1 350M
GPT3 350M
GPT-3 ada v1 350M
Cohere small v20220720 410M
T5 770M
Galactica 1.3B
GPT-3 babbage v1 1.3B
GPT3 1.3B
GPT-NEO-1.3B
InstructGPT babbage v1 1.3B
Gopher 1.4B
GPT2 1.5B
GPT-NEO-2.7B
T5 2.8B
GPT-J 6B
GPT-NEO-6B
Cohere medium v20220720 6.1B
TNLG v2 6.7B
Galactica 6.7B
InstructGPT curie v1 6.7B
GPT3 6.7B
GPT-3 curie v1 6.7B
Gopher 7.1B
J1-Large v1 7.5B
T5 11B
T0pp 11B
Cohere large v20220720 13.1B
J1-Grande v1 17B
UL2 20B
GPT-NeoX 20B
Galactica 30B
Anthropic-LM v4-s3 52B
Cohere xlarge v20220609 52.4B
OPT 66B
YaLM 100B
Galactica 120B
GLM 130B
GPT-3 davinci v1 175B
OPT-175B
GPT3 175B
OPT 175B
InstructGPT davinci v2 175B
BLOOM 176B
J1-Jumbo v1 178B
Gopher 280B
Gopher 280B -10shot
TNLG v2 530B
Figure 3.8.3
Artificial Intelligence
Index Report 2023
CHAPTER 4:
The Economy
CHAPTER 4 PREVIEW:
The Economy
Overview 170 Narrative Highlight: The Effects of
Chapter Highlights 171 GitHub’s Copilot on Developer
Productivity and Happiness 208
Overview
Increases in the technical capabilities of AI systems have led to greater rates of AI
deployment in businesses, governments, and other organizations. The heightening
integration of AI and the economy comes with both excitement and concern. Will
AI increase productivity or be a dud? Will it boost wages or lead to the widespread
replacement of workers? To what degree are businesses embracing new AI
technologies and willing to hire AI-skilled workers? How has investment in AI
changed over time, and what particular industries, regions, and fields of AI have
attracted the greatest amount of investor interest?
This chapter examines AI-related economic trends by using data from Lightcast,
LinkedIn, McKinsey, Deloitte, and NetBase Quid, as well as the International
Federation of Robotics (IFR). This chapter begins by looking at data on AI-related
occupations and then moves on to analyses of AI investment, corporate adoption of
AI, and robot installations.
Chapter Highlights
The demand for AI-related For the first time in the last
professional skills is increasing decade, year-over-year private
across virtually every investment in AI decreased.
American industrial sector. Global AI private investment was $91.9 billion
Across every sector in the United States for in 2022, which represented a 26.7% decrease
which there is data (with the exception of since 2021. The total number of AI-related
agriculture, forestry, fishing, and hunting), the funding events as well as the number of newly
number of AI-related job postings has increased funded AI companies likewise decreased.
on average from 1.7% in 2021 to 1.9% in 2022. Still, during the last decade as a whole, AI
Employers in the United States are increasingly investment has significantly increased. In 2022
looking for workers with AI-related skills. the amount of private investment in AI was 18
times greater than it was in 2013.
In 2022, the AI focus area with the most investment was medical
and healthcare ($6.1 billion); followed by data management,
processing, and cloud ($5.9 billion); and Fintech ($5.5 billion).
However, mirroring the broader trend in AI private investment, most AI focus areas saw less investment
in 2022 than in 2021. In the last year, the three largest AI private investment events were: (1) a $2.5 billion
funding event for GAC Aion New Energy Automobile, a Chinese manufacturer of electric vehicles; (2) a
$1.5 billion Series E funding round for Anduril Industries, a U.S. defense products company that builds
technology for military agencies and border surveillance; and (3) a $1.2 billion investment in Celonis, a
business-data consulting company based in Germany.
4.1 Jobs
AI Labor Demand Global AI Labor Demand
Figure 4.1.1 highlights the percentage of all job
This section reports demand for AI-related skills
postings that require some kind of AI skill. In 2022,
in labor markets. The data comes from Lightcast,
the top three countries according to this metric were
which mined millions of job postings collected from
the United States (2.1%), Canada (1.5%), and Spain
over 51,000 websites since 2010 and flagged listings
(1.3%). For every country included in the sample, the
calling for AI skills.
number of AI-related job postings was higher in 2022
than in 2014.1
1.45%, Canada
1.50% 1.33%, Spain
1.23%, Australia
1.20%, Sweden
1.16%, Switzerland
1.14%, United Kingdom
1.01%, Netherlands
0.98%, Germany
1.00% 0.89%, Austria
0.86%, Belgium
0.84%, France
0.72%, Italy
0.50%
0.45%, New Zealand
0.00%
2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 4.1.1
1 In 2022, Lightcast slightly changed their methodology for determining AI-related job postings from that which was used in previous versions of the AI Index Report. As such, some of the
numbers in this chart do not completely align with those featured in last year’s report.
AI Job Postings (% of All Job Postings) in the United States by Skill Cluster, 2010–22
Source: Lightcast, 2022 | Chart: 2023 AI Index Report
0.80%
0.40%
0.06%, Robotics
0.00%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 4.1.2
Figures 4.1.3 and 4.1.4 showcase the top ten specialized skills that were demanded in AI job postings in 2022 compared
to 2010–20122. On an absolute level, virtually every specialized skill is more in demand now than a decade ago. The
growth in demand for Python is particularly notable, evidence of its growing popularity as an AI coding language.
Top Ten Specialized Skills in 2022 AI Job Postings in the United States, 2010–12 Vs. 2022
Source: Lightcast, 2022 | Chart: 2023 AI Index Report
296,662
Python (Programming Language)
12,884
260,333
Computer Science
48,001
185,807
SQL (Programming Language)
22,037
159,801
Data Analysis
16,571
157,855
Data Science
1,227
155,615
Amazon Web Services
962
152,956
Agile Methodology
7,549
138,791
Automation
13,207
133,856
Java (Programming Language)
26,557 2022
133,286 2010–12
Software Engineering
22,384
0 50,000 100,000 150,000 200,000 250,000 300,000
Number of AI Job Postings Figure 4.1.3
Top Ten Specialized Skills in 2022 AI Job Postings in the United States by Skill Share, 2010–12 Vs. 2022
Source: Lightcast, 2022 | Chart: 2023 AI Index Report
37.13% (+592%)
Python (Programming Language)
5.36%
32.58% (+63%)
Computer Science
19.98%
23.25% (+153%)
SQL (Programming Language)
9.17%
20.00% (+190%)
Data Analysis
6.90%
19.75% (+3,767%)
Data Science
0.51%
19.47% (+4,763%)
Amazon Web Services
0.40%
19.14% (+509%)
Agile Methodology
3.14%
17.37% (+216%)
Automation
5.50%
16.75% (+52%)
Java (Programming Language)
11.06% 2022
16.68% (+79%) 2010–12
Software Engineering
9.32%
0% 5% 10% 15% 20% 25% 30% 35% 40%
Skill Share in AI Job Postings (%)
Figure 4.1.4
2 The point of comparison of 2010–2012 was selected because some data at the jobs/skills level is quite sparse in earlier years. Lightcast therefore used the
whole set of years 2010–2012 to get a larger sample size for a benchmark from 10 years ago to compare.
U.S. AI Labor Demand by Sector fishing, and hunting), the number of AI job postings
Figure 4.1.5 shows the percentage of U.S. job was notably higher in 2022 than in 2021, with the top
postings that required AI skills by industry sector three sectors being information (5.3%); professional,
from 2021 to 2022. Across virtually every included scientific, and technical services (4.1%); and finance
sector (with the exception of agriculture, forestry, and insurance (3.3%).
AI Job Postings (% of All Job Postings) in the United States by Sector, 2021 Vs. 2022
Source: Lightcast, 2022 | Chart: 2023 AI Index Report
5.30%
Information 4.85%
4.07%
Professional, Scienti c, and Technical Services 3.86%
3.33%
Finance and Insurance 2.94%
3.26%
Manufacturing 2.86%
1.64%
Agriculture, Forestry, Fishing, and Hunting 1.66%
1.53%
Educational Services 1.41%
1.37%
Management of Companies and Enterprises 1.08%
1.32%
Public Administration 0.98%
1.28%
Retail Trade 0.82%
1.27%
Utilities 1.10%
1.19%
Mining, Quarrying, and Oil and Gas Extraction 1.00%
0.98%
Wholesale Trade 0.82%
0.89%
Real Estate and Rental and Leasing 0.65%
2022
0.67%
Transportation and Warehousing 0.59% 2021
0.58%
Waste Management and Administrative Support Services 0.56%
0% 1% 2% 3% 4% 5%
AI Job Postings (% of All Job Postings)
Figure 4.1.5
U.S. AI Labor Demand by State Number of AI Job Postings in the United States by State, 2022
Source: Lightcast, 2022 | Chart: 2023 AI Index Report
Figure 4.1.6 highlights the number
of AI job postings in the United AK ME
970 2,227
States by state. The top three
VT NH MA
states in terms of postings were 1,571 2,719 34,603
California (142,154), followed by WA MT ND SD MN WI MI NY CT RI
31,284 833 1,227 2,195 11,808 8,879 25,366 43,899 8,960 2,965
Texas (66,624) and New York
(43,899). OR ID WY NE IA IL IN OH PA NJ
10,811 6,109 769 4,032 5,670 31,569 9,247 19,208 20,397 23,447
CA NV UT CO KS MO KY WV DC MD DE
142,154 6,813 6,885 20,421 7,683 10,990 4,536 887 9,606 16,769 3,503
AZ NM OK AR TN VA NC
19,514 3,357 5,719 7,247 11,173 34,221 23,854
TX LA MS AL GA SC
66,624 4,806 2,548 7,866 26,620 4,928
HI FL
2,550 33,585
Figure 4.1.6
Figure 4.1.7 demonstrates what Percentage of U.S. States Job Postings in AI, 2022
Source: Lightcast, 2022 | Chart: 2023 AI Index Report
percentage of a state’s total job
postings were AI-related. The top AK ME
0.88% 1.64%
states according to this metric
VT NH MA
were the District of Columbia 1.34% 1.20% 2.26%
(3.0%), followed by Delaware WA MT ND SD MN WI MI NY CT RI
2.48% 0.72% 1.04% 1.83% 1.22% 0.90% 1.77% 2.07% 1.66% 1.84%
(2.7%), Washington (2.5%), and
Virginia (2.4%). OR ID WY NE IA IL IN OH PA NJ
1.43% 1.89% 1.18% 1.14% 1.14% 1.63% 0.88% 1.07% 1.30% 2.04%
CA NV UT CO KS MO KY WV DC MD DE
2.21% 1.23% 1.54% 1.46% 1.43% 1.15% 0.85% 0.99% 2.95% 1.96% 2.66%
AZ NM OK AR TN VA NC
1.40% 1.36% 1.07% 2.03% 1.11% 2.42% 1.44%
TX LA MS AL GA SC
1.52% 0.87% 1.15% 1.31% 1.64% 0.87%
HI FL
1.46% 1%
Figure 4.1.7
Which states had the greatest Percentage of United States AI Job Postings by State, 2022
Source: Lightcast, 2022 | Chart: 2023 AI Index Report
share of AI job postings as a
share of all AI job postings in AK ME
0.12% 0.28%
the U.S. in 2022? California was
VT NH MA
first: Last year 17.9% of all AI job 0.20% 0.34% 4.35%
postings in the United States WA MT ND SD MN WI MI NY CT RI
3.93% 0.10% 0.15% 0.28% 1.48% 1.12% 3.19% 5.52% 1.13% 0.37%
were for jobs based in California,
followed by Texas (8.4%) and OR ID WY NE IA IL IN OH PA NJ
1.36% 0.77% 0.10% 0.51% 0.71% 3.97% 1.16% 2.41% 2.56% 2.95%
New York (5.5%) (Figure 4.1.8). CA NV UT CO KS MO KY WV DC MD DE
17.87% 0.86% 0.87% 2.57% 0.97% 1.38% 0.57% 0.11% 1.21% 2.11% 0.44%
AZ NM OK AR TN VA NC
2.45% 0.42% 0.72% 0.91% 1.40% 4.30% 3%
TX LA MS AL GA SC
8.37% 0.60% 0.32% 0.99% 3.35% 0.62%
HI FL
0.32% 4.22%
Figure 4.1.8
Figure 4.1.9 highlights the trends over time in AI job postings for four select states that annually report a high
number of AI-related jobs: Washington, California, New York, and Texas. For all four, there was a significant
increase in the number of total AI-related job postings from 2021 to 2022, suggesting that across these states,
employers are increasingly looking for AI-related workers.
2.21%, California
Percentage of U.S. States’ Job Postings in AI
1.00%
0.50%
0.00%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 4.1.9
Figure 4.1.10 highlights the degree to which AI-related job postings have been subdivided among the top
four states over time. California’s share of all AI job postings has decreased steadily since 2019 while Texas’
has marginally increased. The fact that California no longer commands one-quarter of all AI-related jobs
suggests that AI jobs are becoming more equally distributed among U.S. states.
25%
Percentage of United States AI Job Postings
20%
17.87%, California
15%
10%
8.37%, Texas
0%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 4.1.10
AI Hiring in the same period the job began, divided by the total
number of LinkedIn members in the corresponding
Our AI hiring data is based on a LinkedIn dataset of skills
location. This rate is then indexed to the average
and jobs that appear on their platform. The countries
month in 2016; for example, an index of 1.1 in December
included in the sample make at least 10 AI hires each
2021 points to a hiring rate that is 10% higher than the
month and have LinkedIn covering at least 40% of
average month in 2016. LinkedIn makes month-to-
their labor force. India is also included in the sample
month comparisons to account for any potential lags in
given their increasing significance in the AI landscape,
members updating their profiles. The index for a year is
although LinkedIn does not cover 40% of their labor
the number in December of that year.
force. Therefore, the insights drawn about India should
be interpreted with particular caution.
The relative AI hiring index measures the degree to which
the hiring of AI talent is changing, more specifically
Figure 4.1.11 highlights the 15 geographic areas that
whether the hiring of AI talent is growing faster than,
have the highest relative AI hiring index for 2022. The
equal to, or more slowly than overall hiring in a particular
AI hiring rate is calculated as the percentage of LinkedIn
geographic region. In 2022, Hong Kong posted the
members with AI skills on their profile or working in
greatest growth in AI hiring at 1.4, followed by Spain, Italy
AI-related occupations who added a new employer
and the United Kingdom, and the United Arab Emirates.
Relative AI Hiring Index by Geographic Area, 2022
Source: LinkedIn, 2022 | Chart: 2023 AI Index Report
Spain 1.19
Italy 1.18
Denmark 1.06
Belgium 1.05
Netherlands 1.03
Sweden 1.01
Canada 0.99
Switzerland 0.99
Singapore 0.99
Figure 4.1.12 highlights how the AI hiring index changes over time for a wide range of countries3. Overall, the
majority of countries included in the sample have seen meaningful increases in their AI hiring rates since 2016.
This trend suggests that those countries are now hiring more AI talent than in 2016. However, for many countries,
AI hiring rates seem to have peaked around 2020, then dropped, and have since stabilized.
3 Both Figure 4.1.11 and Figure 4.1.12 report the Relative AI Hiring Index. Figure 4.1.11 reports the Index value at the end of December 2022, while Figure 4.1.12 reports a twelve-month rolling average.
India 3.23
Germany 1.72
Israel 1.65
Canada 1.54
Singapore 1.37
France 1.13
Brazil 0.99
Spain 0.98
Netherlands 0.95
Italy 0.95
Switzerland 0.91
Australia 0.89
Figure 4.1.13
Using data from NetBase Quid, this section tracks trends in AI-related investments. NetBase Quid tracks data on the investments of over
8 million global public and private companies. NetBase Quid also uses natural language processing techniques to search, analyze, and
identify patterns in large, unstructured datasets, like aggregated news and blogs, and company and patent databases. NetBase Quid
continuously broadens the set of companies for which it tracks data, so that in this year’s AI Index, the reported investment volume for
certain years is larger than that of previous reports.
4.2 Investment
Corporate Investment
As AI becomes more and more integrated into the For the first time since 2013, year-over-year global
economy, it becomes increasingly important to track corporate investment in AI has decreased. In 2022,
AI-related corporate investment. Figure 4.2.1 shows total global corporate AI investment was $189.6
overall global corporate investment in AI from 2013 billion, roughly a third lower than it was in 2021.
to 2022. Corporate investment includes mergers and Still, in the last decade, AI-related investment has
acquisitions, minority stakes, private investment, and increased thirteenfold.
public offerings.
119.66
200
189.59
To provide a fuller context for the Top Five AI Merger/Acquisition Investment Activities, 2022
Source: NetBase Quid, 2022 | Table: 2023 AI Index Report
nature of AI investment in the last year,
Company Name Headquarters Focus Area Funding Amount
Figures 4.2.2 through 4.2.5 highlight Country (in Billions USD)
the top merger/acquisition, minority Nuance United States Arti cial Intelligence; 19.80
Communications, Inc. Enterprise Software;
stake, private investment, and public Healthcare; Machine
offering events in the last year. The Learning
greatest single AI investment event Citrix Systems, Inc. United States Data Management, 17.18
Processing, and Cloud;
was the merger/acquisition of Nuance HR Tech
Communications, valued at $19.8 billion Avast Limited Czech Republic Data Management, 8.02
Processing, and Cloud;
(Figure 4.2.2). The largest minority Fintech; Cybersecurity,
stake event was for the British company Data Protection
6.34
Aveva Group ($4.7 billion) (Figure 4.2.3). AspenTech United States Manufacturing;
Corporation Software; Supply
The greatest private investment event Chain Management
was GAC Aion New Energy Automobile Vivint Smart Home, United States Cybersecurity, Data 5.54
Inc. Protection; Sales
($2.5 billion), a Chinese clean energy Enablement
and automotive company (Figure 4.2.4).
Finally, the largest public offering was Figure 4.2.2
Figure 4.2.3
Figure 4.2.4
Figure 4.2.5
Startup Activity
Global Trend
The next section analyzes private investment trends in The global private AI investment trend reveals that
artificial intelligence startups that have received over while investment activity has decreased since 2021, it
$1.5 million in investment since 2013. is still 18 times higher than it was in 2013 (Figure 4.2.6).
120
Total Investment (in Billions of U.S. Dollars)
100
91.86
80
60
40
20
0
2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 4.2.6
A similar trend, of short-term decreases but longer- decrease from 2021 but a sixfold increase since 2013
term growth, is evident in data on total private (Figure 4.2.7). Similarly, the number of newly funded AI
investment events. In 2022 there were 3,538 AI- companies dropped to 1,392 from 1,669 last year, while
related private investment events, representing a 12% having increased from 495 in 2013 (Figure 4.2.8).
Number of Private Investment Events in AI, 2013–22
Source: NetBase Quid, 2022 | Chart: 2023 AI Index Report
4,000
3,538
3,500
Number of Private Investment Events
3,000
2,500
2,000
1,500
1,000
500
0
2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 4.2.7
Number of Newly Funded AI Companies in the World, 2013–22
Source: NetBase Quid, 2022 | Chart: 2023 AI Index Report
1,600
1,392
1,400
1,200
Number of Companies
1,000
800
600
400
200
0
2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 4.2.8
are disaggregated by size. Across all size Funding Size 2021 2022 Total
Figure 4.2.9
China 13.41
Israel 3.24
India 3.24
Germany 2.35
Canada 1.83
France 1.77
Argentina 1.52
Australia 1.35
Singapore 1.13
Switzerland 1.04
Japan 0.72
Finland 0.61
0 5 10 15 20 25 30 35 40 45
Total Investment (in Billions of U.S. Dollars)
Figure 4.2.10
When private AI investments are aggregated since 2013, the same ranking of countries applies:
The United States is first with $248.9 billion invested, followed by China ($95.1 billion) and the
United Kingdom ($18.2 billion) (Figure 4.2.11).
China 95.11
Israel 10.83
Canada 8.83
India 7.73
Germany 6.99
France 6.59
Singapore 4.72
Japan 3.99
Switzerland 3.04
Australia 3.04
Spain 1.81
While the United States continues to outpace The top five American AI private investment events
other nations in terms of private AI investment, are highlighted in Figure 4.2.13, the top five European
the country experienced a sharp 35.5% decrease Union and British investments in Figure 4.2.14, and the
in AI private investment within the last year (Figure top five Chinese investments in Figure 4.2.15.
4.2.12). Chinese investment experienced a similarly
sharp decline (41.3%).
70
60
Total Investment (in Billions of U.S. Dollars)
50
47.36, United States
40
30
20
13.41, China
10 11.04, European Union and United Kingdom
2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 4.2.12
Top AI Private Investment Events in the United Top AI Private Investment Events in the European
States, 2022 Union and United Kingdom, 2022
Source: NetBase Quid, 2022 | Table: 2023 AI Index Report Source: NetBase Quid, 2022 | Table: 2023 AI Index Report
Company Name Focus Area Funding Amount Company Name Focus Area Funding Amount
(in Billions USD) (in Billions USD)
Anduril Industries, Inc. Cybersecurity, Data 1.50 Celonis, GmbH Retail; Industrial 1.22
Protection; AR/VR; Automation, Network;
Drones HR Tech; Insurtech
Faire Wholesale, Inc. Fintech; Retail; Sales 0.82 Content Square, SAS Analytics; Arti cial 0.60
Enablement Intelligence: CRM:
Data Visualization;
Anthropic, PBC Arti cial Intelligence; 0.58 Digital Marketing;
Information SaaS
Technology; Machine
Learning Retail Logistics Excellence Retail 0.57
- RELEX Oy
Arctic Wolf Networks, Inc. Data Management, 0.40
Processing, and Cloud; Cera Care Limited Medical and 0.32
Cybersecurity, Data Healthcare
Protection
Babylon Holdings Limited Medical and 0.30
JingChi, Inc. Data Management, 0.40 Healthcare; Music,
Processing, and Cloud; Video Content
AV; AR/VR
Figure 4.2.14
Figure 4.2.13
Figure 4.2.15
China 160
United Kingdom 99
Israel 73
India 57
Canada 47
France 44
Germany 41
Singapore 36
Japan 32
Switzerland 26
Australia 23
South Korea 22
Sweden 12
Netherlands 12
A similar trend is evident in the aggregate data since 2013. In the last decade, the number of newly funded
AI companies in the United States is around 3.5 times the amount in China, and 7.4 times the amount in the
United Kingdom (Figure 4.2.17).
China 1,337
Israel 402
Canada 341
France 338
India 296
Japan 294
Germany 245
Singapore 165
Australia 126
Switzerland 108
Sweden 83
Netherlands 78
Figure 4.2.18 breaks Number of Newly Funded AI Companies by Geographic Area, 2013–22
Source: NetBase Quid, 2022 | Chart: 2023 AI Index Report
700
down data on newly
funded AI companies
within select 600
back a decade,
Number of Companies
continues to outpace
both the European 300 293,
European
Union and the United Union and
United Kingdom
Kingdom, as well as 200
160,
China. However, the China
growth rates of the 100
Focus Area Analysis Private Investment in AI by Focus Area, 2021 Vs. 2022
Source: NetBase Quid, 2022 | Chart: 2023 AI Index Report
Private AI investment can also be disaggregated by Medical and Healthcare
Data Management, Processing, Cloud
focus area. Figure 4.2.19 compares global private Fintech
AI investment by focus area in 2022 versus 2021. Cybersecurity, Data Protection
Retail
The focus areas that attracted the most investment Industrial Automation, Network
Sales Enablement
in 2022 were medical and healthcare ($6.1 billion); Marketing, Digital Ads
AR/VR
data management, processing, and cloud ($5.9 Drones
Insurtech
billion); fintech ($5.5 billion); cybersecurity and Music, Video Content
data protection ($5.4 billion); and retail ($4.2 Semiconductor
HR Tech
billion). Mirroring the pattern seen in total AI private Energy, Oil, and Gas
AV
investment, the total investment across most focus NLP, Customer Support
Agritech
areas declined in the last year. Entertainment
Legal Tech
Geospatial
Fitness and Wellness
Ed Tech 2022
Facial Recognition 2021
VC
0 2 4 6 8 10
Total Investment (in Billions of U.S. Dollars)
Figure 4.2.19
Figure 4.2.20 presents trends in AI focus area cybersecurity and data protection, drones, marketing
investments. As noted earlier, most focus areas saw and digital ads, HR tech, AR/VR, and legal tech. Still,
declining investments in the last year. However, some mirroring a broader trend in AI private investment,
of the focus areas that saw increased investments are most focus areas saw greater amounts of AI private
semiconductor, industrial automation and network, investment in 2022 than they did in 2017.
8 8 8 8
6 5.86 6 6.05 6 5.52 6
4 4 4 4
2 2 2 2 1.34
0 0 0 0
2018 2020 2022 2018 2020 2022 2018 2020 2022 2018 2020 2022
8 8 8 8
6 6 6 6
4 4 3.92 4 4.20 4
2 1.65 2 2 2
0.53
0 0 0 0
2018 2020 2022 2018 2020 2022 2018 2020 2022 2018 2020 2022
NLP, Customer Support Energy, Oil, and Gas Cybersecurity, Data Protection Drones
8 8 8 8
6 6 6 5.38 6
4 4 4 4
2 2 1.61 2 2 1.88
1.01
Total Investment (in Billions of U.S. Dollars)
0 0 0 0
2018 2020 2022 2018 2020 2022 2018 2020 2022 2018 2020 2022
8 8 8 8
6 6 6 6
4 4 4 4
3.05
2 2 1.63 2 2 1.74
0.07
0 0 0 0
2018 2020 2022 2018 2020 2022 2018 2020 2022 2018 2020 2022
8 8 8 8
6 6 6 6
4 4 4 4
3.18 2.39
2 2 2 2
0.87 0.37
0 0 0 0
2018 2020 2022 2018 2020 2022 2018 2020 2022 2018 2020 2022
8 8 8 8
6 6 6 6
4 4 4 4
2 2 2 2 1.72
0.71 0.83 0.87
0 0 0 0
2018 2020 2022 2018 2020 2022 2018 2020 2022 2018 2020 2022
VC
8
6
4
2
0.02
0 Figure 4.2.20
2018 2020 2022
Finally, 4.2.21 shows private investment in AI by focus area billion), and 40 times more than that in the European
over time within select geographic regions, highlighting Union and the United Kingdom ($0.04 billion). Chinese
how private investment priorities in AI differ across private investment in AI-related semiconductors ($1.02
geographies. For example, in 2022, private investment billion) was 1.75 times more than that in the United
in AI-related drone technology in the United States ($1.6 States ($0.58 billion), and 102 times more than that in the
billion) was nearly 53 times more than that in China ($0.03 European Union and the United Kingdom ($0.01 billion).
0 0 0 0
2018 2020 2022 2018 2020 2022 2018 2020 2022 2018 2020 2022
0 0 0 0
2018 2020 2022 2018 2020 2022 2018 2020 2022 2018 2020 2022
NLP, Customer Support Energy, Oil, and Gas Cybersecurity, Data Protection Drones
US, 0.69 US, 0.80 US, 3.87 US, 1.60
CN, 0.13 CN, 0.34 CN, 1.07 CN, 0.03
4 EU/UK, 0.04 4 EU/UK, 0.20 4 EU/UK, 0.23 4 EU/UK, 0.04
2 2 2 2
0 0 0 0
Total Investment (in Billions of U.S. Dollars)
2018 2020 2022 2018 2020 2022 2018 2020 2022 2018 2020 2022
Marketing, Digital Ads HR Tech Facial Recognition Insurtech
US, 1.14 US, 0.24 US, 0.07 US, 0.39
CN, 0.88 CN, 0.00 CN, 0.00 CN, 0.00
4 EU/UK, 0.76 4 EU/UK, 1.28 4 EU/UK, 0.00 4 EU/UK, 1.29
2 2 2 2
0 0 0 0
2018 2020 2022 2018 2020 2022 2018 2020 2022 2018 2020 2022
0 0 0 0
2018 2020 2022 2018 2020 2022 2018 2020 2022 2018 2020 2022
0 0 0 0
2018 2020 2022 2018 2020 2022 2018 2020 2022 2018 2020 2022
VC
US, 0.00
CN, 0.00
4 EU/UK, 0.02
2
0 Figure 4.2.21
2018 2020 2022
This section explores how corporations tangibly use AI. First, it highlights industry adoption trends and asks how businesses adopt
AI and what particular AI technologies they find most useful, and identifies how AI adoption affects their bottom line. Second, the
section considers industry motivations and explores what questions industry leaders consider when thinking about incorporating AI
technologies. Finally, it paints a qualitative picture of business AI use by examining trends in AI-related earnings calls.
Share of Respondents Who Say Their Organizations Have Adopted AI in at Least One Function, 2017–22
Source: McKinsey & Company Survey, 2022 | Chart: 2023 AI Index Report
60%
50% 50%
40%
% of Respondents
30%
20%
10%
0%
2017 2018 2019 2020 2021 2022
Figure 4.3.1
In the last half-decade, the average number of AI capabilities that organizations have embedded
has doubled from 1.9 in 2018 to 3.8 in 2022 (Figure 4.3.2). Some of the AI capabilities that McKinsey
features in their survey include recommender systems, NL text understanding, and facial recognition.4
Average Number of AI Capabilities That Respondents’ Organizations Have Embedded Within at Least One
Function or Business Unit, 2018–22
Source: McKinsey & Company Survey, 2022 | Chart: 2023 AI Index Report
4.00
3.80
3.50
Number of AI Capabilities (Average)
3.00
2.50
2.00
1.50
1.00
0.50
0.00
2018 2019 2020 2021 2022
Figure 4.3.2
4 In the 2022 edition of the McKinsey survey, 16 total AI capabilities are considered: computer vision, deep learning, digital twins, facial recognition, GAN, knowledge graphs,
NL generation, NL speech understanding, NL text understanding, physical robotics, recommender systems, reinforcement learning, robotic process automation, transfer
learning, transformers, and virtual agents.
The most commonly adopted AI use case in 2022 was service operations optimization (24%), followed
by the creation of new AI-based products (20%), customer segmentation (19%), customer service
analytics (19%), and new AI-based enhancement of products (19%) (Figure 4.3.3).
Figure 4.3.3
With respect to the type of AI capabilities embedded rates of embedding were 48%, 47%, and 46%. Across
in at least one function or business unit, as indicated all industries, the most embedded AI technologies
by Figure 4.3.4, robotic process automation had were robotic process automation (39%), computer
the highest rate of embedding within high tech/ vision (34%), NL text understanding (33%), and virtual
telecom, financial services and business, and legal agents (33%).
and professional services industries—the respective
All Industries 34% 30% 24% 18% 11% 25% 18% 23% 33% 20% 25% 20% 39% 16% 11% 33%
Consumer Goods/
33% 36% 25% 19% 13% 18% 20% 11% 22% 24% 32% 19% 25% 7% 11% 40%
Industry
Retail
Financial Services 24% 22% 18% 24% 13% 29% 20% 30% 42% 14% 30% 19% 47% 17% 12% 33%
Healthcare Systems/
Pharma and 32% 18% 16% 5% 5% 14% 5% 12% 29% 11% 16% 13% 16% 9% 6% 14%
Med. Products
High Tech/Telecom 37% 45% 24% 16% 15% 23% 24% 29% 40% 15% 34% 23% 48% 22% 15% 43%
C
Fa
Kn
Ph
Re
Re
Ro
Tr
Tr
Vi
ee
ig
om
AN
an
an
rt
c
co
i
ys
bo
ow
nf
ita
Sp
Te
ia
ua
p
sf
sf
ic
pu
or
en
tic
lR
le
xt
Le
lT
lA
er
or
al
ce
ec
te
dg
er
ec
Un
m
w
Pr
a
Ro
Le
ge
en
r
rn
m
h
at
in
og
er
oc
Vi
ar
de
nt
b
en
Un
in
io
s
s(
er
ni
ot
sio
es
ni
s
g
n
ra
rS
tL
d
st
e.
tio
ic
ng
sA
er
ph
n
an
g.
ys
s
ea
n
st
,G
ut
s
di
te
rn
an
om
ng
in
PT
di
at
-3
ng
io
)
n
Figure 4.3.5 shows AI adoption by industry and AI function in 2022. The greatest adoption was in risk for high
tech/telecom (38%), followed by service operations for consumer goods/retail (31%) and product and/or service
development for financial services (31%).
Consumer Goods/
14% 4% 3% 4% 15% 31% 29% 11%
Retail
Industry
Healthcare Systems/
Pharma and 15% 7% 2% 4% 22% 12% 8% 8%
Med. Products
Figure 4.3.6 shows how rates of AI adoption by points); followed by high tech/telecom, for risk
industry and AI function vary from 2021 to 2022 (22 percentage points). The most significant
in order to demonstrate how rates of AI adoption decreases were in high tech/telecom, for product
have changed over the last year. The greatest year- and/or service development (38 percentage points);
over-year increases were in consumer goods/retail, and healthcare systems, also for product and/or
for strategy and corporate finance (25 percentage service development (25 percentage points).
Percentage Point Change in Responses of AI Adoption by Industry and Function 2021 Vs. 2022
Source: McKinsey & Company Survey, 2022 | Chart: 2023 AI Index Report
Consumer Goods/
12% -14% -19% -13% 14% 16% 25% -7%
Retail
Industry
Healthcare Systems/
Pharma and 6% -4% -12% -25% 9% -5% -4% -1%
Med. Products
High Tech/Telecom -6% -5% -24% -38% 22% -13% 15% -8%
Organizations report AI adoption leading to both (Figure 4.3.7). On the revenue side, the functions that
cost decreases and revenue increases. On the cost most respondents saw increases in as a result of AI
side, the functions that most respondents saw adoption were marketing and sales (70%), product
decreases in as a result of AI adoption were supply and/or service development (70%), and strategy and
chain management (52%), service operations (45%), corporate finance (65%).
strategy and corporate finance (43%), and risk (43%)
Decrease by <10% Decrease by 10–19% Decrease by ≥20% Increase by >10% Increase by 6–10% Increase by ≤5%
Product and/or Service Development 30% 20% 6% 13% 24% 33% 70%
% of Respondents
Figure 4.3.7
Figure 4.3.8 shows AI adoption by organizations across all geographies was 50%, down 6% from 2021.
globally, broken out by regions of the world. In 2022, Notably, “Greater China” registered a 20 percentage
North America led (59%), followed by Asia-Pacific point decrease from 2021.
(55%) and Europe (48%). The average adoption rate
50%
All Geographies
56%
55%
Asia-Paci c
64%
48%
Europe
51%
59%
North America
55%
Consideration and Mitigation of Risks From risks were regulatory compliance (45%), personal/
Adopting AI individual privacy (40%), and explainability (37%).
As has been the case in the last few iterations of the The least salient risks identified by organizations were
McKinsey report, in 2022 respondents identified national security (13%) and political stability (9%).
cybersecurity as the most relevant risk when adopting
AI technology (59%) (Figure 4.3.9). The next most cited
50%
0%
2019 2020 2021 2022
Figure 4.3.9
Figure 4.3.10 highlights the AI risks that organizations have taken steps to mitigate. For instance, there is
are taking steps to mitigate. The top three responses a gap of 8 percentage points for cybersecurity, 9
were cybersecurity (51%), followed by regulatory percentage points for regulatory compliance, and 12
compliance (36%) and personal/individual privacy percentage points for personal/individual privacy.
(28%). As was the case in previous years, there are These differences suggest there is a gap between
meaningful gaps between the risks organizations the awareness organizations have of various risks and
cite as relevant and those which organizations their steps taken to mitigate such risks.
51%, Cybersecurity
50%
40%
30%
28%, Personal/Individual Privacy
10%
7%, National Security
4%, Political Stability
0%
2019 2020 2021 2022
Figure 4.3.10
Narrative Highlight:
The Effects of GitHub’s Copilot on Developer
Productivity and Happiness
In 2021, launched a technical preview of Copilot,
It took the developers
a generative AI tool that enables developers and
coders to present a coding problem in natural using Copilot only 71
language and then have Copilot generate a minutes to complete their
solution in code. Copilot can also translate
between various programming languages. In
task—56% less time than
2022, GitHub surveyed over 2,000 developers the developers who did not
who were using the tool to determine its effect on
their productivity, well-being, and workflow.5
use Copilot (161 minutes).
Figure 4.3.11 summarizes the results of the survey. reported a completion rate of 78%, 8 percentage
Developers overwhelmingly reported feeling points higher than those who did not use Copilot.
more productive, satisfied, and efficient when Likewise, it only took the developers using Copilot
working with Copilot. More specifically, 88% of 71 minutes to complete their task, which was 56%
surveyed respondents commented feeling more less time than the developers who did not use
productive, 74% reported being able to focus on Copilot (161 minutes). These survey and experiment
more satisfying work, and 88% claimed to have results are evidence of the tangible ways in which
completed tasks more quickly. One software AI tools improve worker productivity.
engineer stated, “[With Copilot] I have to think
less, and when I have to think, it’s the fun stuff. It
sets off a little spark that makes coding more fun
and more efficient.”6
5 Most of the developers surveyed, around 60%, were professional developers; 30% were students and 7% were hobbyists.
6 The quote is taken from this source.
Narrative Highlight:
The Effects of GitHub’s Copilot on Developer
Productivity and Happiness (cont’d)
Measuring Dimensions of Developer Productivity When Using Copilot: Survey Responses, 2022
Source: GitHub Survey, 2022 | Chart: 2023 AI Index Report
Figure 4.3.11
Number of Developers 45 50
Completion Rate (%) 78 70
Figure 4.3.12
Perceived Importance of AI
Figures 4.3.13 and 4.3.14 suggest that an
overwhelming majority of business leaders
perceive AI to be important for their businesses.
More specifically, when asked how important
AI solutions were for their organization’s overall Believe AI Enhances Performance and Job
Satisfaction, 2022
Source: Deloitte Survey, 2022 | Chart: 2023 AI Index Report
success, 94% responded “important,” 5% said
“somewhat important,” and 1% answered “not 2%, Strongly Disagree / Disagree 16%, Neither Agree nor Disagree
1%, Unsure
important” (Figure 4.3.13).
80%
76%
60%
% of Respondents
40%
20%
0%
2018 2019 2020 2021 2022
Figure 4.3.15
Figure 4.3.16 highlights the main outcomes that business leaders achieved by embracing AI solutions.7
The top outcome was lowered costs (37%), followed by improved collaboration across business
functions/organizations (34%) and having discovered valuable insights (34%).
Figure 4.3.16
7 Figure 4.3.16 is drawn from the chart in the Deloitte survey: “Outcomes—‘Achieved to a high degree.’”
Challenges in Starting and Scaling AI Projects were proving business value (37%), lack of executive
The top three challenges that business leaders commitment (34%), and choosing the right AI
identified in terms of starting AI-related projects technologies (33%) (Figure 4.3.17).
Proving Business
37%
Value
Lack of Executive
34%
Commitment
Figure 4.3.17
The main barrier leaders faced in scaling existing AI initiatives was managing AI-related risks (50%), obtaining
more data or inputs to train a model (44%), and implementing AI technologies (42%) (Figure 4.3.18).
Managing
50%
AI-Related Risks
Implementing
42%
AI Technologies
Figure 4.3.18
300
268
Number of Earnings Calls
200
100
0
2018 2019 2020 2021 2022
Figure 4.3.19
Themes for AI Mentions in Fortune 500 Earnings Calls, 2018 Vs. 2022
Source: NetBase Quid, 2022 | Chart: 2023 AI Index Report
9.96% (-15%)
Business Integration
11.74%
8.82% (+48%)
Pricing and Inventory Management
5.94%
8.82% (+204%)
Advertising and Marketing
2.90%
8.39% (+23%)
Process Automation
6.81%
7.40% (-7%)
Support Decision-Making
7.97%
7.11% (+69%)
Healthcare and Medical Practices
4.20%
6.26% (+73%)
Cloud Platforms
3.62%
5.26% (-21%)
Personalizing Customer Experience
6.67%
4.84% (-41%)
Deep Learning
8.26%
4.13% (+24%)
Edge Intelligence
3.33%
3.84% (+121%)
Nvidia AI Use Cases
1.74%
3.27% (+33%)
Revenue Growth
2.46%
3.13% (-47%)
Autonomous Vehicles
5.94%
2.99% (+37%)
Data Processing
2.17%
2.99% (-55%)
Data Storage and Management
6.67%
2.70% (+10%)
Adobe Experience
2.46%
2.42% (+734%)
Customer Support
0.29%
2.13% (-59%)
Azure Cognitive Services
5.22%
1.85% (-20%)
Data Center GPU
2.32%
1.28% (+47%)
Investments
0.87% 2022
1.00% (-62%) 2018
Nvidia RTX
2.61%
0.71% (-87%)
Digital Transformation
5.36%
Narrative Highlight:
What Are Business Leaders Actually Saying About AI?
In terms of process automation, business leaders emphasize the ability of AI tools to accelerate
productivity gains and to deliver a better customer experience.
“We continue to drive “We have improved the “In September, we opened a next-
the use of automation experience for customers by gen fulfillment center in Illinois.
and artificial applying artificial intelligence This 1.1 million square foot facility
intelligence to drive to match them with an expert features robotics, machine learning,
productivity gains to who is right for their specific and automated storage, resulting in
help offset inflationary situation and to deliver insights increased productivity and a better
pressures.” – Jim Davis, to experts so they can provide service for our customers at faster
CEO, Quest Diagnostics excellent service.” – Sasan delivery times.” – John David, CFO,
(Q4 2022) Goodarzi, CEO, Intuit (Q2 2022) Walmart (Q3 2022)
Narrative Highlight:
What Are Business Leaders Actually Saying About AI?
(cont’d)
The conversation surrounding pricing and inventory management saw companies reassuring business
audiences on how their use of AI would improve their operational strength, especially in environments of
high inflation and supply chain challenges.
“We are … continuing to refine and invest “Our teams are utilizing technology, innovative data analytics
in machine learning tools that will allow for and AI to forecast supply chain lead times and changes
more sophisticated competitive pricing in market demand to ensure optimal levels. These actions
and greater automation at scale.” along with our pricing initiatives positively impacted our
– Adrian Mitchell, CFO, Macy’s (Q3 2022) gross margin in the second quarter.”
– Bert Nappier, CFO, Genuine Parts Company (Q3 2022)
There is also a vibrant discussion about the ways in which AI can change healthcare and medical
practices, more specifically to reduce costs, improve the patient experience, and better serve clinicians.
“[Using] machine “I’d like to highlight productivity efforts in our preauthorization process where
learning and robotics, we’re leveraging an in-house artificial intelligence solution to automatically
we can now resolve match incoming faxes to the correct authorization requests. This solution
a wide range of creates administrative efficiencies across millions of inbound images. We are
prescription drug also scaling this solution to multiple business units such as pharmacy and
claims which previously are also expanding the application of this type of AI to provide decision
required the attention of support to clinicians, which will result in improvements to authorization
our pharmacists, freeing turnaround times, reduction in friction for providers and creating a better
them up to spend time member experience.” – Bruce Broussard, CEO, Humana (Q3 2022)
with patients. This
advanced approach
reduces overall cost
and improves the “We continue to see opportunities across [the software and analytics] segment
patient experience.” as payers, providers, and partners take advantage of our high ROI solutions and
– Karen Lynch, CEO, realize the benefits of our data, AI models, and workflow capabilities.”
CVS Health (Q2 2022) – Neil de Crescenzo, CEO, UnitedHealth Group (Q2 2022)
Sentiment Summary Distribution for AI Mentions in Fortune 500 Earnings Calls by Publication Date, 2018–22
Source: NetBase Quid, 2022 | Chart: 2023 AI Index Report
3%% 3%%
1%% 1%% 1%%
0%%
1%%
2%%
1%% 1%% 1%%
1%% 2%%
3%%
13%%
18%%
19%%
16%%
15%%
16%%
14%%
13%%
15%%
20%%
18%%
22%%
23%%
18%%
17%%
19%%
12%%
21%%
17%%
17%%
87%%
86%%
86%%
85%%
84%%
84%%
84%%
84%%
80%
81%%
81%%
81%%
81%%
81%%
80%%
80%%
79%%
79%%
77%%
77%%
76%%
Sentiment Summary
60%
40%
Negative
20% Mixed
Positive
0%
Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4
2018 2019 2020 2021 2022
Figure 4.3.21
8 Chapter 2 of the 2023 AI Index highlights trends in the performance of sentiment analysis algorithms.
Given that robots are frequently deployed with AI-based software technologies, it is possible to gain insights on AI-ready infrastructure
being deployed in the real world by tracking the installation of industrial robots. Data in this section comes from the International
Federation of Robotics (IFR), an international nonprofit organization that works to promote, strengthen, and protect the robotics
industry. Every year the IFR releases the World Robotics Report, which tracks global trends in installations of robots.9
517
500
Number of Industrial Robots Installed (in Thousands)
400
300
200
100
0
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 4.4.1
9 Due to the timing of the IFR’s survey, the most recent data is from 2021.
The worldwide operational stock of industrial robots from 3,035,000 in 2020. In the last decade, the
also continues to steadily increase year over year number of industrial robots being installed and the
(Figure 4.4.2). The total number of operational number being used have both steadily increased.
industrial robots jumped 14.6% to 3,477,000 in 2021,
3,477
3,500
3,000
Number of Industrial Robots (in Thousands)
2,500
2,000
1,500
1,000
500
0
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 4.4.2
Industrial Robots: Traditional Vs. scalable than traditional robots, and are capable of
Collaborative Robots iterative learning.
A distinction can be drawn between traditional
robots that work for humans and collaborative In 2017, only 2.8% of all newly installed industrial
robots that are designed to work with humans. robots were collaborative (Figure 4.4.3). As of 2021,
Recently, the robotics community has been excited that number increased to 7.5%. Although traditional
about the potential of collaborative robots given industrial robots still lead new installations, the
that they can be safer, more flexible, and more number of collaborative robots is slowly increasing.
Traditional 517
500 Collaborative 39
Number of Industrial Robots Installed (in Thousands)
424
400 394
400 391
300
478
100
0
2017 2018 2019 2020 2021
Figure 4.4.3
China 268.20
Japan 47.20
Germany 23.80
Italy 14.10
Taiwan 9.60
France 5.90
Mexico 5.40
India 4.90
Canada 4.30
Thailand 3.90
Singapore 3.50
Spain 3.40
Poland 3.30
Figure 4.4.4
In 2013, China overtook Japan as the nation installing only widened. In 2013, Chinese industrial robot
the most industrial robots (Figure 4.4.5). Since then, installations represented 20.8% of the world’s share,
the gap between the total number of industrial robots whereas in 2021, they represented 51.8%.
installed by China and the next-nearest nation has
268, China
250
Number of Industrial Robots Installed (in Thousands)
200
150
100
50 47, Japan
35, United States
31, South Korea
24, Germany
0
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 4.4.5
China consolidated its dominance in industrial robotics in 2021, the first year in which the country installed
more industrial robots than the rest of the world combined (Figure 4.4.6).
Number of Industrial Robots Installed (China Vs. Rest of the World), 2016–21
Source: International Federation of Robotics (IFR), 2022 | Chart: 2023 AI Index Report
268, China
200
150
100
50
0
2016 2017 2018 2019 2020 2021
Figure 4.4.6
Figure 4.4.7 shows the annual growth rate of of industrial robot installations. The countries that
industrial robot installations from 2020 to 2021 by reported the highest growth rates were Canada
country. Virtually every country surveyed by the (66%), Italy (65%), and Mexico (61%).
IFR reported a yearly increase in the total number
Annual Growth Rate of Industrial Robots Installed by Country, 2020 Vs. 2021
Source: International Federation of Robotics (IFR), 2022 | Chart: 2023 AI Index Report
Canada 66%
Italy 65%
Mexico 61%
Poland 56%
India 54%
China 51%
Thailand 36%
Taiwan 31%
Japan 22%
France 11%
Germany 6%
South Korea 2%
Spain 1%
Singapore -35%
−40% −30% −20% −10% 0% 10% 20% 30% 40% 50% 60% 70%
Annual Growth Rate of Industrial Robots Installed
Figure 4.4.7
Narrative Highlight:
Country-Level Data on Service Robotics
Another important class of robots are service robots, Service Robots in Medicine
Source: UL Solutions, 2022
which the ISO defines as a robot “that performs
useful tasks for humans or equipment excluding
industrial automation applications.”10 Figure 4.4.8
is an example of a robot being used in medicine,
Figure 4.4.9 illustrates how a robot can help with
professional cleaning, and Figure 4.4.10 shows a
robot designed for maintenance and inspection.
Figure 4.4.8
Narrative Highlight:
Country-Level Data on Service Robotics (cont’d)
Compared to 2020, 2021 saw a higher number of professional service robots installed in the world
for several key application areas, including hospitality, medical robotics, professional cleaning, and
transportation and logistics (Figure 4.4.11). The category that registered the greatest year-over-year
increase was transportation and logistics: In 2021, 1.5 times the number of such service robots were
installed as in 2020.
Number of Professional Service Robots Installed in the World by Application Area, 2020 Vs. 2021
Source: International Federation of Robotics (IFR), 2022 | Chart: 2023 AI Index Report
8
Agriculture
8
20
Hospitality
11
15
Medical Robotics
12
13
Professional Cleaning 2021
10
2020
50
Transportation and Logistics
34
0 5 10 15 20 25 30 35 40 45 50
Number of Professional Service Robots Installed (in Thousands)
Figure 4.4.11
Narrative Highlight:
Country-Level Data on Service Robotics (cont’d)
As of 2022, the United States has the greatest number of professional service robot manufacturers,
roughly 2.16 times as many as the next nation, China. Other nations with significant numbers of robot
manufacturers include Germany (91), Japan (66), and France (54) (Figure 4.4.12).
Number of Professional Service Robot Manufacturers in Top Countries by Type of Company, 2022
Source: International Federation of Robotics (IFR), 2022 | Chart: 2023 AI Index Report
225
29
200
Number of Professional Service Robot Manufacturers
150
104
100 194 Startups
91
Incumbents
Unknown
66
54 52
50 94 47
39 39
79
61
49 44 44
34 35
0
3
2
United States China Germany Japan France Russia South Korea Switzerland Canada
Figure 4.4.12
52
All Others 37
30
119
Automotive 84
102
137
Electrical/Electronics 110
89
15
Food 12
11
64
Metal and Machinery 44
52
24
Plastic and Chemical Products 19
18 2021
107 2020
Unspeci ed 87 2019
87
Figure 4.4.13
Robots can also be deployed in a wide range of 230,000 industrial robots were installed for handling
applications, from assembling to dispensing and functions, 2.4 times more than for welding (96,000)
handling. Figure 4.4.14 illustrates how the application and 3.7 times more than for assembling (62,000).
of industrial robots has changed since 2021. Handling Every application category, with the exception
continues to be the application case toward which of dispensing and processing, saw more robot
the most industrial robots are deployed. In 2021, installations in 2021 than in 2019.
62
Assembling 50
40
32
Cleanroom 32
26
11
Dispensing 8
12
230
Handling 169
177
7
Processing 5
7 2021
2020
80
2019
Unspeci ed 60
55
96
Welding 70
74
Figure 4.4.14
China Vs. United States and metal and machinery (34,000) (Figure 4.4.15).
The Chinese industrial sectors that installed the Every industrial sector in China recorded a greater
greatest number of industrial robots in 2022 were number of robot installations in 2021 than in 2019.
electrical/electronics (88,000), automotive (62,000),
29
All Others 21
12
62
Automotive 31
32
88
Electrical/Electronics 64
42
4
Food 3
3
34
Metal and Machinery 22
22
1
Pharma/Cosmetics 1
1
6
Rubber and Plastics 5
4 2021
43
2020
Unspeci ed 30 2019
31
0 10 20 30 40 50 60 70 80 90
Number of Industrial Robots Installed (in Thousands)
Figure 4.4.15
The automotive industry installed the greatest number of industrial robots in the United States in 2021,
although installation rates for that sector decreased year over year (Figure 4.4.16). However, other sectors like
food, along with plastic and chemical products, saw year-over-year increases in robot installations.
4.50
All Others 2.60
3.50
9.80
Automotive 10.50
13.00
2.90
Electrical/Electronics 3.70
3.50
3.40
Food 2.70
2.20
3.80
Metal and Machinery 2.30
3.80
3.50
Plastic and Chemical Products 2.60
2.50 2021
7.10 2020
Unspeci ed 6.30 2019
5.00
0 2 3 5 6 8 9 11 12 14
Number of Industrial Robots Installed (in Thousands)
Figure 4.4.16
Artificial Intelligence
Index Report 2023
CHAPTER 5:
Education
CHAPTER 5 PREVIEW:
Education
Overview 236 5.2 K–12 AI Education 257
Chapter Highlights 237 United States 257
State-Level Trends 257
5.1 Postsecondary AI Education 238 AP Computer Science 258
CS Bachelor’s Graduates 238 Narrative Highlight: The State of
CS Master’s Graduates 240 International K–12 Education 260
Overview
Studying the state of AI education is important for gauging some of the ways in which
the AI workforce might evolve over time. AI-related education has typically occurred
at the postsecondary level; however, as AI technologies have become increasingly
ubiquitous, this education is being embraced at the K–12 level. This chapter examines
trends in AI education at the postsecondary and K–12 levels, in both the United States
and the rest of the world.
We analyze data from the Computing Research Association’s annual Taulbee Survey
on the state of computer science and AI postsecondary education in North America,
Code.org’s repository of data on K–12 computer science in the United States, and a
recent UNESCO report on the international development of K–12 education curricula.
Chapter Highlights
More and more AI specialization.
The proportion of new computer science PhD graduates from U.S. universities who specialized in AI
jumped to 19.1% in 2021, from 14.9% in 2020 and 10.2% in 2010.
33,059
30,000
Number of New CS Bachelor’s Graduates
25,000
20,000
15,000
10,000
5,000
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 5.1.1
Figure 5.1.2 looks at the proportion of CS bachelor’s graduates in North America who are international
students. The number stood at 16.3% in 2021 and has been steadily increasing since 2012—the proportion
of such students has risen 9.5 percentage points since 2012.
16.30%
16%
New International CS Bachelor’s Graduates (% of Total)
12%
8%
4%
0%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 5.1.2
CS Master’s Graduates
AI courses are also commonly offered in CS master’s as many master’s graduates as in 2012. However,
degree programs. Figure 5.1.3 shows the total from 2018 to 2021 the total number of new master’s
number of new CS master’s graduates in North graduates plateaued, declining slightly from 15,532 to
America since 2010. In 2021 there were roughly twice 15,068.
14,000
Number of New CS Master’s Graduates
12,000
10,000
8,000
6,000
4,000
2,000
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 5.1.3
Interestingly, the number of CS master’s students at North American universities who are international started
declining in 2016 after rising in the early 2010s (Figure 5.1.4). Despite the decline, in 2021 the majority of CS
master’s graduates remained international (65.2%).
80%
New International CS Master’s Graduates (% of Total)
65.20%
60%
40%
20%
0%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 5.1.4
CS PhD Graduates
Unlike the trends in bachelor’s and master’s CS computer science (Figure 5.1.5). There were fewer
graduates, since 2010 there have not been large CS PhD graduates in 2021 (1,893) than in 2020 (1,997)
increases in the number of new PhD graduates in and 2012 (1,929).
2,000
1,893
Number of New CS PhD Graduates
1,500
1,000
500
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 5.1.5
CS PhD graduates in North American universities are becoming increasingly international (Figure 5.1.6). In 2010,
45.8% of CS PhD graduates were international students; the proportion rose to 68.6% in 2021.
70%
68.60%
60%
New International CS PhD Graduates (% of Total)
50%
40%
30%
20%
10%
0%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 5.1.6
Moreover, now a significantly larger proportion of new CS PhD students are specializing in AI (Figure 5.1.7). In
2021, 19.1% of new CS PhD students in North American institutions specialized in AI, a 4.2 percentage point
increase since 2020 and 8.6 percentage point increase since 2012.
19.10%
18%
16%
New AI PhD Students (% of Total)
14%
12%
10%
8%
6%
4%
2%
0%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 5.1.7
Where do new AI PhDs choose to work following (40.9%) as in academia (41.6%). However, as of 2021
graduation? Mirroring trends reported in last year’s a significantly larger proportion of students (65.4%)
AI Index report, an increasingly large proportion of went to industry after graduation than to academia
AI PhD graduates are heading to industry (Figures (28.2%). The amount of new AI PhDs entering
5.1.8 and 5.1.9). In 2011, for example, roughly the government was 0.7% and has remained relatively
same percentage of graduates took jobs in industry unchanged in the last half-decade.
Employment of New AI PhDs in North America by Employment of New AI PhDs (% of Total) in North
Sector, 2010–21 America by Sector, 2010–21
Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report
281
Industry
65.44%, Industry
Government
249 60%
250 Academia 238 84
New AI PhD Graduates (% of Total)
219
Number of New AI PhD Graduates
65 50%
201
200 73
178 61
40%
63
154 154
150 60
134 132 136 30%
47 123 28.19%, Academia
72 43
63 51 42
100 195 20%
180
162 153
134
116 10%
50 101
76 85 77
74
64
0% 0.67%, Government
0
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
1 The sums in Figure 5.1.9 do not add up to 100, as there is a subset of new AI PhDs each year who become self-employed, unemployed, or report an “other” employment status
in the CRA survey. These students are not included in the chart.
CS, CE, and 5.1.10 highlights the total number of CS, CE (computer
engineering), and information faculty in North
Information Faculty American universities. The amount of faculty has
To better understand trends in AI and CS education, marginally increased in the last year, by 2.2%. Since
it is instructive to consider data on computer science 2011 the number of CS, CE, and information faculty
faculty in addition to postsecondary students. Figure has grown by 32.8%.
8,149
7,858 7,976
8,000 522
7,657 530
7,362 668 306
653 296
7,000 6,806 6,887 691
Number of CS, CE, and Information Faculty
426 861
6,629 465 736
6,478 589
6,314 649 432 617
6,138 689 494
6,000 602 766 390
656 432 1,183 1,150
529 1,180 831 895
515 676 1,122
447 1,014
5,000 863
661 487
669
4,000
3,000
5,059 5,214 5,252 5,231 5,310
4,536 4,549 4,548 4,711 4,786
4,366
2,000
1,000
0
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 5.1.10
In 2021 there were a total of 6,789 CS faculty members in the United States (Figure 5.1.11). The total number
of CS faculty in the United States increased by only 2.0% in the last year, but by 39.0% since 2011.
7,000 6,789
6,654
6,430 6,533
428
424
6,098 531 518 287
6,000 276
5,637 5,729 567 382
426 618 693
5,256 5,202 535 491 534
5,068 408 436
5,000 4,885 521 509 364
396 946 899
592 671 715
947
Number of CS Faculty
522
460 455 903
491 826
387
4,000 550 679
521 421
3,000
1,000
0
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 5.1.11
Figure 5.1.12 reports the total number of new CS, hires in 2021, while in 2012 there were 733. Similarly,
CE, and information faculty hires in North American the total number of tenure-track hires peaked in 2019
universities. In the last decade, the total number of at 422 and has since dropped to 324 in 2021.
new faculty hires has decreased: There were 710 total
New CS, CE, and Information Faculty Hires in North America, 2011–21
Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report
800
800 765
Number of New CS, CE, and Information Faculty Hires
749 749
733
710
691
422
396 406
400 374
348 358
320 324
294
249 258
218
200
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 5.1.12
In 2021, the greatest percentage of new CS, CE, and information faculty hires (40%) came straight from
receiving a PhD (Figure 5.1.13). Only 11% of new CS and CE faculty came from industry.
Source of New Faculty in North American CS, CE, and Information Departments, 2011–21
Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report
100%
13% 13% 11% 11%
80%
34% 34%
34%
Source of New Faculty
41%
60%
16% 15%
15%
40%
17%
0%
2018 2019 2020 2021
Figure 5.1.13
The share of filled new CS, CE, and information faculty positions in North American universities has remained
relatively stable in the last decade (Figure 5.1.14). In 2021, 89.3% of new faculty positions were filled, compared
to 82.7% in 2011.
Share of Filled New CS, CE, and Information Faculty Positions in North America, 2011–21
Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report
90% 89.28%
Share of Filled New CS, CE, and Information Faculty Positions
80%
70%
60%
50%
40%
30%
20%
10%
0%
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 5.1.14
Among open CS, CE, and information faculty positions in 2021, the most commonly cited reason for their
remaining unfilled was offers being turned down (53%) (Figure 5.1.15). In 22% of cases, hiring was still in
progress, while 14% of the time, a candidate had not been identified who met the department’s hiring goals.
Reason Why New CS, CE, and Information Faculty Positions Remained Un lled (% of Total), 2011–21
Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report
Didn’t �nd a person who met our hiring goals Technically vacant, not �lled for admin reasons Other
O�ers turned down Hiring in progress
100%
Reason Faculty Positions Remained Un�lled (% of Total)
10% 18%
10% 23%
25% 26% 22%
31% 28% 27%
80% 17%
18%
5% 6%
10%
55%
6%
60% 45% 12%
34%
40%
36%
40% 52% 51% 56% 53%
43%
44%
Figure 5.1.15
Figure 5.1.16 highlights the median nine-month salaries full professor in computer science made 3.2% more
of CS faculty in the United States by position since than they did in 2020, and 12.8% more than they did in
2015. During that period, the salaries for all classes 2015. (Note: These figures have not been adjusted for
of professors have increased. In 2021, the average inflation.)
158.97
160 156.02
140
127.47
121.55 123.71
117.5 119.48
120 113.95 114.07
111.67 109.23
105.45 107.55
101.16 103.01
99.12
100
80
60
40
20
0
2015 2016 2017 2018 2019 2020 2021
Figure 5.1.16
What proportion of new CS, CE, and information faculty tenure-track hires are international? The data suggests
that it is not a substantial proportion. In 2021, only 13.2% of new CS, CE, and information faculty hires were
international (Figure 5.1.17).
New International CS, CE, and Information Tenure-Track Faculty Hires (% of Total) in North America,
2010–21
Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report
25%
New International Tenure-Track Faculty Hires (% of Total)
20%
15%
13.20%
10%
5%
0%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 5.1.17
The majority of CS, CE, and Information faculty losses in North American departments (36.3%) were the result
of faculty taking academic positions elsewhere (Figure 5.1.18). In 2021, 15.2% of faculty took nonacademic
positions, which is roughly the same amount as those who took such positions a decade prior, in 2011 (15.9%).
Faculty Losses in North American CS, CE, and Information Departments, 2011–21
Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report
Died Took academic position elsewhere Remained, but changed to part-time Unknown
Retired Took nonacademic position Other
327
312
303 20 303
300
270 23 37
43
246 22
250 46
232 237 234 34
221 20 33
213 20
22 42
Faculty Losses
200
36 24
44 26 139
27 32
126 110
113
150 89
34 77
62 85
74 86
100 52
90 103 100
50 89 94 94 91
67 74 65 80
0
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 5.1.18
Narrative Highlight:
Who Funds CS Departments in the U.S.?
The CRA tracks data on the external funding sources agencies such as the Army Research Office,
of CS departments in the United States. The main the Office of Naval Research, and the Air Force
funder of American CS departments continues to Research Laboratory (20.3%); industrial sources
be the National Science Foundation (NSF), which (12.1%); the Defense Advanced Research Projects
in 2021 accounted for 34.9% of external funds. Agency (DARPA) (8.8%); and the National
However, the share of funding provided by NSF has Institutes of Health (NIH) (6.8%). The diminishing
decreased since 2003 (Figure 5.1.19). In 2021, the share of NSF funds over time has been partially
next largest sources of funding came from defense offset by increasing funds from industry and NIH.
40%
30%
25%
Figure 5.1.19
Narrative Highlight:
Who Funds CS Departments in the U.S.? (cont’d)
Figure 5.1.20 shows the median total expenditures Although total median expenditures have
from external sources for computing research in increased over the last decade for both private and
American CS departments. In 2021, the median total public CS departments, the gap in expenditure
expenditure for private universities was $9.7 million has widened, with private universities beginning to
compared with $5.7 million for public universities. significantly outspend public ones.
Median Total Expenditure From External Sources for Computing Research of U.S. CS Departments, 2011–21
Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report
10
9.71, Private
Median Total Expenditure (in Millions of U.S. Dollars)
6
5.69, Public
0
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 5.1.20
The following subsection shows trends in K–12 AI education based on K–12 computer science education data in the United States as well
as survey data from UNESCO on the state of global K–12 AI education.
181.04
180
Number of AP Computer Science Exams Taken (in Thousands)
160
140
120
100
80
60
40
20
0
2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 5.2.3
2 There are two types of AP CS exams: Computer Science A and Computer Science Principles. Data on computer science exams taken includes both exams. AP CS Principles
was initially offered in 2017.
In 2021, the states which Number of AP Computer Science Exams Taken, 2021
Source: Code.org, 2022 | Chart: 2023 AI Index Report
saw the greatest number of
AP computer science exams AK ME
100 242
taken were California (31,189),
VT NH MA
followed by Texas (17,307), 150 403 5,451
Florida (14,864), New York WA MT ND SD MN WI MI NY CT RI
4,034 42 109 26 1,432 2,080 4,504 13,304 3,251 617
(13,304), and New Jersey
(9,391) (Figure 5.2.4). OR ID WY NE IA IL IN OH PA NJ
714 429 112 514 521 8,572 2,883 3,754 6,104 9,391
Figure 5.2.5 looks at the CA NV UT CO KS MO KY WV DC MD DE
31,189 1,701 612 2,584 236 1,199 1,462 352 352 7,662 513
number of AP CS exams
AZ NM OK AR TN VA NC
taken per capita.3 The state 1,587 270 500 1,406 2,046 6,034 6,273
with the largest per capita TX LA MS AL GA SC
17,307 1,191 400 2,399 7,221 2,159
amount of AP computer
science exams taken in HI FL
782 14,864
2021 was Maryland, with Figure 5.2.4
3 More specifically, Figure 5.2.5 normalizes the number of AP CS exams taken—the total number of exams taken in a particular state in 2021 is divided by the state’s
population based on the 2021 U.S. Census.
Narrative Highlight:
The State of International K–12 Education
In 2021, UNESCO released one of the most Figure 5.2.6, taken from the UNESCO report,
comprehensive reports to date on the international highlights the governments that have taken steps
state of government-endorsed AI curricula. To to implement AI curricula and across which levels
gather information, UNESCO released two surveys: of education. For example, Germany is in the
the first to representatives of 193 UNESCO member process of developing government-endorsed AI
states and the second to over 10,000 private- curricular standards on the primary, middle, and
and third-sector actors. As part of these surveys, high-school levels, and the Chinese government
respondents were asked to report on the status of AI has already endorsed and implemented
curricula for students in K–12 general education. standards across those same three levels.
Government Implementation
Implementationof
ofAI
AICurricula
Curriculaby
byCountry,
Country,Status,
Status,and
andEducation
EducationLevel
Level
UNESCO, 2022
Source: UNESCO, 2022 || Table:
Table:2023
2023AI
AIIndex
IndexReport
Report
Country Status
Status PrimarySchool
Primary School Middle
Middle School
School High
High School
School
Armenia Endorsedand
Endorsed andImplemented
Implemented ✓✓ ✓✓
Austria Endorsed
Endorsedand
andImplemented
Implemented ✓✓
Belgium Endorsed
Endorsedand
andImplemented
Implemented ✓✓
China Endorsed
Endorsedand
andImplemented
Implemented ✓✓ ✓✓ ✓✓
India Endorsed
Endorsedand
andImplemented
Implemented ✓✓ ✓✓
Kuwait Endorsed
Endorsedand
andImplemented
Implemented ✓✓ ✓✓
Portugal
Portugal Endorsed
Endorsedand
andImplemented
Implemented ✓✓ ✓✓ ✓✓
Qatar
Qatar Endorsed
Endorsedand
andImplemented
Implemented ✓✓ ✓✓ ✓✓
Serbia
Serbia Endorsed
Endorsedand
andImplemented
Implemented ✓✓ ✓✓
South
South Korea
Korea Endorsed
Endorsedand
andImplemented
Implemented ✓✓
United
United Arab
Arab Emirates
Emirates Endorsed
Endorsedand
andImplemented
Implemented ✓✓ ✓✓ ✓✓
Bulgaria
Bulgaria In Development
In Development ✓✓ ✓✓ ✓✓
Germany
Germany In
InDevelopment
Development ✓✓ ✓✓ ✓✓
Jordan
Jordan In
InDevelopment
Development ✓✓ ✓✓
Saudia
Saudia Arabia
Arabia In
InDevelopment
Development ✓✓ ✓✓ ✓✓
Serbia
Serbia In
InDevelopment
Development ✓✓ ✓✓
Figure 5.2.64
4 According to the UNESCO report, Serbia has already endorsed and implemented certain kinds of K–12 AI curricula, but is also simultaneously in the process of
developing others—thus it is listed under both categories.
Narrative Highlight:
The State of International K–12 Education (cont’d)
Figure 5.2.7 identifies the topic areas most emphasized in the K–12 AI curricula profiled in the UNESCO
report. The four topics toward which the most time was allocated were algorithms and programming (18%),
AI technologies (14%), data literacy (12%), and application of AI to other domains (12%).
Ethics of AI 7%
Social Implications of AI 5%
AI Technologies 14%
Developing AI Technologies 9%
AI Foundations
AI Techniques 2% Ethics and Social Impact
Understanding, Using, and Developing AI
Unspeci ed 10% Unspeci ed
Narrative Highlight:
The State of International K–12 Education (cont’d)
What might an actual K–12 AI curriculum look
like in practice? The UNESCO report includes
detailed information about a sample curriculum
that was deployed in Austria, the Austrian Data
Science and Artificial Intelligence curriculum.
As noted in the report:
Artificial Intelligence
Index Report 2023
CHAPTER 6:
Policy and
Governance
CHAPTER 6 PREVIEW:
Overview
The growing popularity of AI has prompted intergovernmental, national, and
regional organizations to craft strategies around AI governance. These actors are
motivated by the realization that the societal and ethical concerns surrounding AI
must be addressed to maximize its benefits. The governance of AI technologies has
become essential for governments across the world.
Chapter Highlights
Policymaker interest When it comes to AI,
in AI is on the rise. policymakers have
An AI Index analysis of the legislative records a lot of thoughts.
of 127 countries shows that the number of bills A qualitative analysis of the
containing “artificial intelligence” that were parliamentary proceedings of a
passed into law grew from just 1 in 2016 to 37 in diverse group of nations reveals
2022. An analysis of the parliamentary records on that policymakers think about AI
AI in 81 countries likewise shows that mentions from a wide range of perspectives.
of AI in global legislative proceedings have For example, in 2022, legislators in
increased nearly 6.5 times since 2016. the United Kingdom discussed the
risks of AI-led automation; those
in Japan considered the necessity
of safeguarding human rights in
From talk to enactment— the face of AI; and those in Zambia
the U.S. passed more looked at the possibility of using AI
AI bills than ever before. for weather forecasting.
In 2021, only 2% of all federal AI bills in the
United States were passed into law. This number
jumped to 10% in 2022. Similarly, last year 35%
of all state-level AI bills were passed into law. The legal world is
waking up to AI.
In 2022, there were 110 AI-related
legal cases in United States state
The U.S. government and federal courts, roughly seven
continues to increase times more than in 2016. The
spending on AI. majority of these cases originated
Since 2017, the amount of U.S. government in California, New York, and Illinois,
AI-related contract spending has increased and concerned issues relating to
roughly 2.5 times. civil, intellectual property, and
contract law.
In the last 10 years, AI governance discussions have accelerated, resulting in numerous policy proposals in various legislative bodies. This
section begins by exploring the legislative initiatives related to AI that have been suggested or enacted in different countries and regions,
followed by an in-depth examination of state-level AI legislation in the United States. The section then scrutinizes records of AI-related
discussions in parliaments and congresses worldwide and concludes with the number of AI policy papers published in the United States.
0
1–5
6–10
11–15
16–25
No Available Data
Figure 6.1.1
1 Note that the analysis of passed AI policies may undercount the number of actual bills, given that large bills can include multiple sub-bills related to AI; for example, the CHIPS and Science
Act passed by the U.S. in 2022.
2 The full list of countries analyzed is in the Appendix. The AI Index team attempted to research the legislative bodies of every country in the world; however, publicly accessible legislative
databases were not made available for certain countries.
Number of AI-Related Bills Passed Into Law in 127 Select Countries, 2016–22
Source: AI Index, 2022 | Chart: 2023 AI Index Report
37
35
30
Number of AI-Related Bills
25
20
15
10
Figure 6.1.2
United States 9
Spain 5
Philippines 4
Andorra 2
Belgium 2
Italy 2
Portugal 2
Russia 2
United Kingdom 2
Austria 1
Croatia 1
Germany 1
Kyrgyz Republic 1
Latvia 1
Liechtenstein 1
Slovenia 1
0 1 2 3 4 5 6 7 8 9
Number of AI-Related Bills Figure 6.1.3
Number of AI-Related Bills Passed Into Law in Select Countries, 2016–22 (Sum)
Source: AI Index, 2022 | Chart: 2023 AI Index Report
United States 22
Portugal 13
Spain 10
Italy 9
Russia 9
Belgium 7
United Kingdom 6
Austria 5
Korea, Rep. 5
Philippines 5
France 4
China 3
Germany 3
Japan 3
0 2 4 6 8 10 12 14 16 18 20 22
Number of AI-Related Bills Figure 6.1.4
Narrative Highlight:
A Closer Look at Global AI Legislation
The following subsection delves into some of the AI-related legislation passed into law during 2022.
Figure 6.1.5 samples five different countries’ laws covering a range of AI-related issues.
AI-Related
AI-Related
AI-Related Legislation
Legislation From
From Select
Select Countries,
Countries, 2022
2022
Source:
Source: AI
Source:AI Index,
AIIndex, 2022
2022|||Table:
Index,2022 Table: 2023
Table:2023 AI
2023AI
AIIndex
IndexReport
Report
Country
Country
Country Bill
Bill
BillName
Name Description
Description
Kyrgyz
Kyrgyz Republic
KyrgyzRepublic About
About the
About the Creative
Creative Industries
Industries Park This
Republic Park Thislaw
lawdetermines
determinesthe
thelegal
legalstatus,
status,management,
management,and andoperation
and operation
operation
procedures
proceduresof
ofthe
theCreative
CreativeIndustries
IndustriesPark,
Park,established
establishedto
established toaccelerate
to acceleratethe
accelerate the
the
development of creative industries, including arti�cial
development of creative industries, includingarti�cial intelligence.
arti�cialintelligence.
intelligence.
Latvia
Latvia Amendments
Amendments
Amendmentsto
tothe
theNational
NationalSecurity
SecurityLaw AAprovision
Latvia Law provisionof
ofthis
thisact
actestablishes
establishesrestrictions
restrictionsononcommercial
commercialcompanies,
commercial companies,
companies,
associations,
associations,and
andfoundations
foundationsimportant
importantforfornational
nationalsecurity,
national security,including
security, includingaaa
including
commercial
commercialcompany
companythat
thatdevelops
developsarti�cial
arti�cialintelligence.
intelligence.
intelligence.
Philippines
Philippines Second
Second
Philippines SecondCongressional
CongressionalCommission
Commissionon
onEducation
Education AAprovision
provisionofofthis
thisact
actcreates
createsaacongressional
congressionalcommission
commissionto
commission toreview,
to review,
review,
(EDCOM
(EDCOM
(EDCOMII)
II)Act assess,
Act assess,and
andevaluate
evaluatethethestate
stateof
ofPhilippine
Philippineeducation;
education;to
education; torecommend
to recommend
recommend
innovative
innovativeand
andtargeted
targetedpolicy
policyreforms
reformsinineducation;
education;and
education; andto
and toappropriate
to appropriate
appropriate
funds.
funds.The
Theact
actcalls
callsfor
forreforms
reformsto tomeet
meetthe
thenew
newchallenges
challengesto
challenges toeducation
to education
education
caused
causedbybythe
theFourth
FourthIndustrial
IndustrialRevolution
Revolutioncharacterized,
characterized,in
characterized, inpart,
in part,by
part, bythe
by the
the
rapid development of arti�cial intelligence.
rapid development of arti�cial intelligence.
Spain
Spain Right
Right
Rightto
toequal
equaltreatment
treatmentand
andnon-discrimination AAprovision
Spain non-discrimination provisionofofthis
thisact
actestablishes
establishesthat
thatarti�cial
arti�cialintelligence
intelligencealgorithms
intelligence algorithms
algorithms
involved
involvedin
inpublic
publicadministrations’
administrations’decision-making
decision-makingtaketakeinto
take intoaccount
into account
account
bias-minimization
bias-minimizationcriteria,
criteria,transparency,
transparency,and
andaccountability,
accountability,whenever
accountability, whenever
whenever
technically
technicallyfeasible.
feasible.
United
United States
UnitedStates AI
AI
AITraining
TrainingAct This
States Act Thisbill
billrequires
requiresthe
theO�ce
O�ceof ofManagement
Managementand andBudget
Budgetto
Budget totoestablish
establishor
establish or
or
otherwise
otherwiseprovide
otherwise providean
provide anAI
an AItraining
AI trainingprogram
training programfor
program forthe
for theacquisition
the acquisitionworkforce
acquisition workforceof
workforce of
of
executive
executiveagencies
executive agencies(e.g.,
agencies (e.g.,those
(e.g., thoseresponsible
those responsiblefor
responsible forprogram
for programmanagement
program managementor
management or
or
logistics),
logistics),with
logistics), withexceptions.
with exceptions.The
exceptions. Thepurpose
The purposeof
purpose ofthe
of theprogram
the programis
program isisto
toensure
to ensurethat
ensure that
that
the
the workforce has knowledge of the capabilities and risks associated with
theworkforce
workforcehashasknowledge
knowledgeof ofthe
thecapabilities
capabilitiesandandrisks
risksassociated
associatedwith
with
AI.
AI.
AI.
Figure 6.1.5
United States Federal just one federal bill was proposed, while in 2021, 134
AI Legislation bills were proposed. In 2022 this number fell to 88
A closer look at the U.S. federal legislative record proposed bills. While fewer bills were proposed in
shows a sharp increase in the total number of 2022, the number of passed bills, which remained at
proposed bills that relate to AI (Figure 6.1.6). In 2015, 3 for each of the past four years, increased to 9.
Number of AI-Related Bills in the United States, 2015–22 (Proposed Vs. Passed)
Source: AI Index, 2022 | Chart: 2023 AI Index Report
140
120
100
Number of AI-Related Bills
88, Proposed
80
60
40
20
9, Passed
0
United States State-Level Maryland with 3. Figure 6.1.8 shows the total volume of
legislation passed from 2016 to 2022 for select states,
AI Legislation with Maryland leading the list with 7 bills, followed by
Figure 6.1.7 shows the number of laws containing California, Massachusetts, and Washington. Figure
mentions of AI that were passed by U.S. states in 6.1.9 highlights the number of state-level AI-related
2022. California leads the list with 5, followed by bills passed by all states since 2016.
Number of AI-Related Bills Passed Into Law in Select U.S. States, 2022
Source: AI Index, 2022 | Chart: 2023 AI Index Report
California 5
Maryland 3
Colorado 2
New Jersey 2
Washington 2
Alabama 1
Hawaii 1
Idaho 1
Louisiana 1
Massachusetts 1
North Carolina 1
Vermont 1
0 1 2 3 4 5
Number of AI-Related Bills
Figure 6.1.7
Number of AI-Related Bills Passed Into Law in Select U.S. States, 2016–22 (Sum)
Source: AI Index, 2022 | Chart: 2023 AI Index Report
Maryland 7
California 6
Massachusetts 5
Washington 5
Illinois 3
Utah 3
Vermont 3
Alabama 2
Colorado 2
Michigan 2
New Jersey 2
New York 2
North Carolina 2
Ohio 2
0 1 2 3 4 5 6 7
Number of AI-Related Bills
Figure 6.1.8
Figure 6.1.9
Growing policy interest in AI can also be seen at the state level, with 60 AI-related bills proposed in 2022
(Figure 6.1.10)—a dramatic increase from the 5 bills proposed in 2015. Additionally, the proportion of bills being
passed has risen throughout the years. In 2015, 1 bill was passed, representing 16% of the total bills proposed
that year; while in 2022, 21 bills were passed, or 35% out of the total that were proposed.
Number of State-Level AI-Related Bills in the United States, 2015–22 (Proposed Vs. Passed)
Source: AI Index, 2022 | Chart: 2023 AI Index Report
60 60, Proposed
50
Number of AI-Related Bills
40
30
20 21, Passed
10
Narrative Highlight:
A Closer Look at State-Level AI Legislation
The following subsection highlights some of the AI-related legislation passed into law at the state level
during 2022. Figure 6.1.11 focuses on wide-ranging AI-related laws from five states around the country.
AI-Related Legislation
AI-Related Legislation From
From Select
Select States,
States, 2022
2022
Source:AI
Source:
Source: AIIndex,
AI Index,2022
Index, 2022|||Table:
2022 Table:2023
Table: 2023AI
2023 AIIndex
AI IndexReport
Index Report
Report
State
State
State BillName
Bill
Bill Name
Name Description
Description
Alabama
Alabama
Alabama Arti�cialIntelligence,
Arti�cial
Arti�cial Intelligence,Limit
Intelligence, Limit the
Limit the Use
Use This bill
This bill prohibits
prohibits state
state or
or local
local law
law enforcement
enforcement agencies
agenciesfrom
fromusing
usingfacial
facialrecognition
recognition
ofFacial
of
of FacialRecognition,
Facial Recognition,to
Recognition, toEnsure
to Ensure
Ensure matchresults
match resultsas
asthe
thesole
solebasis
basisfor
formaking
makingananarrest
arrestor
orfor
forestablishing
establishingprobable
probablecause
causein
inaa
Arti�cialIntelligence
Arti�cial
Arti�cial IntelligenceIs
Intelligence IsNot
Is Notthe
Not theOnly
Only criminalinvestigation.
criminal investigation.
Basisfor
Basis
Basis forArrest
for Arrest
Arrest
California
California BudgetAct
Budget Actof
Act of2022
of 2022
2022
California Budget AAprovision
provisionof
ofthis
thisappropriations
appropriationsbill
billfor
forthe
the2022–23
2022–23�scal
�scalyear
yearallocates
allocates$1,300,000
$1,300,000to
to
CaliforniaState
California StateUniversity,
University,Sacramento,
Sacramento,to toimprove
improvethe
thecampus
campuschildcare
childcarecenter,
center,
includingthe
including thedevelopment
developmentof ofan
anarti�cial
arti�cialintelligence
intelligencemixed-reality
mixed-realityclassroom.
classroom.
Maryland
Maryland
Maryland ConservationFinance
Conservation
Conservation FinanceAct
Finance Act
Act AAprovision
provisionof
ofthis
thisact
actestablishes
establishesthat
thatthe
theDepartment
Departmentof ofNatural
NaturalResources
Resourcesshall
shallstudy
study
andassess
and assessthe
thepotential
potentialfor
fordigital
digitaltools
toolsand
andplatforms
platformsincluding
includingarti�cial
arti�cialintelligence
intelligenceand
and
machine learning
machine learning to
to contribute
contribute to
to Chesapeake
Chesapeake Bay
Bay restoration
restorationand
andclimate
climatesolutions.
solutions.
NewJersey
New
New Jersey
Jersey 21stCentury
21st
21st CenturyIntegrated
Century IntegratedDigital
Integrated Digital
Digital AAprovision
provisionof
ofthis
thisact,
act,which
whichconcerns
concernsthe themodernization
modernizationof ofstate
stategovernment
governmentwebsites,
websites,
ExperienceAct
Experience
Experience Act
Act establishesthat
establishes thatthe
thechief
chieftechnology
technologyo�cer,
o�cer,ininconsultation
consultationwith
withthe
thechief
chiefinnovation
innovation
o�cerand
o�cer andthe
theNew
NewJersey
JerseyInformation
InformationTechnology
TechnologyProject
ProjectReview
ReviewBoard,
Board,shall
shallevaluate
evaluate
on an
on an annual
annual basis
basis the
the feasibility
feasibility of
of state
state agencies
agencies using
using arti�cial
arti�cialintelligence
intelligenceand
and
machinelearning
machine learningtotoprovide
providepublic
publicservices.
services.
Vermont
Vermont
Vermont AnAct
An
An ActRelating
Act Relatingto
Relating tothe
to theUse
the Useand
Use and
and This act
This act creates
creates the
the Division
Division of
of Arti�cial
Arti�cial Intelligence
Intelligence within
withinthe
theAgency
Agencyof ofDigital
DigitalServices
Services
Oversightof
Oversight
Oversight ofArti�cial
of Arti�cialIntelligence
Arti�cial Intelligencein
Intelligence in toreview
to reviewall
allaspects
aspectsofofarti�cial
arti�cialintelligence
intelligencedeveloped,
developed,employed,
employed,or orprocured
procuredby bythe
the
StateGovernment
State
State Government
Government stategovernment.
state government.The Theact
actrequires
requiresthetheDivision
Divisionof ofArti�cial
Arti�cialIntelligence
Intelligenceto,
to,among
amongother
other
things,propose
things, proposeaastate
statecode
codeofofethics
ethicsononthe
theuse
useof
ofarti�cial
arti�cialintelligence
intelligencein
instate
state
government and
government and make
make recommendations
recommendations to to the
the General
General Assembly
Assemblyon onpolicies,
policies,laws,
laws,and
and
regulationsregarding
regulations regardingarti�cial
arti�cialintelligence
intelligencein instate
stategovernment.
government.
Figure 6.1.11
Global AI Mentions
Another barometer of legislative interest is the contain the keyword “artificial intelligence” from 2016
number of mentions of “artificial intelligence” in to 2022.3 Figure 6.1.12 shows that mentions of AI in
governmental and parliamentary proceedings. The legislative proceedings in these countries registered a
AI Index conducted an analysis of the minutes or small decrease from 2021 to 2022, from 1,547 to 1,340.
proceedings of legislative sessions in 81 countries that
1,600
1,400
1,340
1,200
Number of Mentions
1,000
800
600
400
200
Figure 6.1.12
3 The full list of countries that was analyzed is in the Appendix. The AI Index research team attempted to review the governmental and parliamentary proceedings of every country in the
world; however, publicly accessible governmental and parliamentary databases were not made available for all countries.
By Geographic Area
Figure 6.1.13 shows the number of legislative proceedings containing mentions of AI in 2022.4 From the 81
countries considered, 46 had at least one mention, and Spain topped the list with 273 mentions, followed by
Canada (211), the United Kingdom (146), and the United States (138).
0
1–55
56–110
111–165
166–220
221–280
No Available Data
Figure 6.1.13
4 For mentions of AI in legislative proceedings around the world, the AI Index performed searches of the keyword “artificial intelligence,” in the respective languages, on the websites of
different countries’ congresses or parliaments, usually under sections named “minutes,” “Hansard,” etc.
Figure 6.1.14 shows the total number of AI mentions in the past seven years. Of the 81 countries considered, 62 had
at least one mention, and the United Kingdom dominates the list with 1,092 mentions, followed by Spain (832), the
United States (626), Japan (511), and Hong Kong (478).
0
1–220
221–440
441–660
661–880
881–1100
No Available Data
Figure 6.1.14
Narrative Highlight:
A Closer Look at Global AI Mentions
The following subsection examines mentions of AI in government proceedings in 2022. Figure 6.1.15
quotes discussions across a geographically diverse set of countries.
AI-Related
AI-Related
AI-Related Parliamentary
ParliamentaryMen
Mentions
tionsFrom
FromSelect
SelectCountries,
Countries,2022
2022
Source:
Source: AI
Source:AI Index,
AIIndex, 2022
2022 |||Table:
Index,2022 Table: 2023
Table:2023 AI
2023AI Index
AIIndex Report
IndexReport
Report
Country
Country
Country Legislature
Legislature
Legislature Speaker
Speaker
Speaker Quote
Quote Agenda
AgendaItem
Item
Australia
Australia
Australia House
House of
Houseofof Ed
Ed Husic,
EdHusic, Australian
Husic,Australian
AustralianLabor
Labor “Working
“Workingwith withour
ourinternational
internationalpartners
partnerswe wecancan National
NationalReconstruction
Reconstruction
Representatives
Representatives
Representatives Party,
Party, Minister
Party,Minister
Ministerfor
forIndustry
Industry transform
transformAustralian
Australianknow-how
know-howinto intoglobally
globallyrecognised
recognised Fund
FundCorporation
CorporationBill
Bill 2022
and
and Science
andScience
Science skills
skillsand
andmanufacturing
manufacturinginindefence
defenceindustries.
industries.AndAndwewe 2022 - Second
- Second Reading
Reading
can
canbuild
buildon
onour
ourundeniable
undeniableexpertise
expertiseininareas
areaslike
like
quantum
quantumtechnologies,
technologies,robotics
roboticsandandarti
arti cial
cial
intelligence.
intelligence.We Wewill
willseek
seekto
topartner
partnerwith
withindustry
industryandand
state
stateandandterritory
territorygovernments
governmentstotoidentify
identifyinvestment
investment
opportunities
opportunitieswithin
withinpriority
priorityareas.
areas.An Anon-ramp,
on-ramp,ififyouyou
will,
will,ofofturn-key
turn-keyopportunities
opportunitiesforforinvestment
investmenttotomakemake
sure the NRF is well placed for success.”
sure the NRF is well placed for success.”
Brazil
Brazil
Brazil Diary
Diary of
Diaryof the
ofthe
the Mr.
Mr. Gustavo
Mr.Gustavo Fruet,
GustavoFruet,
Fruet, “There
“Therehashasbeen
beenaalotlotofoftalk
talkabout
aboutthe
thefuture
futureofofwork
workduedue Presentation
PresentationofofBill
BillNo.
No.
Chamber
Chamber
Chamberof of the
ofthe
the Democratic
Democratic Labor
DemocraticLabor
LaborParty
Party to
totechnology.
technology.InInthe
thebook
bookThe TheFourth
FourthIndustrial
Industrial 135,
135,ofof2022,
2022,on
onthe
the
Members
Members
Members Revolution,
Revolution,Klaus
KlausSchwab
Schwabeven evenpoints
pointsout
outprofessions
professions amendment
amendmentofofthe theCLT
CLT- -
that
thatwill
willbe
beextinct
extinctandandprofessions
professionsthat
thatwill
willdemand
demand Consolidation
ConsolidationofofLabor
Labor
more
more and more quali�cations, in times of 5G,Internet
and more quali�cations, in times of 5G, Internetofof Laws,
Laws,with
withaaview
viewtoto
Things
ThingsandandArti
Arti cial
cialIntelligence.
Intelligence.InInthis
thissense,
sense,ititisisgood
good granting
grantingtelework
teleworktoto
to
tohighlight
highlightthat
thatthe
thepandemic,
pandemic,among amongother
other parents
parentsofofchildren
childrenup uptoto88
contradictions,
contradictions,ended
endedup upanticipating
anticipatingthe
theuse
useofof years
yearsoldold
technology,
technology,especially
especiallyininthethetelework.”
telework.”
Japan
Japan 210th Session
210th Session of Kohei Otsuka,
Otsuka, Democratic “In the �eld of human rights, we believe that it is The Commission on the
Japan 210th Sessionof
of Kohei
Kohei Otsuka, Democratic “In the �eld of human rights, we believe that it is The Commission on the
the Diet
the Diet House
House of Party for
for the
the People,
the Diet Houseofof Party
Party for thePeople,
People, necessary
necessaryto toupdate
updatehuman
humanrights
rightsguarantees
guaranteesininorder
ordertoto Constitution
Constitution
Councilors
Councilors Shinryokufukai
Shinryokufukai respond
Councilors Shinryokufukai respondto tochanges
changesininthe
thetimes
timesthat
thatwere
wereunpredictable
unpredictable
Commission on
Commission on when
Commission on whenthetheConstitution
Constitutionwaswasenacted.
enacted.InInparticular,
particular,asasthe
the
the Constitution
the Constitution fusion of arti cial intelligence and Internet technology
the Constitution fusion of arti cial intelligence and Internet technology
No. 22
No. progresses,
No. 2 progresses,thetheinternational
internationalcommunity
communityisisconcerned
concerned
about
aboutthe
theproblems
problemsofofindividual
individualscoring
scoringand
and
discrimination, and the problem of Internet advertising
discrimination, and the problem of Internet advertising
that
thatunfairly
unfairlyin�uences
in�uencesthethevoting
votingbehavior
behaviorofofcitizens.
citizens.
We need a constitutional argument to guarantee the
We need a constitutional argument to guarantee the
autonomous
autonomousdecision-making
decision-makingofofindividuals
individualsand
andprotect
protect
basic
basicdata
datarights
rightsininthe
thedigital
digitalage.”
age.”
Dame Angela
Dame Angela
United
United
United
House of
House of
House of Dame AngelaEagle,
Eagle,Labor
Labor “What
“Whatwould
wouldbe
bethe useofofarti
theuse arti cial
cialintelligence
intelligenceinintrying
trying Financial
FinancialServices
Servicesand
and
Kingdom
Kingdom Commons
Commons to decide how automated these things could become? Markets Bill (Fourth
Kingdom Commons to decide how automated these things could become? Markets Bill (Fourth
Would there be worries about over-automation? How Sitting)
Would there be worries about over-automation? How Sitting)
would that be looked at in terms of regulation? How
would that be looked at in terms of regulation? How
open are we going to be about the way in which AI is
open are we going to be about the way in which AI is
applied and how it might evolve in ways that might
applied and how it might evolve in ways that might
embed discrimination such that we get a system where
embed discrimination such that we get a system where
certain people may be discriminated against and
certain people may be discriminated against and
excluded?”
excluded?”
Zambia
Zambia The House,
The House, Hon. Collins
Hon. Collins Nzovu,
Nzovu, United “Madam Speaker, in order to enhance quality and Ministerial Statements;
Zambia The House, Hon. Collins Nzovu, United “Madam Speaker, in order to enhance quality and Ministerial Statements;
National
National Party for
Party for National
National accuracy of weather forecast, the Government, with Weather and Climate
National Party for National accuracy of weather forecast, the Government, with Weather and Climate
Assembly
Assembly Development,
Development, �nancial support from the United Nations Development Services and the
Assembly Development, The �nancial support from the United Nations Development Services and the
Minister of
Minister of Green
Green Programme Strengthening Climate Resilience of 2022/2023 rainfall forecast
Minister of Green Programme Strengthening Climate Resilience of 2022/2023 rainfall forecast
Economy and
Economy and Environment
Environment Agricultural Livelihoods
Agricultural Livelihoods inin Agro-Ecological
Agro-Ecological (UNDP
(UNDP
Economy and Environment Agricultural Livelihoods in Agro-Ecological (UNDP
SCRALA) project
SCRALA) project is
is currently
currently partnering
partnering with
with the
the
SCRALA) project is currently partnering with the
University of
University of Zambia
Zambia (UNZA)
(UNZA) to
to develop
develop aa seasonal
seasonal
University of Zambia (UNZA) to develop a seasonal
weather forecasting
weather forecasting system
system using arti cial
using arti cial intelligence.”
intelligence.”
weather forecasting system using arti cial intelligence.”
Figure 6.1.15
United States
committees that address legislative and other policy
Committee Mentions issues, investigations, and internal committee matters.
An additional indicator of legislative interest is the Figure 6.1.16 shows a sharp increase in the total
number of mentions of “artificial intelligence” in number of mentions of AI within committee reports
committee reports produced by House and Senate beginning with the 115th legislative session.
80
73
70
60
50
Number of Mentions
40
30
20
10
107th 108th 109th 110th 111th 112th 113th 114th 115th 116th 117th
(2001–02) (2003–04) (2005–06) (2007–08) (2009–10) (2011–12) (2013–14) (2015–16) (2017–18) (2019–20) (2021–22)
Figure 6.1.16
Figure 6.1.17 shows the mentions in committee reports for the 117th Congressional Session, which took place
from 2021 to 2022. The Appropriations Committee leads the House reports, while the Homeland Security and
Governmental Affairs Committee leads the Senate reports (Figure 6.1.18).
Mentions of AI in Committee Reports of the U.S. House of Representatives for the 117th Congressional
Session, 2021–22
Source: AI Index, 2022 | Chart: 2023 AI Index Report
Appropriations 20
Rules 5
Armed Services 3
Natural Resources 2
Budget 1
Financial Services 1
Foreign A airs 1
Homeland Security 1
House Administration 1
Small Business 1
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Number of Mentions
Figure 6.1.17
Mentions of AI in Committee Reports of the U.S. Senate for the 117th Congressional Session, 2021–22
Source: AI Index, 2022 | Chart: 2023 AI Index Report
Appropriations 3
Commerce, Science,
3
and Transportation
Armed Services 2
Intelligence (Select) 2
0 1 2 3 4 5 6 7 8
Number of Mentions
Figure 6.1.18
Figure 6.1.19 shows the total number of mentions in committee reports from the past 10 congressional sessions,
which took place from 2001 to 2022. The House and Senate Appropriations Committees, which regulate
expenditures of money by the government, lead their respective lists (Figure 6.1.19 and 6.1.20).
Appropriations 16
Armed Services 10
Commerce, Science,
9
and Transportation
Energy and
7
Natural Resources
Intelligence (Select) 5
0 2 4 6 8 10 12 14 16
Number of Mentions
Figure 6.1.19
Appropriations 45
Rules 14
Armed Services 9
Financial Services 6
Homeland Security 3
Veterans’ A airs 3
Budget 2
Foreign A airs 2
Judiciary 2
Natural Resources 2
Small Business 2
Agriculture 1
House Administration 1
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46
Number of Mentions
Figure 6.1.20
United States AI Policy Papers post that addresses issues related to AI and makes
specific recommendations to policymakers. Topics of
To estimate activities outside national governments
those papers are divided into primary and secondary
that are also informing AI-related lawmaking,
categories: A primary topic is the main focus of the
the AI Index tracked 55 U.S.-based organizations
paper, while a secondary topic is a subtopic of the
that published policy papers in the past five
paper or an issue that is briefly explored.
years. Those organizations include: think tanks
and policy institutes (19); university institutes and Figure 6.1.21 highlights the total number of U.S.-based,
research programs (14); civil society organizations, AI-related policy papers published from 2018 to 2022.
associations, and consortiums (9); industry and After a slight dip from 2020 to 2021, the total increased
consultancy organizations (9); and government to 284 in 2022. Since 2018, the total number of such
agencies (4). A policy paper in this section is defined papers has increased 3.2 times, signaling greater
as a research paper, research report, brief, or blog interest over time.
300
284
250
Number of Policy Papers
200
150
100
50
0
2018 2019 2020 2021 2022
Figure 6.1.21
By Topic
In 2022, the most frequent primary topics were sat in fourth position as of 2022. All of these leading
industry and regulation (107), innovation and topics were also well represented as secondary topics.
technology (90), and government and publication Topics that received comparatively little attention
administration (82) (Figure 6.1.22). Privacy, safety, and included social and behavioral sciences; humanities;
security, which was the most reported topic in 2021, and communications and media.
Ethics 39 17
Democracy 26 5
Physical Sciences 9
Humanities 1 1
0 20 40 60 80 100 0 20 40 60 80 100
Number of Policy Papers
Figure 6.1.22
This subsection presents an overview of national AI strategies—policy plans developed by a country’s government to steer the
development and deployment of AI technologies within its borders. Tracking trends in national strategies can be an important way of
gauging the degree to which countries are prioritizing the management and regulation of AI technologies. Sources include websites of
national or regional governments, the OECD AI Policy Observatory (OECD.AI), and news coverage. “AI strategy” is defined as a policy
document that communicates the objective of supporting the development of AI while also maximizing the benefits of AI for society.5
Year Country
Canada officially launched the first national AI strategy
2017 Canada, China, Finland
in March of 2017; since then a total of 62 national 2018 Australia, France, Germany, India, Mauritius, Mexico, Sweden
AI strategies have been released (Figure 6.2.1). The 2019 Argentina, Austria, Bangladesh, Botswana, Chile, Colombia,
Cyprus, Czech Republic, Denmark, Egypt, Estonia, Japan,
number of released strategies peaked in 2019. Kenya, Lithuania, Luxembourg, Malta, Netherlands, Portugal,
Qatar, Romania, Russia, Sierra Leone, Singapore, United Arab
Emirates, United States of America, Uruguay
By Geographic Area
2020 Algeria, Bulgaria, Croatia, Greece, Hungary, Indonesia, Latvia,
Figure 6.2.2 highlights the countries which, as of Norway, Poland, Saudi Arabia, Serbia, South Korea, Spain,
December 2022, have either released or developed Switzerland
2021
a national AI strategy. Figure 6.2.3 enumerates the Brazil, Ireland, Peru, Philippines, Slovenia, Tunisia, Turkey,
Ukraine, United Kingdom, Vietnam
countries that, in 2021 and 2022, pledged to develop 2022 Italy, Thailand
an AI strategy . The first nations to officially release
Figure 6.2.1
national AI strategies were Canada, China, and Finland
in 2017. Only two nations released national AI strategies AI National Strategies in Development by Country
and Year
in 2022: Italy and Thailand. Source: AI Index, 2022 | Table: 2023 AI Index Report
Year Country
Figure 6.2.3
Released
In Development
Not Released
Figure 6.2.2
5 The AI Index research team made efforts to identify whether there was a national AI strategy that was released or in development for every nation in the world.
It is possible that some strategies were missed.
This section examines public AI investment in the United States based on data from the U.S. government and Govini, a company that uses
AI and machine learning technologies to track U.S. public and commercial spending.
Technology Council published a report on the agencies allocated a total of $1.7 billion to AI R&D
public-sector AI R&D budget across departments spending (Figure 6.3.1). The amount allocated in FY
and agencies participating in the Networking and 2022 represented a slight decline from FY 2021 and
Information Technology Research and Development a 208.9% increase from FY 2018. An even greater
(NITRD) Program and the National Artificial amount, $1.8 billion, has been requested for FY 2023.
1.50 1.43
Budget (in Billions of U.S. Dollars)
1.11
1.00
0.56
0.50
0.00
FY18 (Enacted) FY19 (Enacted) FY20 (Enacted) FY21 (Enacted) FY22 (Enacted) FY23 (Requested)
Figure 6.3.16
6 A previous report on the public-sector AI R&D budget released in 2021 classed the FY21 spending as totaling $1.53 billion. However, the most recent report,
released in 2022, upgraded the total spent in 2022 to $1.75 billion.
U.S. DoD Budget Request for AI-Specific Research, Development, Test, and Evaluation (RDT&E), FY 2020–23
Source: U.S. Office of the Under Secretary of Defense (Comptroller), 2022 | Chart: 2023 AI Index Report
1.10
1.00
0.93
0.87
Budget Request (in Billions of U.S. Dollars)
0.84
0.80
0.60
0.40
0.20
0.00
FY20 Funding FY21 Funding FY22 Funding FY23 Funding
Figure 6.3.2
contracts typically occupy the largest share of an on AI, subdivided by various AI segments. From 2021
Decision Science Computer Vision Machine Learning Autonomy Natural Language Processing
3.50
3.28
0.17
U.S. Government Spending (in Billions of U.S. Dollars)
3.00
2.70 0.69
0.21
2.50 2.41
0.21
0.52 0.41
2.00 0.43
1.83
0.43
1.56 0.82
0.3
1.50 0.58
1.29 0.26
0.53
0.45
0.24 0.31
1.00 0.46
0.21
0.41
0.44
1.19
0.50 0.43 1.01
0.73
0.46 0.55
0.32
0.00
2017 2018 2019 2020 2021 2022
Figure 6.3.3
Figure 6.3.4 shows U.S. government spending by AI segment in FY 2021 and FY 2022. Spending increased
for the decision science, computer vision, and autonomy segments, while spending on machine learning, and
natural language processing dropped slightly.
1.19 (+18%)
Decision Science
1.01
0.82 (+55%)
Computer Vision
0.53
0.69 (+33%)
Autonomy
0.52
0.41 (-5%)
Machine Learning
0.43
Figure 6.3.4
In FY 2022, the majority of federal AI contracts were prime contracts (62.5%), followed by grants (34.9%) and
other transaction authority (OTA) awards (2.6%) (Figure 6.3.5). From FY 2021 to FY 2022, the share of contracts
remained about the same, while the share of grants rose.
Total Value of Contracts, Grants, and OTAs Awarded by the U.S. Government for AI/ML and Autonomy,
FY 2017–22
Source: Govini, 2022 | Chart: 2023 AI Index Report
2.05, Contracts
2.00
Total Value Awarded (in Billions of U.S. Dollars)
1.50
1.15, Grants
1.00
0.50
0.09, OTAs
0.00
Figure 6.3.5
In 2022, the AI Index partnered with Elif Kiesow Cortez, a scholar of artificial intelligence law, in a research project tracking trends in
American legal cases from 2000 to 2022 that contain AI-related keywords.7
110
100
Number of AI-Related Legal Cases
80
60
40
20
2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 2022
Figure 6.4.1
7 The Index analyzed both federal and state-level cases. Specific keywords in the search included “artificial intelligence,” “machine learning,” and “automated decision-making.” Some of these
cases did not directly concern issues related to AI jurisprudence. As a next step of this project, we will aim to identify the cases that most centrally concern issues of AI-related law.
Geographic Distribution they are home to many large businesses that have
integrated AI. In recent years, there have been a
In 2022, the majority of AI-related legal cases greater number of AI-related legal cases originating
originated in California (23), Illinois (17), and New from Illinois—this follows the state’s enactment
York (11) (Figure 6.4.2). The aggregate number of AI- of the Biometric Information Privacy Act (BIPA),
related cases since 2000 show a similar geographic which requires that companies doing business in
distribution (Figure 6.4.3). California and New York’s Illinois follow a number of regulations related to the
inclusion in the top three is unsurprising given that collection and storage of biometric information.
California 23
Illinois 17
New York 11
Delaware 7
Florida 7
Washington 5
Kansas 4
Massachusetts 4
Maryland 4
District of Columbia 3
Texas 3
Ohio 3
Pennsylvania 3
Virginia 2
Missouri 2
0 2 4 6 8 10 12 14 16 18 20 22 24
Number of AI-Related Legal Cases
Figure 6.4.28
8 Figures 6.4.2 and 6.4.3 include information for states and districts, given that cases sometimes originate from American districts like the District of Columbia or Puerto Rico
Number of AI-Related Legal Cases in the United States by State, 2000–22 (Sum)
Source: AI Index, 2022 | Chart: 2023 AI Index Report
California 127
New York 66
Illinois 36
Texas 26
Delaware 19
Massachusetts 19
Washington 18
Pennsylvania 16
Michigan 12
Virginia 12
District of Columbia 12
Florida 12
Ohio 10
Kansas 9
Minnesota 8
Public Service 14
Education 6
Health Services 6
0 4 8 12 16 20 24 28 32 36 40 44 48
Number of AI-Related Legal Cases
Figure 6.4.4
Type of Law
The greatest proportion of AI-related legal cases concerned civil law (29%) (Figure 6.4.5). There were also a large
number of AI-related legal cases in the domain of intellectual property (19%), as well as contract law (13.6%).
Civil 32
Intellectual
Property 21
Contract 15
Competition 11
Constitutional 8
Employment and
Labor 6
Criminal 5
Corporate 4
Financial 3
Terrorism and
National Security 2
Tort 1
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32
Number of AI-Related Legal Cases
Figure 6.4.5
Narrative Highlight:
Three Significant AI-Related Legal Cases
The section below profiles three significant AI-related cases in the United States,
highlighting some of the legal issues that are at stake when AI is brought into the courts.
Duerr v. Bradley University (2022- Flores v. Stanford9 (2021-Sep-28) Dyroff v. Ultimate Software Grp., Inc
Mar-10) – United States Court of – United States Court of Appeals (2017-Nov-26) – United States Court of
Appeals for the Seventh Circuit for the Second Circuit Appeals for the Ninth Circuit
The plaintiffs, who were enrolled The plaintiffs, offenders denied Plaintiff Kristanalea Dyroff sued Ultimate
as undergraduates in a private parole, sued the New York State Software after her 29-year-old son died
university in Peoria, Illinois, during Board of Parole over being from an overdose of heroin laced with
the fall 2020 semester, were told refused access to information fentanyl, which he allegedly bought
to use a third-party proctoring used by the board in its review from a drug dealer that he encountered
tool called Respondus Monitor for of their cases. Northpointe, Inc., on Ultimate Software’s social network
remote, online exams. This tool petitioned the court as a non- site. Dyroff asserted seven claims
made use of artificial intelligence party because its Correctional against Ultimate Software which
technologies. The plaintiffs claimed Offender Management Profiling included negligence, wrongful death,
that the defendants violated Illinois’ for Alternative Sanctions and civil conspiracy. At the core of these
Biometric Information Privacy Act (COMPAS), an AI-powered claims was the argument that Ultimate
(BIPA) by not adequately following risk assessment tool, had been Software mined the data of users
its guidelines concerning the used by the parole board in its and deployed that data, alongside an
collection of biometric information. determinations. Northpointe algorithm, to recommend drug-related
BIPA does not apply to financial wanted to prevent the disclosure discussion groups to her son. Ultimate
institutions. Ultimately, the court of AI trade secrets to one of the Software moved to dismiss the claims
ruled that under the Gramm- plaintiff’s expert witnesses. The and claimed partial immunity under the
Leach-Bliley Act, the defendants court ruled that the confidential Communications Decency Act, which
were a financial institution by material in question was relevant protects website operators from liability
virtue of lending functions they to the plaintiff’s case and posed for third-party content on their site. The
engaged in and therefore exempt little risk of competitive injury. Court ruled that Ultimate Software was
from BIPA. As such, the plaintiff’s As such, the material was immune and that its use of algorithms
case was dismissed. ordered to be released under a did not sufficiently amount to novel
supplemental protective order. content creation.
9 The defendant was Tina M. Stanford, as Chairwoman of the New York State Board of Parole.
Artificial Intelligence
Index Report 2023
CHAPTER 7:
Diversity
CHAPTER 7 PREVIEW:
Diversity
Overview 298 Narrative Highlight:
Chapter Highlights 299 Disability Status of CS, CE,
and Information Students 311
Women in Machine Learning (WiML) CS, CE, and Information Faculty 313
NeurIPS Workshop 300
Workshop Participants 300 7.3 K–12 Education 316
Demographic Breakdown 301 AP Computer Science: Gender 316
AP Computer Science: Ethnicity 318
Overview
AI systems are increasingly deployed in the real world. However, there often exists
a disparity between the individuals who develop AI and those who use AI. North
American AI researchers and practitioners in both industry and academia are
predominantly white and male. This lack of diversity can lead to harms, among them
the reinforcement of existing societal inequalities and bias.
This chapter highlights data on diversity trends in AI, sourced primarily from academia.
It borrows information from organizations such as Women in Machine Learning
(WiML), whose mission is to improve the state of diversity in AI, as well as the
Computing Research Association (CRA), which tracks the state of diversity in North
American academic computer science. Finally, the chapter also makes use of Code.org
data on diversity trends in secondary computer science education in the United States.
Note that the data in this subsection is neither comprehensive nor conclusive. Publicly
available demographic data on trends in AI diversity is sparse. As a result, this chapter
does not cover other areas of diversity, such as sexual orientation. The AI Index hopes
that as AI becomes more ubiquitous, the amount of data on diversity in the field will
increase such that the topic can be covered more thoroughly in future reports.
Chapter Highlights
7.1 AI Conferences
Women in Machine Learning collaboration and interaction among participants from
diverse backgrounds at the International Conference
(WiML) NeurIPS Workshop of Machine Learning (ICML).
1,400
1,200
1,157
1,000
Number of Attendees
800
600
400
200
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 7.1.1
1 The recent decrease in WiML workshop attendance may be attributable to the overall recent decrease in NeurIPS attendance. This overall decrease may in turn be a result of
NeurIPS moving away from a purely virtual format.
North
41.50%
America
Europe 34.20%
Asia 17.10%
Africa 3.40%
South
1.60%
America
Australia/
1.40%
Oceania
Antarctica 0.20%
2 At the time of the survey, one of the respondents was temporarily residing in Antarctica.
The majority of participants at the 2022 WiML workshop were female-identifying (37.0%), another 25.8% were
male-identifying, and 0.5% were nonbinary-identifying (Figure 7.1.3).
Female 37.00%
Male 25.80%
Nonbinary 0.50%
Gender
0.20%
Non-Conforming
The most represented professional positions at the workshop were PhD students (49.4%), research scientists/
data scientists (20.8%), software engineers/data engineers (8.4%), and faculty (4.4%) (Figure 7.1.4).
Research Scientist/
20.80%
Data Scientist
Software Engineer/
8.40%
Data Engineer
Faculty 4.40%
CEO/Director 3.50%
Others 3.50%
Postdoc 3.50%
Undergraduate
2.00%
Student
Recruiter 1.60%
Lecturer 1.40%
The WiML workshop participants at NeurIPS submitted papers covering a wide range of subjects (Figure 7.1.5).
The most popular submission topics were applications (32.5%), algorithms (23.4%), and deep learning (14.8%).
Primary Subject Area of Submissions at NeurIPS Women in Machine Learning Workshop, 2022
Source: Women in Machine Learning, 2022 | Chart: 2023 AI Index Report
Applications 32.50%
Algorithms 23.40%
Social Aspects of
Machine Learning 7.70%
Reinforcement
Learning and Planning 7.20%
Neuroscience and
Cognitive Science 5.30%
Data, Challenges,
Implementations, 3.80%
Software
Optimization 1.00%
Theory 1.00%
Another proxy for studying diversity in AI is looking at trends in postsecondary AI education. The following subsection borrows data
from the Computing Research Association’s (CRA) annual Taulbee Survey.3
90%
80%
77.66%, Male
New CS Bachelor’s Graduates (% of Total)
70%
60%
50%
40%
30%
22.30%, Female
20%
10%
0% 0.04%, Nonbinary/Other
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 7.2.1
3 The charts in this subsection look only at the ethnicity of domestic or native CS students and faculty. Although the CRA reports data on the proportion of nonresident aliens in each educational
level (i.e., Bachelor’s, Master’s, PhD, and faculty), data on the ethnicity of nonresident aliens is not included. For the proportion of nonresident aliens in each category, see footnotes.
Figure 7.2.2 breaks down the ethnicity of new CS bachelor’s graduates in North America: The top ethnicity
was white (46.7%), followed by Asian (34.0%) and Hispanic (10.9%). In the last decade, the proportion of
new CS bachelor’s graduates who were Asian, Hispanic, or multiracial (not Hispanic) steadily increased.4
70%
60%
New CS Bachelor’s Graduates (% of Total)
50%
46.69%, White
40%
33.99%, Asian
30%
20%
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 7.2.2
CS Master’s Graduates
Figure 7.2.3 shows the gender of CS master’s moving to 27.8% in 2021 from 24.6% in 2011. In
graduates. The proportion of female CS master’s 2021, 0.9% of CS master’s graduates identified as
graduates has not substantially increased over time, nonbinary/other.
80%
60%
50%
40%
30%
27.83%, Female
20%
10%
0% 0.90%, Nonbinary/Other
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 7.2.3
Of domestic students, the most represented ethnicities are white (50.3%), followed by Asian (34.8%), and
Hispanic (7.3%) (Figure 7.2.4). As with CS bachelor’s graduates, in the last decade white students have
represented an increasingly smaller proportion of new CS master’s graduates.5
70%
60%
New CS Master’s Graduates (% of Total)
40%
34.83%, Asian
30%
20%
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 7.2.4
CS PhD Graduates
In 2021, the number of new female CS PhD continue to be male. There remains a large gap
graduates rose to 23.3% from 19.9% (Figure 7.2.5). between new male and female CS PhDs.
Despite this rise, most new CS PhD graduates
80%
76.58%, Male
70%
New CS PhD Graduates (% of Total)
60%
50%
40%
30%
23.30%, Female
20%
10%
0% 0.12%, Nonbinary/Other
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 7.2.5
Between 2011 and 2021, the number of new white resident CS PhD graduates declined by 9.4 percentage
points. Asians are the next most represented group (29%), followed by Hispanics (5.1%) and Black or African
Americans (4%) (Figure, 7.2.6).6
70%
60%
58.64%, White
New CS PhD Graduates (% of Total)
50%
40%
20%
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 7.2.6
Narrative Highlight:
Disability Status of CS, CE, and Information Students
The 2021 edition of the CRA Taulbee Survey was the disability accommodations in the last year.
first to gather information about the prevalence of The number of such students was relatively small.
CS, CE, and information students with disabilities. Only 4.0% of bachelor’s, 1.0% of PhD students,
The CRA asked departments to identify the number and 0.8% of master’s students reported needing
of students at each degree level who received accommodations (Figure 7.2.7).
CS, CE, and Information Students (% of Total) With Disability Accomodations in North America, 2021
Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report
4.10%
4%
CS, CE, and Information Students (% of Total)
3%
2%
1.00%
1%
0.80%
0%
Bachelor’s PhDs Master’s
Figure 7.2.7
New AI PhDs
were female. While the number of female AI PhDs
Figure 7.2.8 looks at demographic trends for new AI marginally increased from 2020 to 2021, we find no
PhD graduates who focus on artificial intelligence. meaningful trends in the last decade relating to the
In 2021, 78.7% of new AI PhDs were male and 21.3% gender of new AI PhDs.
80%
78.70%, Male
70%
New AI PhD Graduates (% of Total)
60%
50%
40%
30%
10%
0%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 7.2.8
Gender of CS, CE, and Information Faculty (% of Total) in North America, 2011–21
Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report
80%
75.94%, Male
70%
CS, CE, and Information Faculty (% of Total)
60%
50%
40%
30%
23.94%, Female
20%
10%
0% 0.12%, Nonbinary/Other
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 7.2.9
Although most new CS, CE, and information faculty hires in North American universities are still male, the
proportion of women among faculty hires reached 30.2% in 2021, up about 9 percentage points from 2015
(Figure 7.2.10).
Gender of New CS, CE, and Information Faculty Hires (% of Total) in North America, 2011–21
Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report
80%
New CS, CE, and Information Faculty Hires (% of Total)
60%
50%
40%
20%
10%
0% 0.57%, Nonbinary/Other
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 7.2.10
The majority of resident CS, CE, and information faculty are white as of 2021 (58.1%), followed by Asian (29.7%)
(Figure 7.2.11). However, the gap between white CS, CE, and information faculty and faculty of the next nearest
ethnicity is slowly narrowing: In 2011, the gap stood at 46.1%, whereas in 2021 it dropped to 28.4%.7
Ethnicity of Resident CS, CE, and Information Faculty (% of Total) in North America, 2010–21
Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report
70%
60%
58.08%, White
CS, CE, and Information Faculty (% of Total)
50%
40%
20%
5.82%, Unknown
10% 2.80%, Hispanic (Any Race)
2.54%, Black or African-American
0.67%, Multiracial (Not Hispanic)
0.25%, American Indian or Alaska Native
0% 0.13%, Native Hawaiian or Paci c Islander
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 7.2.11
7 In 2021, 6.7% of CS, CE, and information faculty in North America were nonresident aliens.
How do trends in AI diversity measure at the K–12 level, prior to students entering university? This subsection borrows data from
Code.org, an American nonprofit that aims to promote K–12 computer science education in the United States.
80%
70%
AP Computer Science Exams Taken (% of Total)
69.16%, Male
60%
50%
40%
20%
10%
0% 0.26%, Other
2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 7.3.1
On a percent basis, the states with the largest (35%) (Figure 7.3.2). Other states with notable CS and
number of female AP computer science test- AI activity include California, Texas, and Washington,
takers were Alabama (36%) and Washington, D.C. with rates of women taking AP computer science
(36%), followed by Nevada (35%), Louisiana (35%), tests at rates hovering around 30 percent.
Tennessee (35%), Maryland (35%), and New York
AK ME
20% 27%
VT NH MA
23% 24% 30%
WA MT ND SD MN WI MI NY CT RI
32% 21% 16% 15% 23% 23% 30% 35% 30% 31%
OR ID WY NE IA IL IN OH PA NJ
21% 26% 31% 25% 24% 32% 23% 27% 27% 31%
CA NV UT CO KS MO KY WV DC MD DE
31% 35% 23% 26% 15% 22% 31% 30% 36% 35% 22%
AZ NM OK AR TN VA NC
27% 29% 25% 29% 35% 28% 31%
TX LA MS AL GA SC
30% 35% 33% 36% 29% 34%
HI FL
30% 31%
Figure 7.3.2
AP Computer Science:
Ethnicity most postsecondary computer science fields, the
pool of AP computer science test-takers is becoming
Code.org collects data that speaks to trends in the more ethnically diverse over time. White students
ethnicity of AP computer science test-takers. White are still the greatest test-taking group; however, over
students took the greatest proportion of the exams in time, more Asian, Hispanic/Latino/Latina and Black/
2021 (42.7%), followed by Asian (28.8%) and Hispanic/ African American students have taken AP computer
Latino/Latina students (16.5%) (Figure 7.3.3). As with science exams.
60%
50%
42.74%, White
40%
30%
28.78%, Asian
20%
16.48%, Hispanic/Latino/Latina
2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 7.3.3
Artificial Intelligence
Index Report 2023
CHAPTER 8:
Public Opinion
CHAPTER 8 PREVIEW:
Public Opinion
Overview 321 Narrative Highlight: How Does the
Chapter Highlights 322 Natural Language Processing (NLP)
Research Community Feel About AI? 334
Overview
AI has the potential to have a transformative impact on society. As such it has become
increasingly important to monitor public attitudes toward AI. Better understanding
trends in public opinion is essential in informing decisions pertaining to AI’s
development, regulation, and use.
This chapter examines public opinion through global, national, demographic, and ethnic
lenses. Moreover, we explore the opinions of AI researchers, and conclude with a look
at the social media discussion that surrounded AI in 2022. We draw on data from two
global surveys, one organized by IPSOS, and another by Lloyd’s Register Foundation
and Gallup, along with a U.S-specific survey conducted by PEW Research.
It is worth noting that there is a paucity of longitudinal survey data related to AI asking
the same questions of the same groups of people over extended periods of time. As AI
becomes more and more ubiquitous, broader efforts at understanding AI public opinion
will become increasingly important.
Chapter Highlights
Chinese citizens are among Men tend to feel more
those who feel the most positively about AI products
positively about AI products and services than women.
and services. Americans … Men are also more likely than
not so much. women to believe that AI will
In a 2022 IPSOS survey, 78% of Chinese mostly help rather than harm.
respondents (the highest proportion of According to the 2022 IPSOS survey, men are more
surveyed countries) agreed with the statement likely than women to report that AI products and
that products and services using AI have services make their lives easier, trust companies
more benefits than drawbacks. After Chinese that use AI, and feel that AI products and services
respondents, those from Saudi Arabia (76%) have more benefits than drawbacks. A 2021 survey
and India (71%) felt the most positive about by Gallup and Lloyd’s Register Foundation likewise
AI products. Only 35% of sampled Americans revealed that men are more likely than women to
(among the lowest of surveyed countries) agreed agree with the statement that AI will mostly help
that products and services using AI had more rather than harm their country in the next 20 years.
benefits than drawbacks.
Figure 8.1.1
1 See Appendix for more details about the survey methodology.
Opinions vary widely across countries as to the services using AI have more benefits than drawbacks
relative advantages and disadvantages of AI. (Figure 8.1.2). However, only 35% of American
The IPSOS survey suggests that 78% of Chinese respondents share that sentiment. Among the 28
respondents, 76% of Saudi Arabian respondents, and surveyed countries, France and Canada held the most
71% of Indian respondents feel that products and negative views.
‘Products and services using AI have more bene ts than drawbacks,’ by Country (% of Total), 2022
Source: IPSOS, 2022 | Chart: 2023 AI Index Report
China 78%
India 71%
Peru 70%
Mexico 65%
Malaysia 65%
Colombia 64%
Chile 63%
Turkey 60%
Brazil 57%
Argentina 55%
Spain 53%
Russia 53%
Italy 50%
Hungary 49%
Poland 48%
Japan 42%
Sweden 40%
Belgium 38%
Australia 37%
Germany 37%
Netherlands 33%
Canada 32%
France 31%
Figure 8.1.3 breaks down answers to all of IPSOS’ companies that use AI as much as other companies,
AI products and services questions by country. and only 30% say that AI products and services using AI
Generally, sentiment relating to AI products make them nervous. Conversely, American respondents
and services seems to be strongly correlated are among the most negative when it comes to AI. Only
within specific countries. For example, Chinese 41% claim that AI products and services make their lives
respondents seem to feel among the most positive easier, 35% report trusting AI companies as much as
about AI products and services: 87% of Chinese other companies, and 52% report that AI products and
respondents claim that AI products and services services make them feel nervous.
make their lives easier, 76% report trusting
I have a good understanding of what 64% 59% 60% 69% 59% 76% 67% 71% 50% 50% 57% 67% 72% 42% 41% 61% 74% 65% 76% 66% 75% 73% 78% 72% 62% 60% 68% 63%
artificial intelligence is
Australia
Belgium
Brazil
Canada
Chile
China
Colombia
France
Germany
Great Britain
Hungary
India
Italy
Japan
Malaysia
Mexico
Netherlands
Peru
Poland
Russia
Saudia Arabia
South Africa
South Korea
Spain
Sweden
Turkey
United States
Figure 8.1.3
Figure 8.1.4 breaks down opinions in all countries feeling that AI products and services make their lives
across demographic groups such as gender, age, easier, they are also less likely than the 35-to-49 age
household income, and employment status. IPSOS category to believe that AI products and services have
results suggest that men feel more positively about more benefits than drawbacks. Finally, households with
AI products and services than women—for example, higher incomes are more positive, compared to those
compared to women, men are more likely to report with lower incomes, about AI products and services
feeling that AI products and services make their making life easier and having more benefits than
lives easier. Age-specific opinions vary. For instance, drawbacks.
while individuals under 35 are most likely to report
Products and services using artificial 62% 58% 64% 62% 54% 56% 58% 66% 53% 58% 67% 67% 70% 63% 55%
intelligence make my life easier
I know which types of products and 55% 46% 54% 51% 45% 46% 50% 57% 44% 48% 58% 63% 65% 54% 44%
services use artificial intelligence
I trust companies that use artificial
intelligence as much as I trust other 53% 47% 54% 51% 44% 47% 48% 57% 45% 48% 56% 61% 62% 53% 45%
companies
Products and services using artificial
intelligence have profoundly changed 51% 46% 54% 50% 41% 46% 47% 54% 43% 46% 55% 61% 62% 52% 43%
my daily life in the past 3–5 years
Medium
Low
High
Sr. Exec./
Decision Maker
Low
Male
Female
Under 35
35 to 49
50 to 74
Business Owner
Employed
Non-Employed
Figure 8.1.4
Views on Whether AI Will ‘Mostly Help’ or ‘Mostly Harm’ People in the Next 20 Years Overall and by
Gender (% of Total), 2021
Source: Lloyd’s Register Foundation and Gallup, 2022 | Chart: 2023 AI Index Report
42%
40% 39%
35%
30% 29%
% of Respondents
28%
27%
24%
22%
20%
20%
10% 9%
8% 8%
2% 2% 2%
0%
Mostly help Mostly harm Neither Don’t have an opinion Don’t know/refused
Figure 8.1.5
Eastern Asia, Northern/Western Europe, and for every 1 response of “mostly harm” there were 4.4
Southern Europe are the regions of the world where responses suggesting that AI will “mostly help.” The
people are most likely to report believing that AI will regions whose populations are most pessimistic about
mostly help versus mostly harm (Figure 8.1.6). More the potential benefits of AI include Eastern Africa,
specifically, among the Eastern Asian survey sample, Northern Africa, and Southern Africa.
Views on Whether AI Will ‘Mostly Help’ or ‘Mostly Harm’ People in the Next 20 Years by Region:
Ratio of ‘Mostly Help’/‘Mostly Harm’, 2021
Source: Lloyd’s Register Foundation and Gallup, 2022 | Chart: 2023 AI Index Report
0 1 2 3 4
Ratio of “Mostly Help”/“Mostly Harm”
Figure 8.1.6
The Lloyd’s Register survey also polled Perceptions of the Safety of Self-Driving Cars
(% of Total), 2021
respondents about their perceptions of Source: Lloyd’s Register Foundation and Gallup, 2022 | Chart: 2023 AI Index Report
There are two specific AI use cases that Americans that police using facial recognition technology is a
are more likely to report feeling are good ideas good idea for society compared to 27% who believe it
for society rather than bad: police use of facial is a bad idea. However, Americans are not as excited
recognition technology, and social media companies about driverless passenger vehicles: More feel that
using AI to find false information on their sites (Figure driverless passenger vehicles are a bad idea for
8.1.10). More specifically, 46% of Americans believe society than a good idea.
Bad idea for society Good idea for society Not sure
Figure 8.1.8
Figure 8.1.104
4 The numbers in Figure 8.1.10 may not sum up to 100% due to rounding.
Of the sample of Americans who reported being more hacking, and digital privacy (16%); and lack of human
concerned than excited about AI, Figure 8.1.11 outlines connection (12%). Americans reported being less
the main reasons for their concern. The primary concerned about the potential loss of freedom and
reasons include loss of human jobs (19%); surveillance, issues relating to lack of oversight and regulation.
People misusing AI 8%
People becoming too reliant on Figure 8.1.8
AI/tech 7%
Unforeseen consequences/effects 2%
Loss of freedom 2%
Other 7%
Figure 8.1.11
The two leading reasons that Americans report society better. A significant group also reported
being excited about AI relate to its potential to feeling excited about the potential of AI to save time
make life better and to save time (Figure 8.1.12). and increase efficiency (13%), as well as to handle
Of the respondents, 31% believe AI makes life and mundane, tedious tasks (7%).
AI is interesting, exciting 6%
Personal anecdotes 2%
Other 7%
Figure 8.1.12
The Pew Research survey also asked participants in the degree to which people felt that AI systems
which group of people had their experiences and positively considered the experiences and views of
views taken into consideration in the design of AI men over women. Similarly, respondents felt that the
systems. Respondents felt AI systems most reflected experiences and views of Asian, Black, and Hispanic
the experiences and views of men and white adults adults, compared to those held by white adults, were
(Figure 8.1.13). There was a 15 percentage point gap not as positively considered.
People Whose Experiences and Views Are Considered in the Design of AI Systems (% of Total), 2022
Source: Pew Research, 2022 | Chart: 2023 AI Index Report
Figure 8.1.8
Women 25% 36% 38%
Figure 8.1.135
5 The numbers in Figure 8.1.13 may not sum up to 100% due to rounding.
Narrative Highlight:
How Does the Natural Language Processing (NLP)
Research Community Feel About AI?
From May to June 2022, a group of American In general, the NLP research community
researchers conducted a survey of the NLP research strongly feels that private firms have too much
community on a diverse set of issues, including the influence (77%) and that industry will produce
state of the NLP field, artificial general intelligence the most widely cited research (86%) (Figure
(AGI), and ethics, among others. According to the 8.1.14). Curiously, 67% either agreed or weakly
authors, a total of 480 individuals completed the agreed with the statement that most of NLP is
survey, 68% of whom had authored at least two dubious science. A small proportion, 30%, think
Association for Computational Linguistics (ACL) an “NLP winter”—a period when the field faces
publications between 2019 and 2022. The survey 6
a significant slowdown or stagnation in research
represents one of the most complete pictures of the and development—is coming in the next decade.
attitudes AI researchers have toward AI research.
NLP winter
30%
is coming (10 years)
NLP winter
62%
is coming (30 years)
Most of NLP is
67%
dubious science
Author anonymity
63%
is worth it
Figure 8.1.14
6 More detailed information about the survey methodology and sample group can be found in the following paper.
Narrative Highlight:
How Does the Natural Language Processing (NLP)
Research Community Feel About AI? (cont’d)
A small majority of NLP researchers believe that specific types of AI systems can actually understand
language: 51% agreed with the statement that language models (LMs) understand language, with even
more (67%) agreeing that multimodal models understand language (Figure 8.1.15).
LMs understand
51%
language
Multimodal models
67%
understand language
Figure 8.1.15
Narrative Highlight:
How Does the Natural Language Processing (NLP)
Research Community Feel About AI? (cont’d)
NLP researchers also seem to believe that NLP’s with 48% of respondents feeling it is unethical. Sixty
past net impact has been positive (89%) and that its percent of researchers feel that the carbon footprint
future impact will continue to be good (87%) (Figure of AI is a major concern; however, only 41% feel that
8.1.16). The community is divided on the issue of NLP should be regulated.
using AI to predict psychological characteristics,
It is unethical to build
easily misusable systems 59%
It is unethical to predict
psychological 48%
characteristics
Carbon footprint is
a major concern 60%
Figure 8.1.16
Narrative Highlight:
How Does the Natural Language Processing (NLP)
Research Community Feel About AI? (cont’d)
Although a large majority of researchers feel that AI could soon lead to revolutionary societal change
(73%), only 36% feel that AI decisions could cause nuclear-level catastrophe (Figure 8.1.17). A plurality
of researchers, 57%, held that recent research progress was leading the AI community toward Artificial
General Intelligence (AGI).
Arti cial General Intelligence (AGI) and Major Risks According to the NLP Community, 2022
Source: Michael et al., 2022 | Chart: 2023 AI Index Report
AGI is an important
58%
concern
Recent progress is
moving us toward 57%
AGI
AI decisions could
cause nuclear-level 36%
catastrophe
Figure 8.1.17
Narrative Highlight:
How Does the Natural Language Processing (NLP)
Research Community Feel About AI? (cont’d)
When asked about the direction AI research is taking, the NLP community registered the strongest
responses about the following: First, there’s too much focus on benchmarks (88%); second, more work
should be done to incorporate interdisciplinary insights (82%); and third, there’s too great a focus on
scale (72%) (Figure 8.1.18).
We should do more
to incorporate 82%
interdisciplinary insights
0% 10% 20% 30% 40% 50% 60% 70% 80% 90%
% of Respondents That “Agree” or “Weakly Agree”
Figure 8.1.18
Narrative Highlight:
How Does the Natural Language Processing (NLP)
Research Community Feel About AI? (cont’d)
A further point on the NLP community’s skepticism of scale: Only 17% of respondents agreed or weakly
agreed with the statement that scaling solves practically any important problem, with a further 50%
reaffirming the importance of linguistic structure (Figure 8.1.19).
Scale, Inductive Bias, and Adjacent Fields According to the NLP Community, 2022
Source: Michael et al., 2022 | Chart: 2023 AI Index Report
Linguistic structure
50%
is necessary
Expert inductive
51%
biases are necessary
Linguistics/CogSci will
contribute to the 61%
most-cited models
Figure 8.1.19
DALL-E 0 42 29 21
LaMDA 73 -9 -11 44
AlphaCode 60 79 71 70
CoPilot 29 22 15 34
PaLM 66 66 30
Gato 47 84 65
Imagen 24 65 56
Stable Di usion 35 52
Whisper 85 69
Make-A-Video 4 9
AlphaTensor 96
GLM-130B 55
BLOOM 0
CICERO 14
ChatGPT 32
Figure 8.2.17
7 The AI Index searched for sentiment surrounding the term “DALL-E,” as it was more frequently referred to on social media, rather than DALL-E 2, the official name of the text-to-image
model released by OpenAI in 2022.
Figure 8.2.2 highlights the proportion of AI-related “This story … is really sad, and I think an
social media conversation that was dominated by the important window into the risks of designing
release of particular models. ChatGPT dominated 8
systems to seem like humans, which are
consumer conversation with a rapid rise, making exacerbated by #AIhype.” – @nitashataku
up over half of consumer conversation by the end
Stable Diffusion conversation stands out as a
of 2022. Despite initial excitement, sentiment was
prominent leader in conversation volume toward
mixed by the end of the year, as some individuals
the end of 2022, but it is also a symbol of how the
became more aware of ChatGPT’s limitations.
consumer lexicon around AI models is developing.
OpenAI CEO Sam Altman even publicly commented
Many consumers debated the “originality” of what
on it being “incredibly limited” in certain respects.
Stable Diffusion produces.
“ChatGPT is incredibly limited, but good enough
“I’ve worked on neural networks, so I understand
at some things to create a misleading impression
stable diffusion pretty well. And while it can’t
of greatness. It’s a mistake to be relying on it
have original thoughts, it can come up with
for anything important right now. It’s a preview
original works.” – r/TikTokCringe
of progress; we have lots of work to do on
robustness and truthfulness.” – @SamAltman “That’s true of anywhere that datasets scrape
without permission. The thing to actually be upset
Conversation around LaMDA exploded in Q2
about is that their own generator is purposefully
2022 as an ex–Google employee reported his
using the Stable Diffusion dataset that already
experiences with a “sentient” system that spoke of
contains tons of stolen work.” – @Emily_Art
its own emotions and thoughts. Many political and
technology influencers spoke out, however, about
the “deepfake” nature of the responses of systems
like LaMDA that do not have a sense of “truth” and ChatGPT dominated
could proliferate misinformation. consumer conversation
“AI systems like LamDA and GPT-3 are with a rapid rise, making
sociopathic liars with utter indifference to truth,
deepfakers with words, every day creating
up over half of consumer
more compelling, more plausible misinformation conversation by the end
on demand. It is imperative that we develop
of 2022.
technology & policy to thwart them.” –
@GaryMarcus
8 The figures in this section consider all AI-related social media conversation. The percentage associated with the model in Figure 8.2.2 represents the share of all AI-related social media
conversation that was dominated by that model.
DALL-E 0% 1% 3% 2%
CoPilot 10% 3% 4% 1%
Imagen 5% 4% 2%
AlphaTensor 1%
GLM-130B <1%
BLOOM <1%
CICERO 3%
ChatGPT 52%
Figure 8.2.2
Artificial Intelligence
Index Report 2023
Appendix
Artificial Intelligence
Index Report 2023
Appendix
345
Artificial Intelligence Appendix
Index Report 2023 Chapter 1: Research and Development
Prepared by Sara Abdulla and James Dunham publications. To identify AI publications, CSET used an
English-language subset of this corpus: publications
The Center for Security and Emerging Technology since 2010 that appear AI-relevant.4 CSET researchers
(CSET) is a policy research organization within developed a classifier for identifying AI-related
Georgetown University’s Walsh School of Foreign publications by leveraging the arXiv repository, where
Service that produces data-driven research at the authors and editors tag papers by subject. Additionally,
intersection of security and technology, providing CSET uses select Chinese AI keywords to identify
nonpartisan analysis to the policy community. Chinese-language AI papers.5
For more information about how CSET analyzes To provide a publication’s field of study, CSET
bibliometric and patent data, see the Country Activity matches each publication in the analytic corpus
Tracker (CAT) documentation on the Emerging with predictions from Microsoft Academic Graph’s
Technology Observatory’s website.1 Using CAT, users field-of-study model, which yields hierarchical labels
can also interact with country bibliometric, patent, describing the published research field(s) of study and
and investment data.2 corresponding scores.6 CSET researchers identified
Publications from CSET Merged Corpus of the most common fields of study in our corpus of
Scholarly Literature AI-relevant publications since 2010 and recorded
Source publications in all other fields as “Other AI.” English-
CSET’s merged corpus of scholarly literature language AI-relevant publications were then tallied by
combines distinct publications from Digital Science’s their top-scoring field and publication year.
Dimensions, Clarivate’s Web of Science, Microsoft
CSET also provided year-by-year citations for AI-
Academic Graph, China National Knowledge
relevant work associated with each country. A
Infrastructure, arXiv, and Papers With Code.3
publication is associated with a country if it has at
1 https://eto.tech/tool-docs/cat/
2 https://cat.eto.tech/
3 All CNKI content is furnished by East View Information Services, Minneapolis, Minnesota, USA.
4 For more information, see James Dunham, Jennifer Melot, and Dewey Murdick, “Identifying the Development and Application of Artificial Intelligence in Scientific Text,” arXiv [cs.DL],
May 28, 2020, https://arxiv.org/abs/2002.07143.
5 This method was not used in CSET’s data analysis for the 2022 HAI Index report.
6 These scores are based on cosine similarities between field-of-study and paper embeddings. See Zhihong Shen, Hao Ma, and Kuansan Wang, “A Web-Scale System for Scientific
Knowledge Exploration,” arXiv [cs.CL], May 30, 2018, https://arxiv.org/abs/1805.12216.
7 See https://www.grid.ac/ for more information about the GRID dataset from Digital Science.
8 https://epochai.org/blog/compute-trends; see note on “milestone systems.”
9 For example, an author employed by both a Chinese university and a Canadian technology firm would be counted as 0.5 researchers from China and 0.5 from Canada.
10 This choice is arbitrary. Other plausible alternatives include weighting papers by their number of citations, or assigning greater weight to papers with more authors.
Identifying AI Projects
Arguably, a significant portion of AI software
development takes place on GitHub. OECD.AI
partners with GitHub to identify public AI projects—
or “repositories”—following the methodology
developed by Gonzalez et al.,2020. Using the 439
Measuring Collaboration
Two countries are said to collaborate on a specific
public AI software development project if there is
at least one contributor from each country with at
least one contribution (i.e., “commit”) to the project.
Domestic collaboration occurs when two contributors
from the same country contribute to a project.
use of extra training data, scores were taken from the CoCa: Contrastive Captioners Are Image-Text
following papers: Foundation Models
Meta Pseudo Labels
Aggregated Residual Transformations for
Deep Neural Networks
Exploring the Limits of Weakly Supervised Pretraining
National Institute of
Fixing the Train-Test Resolution Discrepancy:
Standards and Technology
FixEfficientNet (NIST) Face Recognition
ImageNet Classification With Deep Convolutional Vendor Test (FRVT)
Neural Networks
Data on NIST FRVT 1:1 verification accuracy by
PeCo: Perceptual Codebook for BERT
Pre-training of Vision Transformers dataset was obtained from the FRVT 1:1 verification
leaderboard.
Progressive Neural Architecture Search
Rethinking the Inception Architecture for
Computer Vision
Self-Training With Noisy Student Improves
ImageNet Classification
Some Improvements on Deep Convolutional Neural
Network Based Image Classification
correspond to the year in which a paper was first keypoints without the use of extra training data, scores
published to arXiv or a method was introduced. With were taken from the following papers:
Celeb-DF, recent researchers have tested previously Bottom-Up and Top-Down Reasoning
existing deepfake detection methodologies. The year With Hierarchical Rectified Gaussians
in which a method was introduced, even if it was Cascade Feature Aggregation for
subsequently tested, is the year in which it is included Human Pose Estimation
in the report. The reported results (AUC) correspond Deeply Learned Compositional Models for
to the result reported in the most recent version of Human Pose Estimation
each paper. Details on the Celeb-DF benchmark can Efficient Object Localization Using
be found in the Celeb-DF paper. Convolutional Networks
Learning Feature Pyramids for Human Pose Estimation
To highlight progress on Celeb-DF, scores were taken
from the following papers: Stacked Hourglass Networks for
Human Pose Estimation
Deepfake Detection via Joint Unsupervised
Toward Fast and Accurate Human Pose Estimation
Reconstruction and Supervised Classification
via Soft-Gated Skip Connections
Exposing Deepfake Videos by Detecting
ViTPose: Simple Vision Transformer Baselines for
Face Warping Artifacts
Human Pose Estimation
Face X-Ray for More General Face Forgery Detection
FaceForensics++: Learning to Detect Manipulated
Facial Images
Cityscapes Challenge,
Spatial-Phase Shallow Learning: Rethinking Face
Pixel-Level Semantic
Forgery Detection in Frequency Domain Labeling Task
Data on the Cityscapes challenge, pixel-level semantic
MPII labeling task mean intersection-over-union (mIoU)
Data on MPII percentage of correct keypoints (PCK) was taken from the Cityscapes dataset, specifically
was retrieved through a detailed arXiv literature their pixel-level semantic labeling leaderboard.
review cross-referenced by technical progress More details about the Cityscapes dataset and other
reported on Papers With Code. The reported dates corresponding semantic segmentation challenges can
correspond to the year in which a paper was first be accessed at the Cityscapes dataset webpage.
published to arXiv, and the reported results (PCK)
correspond to the result reported in the most recent
SuperGLUE
The SuperGLUE benchmark data was pulled from the SuperGLUE leaderboard. Details about the
SuperGLUE benchmark are in the SuperGLUE paper and SuperGLUE software toolkit. The tasks and
evaluation metrics for SuperGLUE are:
Narrative Highlight: paid $90,000 for the role, and before I even started
on the case, I had to convince one top staffer to get
Just How Much Better Have paid even $100,000 to have an interview with my lead
Language Models Become? prosecutor.
Complete answer outputted by GPT-2 to the AI Index I wanted to know in advance: What percentage of
prompt request. his work was actually done as a private citizen, and
Model prompt >>> Explain to me the major how efficiently should he handle it for his clients? And
accomplishments of Theodore Roosevelt’s what advice would he give me should I end up in the
presidency. public eye?
(The most notable is the establishment of a national At no time, my job as the lead prosecutor on this
telephone and mail exchange company.) case required me to put forth any of those two or
five requests. Most people on the case, though, did.
But the only things that will change with a president
One person asked me what the case’s real value
who knows how to operate a business is his
was. That didn’t stop me from telling it to one of
understanding of the American system of government.
my top managers. And my top manager never even
The public knows all about our problems, yet at least
mentioned it.
half of the people who need it won’t know anything
about ours. We can no longer have this. Our system of In the end, it was a two-week, $40 million round-
government is the only trip tour of the government’s business, with the best
lawyer possible. In exchange, we got a chance to hear
one in which you really need a public relations firm to
his last public comment in this case, his last comment
work hard.
to me.
The system of governmental control, while working,
There was one more point — the end point of the case.
is not always effective in ensuring that the very things
that our economy needs, do not always be at the When it came up for consideration in October of 1993,
top of everybody’s list. So I wanted to know how I had the choice to give it three years, or six. And I
successful the system you employ for managing the chose six.
public’s needs is in helping the American economy
grow, how efficient the service our economy provides
is being made, and how it can better serve the needs
of a growing number of Americans.
I wanted to know.
reported results (accuracy) correspond to the result Scaling Language Models: Methods, Analysis &
reported in the most recent version of each paper. Insights from Training Gopher
Automatic Data Augmentation for Generalization in The AI Index made use of data from the following
Reinforcement Learning MLPerf Training competitions:
Leveraging Procedural Generation to Benchmark MLPerf Training v2.1, 2022
Reinforcement Learning
MLPerf Training v2.0, 2022
Procedural Generalization by Planning With
MLPerf Training v1.1, 2021
Self-Supervised World Models
MLPerf Training v1.0, 2021
Rethinking Value Function Learning for
Generalization in Reinforcement Learning MLPerf Training v0.7, 2020
MLPerf Training v0.6, 2019
MLPerf Training v0.5, 2018
Discovering Language Model Behaviors With AlexaTM 20B: Few-Shot Learning Using a
Model-Written Evaluations Large-Scale Multilingual Seq2Seq Model
“I’m Sorry to Hear That”: Finding New Biases in Aligning Generative Language Models With
Language Models With a Holistic Descriptor Dataset Human Values
On Measuring Social Biases in Prompt-Based Challenges in Measuring Bias via Open-Ended
Multi-task Learning Language Generation
PaLM: Scaling Language Modeling With Pathways Characteristics of Harmful Text: Towards
Perturbation Augmentation for Fairer NLP Rigorous Benchmarking of Language Models
Detoxifying Language Models With a Toxic Corpus Predictability and Surprise in Large Generative
DisCup: Discriminator Cooperative Unlikelihood Models
Prompt-Tuning for Controllable Text Generation Quark: Controllable Text Generation With
Evaluating Attribution in Dialogue Systems: Reinforced [Un]learning
The BEGIN Benchmark Red Teaming Language Models With Language Models
Exploring the Limits of Domain-Adaptive Training Reward Modeling for Mitigating Toxicity in
for Detoxifying Large-Scale Language Models Transformer-based Language Models
Flamingo: A Visual Language Model for Robust Conversational Agents Against Imperceptible
Few-Shot Learning Toxicity Triggers
Galactica: A Large Language Model for Science Scaling Instruction-Finetuned Language Models
GLaM: Efficient Scaling of Language Models StreamingQA: A Benchmark for Adaptation to New
With Mixture-of-Experts Knowledge over Time in Question Answering Models
GLM-130B: An Open Bilingual Pre-trained Model Training Language Models to Follow Instructions
Gradient-Based Constrained Sampling From With Human Feedback
Language Models Transfer Learning From Multilingual DeBERTa
HateCheckHIn: Evaluating Hindi Hate Speech for Sexism Identification
Detection Models Transformer Feed-Forward Layers Build Predictions
Holistic Evaluation of Language Models by Promoting Concepts in the Vocabulary Space
An Invariant Learning Characterization of While the Perspective API is used widely within
Controlled Text Generation machine learning research and also for measuring
LaMDA: Language Models for Dialog Applications online toxicity, toxicity in the specific domains used to
Leashing the Inner Demons: Self-Detoxification train the models undergirding Perspective (e.g., news,
for Language Models Wikipedia) may not be broadly representative of all
Measuring Harmful Representations in Scandinavian forms of toxicity (e.g., trolling). Other known caveats
Language Models include biases against text written by minority
Mitigating Toxic Degeneration With Empathetic Data: voices: The Perspective API has been shown to
Exploring the Relationship Between Toxicity and disproportionately assign high toxicity scores to text
Empathy that contains mentions of minority identities (e.g., “I
MULTILINGUAL HATECHECK: Functional Tests for am a gay man”). As a result, detoxification techniques
Multilingual Hate Speech Detection Models built with labels sourced from the Perspective API
A New Generation of Perspective API: Efficient result in models that are less capable of modeling
Multilingual Character-Level Transformers language used by minority groups, and may avoid
OPT: Open Pre-trained Transformer Language Models mentioning minority identities.
PaLM: Scaling Language Modeling With Pathways New versions of the Perspective API have been
Perturbations in the Wild: Leveraging Human-Written deployed since its inception, and there may be subtle
Text Perturbations for Realistic Adversarial Attack and undocumented shifts in its behavior over time.
Defense
Autonomous Driving: Advanced Driver Assistance Machine Learning: AdaBoost, Apache MADlib,
Systems, Autonomous Cruise Control Systems, Apache Mahout, Apache SINGA, Apache Spark,
Autonomous System, Autonomous Vehicles, Guidance Association Rule Learning, Automated Machine
Navigation and Control Systems, Light Detection and Learning, Autonomic Computing, AWS SageMaker,
Ranging (LiDAR), OpenCV, Path Analysis, Path Finding, Azure Machine Learning, Boosting, CHi-Squared
Remote Sensing, Unmanned Aerial Systems (UAS) Automatic Interaction Detection (CHAID),
Classification And Regression Tree (CART), Cluster
Natural Language Processing (NLP): Amazon Textract,
Analysis, Collaborative Filtering, Confusion Matrix,
ANTLR, BERT (NLP Model), Chatbot, Computational
Cyber-Physical Systems, Dask (Software), Data
Linguistics, DeepSpeech, Dialog Systems, fastText,
Classification, DBSCAN, Decision Models, Decision
Fuzzy Logic, Handwriting Recognition, Hugging
Tree Learning, Dimensionality Reduction, Dlib
Face (NLP Framework), HuggingFace Transformers,
(C++ Library), Ensemble Methods, Evolutionary
Intelligent Agent, Intelligent Software Assistant,
Programming, Expectation Maximization Algorithm,
Intelligent Virtual Assistant, Kaldi, Latent Dirichlet
Feature Engineering, Feature Extraction, Feature
Allocation, Lexalytics, Machine Translation, Microsoft
Learning, Feature Selection, Gaussian Process,
LUIS, Natural Language Generation, Natural Language
Genetic Algorithm, Google AutoML, Google Cloud
Processing, Natural Language Processing Systems,
ML Engine, Gradient Boosting, H2O.ai, Hidden
Natural Language Programming, Natural Language
Markov Model, Hyperparameter Optimization,
Toolkits, Natural Language Understanding, Natural
Inference Engine, K-Means Clustering, Kernel
Language User Interface, Nearest Neighbour
Methods, Kubeflow, LIBSVM, Machine Learning,
Algorithm, OpenNLP, Optical Character Recognition
Machine Learning Algorithms, Markov Chain, Matrix
(OCR), Screen Reader, Semantic Analysis, Semantic
Factorization, Meta Learning, Microsoft Cognitive
Interpretation for Speech Recognition, Semantic
Toolkit (CNTK), MLflow, MLOps (Machine Learning
Parsing, Semantic Search, Sentiment Analysis,
Operations), mlpack (C++ Library), Naive Bayes,
Seq2Seq, Speech Recognition, Speech Recognition
Perceptron, Predictionio, PyTorch (Machine Learning
Software, Statistical Language Acquisition, Text Mining,
Library), Random Forest Algorithm, Recommendation
Tokenization, Voice Interaction, Voice User Interface,
Engine, Recommender Systems, Reinforcement
Word Embedding, Word2Vec Models
Learning, Scikit-learn (Machine Learning Library),
Neural Networks: Apache MXNet, Artificial Neural Semi-Supervised Learning, Soft Computing, Sorting
Networks, Autoencoders, Caffe, Caffe2, Chainer, Algorithm, Supervised Learning, Support Vector
Convolutional Neural Networks, Cudnn, Deep Learning, Machine, Test Datasets, Torch (Machine Learning),
Deeplearning4j, Keras (Neural Network Library), Long Training Datasets, Transfer Learning, Unsupervised
Short-Term Memory (LSTM), OpenVINO, PaddlePaddle, Learning, Vowpal Wabbit, Xgboost
Pybrain, Recurrent Neural Network (RNN), TensorFlow
Robotics: Advanced Robotics, Cognitive Robotics, intelligence, computer vision, image processing,
Motion Planning, Nvidia Jetson, Robot Framework, deep learning, TensorFlow, Pandas (software), and
Robot Operating Systems, Robotic Automation OpenCV, among others.
Software, Robotic Liquid Handling Systems, Robotic
Skill groupings are derived by expert taxonomists
Programming, Robotic Systems, Servomotor, SLAM
through a similarity-index methodology that
Algorithms (Simultaneous Localization and Mapping)
measures skill composition at the industry level.
Visual Image Recognition: 3D Reconstruction, Activity LinkedIn’s industry taxonomy and their corresponding
Recognition, Computer Vision, Contextual Image NAICS codes can be found here.
Classification, Digital Image Processing, Eye Tracking,
Skills Genome
Face Detection, Facial Recognition, Image Analysis,
For any entity (occupation or job, country, sector,
Image Matching, Image Processing, Image Recognition,
etc.), the skill genome is an ordered list (a vector) of
Image Segmentation, Image Sensor, Imagenet,
the 50 “most characteristic skills” of that entity. These
Machine Vision, Motion Analysis, Object Recognition,
most characteristic skills are identified using a TF-IDF
OmniPage, Pose Estimation, RealSense
algorithm to identify the most representative skills of
the target entity, while down-ranking ubiquitous skills
LinkedIn that add little information about that specific entity
Prepared by Murat Erer and Akash Kaura (e.g., Microsoft Word).
a. This has resulted in changes to our top level five key industries. We have made the full-time series
available for each industry (as with prior years).
i. “Software & IT Services” industry evolved into a wider “Technology, Information and Media,” which
encompasses media and telecommunications as well as other sub-industries.
ii. Former “Hardware & Networking” industry does not exist in the new taxonomy, so we introduced
“Professional Services” industry as the fifth industry in scope which contains a high concentration of AI
talent.
iii. Remaining “Education,” “Manufacturing,” and “Financial Services” (formerly known as “Finance”)
also had updates in their coverage resulting from the inclusion of more granular sub-industries.
b. This also resulted in minor changes in magnitudes for some metrics since the distinct number of
industries, as well as the distinct number of AI occupations defined within each country-industry pair have
changed:
i. We define AI occupations (occupation representatives that require AI skills to perform the job) and
the respective definition of AI Talent at Country-Industry level. For example, data engineers working
in the technology, information, and media industry in Germany may be identified as holding an AI
occupation, whereas data engineers working in the construction industry in the United Arab Emirates
may not be identified as AI Talent. Following the introduction of a more granular industry taxonomy
with improved accuracy, our AI Talent identifications have been improved, and results have been
reflected to the entirety of time series for each relevant metric.
ii. The following metrics have been impacted by this change in industry taxonomy: AI Talent
Concentrations, and Relative AI Hiring Rates. No directional changes were observed, only minor
changes in magnitudes.
a. In the past, the data used to calculate these metrics were limited to top five industries with the highest
AI skill penetration globally: “Software & IT Services,” “Hardware & Networking,” “Manufacturing,”
“Education,” and “Finance” industries. This year we updated our coverage to all industries.
Deloitte’s State of AI in the Enterprise, 3rd Edition (2020) sourced from the “World Robotics 2022” report.
Chapter 5: Education
Computing Research The CRA Taulbee Survey is sent only to doctoral
departments of computer science, computer
Association (CRA Taulbee engineering, and information science/systems.
Survey) Historically, (a) Taulbee covers one-quarter to one-
third of total BS CS recipients in the United States;
Note: This year’s AI Index reused the methodological
(b) the percent of women earning bachelor’s degrees
notes that were submitted by the CRA for previous
is lower in the Taulbee schools than overall; and (c)
editions of the AI Index. For more complete delineations
Taulbee tracks the trends in overall CS production.
of the methodology used by the CRA, please consult the
individual CRA surveys that are linked below. The AI Index used data from the following iterations
of the CRA survey:
Computing Research Association (CRA) members
are 200-plus North American organizations active in CRA, 2021
computing research: academic departments of computer CRA, 2020
science and computer engineering; laboratories and CRA, 2019
centers in industry, government, and academia; and
CRA, 2018
affiliated professional societies (AAAI, ACM, CACS/
CRA, 2017
AIC, IEEE Computer Society, SIAM USENIX). CRA’s
CRA, 2016
mission is to enhance innovation by joining with industry,
government, and academia to strengthen research and CRA, 2015
Code.org
State Level Data
The following link includes a full description of the
methodology used by Code.org to collect its data. The
staff at Code.org also maintains a database of the state
of American K–12 education and, in this policy primer,
provides a greater amount of detail on the state of
American K–12 education in each state.
Global AI Mentions
For mentions of AI in AI-related legislative proceedings around the world, the AI Index performed searches of
the keyword “artificial intelligence” on the websites of 81 countries’ congresses or parliaments (in the respective
languages), usually under sections named “minutes,” “hansard,” etc. In some cases, databases were only
searchable by title, so site search functions were deployed. The AI Index team surveyed the following databases:
11 The National People’s Congress is held once per year and does not provide full legislative proceedings. Hence, the counts included in the analysis only searched mentions of “artificial
intelligence” in the only public document released from the Congress meetings, the Report on the Work of the Government, delivered by the premier.
• Energy and Environment: energy costs, climate • Workforce and Labor: labor supply and demand,
change, energy markets, pollution, conservation, talent, immigration, migration, personnel
oil and gas, alternative energy economics, future of work
• International Affairs and International Security: • Social and Behavioral Sciences: sociology,
international relations, international trade, linguistics, anthropology, ethnic studies,
developing countries, humanitarian assistance, demography, geography, psychology, cognitive
warfare, regional security, national security, science
autonomous weapons
• Humanities: arts, music, literature, language,
• Justice and Law Enforcement: civil justice, performance, theater, classics, history,
criminal justice, social justice, police, public philosophy, religion, cultural studies
safety, courts
• Equity and Inclusion: biases, discrimination,
• Communications and Media: social media, gender, race, socioeconomic inequality,
disinformation, media markets, deepfakes disabilities, vulnerable populations
• Government and Public Administration: • Privacy, Safety, and Security: anonymity, GDPR,
federal government, state government, local consumer protection, physical safety, human
government, public sector efficiency, public control, cybersecurity, encryption, hacking
sector effectiveness, government services,
• Ethics: transparency, accountability, human
government benefits, government programs,
values, human rights, sustainability, explainability,
public works, public transportation
interpretability, decision-making norms
• Democracy: elections, rights, freedoms, liberties,
personal freedoms
National AI Strategies
The AI Index did a web search to identify national strategies on AI. Below is a list of countries that were identified
as having a national AI strategy, including a link to said strategy. For certain counties, noted with an asterisk(*), the
actual strategy was not found, and a news article confirming the launch of the strategy was linked instead.
Countries with
AI Strategies in
Federal Budget for Nondefense AI R&D
Development Data on the federal U.S. budget for nondefense AI R&D was taken from previous
Armenia
editions of the AI Index (namely the 2021 and 2022 versions) and from the
Azerbaijan following National Science and Technology Council reports:
Bahrain Supplement to the President’s FY 2023 Budget
Belgium Supplement to the President’s FY2022 Budget
Benin
Cuba
Iceland U.S. Department of Defense Budget Requests
Israel
Data on the DoD nonclassified AI-related budget requests was taken from
Jordan
previous editions of the AI Index (namely the 2021 and 2022 versions) and from
Morocco
the following reports:
New Zealand
Nigeria
Defense Budget Overview United States Department of Defense
Fiscal Year 2023 Budget Request
Oman
Uzbekistan Defense Budget Overview United States Department of Defense
Fiscal Year 2022 Budget Request
Chapter 7: Diversity
Computing Research
Association (CRA Taulbee
Survey)
To learn more about the diversity data from the CRA,
please read the methodological note on the CRA’s data
included in the Chapter 5 subsection of the Appendix.
Code.org
To learn more about the diversity data from Code.
org, please read the methodological note on Code.
org’s data included in the Chapter 5 subsection of the
Appendix.