Artificial Intelligence Index Report 2023

Artificial
Intelligence
Index Report
2023
Artificial Intelligence
Index Report 2023
Introduction to the
AI Index Report 2023
Welcome to the sixth edition of the AI Index Report! This year, the report introduces more original data than any
previous edition, including a new chapter on AI public opinion, a more thorough technical performance chapter,
original analysis about large language and multimodal models, detailed trends in global AI legislation records,
a study of the environmental impact of AI systems, and more.
The AI Index Report tracks, collates, distills, and visualizes data related to artificial intelligence. Our mission is
to provide unbiased, rigorously vetted, broadly sourced data in order for policymakers, researchers, executives,
journalists, and the general public to develop a more thorough and nuanced understanding of the complex field of
AI. The report aims to be the world’s most credible and authoritative source for data and insights about AI.
From the Co-Directors

AI has moved into its era of deployment; throughout 2022 and the beginning of 2023, new large-scale AI models
have been released every month. These models, such as ChatGPT, Stable Diffusion, Whisper, and DALL-E 2, are
capable of an increasingly broad range of tasks, from text manipulation and analysis, to image generation, to
unprecedentedly good speech recognition. These systems demonstrate capabilities in question answering and the
generation of text, image, and code unimagined a decade ago, and they outperform the state of the art on many
benchmarks, old and new. However, they are prone to hallucination, routinely biased, and can be tricked into
serving nefarious aims, highlighting the complicated ethical challenges associated with their deployment.
Although 2022 was the first year in a decade where private AI investment decreased, AI is still a topic of great
interest to policymakers, industry leaders, researchers, and the public. Policymakers are talking about AI more
than ever before. Industry leaders that have integrated AI into their businesses are seeing tangible cost and
revenue benefits. The number of AI publications and collaborations continues to increase. And the public is
forming sharper opinions about AI and which elements they like or dislike.
AI will continue to improve and, as such, become a greater part of all our lives. Given the increased presence of
this technology and its potential for massive disruption, we should all begin thinking more critically about how
exactly we want AI to be developed and deployed. We should also ask questions about who is deploying it—as
our analysis shows, AI is increasingly defined by the actions of a small set of private sector actors, rather than a
broader range of societal actors. This year’s AI Index paints a picture of where we are so far with AI, in order to
highlight what might await us in the future.
Jack Clark and Ray Perrault

Index Report 2023
Top Ten Takeaways
1 Industry races ahead of academia.

Until 2014, most significant machine learning 4 The world’s best new scientist … AI?
AI models are starting to rapidly accelerate
models were released by academia. Since then, scientific progress and in 2022 were used to aid
industry has taken over. In 2022, there were 32 hydrogen fusion, improve the efficiency of matrix
significant industry-produced machine learning manipulation, and generate new antibodies.
models compared to just three produced by
academia. Building state-of-the-art AI systems
increasingly requires large amounts of data, computer 5 The number of incidents concerning
the misuse of AI is rapidly rising.
According to the AIAAIC database, which tracks
power, and money—resources that industry actors
inherently possess in greater amounts compared to incidents related to the ethical misuse of AI, the
nonprofits and academia. number of AI incidents and controversies has
increased 26 times since 2012. Some notable incidents
2 Performance saturation on
traditional benchmarks.
AI continued to post state-of-the-art results, but
in 2022 included a deepfake video of Ukrainian
President Volodymyr Zelenskyy surrendering and
U.S. prisons using call-monitoring technology on their
year-over-year improvement on many benchmarks inmates. This growth is evidence of both greater use of
continues to be marginal. Moreover, the speed at AI technologies and awareness of misuse possibilities.
which benchmark saturation is being reached is
increasing. However, new, more comprehensive
benchmarking suites such as BIG-bench and HELM 6 The demand for AI-related
professional skills is increasing across
virtually every American industrial sector.
are being released.
Across every sector in the United States for which
3 AI is both helping and

harming the environment.
New research suggests that AI systems can have
there is data (with the exception of agriculture,
forestry, fishing, and hunting), the number of AI-
related job postings has increased on average from
serious environmental impacts. According to 1.7% in 2021 to 1.9% in 2022. Employers in the United
Luccioni et al., 2022, BLOOM’s training run
States are increasingly looking for workers with AI-
emitted 25 times more carbon than a single air
related skills.
traveler on a one-way trip from New York to
San Francisco. Still, new reinforcement learning
models like BCOOLER show that AI systems
can be used to optimize energy usage.
Index Report 2023
Top Ten Takeaways (cont’d)
7 For the first time in the last decade,

year-over-year private investment
in AI decreased.
10 Chinese citizens are among those
who feel the most positively about
AI products and services. Americans …
Global AI private investment was $91.9 billion in not so much.
2022, which represented a 26.7% decrease since In a 2022 IPSOS survey, 78% of Chinese respondents
2021. The total number of AI-related funding events (the highest proportion of surveyed countries) agreed
as well as the number of newly funded AI companies with the statement that products and services using
likewise decreased. Still, during the last decade as a AI have more benefits than drawbacks. After Chinese
whole, AI investment has significantly increased. In respondents, those from Saudi Arabia (76%) and India
2022 the amount of private investment in AI was 18 (71%) felt the most positive about AI products. Only
times greater than it was in 2013.
35% of sampled Americans (among the lowest of
8 While the proportion of companies surveyed countries) agreed that products and services
adopting AI has plateaued, the using AI had more benefits than drawbacks.
companies that have adopted AI
continue to pull ahead.
The proportion of companies adopting AI in 2022
has more than doubled since 2017, though it has
plateaued in recent years between 50% and 60%,
according to the results of McKinsey’s annual
research survey. Organizations that have adopted
AI report realizing meaningful cost decreases and
revenue increases.
9 Policymaker interest in AI
is on the rise.
An AI Index analysis of the legislative records of 127
countries shows that the number of bills containing
“artificial intelligence” that were passed into law
grew from just 1 in 2016 to 37 in 2022. An analysis
of the parliamentary records on AI in 81 countries
likewise shows that mentions of AI in global
legislative proceedings have increased nearly
6.5 times since 2016.
Index Report 2023
Steering Committee
Co-directors
Jack Clark Raymond Perrault
Anthropic, OECD SRI International
Members
Erik Brynjolfsson Katrina Ligett Juan Carlos Niebles Yoav Shoham
Stanford University Hebrew University Stanford University, (Founding Director)
Salesforce Stanford University,
John Etchemendy Terah Lyons AI21 Labs
Stanford University Vanessa Parli
James Manyika Stanford University Russell Wald
Google Stanford University
Staff and Researchers

Research Manager and Editor in Chief Research Associate
Nestor Maslej Loredana Fattorini
Stanford University Stanford University
Affiliated Researchers
Elif Kiesow Cortez Helen Ngo Robi Rahman Alexandra Rome
Stanford Law School Hugging Face Data Scientist Freelance Researcher
Research Fellow
Graduate Researcher
Han Bai
Stanford University
Undergraduate Researchers
Vania Siddhartha Mena Naima Sukrut Stone Lucy Elizabeth
Chow Javvaji Hassan Patel Oak Yang Zimmerman Zhu
Stanford Stanford Stanford Stanford Stanford Stanford Stanford Stanford
University University University University University University University University
Index Report 2023
How to Cite This Report

Nestor Maslej, Loredana Fattorini, Erik Brynjolfsson, John Etchemendy, Katrina Ligett, Terah Lyons,
James Manyika, Helen Ngo, Juan Carlos Niebles, Vanessa Parli, Yoav Shoham, Russell Wald, Jack Clark,
and Raymond Perrault, “The AI Index 2023 Annual Report,” AI Index Steering Committee,
Institute for Human-Centered AI, Stanford University, Stanford, CA, April 2023.
The AI Index 2023 Annual Report by Stanford University is licensed under

Attribution-NoDerivatives 4.0 International.
Public Data and Tools

The AI Index 2023 Report is supplemented by raw data and an interactive tool.
We invite each reader to use the data and the tool in a way most relevant to their work and interests.
Raw data and charts: The public data and Global AI Vibrancy Tool: Compare up to
high-resolution images of all the charts 30 countries across 21 indicators. The Global AI
in the report are available on Google Drive. Vibrancy tool will be updated in the latter half of 2023.
AI Index and Stanford HAI

The AI Index is an independent initiative at the
Stanford Institute for Human-Centered Artificial Intelligence (HAI).
The AI Index was conceived within the One Hundred Year Study on AI (AI100).
We welcome feedback and new ideas for next year.

Contact us at AI-Index-Report@stanford.edu.
Index Report 2023
Supporting Partners
Analytics and Research Partners

Index Report 2023
Contributors
We want to acknowledge the following individuals by chapter and section for their contributions of data,
analysis, advice, and expert commentary included in the AI Index 2023 Report:
Research and Development

Sara Abdulla, Catherine Aiken, Luis Aranda, Peter Cihon, Jack Clark, Loredana Fattorini, Nestor Maslej,
Besher Massri, Vanessa Parli, Naima Patel, Ray Perrault, Robi Rahman, Alexandra Rome, Kevin Xu
Technical Performance
Jack Clark, Loredana Fattorini, Siddhartha Javvaji, Katrina Ligett, Nestor Maslej, Juan Carlos Niebles,
Sukrut Oak, Vanessa Parli, Ray Perrault, Robi Rahman, Alexandra Rome, Yoav Shoham, Elizabeth Zhu
Technical AI Ethics
Jack Clark, Loredana Fattorini, Katrina Ligett, Nestor Maslej, Helen Ngo, Sukrut Oak, Vanessa Parli,
Ray Perrault, Alexandra Rome, Elizabeth Zhu, Lucy Zimmerman
Economy
Susanne Bieller, Erik Brynjolfsson, Vania Chow, Jack Clark, Natalia Dorogi, Murat Erer, Loredana Fattorini,
Akash Kaura, James Manyika, Nestor Maslej, Layla O’Kane, Vanessa Parli, Ray Perrault, Brittany Presten,
Alexandra Rome, Nicole Seredenko, Bledi Taska, Bill Valle, Casey Weston
Education
Han Bai, Betsy Bizot, Jack Clark, John Etchemendy, Loredana Fattorini, Katrina Ligett, Nestor Maslej,
Vanessa Parli, Ray Perrault, Sean Roberts, Alexandra Rome
Policy and Governance

Meghan Anand, Han Bai, Vania Chow, Jack Clark, Elif Kiesow Cortez, Rebecca DeCrescenzo, Loredana Fattorini,
Taehwa Hong, Joe Hsu, Kai Kato, Terah Lyons, Nestor Maslej, Alistair Murray, Vanessa Parli, Ray Perrault, Alexandra Rome,
Sarah Smedley, Russell Wald, Brian Williams, Catherina Xu, Stone Yang, Katie Yoon, Daniel Zhang
Diversity
Han Bai, Betsy Bizot, Jack Clark, Loredana Fattorini, Nezihe Merve Gürel, Mena Hassan, Katrina Ligett,
Nestor Maslej, Vanessa Parli, Ray Perrault, Sean Roberts, Alexandra Rome, Sarah Tan, Lucy Zimmerman
Public Opinion
Jack Clark, Loredana Fattorini, Mena Hassan, Nestor Maslej, Vanessa Parli, Ray Perrault,
Alexandra Rome, Nicole Seredenko, Bill Valle, Lucy Zimmerman
Conference Attendance
Terri Auricchio (ICML), Lee Campbell (ICLR), Cassio de Campos (UAI), Meredith Ellison (AAAI), Nicole Finn (CVPR),
Vasant Gajanan (AAAI), Katja Hofmann (ICLR), Gerhard Lakemeyer (KR), Seth Lazar (FAccT), Shugen Ma (IROS),
Becky Obbema (NeurIPS), Vesna Sabljakovic-Fritz (IJCAI), Csaba Szepesvari (ICML), Matthew Taylor (AAMAS),
Sylvie Thiebaux (ICAPS), Pradeep Varakantham (ICAPS)
Index Report 2023
We thank the following organizations and individuals who provided

data for inclusion in the AI Index 2023 Report:
Organizations
Code.org Lightcast
Sean Roberts Layla O’Kane, Bledi Taska
Center for Security and LinkedIn

Emerging Technology, Murat Erer, Akash Kaura,
Georgetown University Casey Weston
Sara Abdulla, Catherine Aiken
McKinsey & Company
Computing Research Natalia Dorogi, Brittany Presten
Association
Betsy Bizot NetBase Quid
Nicole Seredenko, Bill Valle
GitHub
Peter Cihon, Kevin Xu OECD.AI Policy Observatory
Luis Aranda, Besher Massri
Govini
Rebecca DeCrescenzo, Women in Machine Learning
Joe Hsu, Sarah Smedley Nezihe Merve Gürel, Sarah Tan
We also would like to thank Jeanina Casusi, Nancy King, Shana Lynch, Jonathan Mindes,
Michi Turner, and Madeleine Wright for their help in preparing this report, and Joe Hinman and
Santanu Mukherjee for their help in maintaining the AI Index website.
Index Report 2023
Table of Contents
Report Highlights 11
Chapter 1 Research and Development 20
Chapter 2 Technical Performance 69
Chapter 3 Technical AI Ethics 125
Chapter 4 The Economy 168
Chapter 5 Education 234
Chapter 6 Policy and Governance 263
Chapter 7 Diversity 296
Chapter 8 Public Opinion 319
Appendix 344
ACCESS THE PUBLIC DATA

Index Report 2023
Report Highlights
Chapter 1: Research and Development
The United States and China had the greatest number of cross-country collaborations in AI
publications from 2010 to 2021, although the pace of collaboration has slowed. The number of AI
research collaborations between the United States and China increased roughly 4 times since 2010,
and was 2.5 times greater than the collaboration totals of the next nearest country pair, the United
Kingdom and China. However the total number of U.S.-China collaborations only increased by 2.1%
from 2020 to 2021, the smallest year-over-year growth rate since 2010.
AI research is on the rise, across the board. The total number of AI publications has more than
doubled since 2010. The specific AI topics that continue dominating research include pattern
recognition, machine learning, and computer vision.
China continues to lead in total AI journal, conference, and repository publications.

The United States is still ahead in terms of AI conference and repository citations, but those
leads are slowly eroding. Still, the majority of the world’s large language and multimodal models
(54% in 2022) are produced by American institutions.
Industry races ahead of academia. Until 2014, most significant machine learning models were
released by academia. Since then, industry has taken over. In 2022, there were 32 significant
industry-produced machine learning models compared to just three produced by academia.
Building state-of-the-art AI systems increasingly requires large amounts of data, computer power,
and money—resources that industry actors inherently possess in greater amounts compared to
nonprofits and academia.
Large language models are getting bigger and more expensive. GPT-2, released in 2019,
considered by many to be the first large language model, had 1.5 billion parameters and cost an
estimated $50,000 USD to train. PaLM, one of the flagship large language models launched in 2022,
had 540 billion parameters and cost an estimated $8 million USD—PaLM was around 360 times
larger than GPT-2 and cost 160 times more. It’s not just PaLM: Across the board, large language and
multimodal models are becoming larger and pricier.
Index Report 2023
Chapter 2: Technical Performance
Performance saturation on traditional benchmarks. AI continued to post state-of-the-art results,

but year-over-year improvement on many benchmarks continues to be marginal. Moreover,
the speed at which benchmark saturation is being reached is increasing. However, new, more
comprehensive benchmarking suites such as BIG-bench and HELM are being released.
Generative AI breaks into the public consciousness. 2022 saw the release of text-to-image
models like DALL-E 2 and Stable Diffusion, text-to-video systems like Make-A-Video, and chatbots
like ChatGPT. Still, these systems can be prone to hallucination, confidently outputting incoherent or
untrue responses, making it hard to rely on them for critical applications.
AI systems become more flexible. Traditionally AI systems have performed well on narrow tasks
but have struggled across broader tasks. Recently released models challenge that trend; BEiT-3,
PaLI, and Gato, among others, are single AI systems increasingly capable of navigating multiple tasks
(for example, vision, language).
Capable language models still struggle with reasoning. Language models continued to improve
their generative capabilities, but new research suggests that they still struggle with complex
planning tasks.
AI is both helping and harming the environment. New research suggests that AI systems can have
serious environmental impacts. According to Luccioni et al., 2022, BLOOM’s training run emitted 25
times more carbon than a single air traveler on a one-way trip from New York to San Francisco. Still,
new reinforcement learning models like BCOOLER show that AI systems can be used to optimize
energy usage.
The world’s best new scientist … AI? AI models are starting to rapidly accelerate scientific
progress and in 2022 were used to aid hydrogen fusion, improve the efficiency of matrix
manipulation, and generate new antibodies.
AI starts to build better AI. Nvidia used an AI reinforcement learning agent to improve the design
of the chips that power AI systems. Similarly, Google recently used one of its language models,
PaLM, to suggest ways to improve the very same model. Self-improving AI learning will accelerate
AI progress.
Index Report 2023
Chapter 3: Technical AI Ethics
The effects of model scale on bias and toxicity are confounded by training data and mitigation
methods. In the past year, several institutions have built their own large models trained on
proprietary data—and while large models are still toxic and biased, new evidence suggests that
these issues can be somewhat mitigated after training larger models with instruction-tuning.
Generative models have arrived and so have their ethical problems. In 2022, generative models
became part of the zeitgeist. These models are capable but also come with ethical challenges. Text-
to-image generators are routinely biased along gender dimensions, and chatbots like ChatGPT can
be tricked into serving nefarious aims.
The number of incidents concerning the misuse of AI is rapidly rising. According to the AIAAIC
database, which tracks incidents related to the ethical misuse of AI, the number of AI incidents
and controversies has increased 26 times since 2012. Some notable incidents in 2022 included a
deepfake video of Ukrainian President Volodymyr Zelenskyy surrendering and U.S. prisons using
call-monitoring technology on their inmates. This growth is evidence of both greater use of AI
technologies and awareness of misuse possibilities.
Fairer models may not be less biased. Extensive analysis of language models suggests that while there
is a clear correlation between performance and fairness, fairness and bias can be at odds: Language
models which perform better on certain fairness benchmarks tend to have worse gender bias.
Interest in AI ethics continues to skyrocket. The number of accepted submissions to FAccT, a

leading AI ethics conference, has more than doubled since 2021 and increased by a factor of 10 since
2018. 2022 also saw more submissions than ever from industry actors.
Automated fact-checking with natural language processing isn’t so straightforward after all.
While several benchmarks have been developed for automated fact-checking, researchers find that
11 of 16 of such datasets rely on evidence “leaked” from fact-checking reports which did not exist at
the time of the claim surfacing.
Index Report 2023
Chapter 4: The Economy
The demand for AI-related professional skills is increasing across virtually every American
industrial sector. Across every sector in the United States for which there is data (with the exception
of agriculture, forestry, fishing, and hunting), the number of AI-related job postings has increased on
average from 1.7% in 2021 to 1.9% in 2022. Employers in the United States are increasingly looking for
workers with AI-related skills.
For the first time in the last decade, year-over-year private investment in AI decreased.
Global AI private investment was $91.9 billion in 2022, which represented a 26.7% decrease since 2021.
The total number of AI-related funding events as well as the number of newly funded AI companies
likewise decreased. Still, during the last decade as a whole, AI investment has significantly increased.
In 2022 the amount of private investment in AI was 18 times greater than it was in 2013.
Once again, the United States leads in investment in AI. The U.S. led the world in terms of total
amount of AI private investment. In 2022, the $47.4 billion invested in the U.S. was roughly 3.5 times
the amount invested in the next highest country, China ($13.4 billion). The U.S. also continues to lead in
terms of total number of newly funded AI companies, seeing 1.9 times more than the European Union
and the United Kingdom combined, and 3.4 times more than China.
In 2022, the AI focus area with the most investment was medical and healthcare ($6.1 billion);
followed by data management, processing, and cloud ($5.9 billion); and Fintech ($5.5 billion).
However, mirroring the broader trend in AI private investment, most AI focus areas saw less
investment in 2022 than in 2021. In the last year, the three largest AI private investment events were:
(1) a $2.5 billion funding event for GAC Aion New Energy Automobile, a Chinese manufacturer of
electric vehicles; (2) a $1.5 billion Series E funding round for Anduril Industries, a U.S. defense products
company that builds technology for military agencies and border surveillance; and (3) a $1.2 billion
investment in Celonis, a business-data consulting company based in Germany.
While the proportion of companies adopting AI has plateaued, the companies that have adopted
AI continue to pull ahead. The proportion of companies adopting AI in 2022 has more than doubled
since 2017, though it has plateaued in recent years between 50% and 60%, according to the results of
McKinsey’s annual research survey. Organizations that have adopted AI report realizing meaningful
cost decreases and revenue increases.
Index Report 2023
Chapter 4: The Economy (cont’d)

AI is being deployed by businesses in multifaceted ways. The AI capabilities most likely to have
been embedded in businesses include robotic process automation (39%), computer vision (34%), NL
text understanding (33%), and virtual agents (33%). Moreover, the most commonly adopted AI use
case in 2022 was service operations optimization (24%), followed by the creation of new AI-based
products (20%), customer segmentation (19%), customer service analytics (19%), and new AI-based
enhancement of products (19%).
AI tools like Copilot are tangibly helping workers. Results of a GitHub survey on the use of Copilot,
a text-to-code AI system, find that 88% of surveyed respondents feel more productive when using
the system, 74% feel they are able to focus on more satisfying work, and 88% feel they are able to
complete tasks more quickly.
China dominates industrial robot installations. In 2013, China overtook Japan as the nation installing
the most industrial robots. Since then, the gap between the total number of industrial robots installed
by China and the next-nearest nation has widened. In 2021, China installed more industrial robots than
the rest of the world combined.
Index Report 2023
Chapter 5: Education
More and more AI specialization. The proportion of new computer science PhD graduates from
U.S. universities who specialized in AI jumped to 19.1% in 2021, from 14.9% in 2020 and 10.2% in 2010.
New AI PhDs increasingly head to industry. In 2011, roughly the same proportion of new AI PhD
graduates took jobs in industry (40.9%) as opposed to academia (41.6%). Since then, however, a
majority of AI PhDs have headed to industry. In 2021, 65.4% of AI PhDs took jobs in industry, more
than double the 28.2% who took jobs in academia.
New North American CS, CE, and information faculty hires stayed flat. In the last decade,
the total number of new North American computer science (CS), computer engineering (CE),
and information faculty hires has decreased: There were 710 total hires in 2021 compared to
733 in 2012. Similarly, the total number of tenure-track hires peaked in 2019 at 422 and then
dropped to 324 in 2021.
The gap in external research funding for private versus public American CS departments
continues to widen. In 2011, the median amount of total expenditure from external sources for
computing research was roughly the same for private and public CS departments in the United
States. Since then, the gap has widened, with private U.S. CS departments receiving millions more
in additional funding than public universities. In 2021, the median expenditure for private universities
was $9.7 million, compared to $5.7 million for public universities.
Interest in K–12 AI and computer science education grows in both the United States and the
rest of the world. In 2021, a total of 181,040 AP computer science exams were taken by American
students, a 1.0% increase from the previous year. Since 2007, the number of AP computer science
exams has increased ninefold. As of 2021, 11 countries, including Belgium, China, and South Korea,
have officially endorsed and implemented a K–12 AI curriculum.
Index Report 2023
Chapter 6: Policy and Governance
Policymaker interest in AI is on the rise. An AI Index analysis of the legislative records of 127
countries shows that the number of bills containing “artificial intelligence” that were passed into law
grew from just 1 in 2016 to 37 in 2022. An analysis of the parliamentary records on AI in 81 countries
likewise shows that mentions of AI in global legislative proceedings have increased nearly 6.5 times
since 2016.
From talk to enactment—the U.S. passed more AI bills than ever before. In 2021, only 2% of
all federal AI bills in the United States were passed into law. This number jumped to 10% in 2022.
Similarly, last year 35% of all state-level AI bills were passed into law.
When it comes to AI, policymakers have a lot of thoughts. A qualitative analysis of the
parliamentary proceedings of a diverse group of nations reveals that policymakers think about
AI from a wide range of perspectives. For example, in 2022, legislators in the United Kingdom
discussed the risks of AI-led automation; those in Japan considered the necessity of safeguarding
human rights in the face of AI; and those in Zambia looked at the possibility of using AI for
weather forecasting.
The U.S. government continues to increase spending on AI. Since 2017, the amount of U.S.
government AI-related contract spending has increased roughly 2.5 times.
The legal world is waking up to AI. In 2022, there were 110 AI-related legal cases in United
States state and federal courts, roughly seven times more than in 2016. The majority of these cases
originated in California, New York, and Illinois, and concerned issues relating to civil, intellectual
property, and contract law.
Index Report 2023
Chapter 7: Diversity
North American bachelor’s, master’s, and PhD-level computer science students are becoming
more ethnically diverse. Although white students are still the most represented ethnicity among
new resident bachelor’s, master’s, and PhD-level computer science graduates, students from other
ethnic backgrounds (for example, Asian, Hispanic, and Black or African American) are becoming
increasingly more represented. For example, in 2011, 71.9% of new resident CS bachelor’s graduates
were white. In 2021, that number dropped to 46.7%.
New AI PhDs are still overwhelmingly male. In 2021, 78.7% of new AI PhDs were male.
Only 21.3% were female, a 3.2 percentage point increase from 2011. There continues to be a gender
imbalance in higher-level AI education.
Women make up an increasingly greater share of CS, CE, and information faculty hires.
Since 2017, the proportion of new female CS, CE, and information faculty hires has increased from
24.9% to 30.2%. Still, most CS, CE, and information faculty in North American universities are male
(75.9%). As of 2021, only 0.1% of CS, CE, and information faculty identify as nonbinary.
American K–12 computer science education has become more diverse, in terms of both gender
and ethnicity. The share of AP computer science exams taken by female students increased from
16.8% in 2007 to 30.6% in 2021. Year over year, the share of Asian, Hispanic/Latino/Latina, and
Black/African American students taking AP computer science has likewise increased.
Index Report 2023
Chapter 8: Public Opinion
Chinese citizens are among those who feel the most positively about AI products and services.
Americans … not so much. In a 2022 IPSOS survey, 78% of Chinese respondents (the highest
proportion of surveyed countries) agreed with the statement that products and services using AI
have more benefits than drawbacks. After Chinese respondents, those from Saudi Arabia (76%) and
India (71%) felt the most positive about AI products. Only 35% of sampled Americans (among the
lowest of surveyed countries) agreed that products and services using AI had more benefits than
drawbacks.
Men tend to feel more positively about AI products and services than women. Men are also
more likely than women to believe that AI will mostly help rather than harm. According to the
2022 IPSOS survey, men are more likely than women to report that AI products and services make
their lives easier, trust companies that use AI, and feel that AI products and services have more
benefits than drawbacks. A 2021 survey by Gallup and Lloyd’s Register Foundation likewise revealed
that men are more likely than women to agree with the statement that AI will mostly help rather than
harm their country in the next 20 years.
People across the world and especially America remain unconvinced by self-driving cars. In
a global survey, only 27% of respondents reported feeling safe in a self-driving car. Similarly, Pew
Research suggests that only 26% of Americans feel that driverless passenger vehicles are a good
idea for society.
Different causes for excitement and concern. Among a sample of surveyed Americans, those
who report feeling excited about AI are most excited about the potential to make life and society
better (31%) and to save time and make things more efficient (13%). Those who report feeling more
concerned worry about the loss of human jobs (19%); surveillance, hacking, and digital privacy (16%);
and the lack of human connection (12%).
NLP researchers … have some strong opinions as well. According to a survey widely distributed to
NLP researchers, 77% either agreed or weakly agreed that private AI firms have too much influence,
41% said that NLP should be regulated, and 73% felt that AI could soon lead to revolutionary societal
change. These were some of the many strong opinions held by the NLP research community.
Index Report 2023
Index Report 2023
CHAPTER 1:
Research and
Development
Table of Contents Chapter 1 Preview 20

Index Report 2023
CHAPTER 1 PREVIEW:
Research and Development

Overview 22 Computer Vision 46
Chapter Highlights 23 Natural Language Processing 47
Speech Recognition 48
1.1 Publications 24
Overview 24 1.2 Trends in Significant
Total Number of AI Publications 24 Machine Learning Systems 49
By Type of Publication 25 General Machine Learning Systems 49
By Field of Study 26 System Types 49
By Sector 27 Sector Analysis 50
Cross-Country Collaboration 29 National Affiliation 51
Cross-Sector Collaboration 31 Systems 51
AI Journal Publications 32 Authorship 53
Overview 32 Parameter Trends 54
By Region 33 Compute Trends 56
By Geographic Area 34 Large Language and Multimodal Models 58
Citations 35 National Affiliation 58
AI Conference Publications 36 Parameter Count 60
Overview 36 Training Compute 61
By Region 37 Training Cost 62
By Geographic Area 38
Citations 39 1.3 AI Conferences 64
AI Repositories 40 Conference Attendance 64
Overview 40
1.4 Open-Source AI Software 66
By Region 41
Projects 66
Stars 68
Citations 43
Narrative Highlight: ACCESS THE PUBLIC DATA
Top Publishing Institutions 44
All Fields 44

Artificial Intelligence Chapter 1: Research and Development
Index Report 2023
Overview
This chapter captures trends in AI R&D. It begins by examining AI publications,
including journal articles, conference papers, and repositories. Next it considers data
on significant machine learning systems, including large language and multimodal
models. Finally, the chapter concludes by looking at AI conference attendance and
open-source AI research. Although the United States and China continue to dominate
AI R&D, research efforts are becoming increasingly geographically dispersed.

Index Report 2023
Chapter Highlights
The United States and China Industry races ahead
had the greatest number of of academia.
cross-country collaborations in AI Until 2014, most significant machine
publications from 2010 to 2021, learning models were released by
academia. Since then, industry has taken
although the pace of collaboration
over. In 2022, there were 32 significant
has since slowed. industry-produced machine learning
The number of AI research collaborations between
models compared to just three produced
the United States and China increased roughly 4
by academia. Building state-of-the-art
times since 2010, and was 2.5 times greater than the
AI systems increasingly requires large
collaboration totals of the next nearest country pair,
amounts of data, computer power, and
the United Kingdom and China. However, the total
money—resources that industry actors
number of U.S.-China collaborations only increased
inherently possess in greater amounts
by 2.1% from 2020 to 2021, the smallest year-over-
compared to nonprofits and academia.
year growth rate since 2010.
AI research is on the rise, across Large language models

the board. The total number of AI publications are getting bigger and
has more than doubled since 2010. The specific AI
more expensive.
topics that continue to dominate research include
GPT-2, released in 2019, considered
pattern recognition, machine learning,
by many to be the first large language
and computer vision.
model, had 1.5 billion parameters and
cost an estimated $50,000 USD to
train. PaLM, one of the flagship large
China continues to lead in total language models launched in 2022,
AI journal, conference, and had 540 billion parameters and cost an
repository publications. estimated $8 million USD—PaLM was
The United States is still ahead in terms of AI around 360 times larger than GPT-2 and
conference and repository citations, but those leads cost 160 times more. It’s not just PaLM:
are slowly eroding. Still, the majority of the world’s Across the board, large language and
large language and multimodal models (54% in 2022) multimodal models are becoming larger
are produced by American institutions. and pricier.

Index Report 2023 1.1 Publications
This section draws on data from the Center for Security and Emerging Technology (CSET) at Georgetown University. CSET maintains a
merged corpus of scholarly literature that includes Digital Science’s Dimensions, Clarivate’s Web of Science, Microsoft Academic Graph,
China National Knowledge Infrastructure, arXiv, and Papers With Code. In that corpus, CSET applied a classifier to identify English-
language publications related to the development or application of AI and ML since 2010. For this year’s report, CSET also used select
Chinese AI keywords to identify Chinese-language AI papers; CSET did not deploy this method for previous iterations of the AI Index report.1
In last year’s edition of the report, publication trends were reported up to the year 2021. However, given that there is a significant lag in the
collection of publication metadata, and that in some cases it takes until the middle of any given year to fully capture the previous year’s
publications, in this year’s report, the AI Index team elected to examine publication trends only through 2021, which we, along with CSET,
are confident yields a more fully representative report.
1.1 Publications
Overview publication and citation data by region for AI journal
articles, conference papers, repositories, and patents.
The figures below capture the total number
of English-language and Chinese-language AI Total Number of AI Publications
publications globally from 2010 to 2021—by type, Figure 1.1.1 shows the number of AI publications in
affiliation, cross-country collaboration, and cross- the world. From 2010 to 2021, the total number of
industry collaboration. The section also breaks down AI publications more than doubled, growing from
200,000 in 2010 to almost 500,000 in 2021.
Number of AI Publications in the World, 2010–21

Source: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report
496.01
500
400
Number of AI Publications (in Thousands)
300
200
100
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 1.1.1
1 See the Appendix for more information on CSET’s methodology. For more on the challenge of defining AI and correctly capturing relevant bibliometric data, see the AI Index team’s
discussion in the paper “Measurement in AI Policy: Opportunities and Challenges.”

By Type of Publication book chapters, theses, and unknown document types

Figure 1.1.2 shows the types of AI publications released made up the remaining 10% of publications. While
globally over time. In 2021, 60% of all published AI journal and repository publications have grown 3
documents were journal articles, 17% were conference and 26.6 times, respectively, in the past 12 years, the
papers, and 13% were repository submissions. Books, number of conference papers has declined since 2019.
Number of AI Publications by Type, 2010–21

300
293.48, Journal
270
240
210
180
150
120
90
85.09, Conference
65.21, Repository
60
29.88, Thesis
30
13.77, Book Chapter
5.82, Unknown
0 2.76, Book
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 1.1.2

By Field of Study roughly doubled while the number of machine learning

Figure 1.1.3 shows that publications in pattern papers has roughly quadrupled. Following those two
recognition and machine learning have experienced topic areas, in 2021, the next most published AI fields
the sharpest growth in the last half decade. Since of study were computer vision (30,075), algorithm
2015, the number of pattern recognition papers has (21,527), and data mining (19,181).
Number of AI Publications by Field of Study (Excluding Other AI), 2010–21

60 59.36, Pattern Recognition
50
42.55, Machine Learning

40
30 30.07, Computer Vision
21.53, Algorithm
20 19.18, Data Mining
14.99, Natural Language Processing

11.57, Control Theory
10 10.37, Human–Computer Interaction
6.74, Linguistics
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 1.1.3

By Sector
This section shows the number of AI publications 1.1.5).2 The education sector dominates in each region.
affiliated with education, government, industry, The level of industry participation is highest in the
nonprofit, and other sectors—first globally (Figure United States, then in the European Union. Since
1.1.4), then looking at the United States, China, and 2010, the share of education AI publications has been
the European Union plus the United Kingdom (Figure dropping in each region.
AI Publications (% of Total) by Sector, 2010–21

80%
75.23%, Education
70%
60%
AI Publications (% of Total)
50%
40%
30%
20%
13.60%, Nonpro t
10%
7.21%, Industry
3.74%, Government
0% 0.22%, Other
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 1.1.4
2 The categorization is adapted based on the Global Research Identifier Database (GRID). Healthcare, including hospitals and facilities, is included under nonprofit. Publications affiliated with
state-sponsored universities are included in the education sector.

AI Publications (% of Total) by Sector and Geographic Area, 2021

69.17%
Education 69.23%
77.85%
14.82%
Nonpro t 18.63%
11.73%
12.60%
Industry 7.90%
5.47%
3.21%
Government 3.92%
4.74%
0.20% United States
Other 0.33% European Union and United Kingdom

China
0.20%
0% 10% 20% 30% 40% 50% 60% 70% 80%

AI Publications (% of Total)
Figure 1.1.5

Cross-Country Collaboration
Cross-border collaborations between academics, By far, the greatest number of collaborations in the
researchers, industry experts, and others are a key past 12 years took place between the United States
component of modern STEM (science, technology, and China, increasing roughly four times since 2010.
engineering, and mathematics) development that However the total number of U.S.-China collaborations
accelerate the dissemination of new ideas and the only increased by 2.1% from 2020 to 2021, the smallest
growth of research teams. Figures 1.1.6 and 1.1.7 depict year-over-year growth rate since 2010.
the top cross-country AI collaborations from 2010
The next largest set of collaborations was between
to 2021. CSET counted cross-country collaborations
the United Kingdom and both China and the United
as distinct pairs of countries across authors for each
States. In 2021, the number of collaborations between
publication (e.g., four U.S. and four Chinese-affiliated
the United States and China was 2.5 times greater
authors on a single publication are counted as one
than between the United Kingdom and China.
U.S.-China collaboration; two publications between
the same authors count as two collaborations).
United States and China Collaborations in AI Publications, 2010–21

10.47
10
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 1.1.6

Cross-Country Collaborations in AI Publications (Excluding U.S. and China), 2010–21

4.13, United Kingdom and China

4 4.04, United States and United Kingdom
3.42, United States and Germany
3
2.80, China and Australia
2.61, United States and Australia
2
1.83, United States and France
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 1.1.7

Cross-Sector Collaboration
The increase in AI research outside of academia has educational institutions (12,856); and educational
broadened and grown collaboration across sectors and government institutions (8,913). Collaborations
in general. Figure 1.1.8 shows that in 2021 educational between educational institutions and industry have
institutions and nonprofits (32,551) had the greatest been among the fastest growing, increasing 4.2 times
number of collaborations; followed by industry and since 2010.
Cross-Sector Collaborations in AI Publications, 2010–21

32.55, Education and Nonpro t
30
25
20
15
12.86, Industry and Education
10
8.91, Education and Government
5
2.95, Government and Nonpro t
2.26, Industry and Nonpro t
0.63, Industry and Government
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 1.1.8

AI Journal Publications
Overview
After growing only slightly from 2010 to 2015, the number of AI journal publications grew around 2.3 times since
2015. From 2020 to 2021, they increased 14.8% (Figure 1.1.9).
Number of AI Journal Publications, 2010–21

300 293.48
Number of AI Journal Publications (in Thousands)
250
200
150
100
50
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 1.1.9

By Region3
Figure 1.1.10 shows the share of AI journal publications East Asia and the Pacific; Europe and Central Asia;
by region between 2010 and 2021. In 2021, East Asia as well as North America have been declining.
and the Pacific led with 47.1%, followed by Europe During that period, there has been an increase in
and Central Asia (17.2%), and then North America publications from other regions such as South Asia;
(11.6%). Since 2019, the share of publications from and the Middle East and North Africa.
AI Journal Publications (% of World Total) by Region, 2010–21

50%
47.14%, East Asia and Paci c
40%
AI Journal Publications (% of World Total)
30%
20%
17.20%, Europe and Central Asia
11.61%, North America

10% 6.93%, Unknown
6.75%, South Asia
4.64%, Middle East and North Africa
2.66%, Latin America and the Caribbean
2.30%, Rest of the World
0% 0.77%, Sub-Saharan Africa
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 1.1.10
3 Regions in this chapter are classified according to the World Bank analytical grouping.

By Geographic Area4
Figure 1.1.11 breaks down the share of AI journal throughout, with 39.8% in 2021, followed by the
publications over the past 12 years by geographic European Union and the United Kingdom (15.1%),
area. This year’s AI Index included India in recognition then the United States (10.0%). The share of Indian
of the increasingly important role it plays in the publications has been steadily increasing—from 1.3%
AI ecosystem. China has remained the leader in 2010 to 5.6% in 2021.
AI Journal Publications (% of World Total) by Geographic Area, 2010–21

40% 39.78%, China

AI Journal Publications (% of World Total)
30%

20%
15.05%, European Union and United Kingdom
10% 10.03%, United States
6.88%, Unknown
5.56%, India
0%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 1.1.11
4 In this chapter we use “geographic area” based on CSET’s classifications, which are disaggregated not only by country, but also by territory. Further, we count the European Union and the
United Kingdom as a single geographic area to reflect the regions’ strong history of research collaboration.

Citations
China’s share of citations in AI journal publications 1.1.12). China, the European Union and the United
has gradually increased since 2010, while those of the Kingdom, and the United States accounted for 65.7%
European Union and the United Kingdom, as well as of the total citations in the world.
those of the United States, have decreased (Figure
AI Journal Citations (% of World Total) by Geographic Area, 2010–21

30%
29.07%, China
25%
AI Journal Citations (% of World Total)

20%
15% 15.08%, United States
10%
6.05%, India
5%
0.92%, Unknown
0%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 1.1.12

AI Conference Publications
Overview
The number of AI conference publications peaked in 2019, and fell 20.4% below the peak in 2021 (Figure 1.1.13).
The total number of 2021 AI conference publications, 85,094, was marginally greater than the 2010 total of 75,592.
Number of AI Conference Publications, 2010–21

100
Number of AI Conference Publications (in Thousands)
85.09
80
60
40
20
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 1.1.13

By Region
Figure 1.1.14 shows the number of AI conference East Asia and the Pacific continues to rise, accounting
publications by region. As with the trend in journal for 36.7% in 2021, followed by Europe and Central
publications, East Asia and the Pacific; Europe Asia (22.7%), and then North America (19.6%). The
and Central Asia; and North America account for percentage of AI conference publications in South Asia
the world’s highest numbers of AI conference saw a noticeable rise in the past 12 years, growing from
publications. Specifically, the share represented by 3.6% in 2010 to 8.5% in 2021.
AI Conference Publications (% of World Total) by Region, 2010–21

40%

35%
AI Conference Publications (% of World Total)
30%
25%
20% 19.56%, North America
15%
10%
8.45%, South Asia
5% 3.07%, Latin America and the Caribbean
2.76%, Unknown
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 1.1.14

By Geographic Area
In 2021, China produced the greatest share of the came in third at 17.2% (Figure 1.1.15). Mirroring trends
world’s AI conference publications at 26.2%, having seen in other parts of the research and development
overtaken the European Union and the United section, India’s share of AI conference publications is
Kingdom in 2017. The European Union plus the United also increasing.
Kingdom followed at 20.3%, and the United States
AI Conference Publications (% of World Total) by Geographic Area, 2010–21

30%

AI Conference Publications (% of World Total)
26.15%, China
25%
20% 20.29%, European Union and United Kingdom
17.23%, United States
15%
10%
6.79%, India
5%
2.70%, Unknown
0%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 1.1.15

Citations
Despite China producing the most AI conference conference citations, with 23.9%, followed by China’s
publications in 2021, Figure 1.1.16 shows that 22.0%. However, the gap between American and
the United States had the greatest share of AI Chinese AI conference citations is narrowing.
AI Conference Citations (% of World Total) by Geographic Area, 2010–21

35%
30%
AI Conference Citations (% of World Total)

25%
22.02%, China
20%
15%
10%
6.09%, India
5%
0.87%, Unknown
0%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 1.1.16

AI Repositories
Overview
Publishing pre-peer-reviewed papers on repositories share their findings before submitting them to journals
of electronic preprints (such as arXiv and SSRN) and conferences, thereby accelerating the cycle of
has become a popular way for AI researchers to information discovery. The number of AI repository
disseminate their work outside traditional avenues for publications grew almost 27 times in the past 12 years
publication. These repositories allow researchers to (Figure 1.1.17).
Number of AI Repository Publications, 2010–21

65.21
60
Number of AI Repository Publications (in Thousands)
50
40
30
20
10
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 1.1.17

By Region
Figure 1.1.18 shows that North America has by East Asia and the Pacific has grown significantly
maintained a steady lead in the world share of AI since 2010 and continued growing from 2020 to
repository publications since 2016. Since 2011, the 2021, a period in which the year-over-year share of
share of repository publications from Europe and North American as well European and Central Asian
Central Asia has declined. The share represented repository publications declined.
AI Repository Publications (% of World Total) by Region, 2010–21

AI Repository Publications (% of World Total)
30%

23.99%, Unknown

20%
10%
3.41%, South Asia

2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 1.1.18

By Geographic Area
While the United States has held the lead in the (Figure 1.1.19). In 2021, the United States accounted
percentage of global AI repository publications since for 23.5% of the world’s AI repository publications,
2016, China is catching up, while the European Union followed by the European Union plus the United
plus the United Kingdom’s share continues to drop Kingdom (20.5%), and then China (11.9%).
AI Repository Publications (% of World Total) by Geographic Area, 2010–21

AI Repository Publications (% of World Total)
30%

23.18%, Unknown
20% 20.54%, European Union and United Kingdom
11.87%, China
10%
2.85%, India
0%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 1.1.19

Citations
In the citations of AI repository publications, Figure a dominant lead over the European Union plus the
1.1.20 shows that in 2021 the United States topped United Kingdom (21.5%), as well as China (21.0%).
the list with 29.2% of overall citations, maintaining
AI Repository Citations (% of World Total) by Geographic Area, 2010–21

40%
AI Repository Citations (% of World Total)
30%

20.98%, China
20%
10%
4.59%, Unknown
1.91%, India
0%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 1.1.20

Narrative Highlight:
Top Publishing Institutions
All Fields University, the University of the Chinese Academy
Since 2010, the institution producing the greatest of Sciences, Shanghai Jiao Tong University,
number of total AI papers has been the Chinese and Zhejiang University.5 The total number of
Academy of Sciences (Figure 1.1.21). The next publications released by each of these institutions
top four are all Chinese universities: Tsinghua in 2021 is displayed in Figure 1.1.22.
Top Ten Institutions in the World in 2021 Ranked by Number of AI Publications in All Fields, 2010–21
1 1, Chinese Academy of Sciences
2 2, Tsinghua University
3 3, University of Chinese Academy of Sciences
4 4, Shanghai Jiao Tong University
5 5, Zhejiang University
Rank
6 6, Harbin Institute of Technology
7 7, Beihang University
8 8, University of Electronic Science and Technology of China
9 9, Peking University
10 10, Massachusetts Institute of Technology
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 1.1.21
5 It is important to note that many Chinese research institutions are large, centralized organizations with thousands of researchers. It is therefore not entirely surprising that,
purely by the metric of publication count, they outpublish most non-Chinese institutions.

Top Publishing Institutions (cont’d)
Top Ten Institutions in the World by Number of AI Publications in All Fields, 2021
Chinese Academy of Sciences 5,099
Tsinghua University 3,373
University of Chinese Academy

2,904
of Sciences
Shanghai Jiao Tong University 2,703
Zhejiang University 2,590
Harbin Institute of Technology 2,016
Beihang University 1,970
University of Electronic Science

1,951
and Technology of China
Peking University 1,893
Massachusetts Institute of
1,745
Technology
0 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 4,500 5,000
Number of AI Publications
Figure 1.1.22

Computer Vision
In 2021, the top 10 institutions publishing the greatest number of AI computer vision publications were
all Chinese (Figure 1.1.23). The Chinese Academy of Sciences published the largest number of such
publications, with a total of 562.
Top Ten Institutions in the World by Number of AI Publications in Computer Vision, 2021
Chinese Academy of Sciences 562
Shanghai Jiao Tong University 316

314
of Sciences
Tsinghua University 296
Zhejiang University 289
Beihang University 247
Wuhan University 231
Beijing Institute of Technology 229
Harbin Institute of Technology 210
Tianjin University 182
0 100 200 300 400 500

Figure 1.1.23

Natural Language Processing
American institutions are represented to a took second place (140 publications), followed by
greater degree in the share of top NLP publishers Microsoft (134). In addition, 2021 was the first year
(Figure 1.1.24). Although the Chinese Academy of Amazon and Alibaba were represented among the
Sciences was again the world’s leading institution top-ten largest publishing NLP institutions.
in 2021 (182 publications), Carnegie Mellon
Top Ten Institutions in the World by Number of AI Publications in Natural Language Processing, 2021
Carnegie Mellon University 140
Microsoft (United States) 134
Carnegie Mellon University

116
Australia
Google (United States) 116
Peking University 113

112
of Sciences
Alibaba Group (China) 100
Amazon (United States) 98
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190
Figure 1.1.24

Speech Recognition
In 2021, the greatest number of speech recognition papers came from the Chinese Academy of Sciences
(107), followed by Microsoft (98) and Google (75) (Figure 1.1.25). The Chinese Academy of Sciences
reclaimed the top spot in 2021 from Microsoft, which held first position in 2020.
Top Ten Institutions in the World by Number of AI Publications in Speech Recognition, 2021
Microsoft (United States) 98
Google (United States) 75

66
of Sciences
University of Science
59
and Technology of China
Carnegie Mellon University 57
Tencent (China) 57
Chinese University of Hong Kong 55
Amazon (United States) 54
0 10 20 30 40 50 60 70 80 90 100 110
Figure 1.1.25

Index Report 2023 1.2 Trends in Significant Machine Learning Systems
Epoch AI is a collective of researchers investigating and forecasting the development of advanced AI. Epoch curates a database of
significant AI and machine learning systems that have been released since the 1950s. There are different criteria under which the
Epoch team decides to include particular AI systems in their database; for example, the system may have registered a state-of-the-art
improvement, been deemed to have been historically significant, or been highly cited.
This subsection uses the Epoch database to track trends in significant AI and machine learning systems. The latter half of the chapter
includes research done by the AI Index team that reports trends in large language and multimodal models, which are models trained on
large amounts of data and adaptable to a variety of downstream applications.
1.2 Trends in Significant

Machine Learning Systems
General Machine System Types
Among the significant AI machine learning systems
Learning Systems released in 2022, the most common class of system
The figures below report trends among all machine was language (Figure 1.2.1). There were 23 significant
learning systems included in the Epoch dataset. For AI language systems released in 2022, roughly six
reference, these systems are referred to as significant times the number of the next most common system
machine learning systems throughout the subsection. type, multimodal systems.
Number of Significant Machine Learning Systems by Domain, 2022

Source: Epoch, 2022 | Chart: 2023 AI Index Report
Language 23
Multimodal 4
Drawing 3
Vision 2
Speech 2
Text-to-Video 1
Other 1
Games 1
0 2 4 6 8 10 12 14 16 18 20 22 24
Number of Signi cant Machine Learning Systems
Figure 1.2.16
6 There were 38 total significant AI machine learning systems released in 2022, according to Epoch; however, one of the systems, BaGuaLu, did not have a domain classification
and is therefore omitted from Figure 1.2.1.

Sector Analysis
Which sector among industry, academia, or nonprofit machine learning systems compared to just three
has released the greatest number of significant produced by academia. Producing state-of-the-art
machine learning systems? Until 2014, most machine AI systems increasingly requires large amounts of
learning systems were released by academia. data, computing power, and money; resources that
Since then, industry has taken over (Figure 1.2.2). In industry actors possess in greater amounts compared
2022, there were 32 significant industry-produced to nonprofits and academia.
Number of Significant Machine Learning Systems by Sector, 2002–22

35
32, Industry
30
25
20
15
10
5
3, Academia
2, Research Collective
1, Industry-Academia Collaboration
0 0, Nonpro t
2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 2022
Figure 1.2.2

National Affiliation
In order to paint a picture of AI’s evolving or AI-research firm, was headquartered. In 2022,
geopolitical landscape, the AI Index research the United States produced the greatest number
team identified the nationality of the authors who of significant machine learning systems with 16,
contributed to the development of each significant followed by the United Kingdom (8) and China (3).
machine learning system in the Epoch dataset. 7 Moreover, since 2002 the United States has outpaced
the United Kingdom and the European Union, as well
Systems
as China, in terms of the total number of significant
Figure 1.2.3 showcases the total number of
machine learning systems produced (Figure 1.2.4).
significant machine learning systems attributed to
Figure 1.2.5 displays the total number of significant
researchers from particular countries.8 A researcher
machine learning systems produced by country since
is considered to have belonged to the country in
2002 for the entire world.
which their institution, for example a university
Number of Significant Machine Learning Systems by Number of Significant Machine Learning Systems by
Country, 2022 Select Geographic Area, 2002–22
Source: Epoch and AI Index, 2022 | Chart: 2023 AI Index Report Source: Epoch and AI Index, 2022 | Chart: 2023 AI Index Report
United States 16
Number of Significant Machine Learning Systems
30
United Kingdom 8
China 3 25
Canada 2
20
Germany 2
16, United States
15
France 1
12, European Union and
United Kingdom
India 1 10
Israel 1
5
Russia 1 3, China
0
Singapore 1
2002
2004
2006
2008
2010
2012
2014
2016
2018
2020
2022
0 2 4 6 8 10 12 14 16
Figure 1.2.3 Figure 1.2.4
7 The methodology by which the AI Index identified authors’ nationality is outlined in greater detail in the Appendix.
8 A machine learning system is considered to be affiliated with a particular country if at least one author involved in creating the model was affiliated with that country.
Consequently, in cases where a system has authors from multiple countries, double counting may occur.

Number of Significant Machine Learning Systems by Country, 2002–22 (Sum)

Source: AI Index, 2022 | Chart: 2023 AI Index Report
0
1–10
11–20
21–60
61–255
Figure 1.2.5

Authorship
Figures 1.2.6 to 1.2.8 look at the total number of in 2022 the United States had the greatest number of
authors, disaggregated by national affiliation, that authors producing significant machine learning systems,
contributed to the launch of significant machine with 285, more than double that of the United Kingdom
learning systems. As was the case with total systems, and nearly six times that of China (Figure 1.2.6).
Number of Authors of Significant Machine Learning Number of Authors of Significant Machine Learning
Systems by Country, 2022 Systems by Select Geographic Area, 2002–22
400
United States 285
United Kingdom 139 350
China 49 300
285, United States
Canada 21 Number of Authors 250
Israel 13
200
Sweden 8 155, European Union and
150 United Kingdom
Germany 7
100
Russia 3
50 49, China
India 2
France 1 0
0 50 100 150 200 250 300

2002
2004
2006
2008
2010
2012
2014
2016
2018
2020
2022
Number of Authors
Number of Authors of Significant Machine Learning Systems by Country, 2002–22 (Sum)

0
1–10
11–20
21–60
61–180
181–370
371–680
Figure 1.2.8
681–2000

Parameter Trends
Parameters are numerical values that are learned by dataset by sector. Over time, there has been a steady
machine learning models during training. The value of increase in the number of parameters, an increase that
parameters in machine learning models determines has become particularly sharp since the early 2010s.
how a model might interpret input data and make The fact that AI systems are rapidly increasing their
predictions. Adjusting parameters is an essential parameters is reflective of the increased complexity of
step in ensuring that the performance of a machine the tasks they are being asked to perform, the greater
learning system is optimized. availability of data, advancements in underlying
hardware, and most importantly, the demonstrated
Figure 1.2.9 highlights the number of parameters of
performance of larger models.
the machine learning systems included in the Epoch
Number of Parameters of Significant Machine Learning Systems by Sector, 1950–2022

1.0e+14
Academia Industry Industry-Academia Collaboration Nonpro t Research Collective
1.0e+12
Number of Parameters (Log Scale)
1.0e+10
1.0e+8
1.0e+6
1.0e+4
1.0e+2
1950 1954 1958 1962 1966 1970 1974 1978 1982 1986 1990 1994 1998 2002 2006 2010 2014 2018 2022
Figure 1.2.9

Figure 1.2.10 demonstrates the parameters of machine learning systems by domain. In recent years, there has
been a rise in parameter-rich systems.
Number of Parameters of Significant Machine Learning Systems by Domain, 1950–2022

Language Vision Games

1.0e+12
1.0e+10
1.0e+8
1.0e+6
1.0e+4
1.0e+2
1954 1958 1962 1966 1970 1974 1978 1982 1986 1990 1994 1998 2002 2006 2010 2014 2018 2022
Figure 1.2.10

Compute Trends
machine learning systems has increased exponentially
The computational power, or “compute,” of AI
in the last half-decade (Figure 1.2.11).9 The growing
systems refers to the amount of computational
demand for compute in AI carries several important
resources needed to train and run a machine
implications. For example, more compute-intensive
learning system. Typically, the more complex a
models tend to have greater environmental impacts,
system is, and the larger the dataset on which it is
and industrial players tend to have easier access
trained, the greater the amount of compute required.
to computational resources than others, such as
The amount of compute used by significant AI universities.
Training Compute (FLOP/s) of Significant Machine Learning Systems by Sector, 1950–2022

Academia Industry Industry-Academia Collaboration Nonpro t Research Collective

1.0e+24
1.0e+21
Training Compute (FLOP/s – Log Scale)
1.0e+18
1.0e+15
1.0e+12
1.0e+9
1.0e+6
1.0e+3
1.0e+0
1950 1954 1958 1962 1966 1970 1974 1978 1982 1986 1990 1994 1998 2002 2006 2010 2014 2018 2022
Figure 1.2.11
9 FLOP/s stands for “Floating Point Operations per second” and is a measure of the performance of a computational device.

Since 2010, it has increasingly been the case that of all machine learning systems, language models are
demanding the most computational resources.
Training Compute (FLOP/s) of Significant Machine Learning Systems by Domain, 1950–2022

Language Vision Games

1.0e+24
1.0e+21
1.0e+18
1.0e+15
1.0e+12
1.0e+9
1.0e+6
1.0e+3
1954 1958 1962 1966 1970 1974 1978 1982 1986 1990 1994 1998 2002 2006 2010 2014 2018 2022
Figure 1.2.12

Large Language and are starting to be widely deployed in the real world.
Multimodal Models National Affiliation

This year the AI Index conducted an analysis of the
Large language and multimodal models, sometimes national affiliation of the authors responsible for
called foundation models, are an emerging and releasing new large language and multimodal models.10
increasingly popular type of AI model that is trained The majority of these researchers were from American
on huge amounts of data and adaptable to a variety institutions (54.2%) (Figure 1.2.13). In 2022, for the first
of downstream applications. Large language and time, researchers from Canada, Germany, and India
multimodal models like ChatGPT, DALL-E 2, and Make- contributed to the development of large language and
A-Video have demonstrated impressive capabilities and multimodal models.
Authors of Select Large Language and Multimodal Models (% of Total) by Country, 2019–22
Source: Epoch and AI Index, 2022 | Chart: 2023 AI Index Report
100%
Authors of Large Language and Multimodal Models (% of Total)
80%
60%
40%
21.88%, United Kingdom

20% 8.04%, China
6.25%, Canada
5.80%, Israel
3.12%, Germany
0.89%, India
0% 0.00%, Korea
2019 2020 2021 2022

Figure 1.2.13
Figure 1.2.14 offers a timeline view of the large PaLM (540B). The only Chinese large language and
language and multimodal models that have been multimodal model released in 2022 was GLM-130B,
released since GPT-2, along with the national an impressive bilingual (English and Chinese) model
affiliations of the researchers who produced the created by researchers at Tsinghua University. BLOOM,
models. Some of the notable American large also launched in late 2022, was listed as indeterminate
language and multimodal models released in given that it was the result of a collaboration of more
2022 included OpenAI’s DALL-E 2 and Google’s than 1,000 international researchers.
10 The AI models that were considered to be large language and multimodal models were hand-selected by the AI Index steering committee. It is possible that this selection may have omitted
certain models.

Timeline and National Affiliation of Select Large Language and Multimodal Model Releases
2023-Jan
BLOOM
2022-Oct
GLM-130B
2022-Jul Minerva (540B)
Imagen
Jurassic-X OPT-175B
2022-Apr Stable Diffusion (LDM-KL-8-G) PaLM (540B) DALL·E 2 Chinchilla
GPT-NeoX-20B InstructGPT AlphaCode
2022-Jan
Gopher
2021-Oct Megatron-Turing NLG 530B

Jurassic-1-Jumbo
2021-Jul Codex ERNIE 3.0
HyperClova Wu Dao 2.0 CogView
PanGu-alpha GPT-J-6B
2021-Apr GPT-Neo
Wu Dao - Wen Yuan
2021-Jan DALL-E
2020-Oct
ERNIE-GEN (large)
2020-Jul
GPT-3 175B (davinci)
2020-Apr
Turing NLG
Meena
2020-Jan United States Canada
United Kingdom Israel
T5-11B T5-3B China Germany
2019-Oct Megatron-LM (Original, 8.3B) United States, Indeterminate
United Kingdom,
Germany, India
2019-Jul Korea
Grover-Mega
2019-Apr
GPT-2
2019-Jan
Figure 1.2.1411
11 While we were conducting the analysis to produce Figure 1.2.14, Irene Solaiman published a paper that has a similar analysis. We were not aware of the paper at the time of our research.

Parameter Count
Over time, the number of parameters of newly released Google in 2022, had 540 billion, nearly 360 times
large language and multimodal models has massively more than GPT-2. The median number of parameters
increased. For example, GPT-2, which was the first in large language and multimodal models is increasing
large language and multimodal model released in 2019, exponentially over time (Figure 1.2.15).
only had 1.5 billion parameters. PaLM, launched by
Number of Parameters of Select Large Language and Multimodal Models, 2019–22

3.2e+12
Wu Dao 2.0
1.0e+12
Megatron-Turing NLG 530B
Minerva (540B)
Gopher PaLM (540B)
3.2e+11 HyperClova
GPT-3 175B (davinci) OPT-175B BLOOM

Jurassic-1-Jumbo
PanGu-α
1.0e+11 GLM-130B
Chinchilla
3.2e+10 GPT-NeoX-20B
Turing NLG
T5-11B DALL-E Codex
1.0e+10 GPT-J-6B ERNIE 3.0 Jurassic-X
Megatron-LM (Original, 8.3B) DALL·E 2
T5-3B Meena CogView
3.2e+9 Wu Dao - Wen Yuan
GPT-2 GPT-Neo Stable Di usion (LDM-KL-8-G)
1.0e+9 Grover-Mega
ERNIE-GEN (large)
3.2e+8
20
20
20
20
20
20 9-S
20
20 0-J
20
20
20
20
20 -M
20 1-A ar
20 1-M r
20 1-J ay
20 1-J n
20
20
20
2022-F
20 2-M b
20 2-A ar
20 2-M r
19
19
1
19 ep
2
20 an
20
20
21
21
2
2 p
2
2 u
21 ul
21
21
2 e
2
2 p
22 a
22
22
-J
-A
-O
-D
-F
-M
-O
-J y
-A
-N
-F
-M
-A
an
eb
ug
un
ec
eb
ct
ug
ct
ug
ov
ay
ay
Figure 1.2.15

Training Compute
The training compute of large language and multimodal reasoning problems, was roughly nine times greater
models has also steadily increased (Figure 1.2.16). The than that used for OpenAI’s GPT-3, which was
compute used to train Minerva (540B), a large language released in June 2022, and roughly 1839 times greater
and multimodal model released by Google in June than that used for GPT-2 (released February 2019).
2022 that displayed impressive abilities on quantitative
Training Compute (FLOP/s) of Select Large Language and Multimodal Models, 2019–22
PaLM (540B)
3.2e+24 Megatron-Turing NLG 530B Minerva (540B)
Gopher OPT-175B
1.0e+24
GPT-3 175B (davinci) Jurassic-1-Jumbo
Chinchilla BLOOM
3.2e+23
Meena GPT-NeoX-20B
1.0e+23 DALL-E PanGu-α
T5-11B HyperClova Stable Diffusion
3.2e+22 Turing NLG CogView AlphaCode GLM-130B
T5-3B
1.0e+22 GPT-J-6B
Megatron-LM (Original, 8.3B) GPT-Neo
3.2e+21 GPT-2
3.2e+20
1.0e+20
3.2e+19
1.0e+19
ERNIE 3.0
3.2e+18
1.0e+18
20
20
20
20 9-S
20
20 0-J
20
20
20
20 -M
20 1-A ar
20
20 1-J
20
20
20
2022-F
20 2-M b
20 2-A ar
20 -M
20
19
1
19 ep
22
2
20 an
20
21
21
2
21 pr
2
21 ul
21
21
2 e
2
22 pr
22 a
22
-J
-M
-A
-O
-D
-F
-O
-J y
-A
-N
-F
-M
an
eb
ug
un
ec
eb
ct
ug
ct
ov
ay
ay
Figure 1.2.16

Training Cost
A particular theme of the discourse around large estimate with the tag of mid, high, or low: mid where
language and multimodal models has to do with their the estimate is thought to be a mid-level estimate,
hypothesized costs. Although AI companies rarely speak high where it is thought to be an overestimate, and
openly about training costs, it is widely speculated that low where it is thought to be an underestimate. In
these models cost millions of dollars to train and will certain cases, there was not enough data to estimate
become increasingly expensive with scale. the training cost of particular large language and
multimodal models, therefore these models were
This subsection presents novel analysis in which the
omitted from our analysis.
AI Index research team generated estimates for the
training costs of various large language and multimodal The AI Index estimates validate popular claims that
models (Figure 1.2.17). These estimates are based on the large language and multimodal models are increasingly
hardware and training time disclosed by the models’ costing millions of dollars to train. For example,
authors. In cases where training time was not disclosed, Chinchilla, a large language model launched by
we calculated from hardware speed, training compute, DeepMind in May 2022, is estimated to have cost $2.1
and hardware utilization efficiency. Given the possible million, while BLOOM’s training is thought to have cost
variability of the estimates, we have qualified each $2.3 million.
Estimated Training Cost of Select Large Language and Multimodal Models

Mid High Low

12 11.35
(in Millions of U.S. Dollars)
10
8.55
8.01
Training Cost
4
1.97 2.11 2.29
1.47 1.80 1.69
2 1.03
0.23 0.02 0.09 0.43 0.27 0.24 0.60
0.05 0.11 0.01 0.14 0.09 0.16
0
GPT-2
T5-11B
Meena
Turing NLG
GPT-3 175B
DALL-E
Wu Dao - Wen Yuan
GPT-Neo
GPT-J-6B
HyperClova
ERNIE 3.0
Codex
Megatron-Turing NLG 530B
Gopher
AlphaCode
GPT-NeoX-20B
Chinchilla
PaLM (540B)
Stable Di usion (LDM-KL-8-G)
OPT-175B
Minerva (540B)
GLM-130B
BLOOM
2019 2020 2021 2022
Figure 1.2.17
12 See Appendix for the complete methodology behind the cost estimates.

There is also a clear relationship between the cost of large language and multimodal models and their size.
As evidenced in Figures 1.2.18 and 1.2.19, the large language and multimodal models with a greater number of
parameters and that train using larger amounts of compute tend to be more expensive.
Estimated Training Cost of Select Large Language Estimated Training Cost of Select Large Language and
and Multimodal Models and Number of Parameters Multimodal Models and Training Compute (FLOP/s)
Source: AI Index, 2022 | Chart: 2023 AI Index Report Source: AI Index, 2022 | Chart: 2023 AI Index Report
Minerva (540B) PaLM (540B) Minerva (540B)

PaLM (540B)
5.0e+11 Megatron-Turing NLG 530B Megatron-Turing NLG 530B
OPT-175B
Gopher 1.0e+24
Chinchilla Gopher
HyperClova

OPT-175B
2.0e+11
GLM-130B BLOOM GPT-NeoX-20B Meena BLOOM

GPT-3 175B
GLM-130B Stable Diffusion
1.0e+11 Chinchilla AlphaCode T5-11B
DALL-E
Turing NLG GPT-J-6B
5.0e+10 AlphaCode 1.0e+22 GPT-Neo
GPT-NeoX-20B GPT-2
Turing NLG
Codex T5-11B
ERNIE 3.0 DALL-E
1.0e+10
GPT-J-6B
1.0e+20
5.0e+9
Wu Dao - Wen Yuan Meena
GPT-Neo
2.0e+9 GPT-2 Stable Di usion ERNIE 3.0
1.0e+9 1.0e+18
10k 100k 1M 10M 10k 100k 1M 10M
Training Cost (in U.S. Dollars - Log Scale) Training Cost (in U.S. Dollars - Log Scale)

Index Report 2023 1.3 AI Conferences
AI conferences are key venues for researchers to share their work and connect with peers and collaborators. Conference attendance is an
indication of broader industrial and academic interest in a scientific field. In the past 20 years, AI conferences have grown in size, number,
and prestige. This section presents data on the trends in attendance at major AI conferences.
1.3 AI Conferences
Conference Attendance International Conference on Principles of Knowledge
Representation and Reasoning (KR) were both held
After a period of increasing attendance, the total strictly in-person.
attendance at the conferences for which the AI
Index collected data dipped in 2021 and again in Neural Information Processing Systems (NeurIPS)
2022 (Figure 1.3.1).13 This decline may be attributed continued to be one of the most attended
to the fact that many conferences returned to hybrid conferences, with around 15,530 attendees (Figure
or in-person formats after being fully virtual in 1.3.2).14 The conference with the greatest one-
2020 and 2021. For example, the International Joint year increase in attendance was the International
Conference on Artificial Intelligence (IJCAI) and the Conference on Robotics and Automation (ICRA),
from 1,000 in 2021 to 8,008 in 2022.
Number of Attendees at Select AI Conferences, 2010–22

90
80
70
Number of Attendees (in Thousands)
60 59.45
50
40
30
20
10
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 1.3.1
13 This data should be interpreted with caution given that many conferences in the last few years have had virtual or hybrid formats. Conference organizers report that
measuring the exact attendance numbers at virtual conferences is difficult, as virtual conferences allow for higher attendance of researchers from around the world.
14 In 2021, 9,560 of the attendees attended NeurIPS in-person and 5,970 remotely.

Attendance at Large Conferences, 2010–22

30
25
20
15.53, NeurIPS
15
10 10.17, CVPR
8.01, ICRA
7.73, ICML
5.35, ICLR
5 4.32, IROS
3.56, AAAI
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 1.3.2
Attendance at Small Conferences, 2010–22

3.50
3.00
2.50
2.00 2.01, IJCAI
1.50
1.09, FaccT
1.00
0.66, UAI
0.50 0.50, AAMAS
0.39, ICAPS
0.12, KR
0.00
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 1.3.3

Index Report 2023 1.4 Open-Source AI Software
GitHub is a web-based platform where individuals and coding teams can host, review, and collaborate on various code repositories.
GitHub is used extensively by software developers to manage and share code, collaborate on various projects, and support open-source
software. This subsection uses data provided by GitHub and the OECD.AI policy observatory. These trends can serve as a proxy for some
of the broader trends occuring in the world of open-source AI software not captured by academic publication data.
1.4 Open-Source AI Software

Projects
A GitHub project is a collection of files that software project. Since 2011, the total number of
can include the source code, documentation, AI-related GitHub projects has steadily increased,
configuration files, and images that constitute a growing from 1,536 in 2011 to 347,934 in 2022.
Number of GitHub AI Projects, 2011–22

Source: GitHub, 2022; OECD.AI, 2022 | Chart: 2023 AI Index Report
350 348
300
Number of AI Projects (in Thousands)
250
200
150
100
50
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 1.4.1

As of 2022, a large proportion of GitHub AI projects United Kingdom (17.3%), and then the United States
were contributed by software developers in India (14.0%). The share of American GitHub AI projects
(24.2%) (Figure 1.4.2). The next most represented has been declining steadily since 2016.
geographic area was the European Union and the
GitHub AI Projects (% Total) by Geographic Area, 2011–22


40%
35%
30%
AI Projects (% of Total)
25%
24.19%, India
20%
15%
10%
5%
2.40%, China
0%
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 1.4.2

Stars
GitHub users can bookmark or save a repository Figure 1.4.3 shows the cumulative number of stars
of interest by “starring” it. A GitHub star is similar attributed to projects belonging to owners of various
to a “like” on a social media platform and indicates geographic areas. As of 2022, GitHub AI projects
support for a particular open-source project. Some of from the United States received the most stars,
the most starred GitHub repositories include libraries followed by the European Union and the United
like TensorFlow, OpenCV, Keras, and PyTorch, which Kingdom, and then China. In many geographic areas,
are widely used by software developers in the AI the total number of new GitHub stars has leveled off
coding community. in the last few years.
Number of GitHub Stars by Geographic Area, 2011–22

3.50 3.44, United States
3.00
Number of Cumulative GitHub Stars (in Millions)
2.69, Rest of the World

2.50
2.34, European Union and United Kingdom
2.00
1.50 1.53, China
1.00
0.50 0.46, India
0.00
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 1.4.3

Index Report 2023
CHAPTER 2:
Technical
Performance
Index Report 2023
CHAPTER 2 PREVIEW:
Overview 72 Narrative Highlight: A Closer Look at
Chapter Highlights 73 Progress in Image Generation 90
Visual Reasoning 92
2.1 What’s New in 2022: A Timeline 74 Visual Question Answering (VQA)

Challenge 92
Narrative Highlight: The Rise of Capable
2.2 Computer Vision—Image 81
Multimodal Reasoning Systems 93
Image Classification 81
Visual Commonsense Reasoning (VCR) 95
ImageNet 81
Face Detection and Recognition 82
2.3 Computer Vision—Video 96
National Institute of Standards and
Activity Recognition 96
Technology Face Recognition
Vendor Test (FRVT) 83 Kinetics-400, Kinetics-600, Kinetics-700 96
Deepfake Detection 84 Narrative Highlight: A Closer Look

at the Progress of Video Generation 98
Celeb-DF 84
Human Pose Estimation 85
2.4 Language 99
MPII 85
English Language Understanding 99
Semantic Segmentation 86
SuperGLUE 99
Cityscapes Challenge, Pixel-Level
Semantic Labeling Task 86 Reading Comprehension Dataset
Requiring Logical Reasoning (ReClor) 100
Medical Image Segmentation 87
Narrative Highlight: Just How Much
Kvasir-SEG 87
Better Have Language Models Become? 102
Object Detection 88
Narrative Highlight: Planning and
Common Objects in Context (COCO) 88 Reasoning in Large Language Models 103
Image Generation 89 Text Summarization 104
CIFAR-10 and STL-10 89 arXiv and PubMed 104

Index Report 2023
CHAPTER 2 PREVIEW (CONT’D):
Natural Language Inference 105 2.7 Hardware 115
Abductive Natural Language MLPerf Training Time 115
Inference (aNLI) 105 MLPerf Inference 117
Sentiment Analysis 106 Trends in GPUs 118
SST-5 Fine-Grained Classification 106
Multitask Language Understanding 107 2.8 Environment 120
Massive Multitask Language Environmental Impact of
Understanding (MMLU) 107 Select Large Language Models 120
Machine Translation (MT) 108 Narrative Highlight: Using AI to
Number of Commercially Available Optimize Energy Usage 122
MT Systems 108
2.9 AI for Science 123

2.5 Speech 109 Accelerating Fusion Science Through
Speech Recognition 109 Learned Plasma Control 123
VoxCeleb 109 Discovering Novel Algorithms for
Narrative Highlight: Matrix Manipulation With AlphaTensor 123
Whisper 110 Designing Arithmetic Circuits With
Deep Reinforcement Learning 124
2.6 Reinforcement Learning 112 Unlocking de Novo Antibody Design

With Generative AI 124
Reinforcement Learning Environments 112
Procgen 112
Benchmark Saturation 114

Artificial Intelligence Chapter 2: Technical Performance
Index Report 2023
Overview
This year’s technical performance chapter features analysis of the technical progress in
AI during 2022. Building on previous reports, this chapter chronicles advancement in
computer vision, language, speech, reinforcement learning, and hardware. Moreover,
this year this chapter features an analysis on the environmental impact of AI, a discussion
of the ways in which AI has furthered scientific progress, and a timeline-style overview
of some of the most significant recent AI developments.

Index Report 2023
Chapter Highlights
Performance saturation on Generative AI breaks into
traditional benchmarks. the public consciousness.
AI continued to post state-of-the-art results, 2022 saw the release of text-to-image models
but year-over-year improvement on many like DALL-E 2 and Stable Diffusion, text-to-
benchmarks continues to be marginal. video systems like Make-A-Video, and chatbots
Moreover, the speed at which benchmark like ChatGPT. Still, these systems can be
saturation is being reached is increasing. prone to hallucination, confidently outputting
However, new, more comprehensive incoherent or untrue responses, making it hard
benchmarking suites such as BIG-bench and to rely on them for critical applications.
HELM are being released.
Capable language models

still struggle with reasoning.
AI systems become Language models continued to improve
more flexible. their generative capabilities, but new
Traditionally AI systems have performed well research suggests that they still struggle
on narrow tasks but have struggled across with complex planning tasks.
broader tasks. Recently released models
challenge that trend; BEiT-3, PaLI, and
Gato, among others, are single AI systems
The world’s best new scientist
increasingly capable of navigating multiple
tasks (for example, vision, language).
… AI? AI models are starting to rapidly
accelerate scientific progress and in 2022
were used to aid hydrogen fusion, improve
the efficiency of matrix manipulation, and
generate new antibodies.
AI is both helping and harming
the environment.
New research suggests that AI systems can
have serious environmental impacts. According AI starts to build better AI.
to Luccioni et al., 2022, BLOOM’s training run Nvidia used an AI reinforcement learning agent
emitted 25 times more carbon than a single air to improve the design of the chips that power AI
traveler on a one-way trip from New York to systems. Similarly, Google recently used one of
San Francisco. Still, new reinforcement learning its language models, PaLM, to suggest ways to
models like BCOOLER show that AI systems improve the very same model. Self-improving AI
can be used to optimize energy usage. learning will accelerate AI progress.

Index Report 2023 2.1 What’s New in 2022: A Timeline
The technical performance chapter begins with an overview of some of the most significant technical developments in AI during 2022,
as selected by the AI Index Steering Committee.
2.1 What’s New in 2022: A Timeline

DeepMind Releases AlphaCode
Feb. 2, 2022
AlphaCode, an AI system that writes computer programs
at a competitive level, achieves a rank within the top 54%
of participants in a human programming competition. This
represents an improvement on the more complex problem-
solving tasks with which AI has traditionally struggled.
Figure 2.1.1
DeepMind Trains Reinforcement Learning Agent to

Feb. 16, 2022
Control Nuclear Fusion Plasma in a Tokamak
Nuclear fusion is a potential source of clean, limitless
energy, but producing such energy in tokamaks is difficult
due to a lack of experimental data. DeepMind simulated
optimal tokamak management, an example of how AI can
accelerate science and combat climate change.
Figure 2.1.2
IndicNLG Benchmarks Natural Language Generation for Indic Languages

March 10, 2022
An international research collective launches IndicNLG, a collection of datasets for benchmarking
natural language generation for 11 Indic languages. The creation of IndicNLG increases the
potential for AI systems to generate language in more diverse, non-English linguistic settings.
Figure 2.1.3

Meta AI Releases Make-A-Scene

March 24, 2022
Make-A-Scene is a text-to-image AI model that
enables users to generate images through text.
Make-A-Scene is one of many text-to-image
models released in 2022.
Figure 2.1.4
Google Releases PaLM

April 5, 2022
Google’s AI team trains one of the world’s
largest language models, PaLM. Made up
of 540 billion parameters, PaLM reinforces
the belief that researchers can improve
performance on large language models by
simply training them on more data. Figure 2.1.5
OpenAI Releases DALL-E 2

April 13, 2022
DALL-E 2, a text-to-image AI system that can create
realistic art and images from textual descriptions, is
released to the public, igniting a generative AI craze.
Figure 2.1.6
DeepMind Launches Gato

May 12, 2022
Gato is a new reinforcement learning agent
capable of doing a wide range of tasks such
as robotic manipulation, game playing, image
captioning, and natural language generation.
The release of such models suggests that AI
systems are becoming better at generalization.
Figure 2.1.7

Google Releases Imagen

May 23, 2022
Imagen is a text-to-image diffusion model capable
of producing images with a high degree of
photorealism. Imagen’s launch also comes with
the release of DrawBench, a challenging new
benchmark for text-to-image systems.
Figure 2.1.8
442 Authors Across 132 Institutions Team Up to Launch BIG-bench

June 9, 2022
In order to better challenge increasingly capable large language models, a team of 442 authors
across 132 institutions launch the Beyond the Imitation Game benchmark (BIG-bench). The
benchmark consists of 204 tasks ranging from linguistics, childhood development, math,
common-sense reasoning, biology, physics, social bias, and software development.
Figure 2.1.9
GitHub Makes Copilot Available as

June 21, 2022 a Subscription-Based Service for
Individual Developers
Copilot is a generative AI system capable
of turning natural language prompts
into coding suggestions across multiple
languages. Similar systems include OpenAI’s
Codex and Salesforce’s CodeGen. Surveys
suggest that Copilot makes coders more
productive and less frustrated.
Figure 2.1.10

Nvidia Uses Reinforcement Learning to

July 8, 2022
Design Better-Performing GPUs
Nvidia uses its AI systems to improve the
performance of its latest H100 class of GPU chips.
GPUs being essential to AI training, this is one
example of how AI is starting to develop better AI.
Figure 2.1.11
Meta Announces
July 11, 2022
‘No Language Left Behind’
No Language Left Behind (NLLB) is
a family of models that can translate
across 200 distinct languages. NLLB is
one of the first systems that can perform
well across a wide range of low-resource
languages like Kamba and Lao.
Figure 2.1.12
Tsinghua Researchers Launch GLM-130B

Aug 4, 2022
Chinese researchers affiliated with Tsinghua
University release GLM-130B, a large language
model that outperforms others such as Meta’s
OPT, Hugging Face’s BLOOM, and OpenAI’s
original GPT-3. Figure 2.1.13
Stability AI Releases Stable Diffusion

Aug 22, 2022
Stable Diffusion is an open-source text-to-image
diffusion-based model, meaning users can freely use
the model weights to generate their own images. Stable
Diffusion is trained on existing images created by humans
and gives no credit or acknowledgment, leaving open
questions around the ethical use of image generators.
Figure 2.1.14

OpenAI Launches Whisper

Sept 21, 2022
Whisper is a large-scale speech-recognition system
trained on roughly 700,000 hours of audio data and
capable of respectable performance on various speech
recognition tasks. The fact that Whisper required neither
supervised pre-training nor unsupervised training with
fine-tuning yet was able to achieve strong performance
by merely increasing training data further validates the
approach of increasingly scaling AI models.
Figure 2.1.15
Meta Releases Make-A-Video

Sept 29, 2022
Make-A-Video is a system that allows users
to create videos from short text descriptions.
The quality of the videos is high and again
demonstrates the validity of the scaling
approach.
Figure 2.1.16
DeepMind Launches AlphaTensor

Oct 5, 2022
AlphaTensor is an AI reinforcement-learning-
based system able to discover new and
efficient algorithms for matrix manipulation.
Matrix manipulation is essential to a wide
range of digital practices and is a process
that researchers have been trying to make
more efficient for decades.
Figure 2.1.17

Google Uses PaLM to Improve

Oct 20, 2022
the Reasoning of PaLM
Google researchers use one of
their existing language models,
PaLM, to improve the reasoning
of the very same model. This
process is yet another example
of AI systems using their own Figure 2.1.18
knowledge to improve.
International Research
Nov 9, 2022
Group Releases BLOOM
A collaboration of over 100
researchers from across the
globe develop an open-access
language model called BLOOM.
BLOOM impresses with its
Figure 2.1.19
public release and for furthering
the possibilities of international
collaboration in AI research.
Stanford Researchers Release HELM

Nov 16, 2022
As part of an attempt to judge new language models according to more unified standards, Stanford
researchers develop a new benchmarking approach for large language models called Holistic Evaluation
of Language Models (HELM). The launch of HELM is evidence of the AI community’s attempt to develop
transparency around increasingly powerful, capable, and influential large language models.
Figure 2.1.20

Meta Releases CICERO

Nov 22, 2022
CICERO is the first AI to play in
the top 10% of human participants
in the game Diplomacy. CICERO’s
launch shows that AI systems have
improved in strategic reasoning, a
domain in which they have traditionally
struggled, and are capable of
effectively convincing humans to go
Figure 2.1.21
along with their objectives.
OpenAI Launches ChatGPT

Nov 30, 2022
ChatGPT is an impressive,
publicly usable chatbot capable
of writing university-level
essays. Months after launching,
ChatGPT reaches 100 million
monthly active users, making it
the fastest-growing consumer
application in history. ChatGPT’s
release caps a year in which
generative AI became a part
of the zeitgeist, and raises Figure 2.1.22
questions about the effect that
AI will have on the future of
humanity.

Index Report 2023 2.2 Computer Vision—Image
Computer vision is the subfield of AI that teaches machines to understand images and videos. Computer vision technologies have a
variety of important real-world applications, such as autonomous driving, crowd surveillance, sports analytics, and video-game creation.
This section tracks progress in computer vision across several different task domains which include: (1) image classification, (2)
face detection and recognition, (3) deepfake detection, (4) human pose estimation, (5) semantic segmentation, (6) medical image
segmentation, (7) object detection, (8) image generation, and (9) visual reasoning.
2.2 Computer Vision—Image

Image Classification A Demonstration of Image Classification
Source: Krizhevsky et al., 2012
Image classification is the ability of machines to
categorize objects in images (Figure 2.2.1).
ImageNet
ImageNet is one of the most widely used
benchmarks for image classification. This dataset
includes over 14 million images across 20,000
different object categories such as “strawberry” or
“balloon.” Performance on ImageNet is measured
through various accuracy metrics. Top-1 accuracy
measures the degree to which the top prediction
generated by an image classification model for a
given image actually matches the image’s label.
As of 2022, the best image classification system on

Figure 2.2.1
ImageNet has a top-1 accuracy rate of 91.0% (Figure
2.2.2). Although the current image classification
capabilities of state-of-the-art systems is 27.7
percentage points better than a decade ago, last
year saw a very marginal 0.1 percentage point
improvement in classification accuracy.

ImageNet Challenge: Top-1 Accuracy

Source: Papers With Code, 2022; arXiv, 2022 | Chart: 2023 AI Index Report
91.00%, With Extra Training Data

90%
88.50%, Without Extra Training Data
85%
Top-1 Accuracy (%)
80%
75%
70%
65%
2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 2.2.2
Face Detection and A Demonstration of Face Detection and Recognition

Source: Forbes, 2020
Recognition
Facial detection and recognition is the ability of AI
systems to identify faces or individuals in images
or videos (Figure 2.2.3). Currently, many facial
recognition systems are able to successfully identify
close to 100% of faces, even on challenging datasets
(Figure 2.2.4).
Figure 2.2.3

National Institute of Standards and Technology (NIST) Face Recognition Vendor Test (FRVT):
Veri cation Accuracy by Dataset
Source: National Institute of Standards and Technology, 2022 | Chart: 2023 AI Index Report
1.0000
0.5000
0.2000
False Non-Match Rate: FMNR (Log-Scale)
0.1000
0.0500
0.0297, WILD Photos @ FMR = 1e-5

0.0200
0.0100
0.0050
0.0032, BORDER Photos @ FMR = 1e-6
0.0021, MUGSHOT Photos @ FMR = 1e-5
0.0020 0.0019, MUGSHOT Photos ≥ 12 YRS @ FMR = 1e-5
0.0016, VISABORDER Photos @ FMR = 1e-6
0.0010
0.0006, VISA Photos @ FMR = 1e-6

0.0005
2017 2018 2019 2020 2021 2022
Figure 2.2.4
National Institute of Standards and others. Facial detection capacity is measured by the
Technology Face Recognition Vendor Test false non-match rate (FNMR), otherwise known as
(FRVT) error rate, which is the rate at which a model fails to
Progress on facial recognition can be tracked match the face in an image to that of a person.
through the National Institute of Standards and
As of 2022, the top-performing models on all of the
Technology’s Face Recognition Vendor Test. This
FRVT datasets, with the exception of WILD Photos,
test tracks how well different facial recognition
each posted an error rate below 1%, and as low as a
algorithms perform on various homeland security
0.06% error rate on the VISA Photos dataset.
tasks, such as identification of child trafficking
victims and cross-verification of visa images, among

Deepfake Detection Real-Life Deepfake: President Zelenskyy Calling

for the Surrender of Ukrainian Soldiers
Source: NPR, 2022
The ability of AI systems to create synthetic images
that are sometimes indistinguishable from real ones
has led to the creation of deepfakes, images or
videos that appear to be real but are actually fake. In
the last year, there was a widely circulated deepfake
video of Ukrainian president Volodymyr Zelenskyy
surrendering (Figure 2.2.5).
Celeb-DF
Celeb-DF is presently one of the most challenging
Figure 2.2.5
deepfake detection benchmarks. This dataset
is composed of 590 original celebrity YouTube algorithm on Celeb-DF came from researchers at
videos that have been manipulated into thousands Deakin University in Australia. Their JDFD model
of deepfakes. This year’s top deepfake detection posted an AUC score of 78 (Figure 2.2.6).
Celeb-DF: Area Under Curve Score (AUC)

Source: arXiv, 2022 | Chart: 2023 AI Index Report
78.00
75
Area Under Curve Score (AUC)
70
65
2018 2019 2020 2021 2022
Figure 2.2.6

Human Pose Estimation A Demonstration of Human Pose Estimation

Source: Cong et al., 2022
Human pose estimation is the task of

estimating the position of the human body
from images (Figure 2.2.7).
MPII
MPII is a dataset of over 25,000 annotated
images which contains annotations of
more than 40,000 people doing 410 human
activities. On MPII, this year’s top model,
Figure 2.2.7
ViTPose, correctly estimated 94.3% of
keypoints (human joints), which represented
a small 0.2 percentage point increase from
the previous state-of-the-art result posted in
2020 (Figure 2.2.8).
MPII: Percentage of Correct Keypoints (PCK)

95%
94.30%
Percentage of Correct Keypoints (PCK)
90%
85%
2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 2.2.8

Semantic Segmentation A Demonstration of Semantic Segmentation

Source: Cityscapes Dataset, 2022
Semantic segmentation involves assigning individual

image pixels to specific categories (for example,
human, bicycle, or street) (Figure 2.2.9).
Cityscapes Challenge,
Pixel-Level Semantic Labeling Task
The Cityscapes dataset is used to test the semantic
segmentation capabilities of AI. This dataset
contains 25,000 annotated images of diverse urban
Figure 2.2.9
environments. The Cityscapes dataset enables a
variety of different segmentation tasks. One of the
greater the mIoU, the better a system has performed.
most popular is the pixel-level task. Performance
on semantic segmentation is measured by mean Performance on Cityscapes has increased by 23.4
intersection-over-union (mIoU), which represents the percentage points since the competition launched in
degree to which the image segments predicted by the 2014; however, it has plateaued in the last few years
model overlap with the image’s actual segments. The (Figure 2.2.10).
Cityscapes Challenge, Pixel-Level Semantic Labeling Task: Mean Intersection-Over-Union (mIoU)

Source: Cityscapes Challenge, 2022 | Chart: 2023 AI Index Report
86.46%, With Extra Training Data

85%
84.30%, Without Extra Training Data
Mean Intersection-Over-Union (mIoU)
80%
75%
70%
65%
2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 2.2.10

Medical Image A Demonstration of Medical Imaging Segmentation

Source: Jha et al., 2019
Segmentation
In medical image segmentation, AI systems
segment objects such as lesions or organs in
medical images (Figure 2.2.11).
Kvasir-SEG
Kvasir-SEG is a dataset for medical image
Figure 2.2.11
segmentation that contains 1,000 high-
quality images of gastrointestinal polyps
that were manually identified by medical
professionals. Progress on Kvasir-SEG is
measured in mean Dice, which represents
the degree to which the polyp segments This year’s top-performing model on Kvasir-SEG, SEP, was
identified by AI systems overlap with the created by a Chinese researcher and posted a mean Dice of
actual polyp segments.1 94.1% (Figure 2.2.12).
Kvasir-SEG: Mean Dice

94.11%
90%
Mean Dice
85%
2019 2020 2021 2022
Figure 2.2.12
1 Mean Dice and mIoU are in principle quite similar. This StackExchange post outlines the differences in more detail.

Object Detection A Demonstration of Object Detection

Source: Rizzoli, 2023
The challenge of identifying and localizing objects

within an image or video is known as object
detection (Figure 2.2.13).
Common Objects in Context (COCO)

Microsoft’s Common Objects in Context (COCO)
object detection dataset has over 80 object
categories in 328,000 images. Several accuracy
metrics are used to measure progress on COCO.
This section considers mean average precision
(mAP50).
Since 2015, state-of-the-art detectors have

improved by 26 percentage points. The top model
Figure 2.2.13
in 2022, EVA, was the result of a Chinese academic
research collaboration.
COCO: Mean Average Precision (mAP50)

81.90%
80%
Mean Average Precision (mAP50)
70%
60%
2015 2016 2017 2018 2019 2020 2021 2022
Figure 2.2.14

Which Face Is Real?

Image Generation Source: Which Face Is Real?, 2022
Image generation is the task of generating images that

are indistinguishable from real ones. In the last decade,
progress on image generation has tremendously
increased, so much so that now it would be difficult
for the average person to distinguish a real human face
from one synthetically generated by AI (Figure 2.2.15).
CIFAR-10 and STL-10 Figure 2.2.15

CIFAR-10 and STL-10 are two popular benchmarks
set of images is similar to the real images on which it
for tracking progress on image generation. CIFAR-10
was trained.
comprises 60,000 color images across 10 different
object classes; STL-10 is inspired by CIFAR-10, with This year saw state-of-the-art results on both CIFAR-10
some modifications, including fewer labeled training and STL-10 benchmarks (Figure 2.2.15). The top
examples and more unlabeled examples. Progress on model on CIFAR-10, EDM-G++, came from Korean
image generation in both benchmarks is measured researchers at KAIST. The top model on STL-10 was
by the Fréchet Inception Distance (FID) score, which Diffusion-GAN, a collaboration between researchers at
reflects the degree to which a synthetically generated the University of Texas at Austin and Microsoft.
CIFAR-10 and STL-10: Fréchet Inception Distance (FID) Score

35
30
Fréchet Inception Distance (FID) Score
25
20
15
10
6.91, STL-10
5
1.77, CIFAR-10
0
2017 2018 2019 2020 2021 2022
Figure 2.2.16

Index Report 2023 2.1 Computer Vision–Image
A Closer Look at Progress in Image Generation
Figure 2.2.17 tracks the progress of facial GAN Progress on Face Generation
Source: Goodfellow et al., 2014; Radford et al., 2016; Liu and Tuzel, 2016;
image generation over time, with the final Karras et al., 2018; Karras et al., 2019; Goodfellow, 2019; Karras et al., 2020;
image being generated by Diffusion-GAN, Vahdat et al., 2021; Wang et al., 2022.
the model that posted the 2022 state-of-

the-art score on STL-10.
2014 2015
2016 2017
2018
2020
2021
2022
Figure 2.2.17
In the last year, text-to-image Images Generated by DALL-E 2, Stable Diffusion and Midjourney
Source: AI Index, 2022
generation broke into the public
consciousness with the release
of models such as OpenAI’s
DALL-E 2, Stability AI’s Stable
Diffusion, Midjourney’s
Midjourney, Meta’s Make-A- a. DALL-E 2
Scene, and Google’s Imagen.
With these systems, users can
generate images based on
a text prompt. Figure 2.2.18
juxtaposes the images generated
by DALL-E 2, Stable Diffusion,
and Midjourney, three publicly
accessible AI text-to-image
systems, for the same prompt: “a
panda playing a piano on a warm
evening in Paris.”
b. Stable Diffusion c. Midjourney Figure 2.2.18

A Closer Look at Progress in Image Generation (cont’d)
Of all the recently released text-to-image generators, Google’s Imagen performs best on the
COCO benchmark (Figure 2.2.19)2. This year, the Google researchers who created Imagen
also released a more difficult text-to-image benchmark, DrawBench, designed to challenge
increasingly capable text-to-image models.
Notable Text-to-Image Models on MS-COCO 256 × 256 FID-30K: Fréchet Inception Distance (FID) Score
Source: Saharia et al., 2022 | Chart: 2023 AI Index Report
Trained on COCO-FID
35.49
COCO Fréchet Inception Distance (FID) Score
Not Trained on COCO-FID

32.64
30
21.42 20.79
20 17.89
12.24
10.39
10 9.33
8.12 7.55 7.27
0
AttnGAN
DM-GAN
DF-GAN
DM-GAN + CL
DALL-E
GLIDE
XMC-GAN
LAFITE
DALL-E 2
Make-A-Scene
Imagen
2017 2019 2020 2021 2022

Model
Figure 2.2.19
2 The COCO benchmark, first launched in 2014, includes 328,000 images with 2.5 million labeled instances. Although it is typically used for object detection tasks, researchers
have also deployed it for image generation.

Visual Reasoning
Visual reasoning tests how well AI systems can reason across both textual and visual data,
as in the examples of Figure 2.2.20.
A Collection of
Visual Reasoning
Tasks
Source: Agrawal et al., 2016
Figure 2.2.20
Visual Question Answering (VQA) Challenge

The Visual Question Answering Challenge tests AI reports progress on the VQA V2 dataset.
systems with open-ended textual questions about
This year the top-performing model on VQA V2
images. Successfully answering the questions
was PaLI, a multimodal model produced by Google
requires that AI systems possess vision, language, and
researchers (Figure 2.2.21).
commonsense reasoning capabilities. This section
Visual Question Answering (VQA) V2 Test-Dev: Accuracy

85%
84.30%
80.78%, Human Baseline
80%
Accuracy (%)
75%
70%
65%
2016 2017 2018 2019 2020 2021 2022
Figure 2.2.21

The Rise of Capable Multimodal Reasoning Systems
Traditionally AI has been strong in narrow tasks, models were introduced, for example BEiT-3 from
but it has been unable to easily generalize across Microsoft and PaLI from Google, that posted state-
multiple domains. For instance, many image of-the-art results across a variety of both vision and
classifiers are adept at classifying images but are language benchmarks. For example, at the time of
incapable of understanding written text. publication of the BEiT-3 paper, BEiT-3 posted state-
of-the-art results for four different vision skills and
However, recent technical progress in AI has
five different vision-language skills (Figure 2.2.22).
begun to challenge this notion. In 2022, several
BEiT-3Vs.
BEiT-3
BEiT-3 Vs. Previous State-of-the-Art
State-of-the-Art Models
Models
Source: Wang et Previous
al., 2022 | Table: 2023 AI Index Report
Source: Wang
Source: Wang et
et al.,
al., 2022
2022 || Table:
Table: 2023
2023 AI Index Report
Category Task Dataset Metric Previous SOTA Model of BEiT-3 Scale of
Category
Category Task
Task Dataset Metric Previous SOTA Model of BEiT-3 Scale
Scale of
of
Previous SOTA Improvement
Previous SOTA Improvement
Improvement
Vision Semantic ADE20K mIoU 61.40 FD-SwimV2 62.80 2.28%
Vision
Vision Semantic
Semantic ADE20K mIoU 61.40 FD-SwimV2 62.80
62.80 2.28%
2.28%
Segmentation
Segmentation
Segmentation
Vision Object COCO AP 63.30 DINO 63.70 0.63%
Vision
Vision Object
Object COCO AP 63.30 DINO 63.70
63.70 0.63%
0.63%
Detection
Detection
Detection
Vision Instance COCO AP 54.70 Mask DINO 54.80 0.18%
Vision
Vision Instance
Instance COCO AP 54.70 Mask DINO 54.80
54.80 0.18%
0.18%
Segmentation
Segmentation
Segmentation
Vision Image ImageNet Top-1 Accuracy 89.00 FD-CLIP 89.60 0.67%
Vision
Vision Image
Image ImageNet Top-1 Accuracy 89.00 FD-CLIP 89.60
89.60 0.67%
0.67%
Classi cation
Classi cation
Classi
Vision-Language Visual NLVR Accuracy 87.00 CoCA 92.60 6.44%
Vision-Language Visual
Vision-Language Visual NLVR Accuracy 87.00 CoCA 92.60
92.60 6.44%
6.44%
Reasoning
Reasoning
Reasoning
Vision-Language Visual QA VQAv2 VQA Accuracy 82.30 CoCA 84.00 2.07%
Vision-Language Visual
Vision-Language Visual QA VQAv2 VQA Accuracy 82.30 CoCA 84.00
84.00 2.07%
2.07%
Vision-Language Image COCO CIDEr 145.30 OFA 147.60 1.58%
Vision-Language Image
Vision-Language Image COCO CIDEr 145.30 OFA 147.60
147.60 1.58%
1.58%
Captioning
Captioning
Captioning
Vision-Language Finetuned COCO R@1 72.50 Florence 76.00 4.83%
Vision-Language Finetuned
Vision-Language Finetuned COCO R@1 72.50 Florence 76.00
76.00 4.83%
4.83%
Retrieval Flickr30K
Retrieval
Retrieval Flickr30K
Vision-Language Zero-Shot Flickr30K R@1 86.50 CoCA 88.20 1.97%
Vision-Language Zero-Shot
Vision-Language Zero-Shot Flickr30K
Flickr30K R@1
R@1 86.50
86.50 CoCA
CoCA 88.20
88.20 1.97%
1.97%
Retrieval
Retrieval
Retrieval
Figure 2.2.22

The Rise of Capable Multimodal Reasoning Systems (cont’d)
Figure 2.2.23 shows some of the different vision-language tasks challenging multimodal systems like
PaLI and BEiT-3.
A Collection of Vision-Language Tasks

Source: Chen et al., 2022
Figure 2.2.23

Visual Commonsense Reasoning (VCR) example of a question posed in VCR. Performance on

The Visual Commonsense Reasoning challenge, first VCR is tracked in the Q->AR score, which combines
launched in 2019, is a relatively new benchmark in the ability of machines to select the right answer
which AI systems must answer questions presented for the question (Q->A) and the ability to select the
from images, as in VQA, but also select the reasoning correct rationale behind the answer (Q->R).
behind their answer choices. Figure 2.2.24 shows an
A Sample Question from the Visual Commonsense Reasoning (VCR) Challenge

Source: Zellers et al., 2018
Figure 2.2.24
VCR is one of the few visual benchmarks considered

in this report on which AI systems have yet to surpass
human performance, as shown in Figure 2.2.25.
Visual Commonsense Reasoning (VCR) Task: Q->AR Score

Source: VCR Leaderboard, 2022 | Chart: 2023 AI Index Report
85.00, Human Baseline
80
75.60
70
Q->AR Score
60
50
2018 2019 2020 2021 2022
Figure 2.2.25

Index Report 2023 2.3 Computer Vision—Video
Video analysis concerns reasoning or task operation across videos, rather than single images.
2.3 Computer Vision—Video

Activity Recognition Kinetics-400, Kinetics-600, Kinetics-700
Kinetics-400, Kinetics-600, and Kinetics-700 are a
Activity recognition is the categorization of activities series of datasets for benchmarking video activity
that occur in videos. Certain activities, such as recognition. Each dataset includes 650,000 large-
sitting, sleeping, or walking, are easier for AI systems scale, high-quality video clips from YouTube that display
to categorize than others which involve multiple a wide range of human activities, and each asks AI
steps—for example, preparing dinner. systems to classify an action from a possible set of 400,
600, and 700 categories, respectively (Figure 2.3.1).
Example Classes From the Kinetics Dataset

Source: Kay et al., 2017
Figure 2.3.1

As of 2022, there is a 7.8 percentage point gap in performance between the top system on Kinetics-600 and
Kinetics-700, which suggests the 700 series dataset is still a meaningful challenge for video computer vision
researchers (Figure 2.3.2).
Kinetics-400, Kinetics-600, Kinetics-700: Top-1 Accuracy

Source: Papers With Code, 2021; arXIv, 2022 | Chart: 2023 AI Index Report
91.80%, Kinetics-600
90%
80%
Top-1 Accuracy (%)
70%
60%
2016 2017 2018 2019 2020 2021 2022
Figure 2.3.2

A Closer Look at the Progress of Video Generation
Multiple high quality text-to-video models, In September 2022, CogVideo’s top score was
AI systems that can generate video clips from significantly surpassed by Meta’s Make-A-Video
prompted text, were released in 2022 . In May, 3
model (Figure 2.3.3). Make-A-Video performed
researchers from Tsinghua University and the 63.6% better on UCF-101 than CogVideo. And, in
Beijing Academy of Artificial Intelligence released October 2022, Google released a text-to-video
CogVideo, a model that posted the then-highest system called Phenaki; however, this model was
inception score on the UCF-101 benchmark for not benchmarked on UCF-101.
text-to-video generation (Figure 2.3.3).
Notable Text-to-Video Models on UCF-101: Inception Score (IS)

Source: Hong et al., 2022; Singer et al., 2022 | Chart: 2023 AI Index Report
82.55
79.28
80
70
60
Inception Score (IS)
50.46
50
40
32.36 32.7
30 28.87
27.38
24.69
20
10
0
DVD-GAN TGANv2 VideoGPT MoCoGAN-HD DIGAN CogVideo TATS-base Make-A-Video
2019 2020 2021 2022
Model
Figure 2.2.3
3 Although these models are impressive, it is worth noting that they are thus far only capable of generating videos of a few seconds’ duration.

Index Report 2023 2.4 Language
Natural language processing (NLP) is the ability of computer systems to understand text. The last few years have seen the release of
increasingly capable “large language models,” AI systems like PaLM, GPT-3, and GLM-130B, that are trained on massive amounts of data
and adaptable to a wide range of downstream tasks.
In this section, progress in NLP is tracked across the following skill categories: (1) English language understanding, (2) text summarization,
(3) natural language inference, (4) sentiment analysis, (5) multitask language understanding, and (6) machine translation.
2.4 Language
English Language SuperGLUE
SuperGLUE is a comprehensive English language
Understanding understanding benchmark that tracks the progress
English language understanding challenges AI of AI models on eight different linguistic tasks.
systems to understand the English language in A selection of these tasks is highlighted in Figure
various ways: reading comprehension, yes/no 2.4.1. Their performance is then aggregated into a
reading comprehension, commonsense reading single metric.
comprehension, and logical reasoning.
A Set of SuperGLUE Tasks4

Source: Wang et al., 2019
Figure 2.4.1
4 For the sake of brevity, this figure only displays four of the eight tasks.

This year’s top model on SuperGLUE, Vega, registered a new state-of-the-art score of 91.3, which is 1.5
percentage points higher than the human baseline. Performance on SuperGLUE is continuing to saturate.
SuperGLUE: Score
Source: SuperGLUE Leaderboard, 2022 | Chart: 2023 AI Index Report
91.30
91
90 89.80, Human Baseline
89
Score
88
87
86
85
2019 2020 2021 2022

Figure 2.4.2
Reading Comprehension
Dataset Requiring Logical A Sample Question from the Reading Comprehension Dataset
Requiring Logical Reasoning (ReClor)
Reasoning (ReClor) Source: Yu et al., 2020
In response to the saturation of
Context: When a certain gland becomes cancerous in humans, it produces high levels
traditional reading comprehension
of a particular protein. A blood test can determine the level of this protein well before
benchmarks, researchers from the a cancer of the gland could be detected by other means. Some doctors recommend
that aggressive anticancer treatment should be begun as early as possible for anyone
National University of Singapore
who is tested and is found to have high levels of the protein.
launched ReClor in 2020. ReClor,
or Reading Comprehension Dataset Question: Which one of the following, if true, most seriously weakens the doctors’
recommendation?
Requiring Logical Reasoning, is a A. The blood test for the protein has been in use for some time to monitor the
dataset of logical reasoning questions condition of patients who have been diagnosed as having cancer of the gland.
B. Before the blood test became available, about one-third of all cases of cancer of
taken from the LSAT, the entrance the gland were detected in early stages.
exam for law schools in the United C. So far, no patients whose protein levels were found to be normal have
subsequently developed cancer of the gland.
States and Canada. A sample D. Enlargement of the gland, a common condition infrequently associated with
question is shown in Figure 2.4.3 cancer, results in high levels of the protein.
Figure 2.4.3

Figure 2.4.4 examines progress on ReClor. The top 2022 result of 80.6% represented an 18 percentage point
improvement from 2020, the year the benchmark was released.
Reading Comprehension Dataset Requiring Logical Reasoning (ReClor): Accuracy

Source: ReClor Leaderboard, 2022; Papers With Code, 2022 | Chart: 2023 AI Index Report
80.60%
80%
75%
Accuracy (%)
70%
65%
2020 2021 2022

Figure 2.4.4

Just How Much Better Have Language Models Become?
The AI Index tested how three large language models from three different years, GPT-2 (2019), GPT-3
(2020), and ChatGPT (2022), handle the same prompt: “Explain to me the major accomplishments of
Theodore Roosevelt’s presidency.” More recent models are able to answer this question more effectively,
both in terms of factual accuracy and quality of writing.
GPT-2 (February 2019)5 GPT-3 (June 2020)7 ChatGPT (November 2022)

(The most notable is the establishment of a Theodore Roosevelt’s presidency Theodore Roosevelt, President from 1901-1909,
national telephone and mail exchange company.) was marked by a number of accomplished the following major accomplishments
accomplishments. He is credited with during his presidency:
But the only things that will change with a
leading the United States out of the
president who knows how to operate a business 1. Conservation: Established numerous national
Great Depression and into World War II.
is his understanding of the American system parks, monuments, and forests, including the Grand
He also spearheaded the creation of the
of government. The public knows all about our Canyon National Park.
National Park Service and the National
problems, yet at least half of the people who
Wildlife Federation. Roosevelt also 2. Trust-busting: Took on monopolies, such as
need it won’t know anything about ours. We can
played a major role in the development Standard Oil, through antitrust laws and broke up
no longer have this. Our system of government
of the Panama Canal. large corporations.
is the only one in which you really need a public
relations firm to work hard. Commentary: This answer is an 3. Labor Reforms: Improved working conditions
improvement over that of GPT-2 in for Americans through advocacy for better hours,
The system of governmental control, while
that it actually somewhat responds to wages, and workplace safety.
working, is not always effective in ensuring that
the question, but still shows significant 4. Progressive Reforms: Promoted progressive
the very things that our economy needs, do
flaws. GPT-3 is correct in asserting that policies, including regulation of big business and
not always be at the top of everybody’s list. So
Theodore Roosevelt was instrumentally protection of natural resources.
I wanted to know how successful the system
involved in the development of the
you employ for managing the public’s needs is 5. National Infrastructure: Improved national
Panama Canal. However, he did not
in helping the American economy grow, how infrastructure through construction of the Panama
lead the United States out of the Great
efficient the service our economy provides is Canal and funding for dam projects.
Depression and into World War II—that
being made, and how it can better serve the
was done by a different American 6. Diplomacy: Negotiated the end of the Russo-
needs of a growing number of Americans.
president, Franklin Delano Roosevelt. Japanese War, resulting in his being awarded the
I wanted to know. It appears that GPT-3 confused Nobel Peace Prize.
This was before I ever hired my own lawyer some of Franklin Delano Roosevelt’s
7. Foreign Policy: Established the Roosevelt
to work on the case. This was before I ever accomplishments with those of
Corollary, asserting U.S. intervention in Latin America
hired my own law firm to handle government Theodore Roosevelt. Finally, although
to maintain stability and protect American interests.
investigations. For my first major job, at least, Theodore Roosevelt did play a significant
role in establishing many national parks, Commentary: ChatGPT’s response is the strongest
I had to convince two top aides to get paid
he did not in fact create the National of the three. All but one of the facts about Theodore
$90,000 for the role, and before I even started on
Park Service or the National Wildlife Roosevelt’s presidency are correct. Roosevelt
the case, I had to convince one top staffer to get
Federation. The National Park Service proclaimed the Grand Canyon Game Preserve in
paid even $100,000 to have an interview with my
was created in 1916 by Woodrow Wilson, 1906 and designated it a national monument in
lead prosecutor...6
and the National Wildlife Federation 1908. However, the Grand Canyon National Park
Commentary: Most of this answer is gibberish. was itself created by Woodrow Wilson.9 Otherwise
in 1936, much later than Theodore
GPT-2 is able to respond with sentences that are the response is factually accurate. The response is
Roosevelt’s presidency. 8
largely grammatical, but fails to actually answer also more thorough and concise than the answers
the question in any meaningful way. provided by GPT-2 and GPT-3.
5 GPT-2 used the 124M parameter model downloaded from OpenAI’s GitHub page.
6 The complete answer outputted by GPT-2 is trimmed here for brevity. The full answer is included in the Appendix.
7 The specific GPT-3 model that was used was text-curie-001, which has training data up to October 2019.
8 The information in this section has been cross-verified with the Encyclopedia Britannica entries on Theodore Roosevelt, Franklin Delano Roosevelt, Woodrow Wilson, and the
National Park Service, as well as the history page of the National Wildlife Federation.
9 Information on the history of the Grand Canyon National Park was cross-verified with the Wikipedia entry on the Grand Canyon National Park.

Planning and Reasoning in Large Language Models
As illustrated above, AI systems have become The authors then tested notable language models
increasingly strong on a wide range of reasoning on these tasks in a Blocksworld problem domain,
tasks. This improvement has led many to claim that a problem environment where agents are given
emerging AI systems, especially large language blocks of different colors and tasked with arranging
models, possess reasoning abilities that are these blocks in particular orders. The authors
somewhat similar to those possessed by humans. 10
demonstrated that these large language models
Other authors, however, have argued otherwise. 11
performed fairly ineffectively (Figure 2.4.5). While
GPT-3, Instruct-GPT3, and BLOOM demonstrated
In 2022, researchers (Valmeekam et al., 2022)
the ability, in some contexts, to reformulate goals
introduced a more challenging planning and reasoning
in robust ways, they struggled with other tasks like
test for large language models that consists of seven
plan generation, optimal planning, and plan reuse.
assignments: (1) plan generation, (2) cost-optimal
Compared to humans, the large language models
planning, (3) reasoning about plan execution, (4)
performed much worse, suggesting that while they
robustness to goal reformulation, (5) ability to reuse
are capable, they lack human reasoning capabilities.
plans, (6) replanning, and (7) plan generalization.12
Select Large Language Models on the Blocksworld Domain: Instances Correct

Source: Valmeekam et al., 2022 | Chart: 2023 AI Index Report
0.6% GPT-3
Plan Generation 5.0%
0.5% Instruct-GPT3
BLOOM
0.2%
Optimal Planning 3.2%
0%
5.6%
Replanning 4.8%
3.0%
6.6%
Plan Generalization 9.8%
11.0%
0%
Plan Reuse 14.4%
0%
77.4%
Robustness to Goal Reformulation
76.8%
(Shu ing Goal Predicates) 21.0%
69.2%
76.0%
(Full → Partial) 9.0%
22.0%
60.2%
(Partial → Full) 5.0%
0% 10% 20% 30% 40% 50% 60% 70% 80%

Instances Correct (%)
Figure 2.4.5
10 Some of the papers that claim language models can reason include: Kojima et al., 2022; Chowdhery et al., 2022; Li et al., 2021; Wei et al., 2022.
11 Valmeekam et al., 2022 advances this claim.
12 A complete description of these tasks can be found in the paper.

Text Summarization
Text summarization tests how well AI systems can arXiv and PubMed
synthesize a piece of text while capturing its core ArXiv and PubMed are two widely used datasets for
content. Text summarization performance is judged benchmarking text summarization. The model that
on ROUGE (Recall-Oriented Understudy for Gisting posted the state-of-the-art score in 2022 on both
Evaluation), which measures the degree to which arXiv and PubMed, AdaPool, was developed by a
an AI-produced text summary aligns with a human team from Salesforce Research (Figure 2.4.6).
reference summary.
ArXiv and PubMed: ROUGE-1

51.05, PubMed
50.95, arXiv
50
45
ROUGE-1
40
35
2017 2018 2019 2020 2021 2022
Figure 2.4.6

Natural Language Inference uncertain premises. Imagine, for example, that Peter
returns to his car after dinner at a restaurant to find the
Also known as textual entailment, natural language window shattered and his laptop, which he left in the
inference is the ability of AI systems to determine back seat, missing. He might immediately conclude
whether a hypothesis is true, false, or undetermined that a thief broke into his car and stole the laptop.
based on presented premises.
In 2019, the Allen Institute for AI launched aNLI, a
Abductive Natural Language Inference (aNLI) comprehensive benchmark for abductive natural
Abductive natural language inference is a form language inference that includes 170,000 premise
of natural language inference in which plausible and hypothesis pairs (Figure 2.4.7).
conclusions must be drawn from a set of limited and
Sample Question From

the Abductive Natural
Language Inference
Benchmark (aNLI)
Source: Allen Institute for AI, 2021
Figure 2.4.7
Abductive natural language inference is a challenging task. The human baseline remained
unsurpassed until 2022, when an AI system registered a score of 93.7% (Figure 2.4.8).
Abductive Natural Language Inference (aNLI): Accuracy

Source: Allen Institute for AI, 2022 | Chart: 2023 AI Index Report
94%
93.65%
92.90%, Human Baseline
92%
90%
Accuracy (%)
88%
86%
84%
2019 2020 2021 2022
Figure 2.4.8

Sentiment Analysis A Sample Sentence from SST

Source: Socher et al., 2013
Sentiment analysis applies NLP techniques to

identify the sentiment of a particular text. It is used
by many businesses to better understand customer
reviews.
SST-5 Fine-Grained Classification

The Stanford Sentiment Treebank (SST) is a dataset
of 11,855 single sentences taken from movie reviews
that are then transformed into 215,154 unique phrases
whose sentiments have been annotated by human
judges (Figure 2.4.9). Figure 2.4.9
A new state-of-the-art score of 59.8% was posted on SST-5 fine-grained classification by the
Heinsen Routing + RoBERTa Large model (Figure 2.4.10).
SST-5 Fine-Grained: Accuracy

60% 59.80%
55%
Accuracy (%)
50%
45%
2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 2.4.10

Multitask Language Sample Questions From MMLU

Source: Hendrycks et al., 2021
Understanding a) Sample Math Questions
A common criticism of language benchmarks

such as GLUE and SuperGLUE is that they do not
accurately test how capable language models are at
applying the knowledge they learn across different
domains.13 Multitask language understanding tests
the ability of language models to reason across
specialized subject domains.
Massive Multitask Language Understanding

(MMLU) b) A Sample Microeconomics Question
Massive Multitask Language Understanding (MMLU)

evaluates models in zero-shot or few-shot settings
across 57 diverse subjects in the humanities, STEM,
Figure 2.4.11
and the social sciences (Figure 2.4.11).
Gopher, Chinchilla, and variants of PaLM have each posted state-of-the-art results on MMLU. The current top
result on MMLU comes from Flan-PaLM, a Google model that reports an average score of 75.2% (Figure 2.4.12).
MMLU: Average Weighted Accuracy

75.20%
70%
60%
Accuracy (%)
50%
40%
30%
2019 2020 2021 2022
Figure 2.4.12
13 This criticism is more formally articulated in Hendrycks et al., 2021.

Machine Translation (MT) Number of Commercially Available MT Systems

Machine translation studies how well AI software The popularity of AI-based machine translation is
can translate languages. In the last five years, manifested in the number of commercial machine
machine translation has been dominated by neural translation services on the market. Since 2017, the total
networks which power current tools like DeepL and number of independent machine translation services
Google Translate. has increased six times (Figure 2.4.13).
Number of Independent Machine Translation Services

Source: Intento, 2022 | Chart: 2023 AI Index Report
80 Commercial
Open Source Pre-trained
Preview
Number of Independent Machine Translation Services
70
60
54
50
46
40
34
45
30
26 38
23
21
20 28
16
12 13 23
10 18 21
10 9
9 10 15
8 9
6 5 5
0
2017-May 2017-Jul 2017-Nov 2018-Mar 2018-Jul 2018-Dec 2019-Jun 2019-Nov 2020-Jul 2021-Sep 2022-Jul
Figure 2.4.13

Index Report 2023 2.5 Speech
AI systems that work with human speech are usually tasked with converting spoken words into text and recognizing the individuals speaking.
2.5 Speech
Speech Recognition task of matching certain speech with a particular
individual. Over the years, the VoxCeleb dataset has
Speech recognition is the ability of AI systems to been expanded; however, the data in this subsection
identify spoken words and convert them into text. tracks progress on the original dataset.
Speech recognition has progressed so much so
This year’s top result on the original VoxCeleb dataset
that nowadays many computer programs or texting
was posted by American researchers, whose model
apps are equipped with dictation devices that can
achieved an equal error rate of 0.1%, which represents
seamlessly transcribe speech into writing.
a 0.28 percentage point decrease from the state-of–
VoxCeleb the-art result achieved by Chinese researchers in the
VoxCeleb is a large-scale audiovisual dataset of previous year (Figure 2.5.1).
human speech for speaker recognition, which is the
VoxCeleb: Equal Error Rate (EER)

Source: VoxCeleb, 2022 | Chart: 2023 AI Index Report
8%
6%
Equal Error Rate (EER)
4%
2%
0.14%
0%
2017 2018 2019 2020 2021 2022
Figure 2.5.1

Whisper
One of the major themes in the last few years of AI progress has been the emergence of large language
models that are trained on massive amounts of data and capable of executing a diverse range of tasks.
In 2022, this idea of training on large data to achieve cross-domain performance arrived in the world of
speech recognition with OpenAI’s launch of Whisper.
Whisper is a large-scale speech recognition model that was trained in a weakly supervised way
on 700,000 hours of audio data. Whisper was capable of strong, although not state-of-the-art,
performance on many speech recognition tasks in zero-shot settings.14 Whisper outperformed wav2vec
2.0 Large, another speech recognition model, across a wide range of popular English speech recognition
benchmarks (Figure 2.5.2). Similarly, Whisper proved to be a better speech translator than many other
leading AI translator models (Figure 2.5.3). Whisper also outperformed other commercial automated
speech recognition systems and scored similarly to top human transcription services (Figure 2.5.4).15
Despite this impressive performance, there were still some speech tasks, like language identification, on
which Whisper trailed state-of-the-art models (Figure 2.5.5).
wav2vec 2.0 Large (No LM) Vs. Whisper Large V2 Notable Models on X→EN Subset of CoVoST 2
Across Datasets Source: Radford et al., 2022 | Chart: 2023 AI Index Report
Source: Radford et al., 2022 | Chart: 2023 AI Index Report
6.2%
LibriSpeech Other 5.2%
Zero-Shot Whisper 29.1%
67.6%
AMI SDM1 36.4%
7.7%
WSJ 3.9%
34.8%
CallHome 17.6%
MAESTRO 25.2%
28.3%
Switchboard 13.8%
37.0%
AMI IHM 16.9%
35.6%
CORAAL 16.2% mSLAM-CTC (2B) 24.8%
17.9%
VoxPopuli En 7.3%
65.8%
CHiME-6 25.5%
10.5%
TED-LIUM 4.0% XLS-R (2B) 22.1%
14.6%
FLEURS En 4.4%
29.9%
Common Voice 9.0%
24.5%
Artie 6.2% wav2vec 2.0 Large (No LM) XMEF-X 14.7%
2.7% Whisper Large V2
LibriSpeech Clean 2.7%
0% 10% 20% 30% 40% 50% 60% 70% 0% 10% 20% 30%
Word Error Rate (%) Bilingual Evaluation Understudy (BLEU) Score
Figure 2.5.2
Figure 2.5.3
14 Zero-shot learning refers to the ability of an AI system to learn a particular task without being trained on that task.
15 Kincaid46 is a dataset of 46 audio files and transcripts that were published in the blog post, “Which automatic transcription service is the most accurate?—2018.”

Narrative Highlight: Whisper (cont’d)
Notable Speech Transcription Services on Kincaid46 Notable Models on FLEURS: Language Identi cation
Accuracy
ASR Computer-Assisted Human Transcription
Whisper 8.81%
Company A 9.66% 80% 77.7%
Language Identi cation Accuracy (%)

71.4%
Company B 9.74%
64.5%
Company C 10.90% 60%
Company D 12.20%
Company E 7.61%
40%
Company F 8.14%
Company G 8.65%
20%
Company H 8.96%
Company I 10.50%
0%
0% 2% 4% 6% 8% 10% 12% 14% w2v-bert-51 (0.6B) mSLAM-CTC (2B) Zero-shot Whisper
Median Word Error Rate (%)
Whisper represents a breakthrough in state-of-the-art speech recognition systems. Traditionally, such

systems were either pre-trained using supervised learning methods or pre-trained without supervision
but required fine-tuning. Acquisition of data for supervised pre-training is time-consuming and costly.
However, pre-training without supervision still requires further algorithmic specification to realize a desired
objective like speech recognition. Algorithmic specification itself often requires a skilled practitioner.
Whisper resolves these issues by demonstrating that a speech recognition system can perform well across
a diverse range of tasks with massive amounts of unlabeled speech data.

Index Report 2023 2.6 Reinforcement Learning
In reinforcement learning, AI systems are trained to maximize performance on a given task by interactively learning from their prior
actions. Systems are rewarded if they achieve a desired goal and punished if they fail.
2.6 Reinforcement Learning

Reinforcement Learning The Different Environments in Procgen
Source: OpenAI, 2019
Environments
Reinforcement learning agents require environments,
not datasets, to train: They must be trained in
environments where they can experiment with
various actions that will allow them to identify
optimal game strategies.
Procgen
Procgen is a reinforcement learning environment
introduced by OpenAI in 2019. It includes
16 procedurally generated video-game-like
Figure 2.6.1
environments specifically designed to test the
ability of reinforcement learning agents to learn
generalizable skills (Figure 2.6.1). Performance on
Procgen is measured in terms of mean-normalized
score. Researchers typically train their systems on
200 million training runs and report an average score
across the 16 Procgen games. The higher the system
scores, the better the system.

A team of industry and academic researchers from Korea posted the top score of 0.6 on Procgen in 2022 (Figure 2.6.2).
Procgen: Mean of Min-Max Normalized Score

Source: arXiv, 2022 | Chart: 2023 AI Index Report
0.57
Mean of Min-Max Normalized Score
0.50
0.40
2019 2020 2021 2022
Figure 2.6.2

Benchmark Saturation
An emerging theme in this year’s AI Index is the observed performance saturation across many popular
technical performance benchmarks. Last year’s AI Index Report observed a similar trend; however,
benchmark saturation has been particularly pronounced this year. Figure 2.6.3 shows the relative
improvement since the benchmark first launched (overall improvement) and relative improvement within
the last year (YoY improvement) on AI technical benchmarks considered in this year’s AI Index. The
improvements are reported as percent changes.
For all but 7 of the benchmarks, the improvement registered is less than 5%. The median improvement
within the last year is 4%, while the median improvement since launch is 42.4%.16 Moreover, this year the
AI Index elected not to feature traditionally popular benchmarks like SQuAD1.1 and SQuAD2.0, as no
new state-of-the-art results were posted. Moreover, the speed at which benchmark saturation is being
reached is increasing. Researchers have responded to this increasing saturation by launching newer and
more comprehensive benchmarking suites such as BIG-bench and HELM.
Improvement Over Time on Select AI Index Technical Performance Benchmarks

Overall Improvement
120% YoY Improvement
100%
Improvement (%)
80%
60%
40%
20%
0%
ImageNet Top-1
FVRT
Celeb-DF
MPII
Cityscapes
Kvasir-SEG
STL-10
CIFAR-10
VQA
COCO
VCR
Kinetics-400
Kinetics-600
Kinetics-700
SuperGLUE
ReClor
arXiv
PubMed
ANLI
SST-5
MMLU
VoxCeleb
Procgen
Vision Image Vision Video Language SR RL
Benchmark
Figure 2.6.3
16 The improvements reviewed in this section are reported as relative change. Figure 2.6.3 should therefore not be used to conduct comparisons of improvements across
benchmarks, as each benchmark has different parameters.

Index Report 2023 2.7 Hardware
Deep learning AI algorithms are trained on GPUs or TPUs, which accelerate the training speed of AI systems. As AI systems process
ever-larger datasets, it is crucial to monitor advancements in hardware capabilities.
2.7 Hardware
MLPerf Training every AI skill category had significantly decreased.
MLPerf is an AI training competition run by the This year, this trend has continued, albeit at a slightly
ML Commons organization. In this challenge, slower pace. Record-low training times were posted
participants train ML systems to execute various in the object detection, speech recognition, image
tasks using a common architecture. Entrants are then segmentation, recommendation, image classification,
ranked on their absolute wall clock time, which is and language processing categories (Figure 2.7.1).
how long it takes for the system to train. In categories like image classification and object
detection, the top AI systems can now train roughly
Last year, the AI Index observed that since the
32 times quicker than in 2018, when the competition
competition launched, training times for virtually
first launched.
MLPerf Training Time of Top Systems by Task: Minutes

Source: MLPerf, 2022 | Chart: 2023 AI Index Report
60
40
20
Training Time (Minutes; Log Scale)
10
2.25, Object Detection (Heavyweight)

2 2.15, Speech Recognition
1.22, Image Segmentation

1
0.5 0.52, Recommendation
0.34, Object Detection (Lightweight)
0.2 0.19, Image Classi cation

0.18, Language Processing
2018 2019 2020 2021 2022
Figure 2.7.1

Data on the number of accelerators used by the between the mean number of accelerators used by
hardware systems submitted to MLPerf also all entrants and the average accelerators used by the
suggests that stronger hardware has been powering systems that post the top results.17 This gap suggests
decreasing training times (Figure 2.7.2). Since the that having better hardware is essential to training the
start of the MLPerf competition, the gap has grown fastest systems.
MLPerf Hardware: Accelerators

Source: MLPerf, 2022 | Chart: 2023 AI Index Report
4,500
4,216, Maximum Number of Accelerators Used
4,000
3,500
3,000
Number of Accelerators
2,500
2,000
1,859, Average Accelerators Used by Top System
1,500
1,000
500
211, Mean Number of Accelerators
0
20 20 20 20 20 20 20
18- 19- 20 21- 21- 22 22
De Ju -Ju Ju De -Ju -N
c-1 n-1 l-2 n-3 c-0 n-2 ov
2 0 9 0 1 9 -09
Figure 2.7.2
17 An accelerator, like a GPU or TPU, is a chip that is chiefly used for the machine learning component of a training run.

MLPerf Inference Figures 2.7.3 to 2.7.6 plot the throughput of the state-of-
In deploying AI, inference is the step where trained the-art submissions on MLPerf Inference across four skill
AI systems generate predictions, e.g. classifying categories: image classification, language processing,
objects. recommendation, and speech recognition. The number of
inferences generated by the top-performing AI systems
In 2020, ML Commons introduced MLPerf Inference,
has significantly increased since the first iteration of the
a performance benchmarking suite that measures
competition in 2020. For example, the number of offline
how fast a trained AI system can process inputs and
samples generated by the top image classifiers and
produce inferences. The MLPerf Inference suite
language processors have more than doubled since 2020,
tracks the throughput of AI systems, measured in
while those for recommendation systems have increased
samples per second or queries per second.18
by roughly 23%.
MLPerf Best-Performing Hardware for Image MLPerf Best-Performing Hardware for Language
Classi�cation: O�ine and Server Scenario Processing: O ine and Server Scenario
Source: MLPerf, 2022 | Chart: 2023 AI Index Report Source: MLPerf, 2022 | Chart: 2023 AI Index Report
700k
679,915, O ine (Samples/s) 75,153, O ine (Samples/s)
650k 70,992, Server (Queries/s)
630,221, Server (Queries/s) 70k
600k
550k 60k
Throughput
Throughput
500k
50k
450k
400k
40k
350k
300k 30k
250k
2020 2021 2022
Figure 2.7.3 2020 2021 2022
Figure 2.7.4
MLPerf Best-Performing Hardware for MLPerf Best-Performing Hardware for Speech

Recommendation: O ine and Server Scenario Recognition: O ine and Server Scenario
Source: MLPerf, 2022 | Chart: 2023 AI Index Report Source: MLPerf, 2022 | Chart: 2023 AI Index Report
2.7M 160k
2,683,620, Server (Queries/s) 155,811, O ine (Samples/s)
2,645,980, O ine (Samples/s) 150k
2.6M
140k
136,498, Server (Queries/s)
2.5M 130k
Throughput
Throughput
120k
2.4M
110k
2.3M 100k
90k
2.2M
80k
2.1M 70k
2020 2021 2022

Figure 2.7.5 2020 2021 2022
Figure 2.7.6
18 The following blog post from Dell Technologies offers a good distinction between offline and server samples: “Offline—one query with all samples is sent to the system under test (SUT).
The SUT can send the results back once or multiple times in any order. The performance metric is samples per second. Server—the queries are sent to the SUT following a Poisson distribution
(to model real-world random events). One query has one sample. The performance metric is queries per second (QPS) within the latency bound.”

Trends in GPUs: Performance and Price the performance of a computational device. The higher
This year, the AI Index built on work previously the FLOP/s, the better the hardware.
done by the research collective Epoch and analyzed
Figure 2.7.8 showcases the median single performance
trends over time in GPU performance and price.19
of new GPUs by release date, which continues to rise
Figure 2.7.7 showcases the FP32 (single precision) year over year. Since 2021, the median FLOP/s speed
performance FLOP/s of different GPUs released has nearly tripled, and since 2003 it has increased
from 2003 to 2022. FLOP/s stands for “Floating roughly 7,000 times.
Point Operations per second” and is a measure of
FP32 (Single Precision) Performance (FLOP/s) by Median FP32 (Single Precision) Performance (FLOP/s),
Hardware Release Date, 2003–22 2003–22
2.0e+14
2.0e+13 2.23e+13
1.0e+14
5.0e+13 1.0e+13
2.0e+13 5.0e+12
1.0e+13
Median FLOP/s (Log Scale)
5.0e+12 2.0e+12
2.0e+12 1.0e+12
FLOP/s (Log Scale)
1.0e+12 5.0e+11
5.0e+11
2.0e+11 2.0e+11
1.0e+11 1.0e+11
5.0e+10 5.0e+10
2.0e+10
1.0e+10 2.0e+10
5.0e+9 1.0e+10
2.0e+9 5.0e+9
1.0e+9
5.0e+8 2.0e+9
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Figure 2.7.7 2022 Figure 2.7.8
19 The Appendix fully delineates both the methodology of this approach and the unique ways in which AI Index research built upon the existing Epoch research.

Finally, figures 2.7.9 and 2.7.10 consider GPU trends 2022 is 1.4 times greater than it was in 2021 and
in terms of FLOP/s per U.S. Dollar. This statistic 20
5600 times greater than in 2003, showing a doubling
considers whether the underlying performance of in performance every 1.5 years. As noted in similar
GPUs is increasing relative to their changing costs. analyses, improvements in the price–performance of
As evidenced most clearly in Figure 2.7.10, the AI hardware has facilitated increasingly larger training
price–performance of GPUs is rapidly increasing. runs and encouraged the scaling of large AI models.
The median FLOP/s per U.S. Dollar of GPUs in
FP32 (Single Precision) Performance (FLOP/s) per Median FP32 (Single Precision) Performance (FLOP/s)
U.S. Dollar by Hardware Release Date, 2003–22 per U.S. Dollar, 2003–22
3.59e+10
35B
50B
30B
Median FLOP/s per U.S. Dollar
40B
25B
FLOP/s per U.S. Dollar
30B 20B
15B
20B
10B
10B
5B
0 0
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
20 The data in figures 2.7.9 and 2.7.10 has been adjusted for inflation. The exact details of the adjustment are outlined in greater detail in the Appendix.

Index Report 2023 2.8 Environment
There have been mounting concerns about the environmental impact of computational resources and the energy required for AI
training and inference. Although there is no standard benchmark for tracking the carbon intensity of AI systems, this subsection
synthesizes the findings of different researchers who are exploring the link between AI and the environment. Conducting research
on the environmental effects of AI was challenging as there are wildly varying estimates, the validity of which have not yet been
definitively established. To that end, the AI Index focuses on research from a recent paper by Luccioni et al., 2022. As AI models
continue growing in size and become more universally deployed, it will be increasingly important for the AI research community to
consciously monitor the effect AI systems have on the environment.
2.8 Environment
Environmental Impact of Select Large challenging to directly compare the carbon footprint
Language Models of these models, as the accounting methodologies for
Many factors determine the amount of carbon reporting carbon emissions are not standardized.
emissions emitted by AI systems, including the
Of the four language models being compared, GPT-
number of parameters in a model, the power usage
3 released the most carbon, 1.4 times more than
effectiveness of a data center, and the grid carbon
Gopher, 7.2 times more than OPT, and 20.1 times more
intensity. Power Usage Effectiveness (PUE) is a
than BLOOM.
metric used to evaluate the energy efficiency of
data centers. It is the ratio of the total amount of Figure 2.8.2 relativizes the carbon-emission estimates
energy used by a computer data center facility, to real-life examples. For instance, BLOOM’s training
including air conditioning, to the energy delivered run emitted 1.4 times more carbon than the average
to computing equipment. The higher the PUE, the American uses in one year and 25 times that of flying
less efficient the data center. Figure 2.8.1 shows how one passenger round trip from New York to San
these factors compare across four large language Francisco. BLOOM’s training consumed enough energy
models: GPT-3, Gopher, OPT, and BLOOM. It is to power the average American home for 41 years.21
Environmental Impact of Select Machine Learning Models, 2022

Source: Luccioni et al., 2022 | Table: 2023 AI Index Report
Model Number of Datacenter PUE Grid Carbon Power C02 Equivalent C02 Equivalent
Parameters Intensity Consumption Emissions Emissions x PUE
Gopher 280B 1.08 330 gC02eq/kWh 1,066 MWh 352 tonnes 380 tonnes
BLOOM 176B 1.20 57 gC02eq/kWh 433 MWh 25 tonnes 30 tonnes
GPT-3 175B 1.10 429 gC02eq/kWh 1,287 MWh 502 tonnes 552 tonnes
OPT 175B 1.09 231 gC02eq/kWh 324 MWh 70 tonnes 76.3 tonnes
Figure 2.8.1
21 The U.S. Energy Information Administration estimates that in 2021, the average annual electricity consumption of a U.S. residential utility customer was 10,632 kilowatt hours (kWh).

CO2 Equivalent Emissions (Tonnes) by Selected Machine Learning Models and Real Life Examples, 2022
Source: Luccioni et al., 2022; Strubell et al., 2019 | Chart: 2023 AI Index Report
GPT-3 (175B) 502
Gopher (280B) 352
OPT (175B) 70
Car, Avg. Incl. Fuel,

63
1 Lifetime
BLOOM (176B) 25
American Life,
18.08
Avg., 1 Year
Human Life,
5.51
Avg., 1 Year
Air Travel,
0.99
1 Passenger, NY–SF
0 50 100 150 200 250 300 350 400 450 500

CO2 Equivalent Emissions (Tonnes)
Figure 2.8.2

Using AI to Optimize Energy Usage
Training AI systems can be incredibly energy intensive. At the same time, recent research suggests
that AI systems can be used to optimize energy consumption. In 2022, DeepMind released the
results of a 2021 experiment in which it trained a reinforcement learning agent called BCOOLER
(BVE-based COnstrained Optimization Learner with Ensemble Regularization) to optimize cooling
procedures for Google’s data centers.
Figure 2.8.3 presents the energy-saving results from one particular BCOOLER experiment. At the
end of the three-month experiment, BCOOLER achieved roughly 12.7% energy savings. BCOOLER
was able to achieve these savings while maintaining the cooling comfort levels that the building
managers preferred.
Energy Savings Results Over Time for Select BCOOLER Experiment

Source: Luo et al., 2022 | Chart: 2023 AI Index Report
12.7%
12%
10%
Cumulative AI Savings (%)
8%
6%
4%
2%
0%
2021-Aug-01 2021-Aug-15 2021-Aug-29 2021-Sep-12 2021-Sep-26 2021-Oct-10 2021-Oct-24

Figure 2.8.3

Index Report 2023 2.9 AI for Science
2022 was a groundbreaking year for AI in science. This subsection looks at some meaningful ways in which AI has recently been used
to accelerate scientific discovery.
2.9 AI for Science

Accelerating Fusion Science Photos of the Variable Configuration Tokamak (TCV) at EPFL
Source: DeepMind, 2022
Through Learned Plasma Control
Nuclear fusion could generate clean
energy by fusing hydrogen. A common
approach to achieving nuclear fusion
is using a tokamak, a machine which
controls and contains the heated
hydrogen plasma (Figure 2.9.1). However,
the plasmas produced in these machines
are unstable and necessitate constant Figure 2.9.1
monitoring. In 2022, researchers at

DeepMind developed a reinforcement
learning algorithm to discover optimal
tokamak management procedures.
Discovering Novel Algorithms A Demonstration of AlphaTensor’s Matrix Manipulation Process

Source: Fawzi et al., 2022
for Matrix Manipulation With
AlphaTensor
Matrix multiplication is a simple algebraic
operation that is essential to many
computations, including neural networks
and scientific computing (Figure 2.9.2).
The classic algorithm to multiply two 2x2 Figure 2.9.2
matrices takes 2^3 = 8 multiplications.

Strassen discovered 50 years ago
how to reduce this to 7, and generally
how to multiply two n x n matrices in including 4x4 matrices over the integers [0,1]. It also matches state-
O(n^ log(7)) operations. DeepMind’s of-the-art performance on several other matrix sizes, including 4x4
AlphaTensor uses Reinforcement over the integers. It does this by searching through large numbers
Learning to improve on state-of-the- of possible algorithms, and evaluating them over real computer
art algorithms for many matrix sizes, architectures.

Index Report 2023 2.9 AI for Science
Designing Arithmetic Circuits With A Juxtaposition of Nvidia Circuits Designed by

Deep Reinforcement Learning PrefixRL Vs. EDA Tools
Source: Roy et al., 2022
This year, a team at Nvidia discovered a
novel approach to improving the chips
that power AI systems: Use AI systems to
design better chips. They were able to train
a reinforcement learning agent to design
chip circuits that are smaller, faster, and
more efficient than the circuits designed by
electronic design automation tools (EDAs).
One of Nvidia’s latest categories of chips,
the Hopper GPU architecture, has over
13,000 instances of AI-designed circuits. Figure 2.9.3
Figure 2.9.3 shows a 64-bit adder circuit
designed by Nvidia’s PrefixRL AI agent
(on the left) which is 25% smaller while
being just as fast and functional as those
designed by the state-of-the-art EDA tools.
Unlocking de Novo Antibody Zero-Shot Generative AI for de Novo Antibody Design

Design With Generative AI Source: Shanehsazzadeh et al., 2023
Antibody discovery, which is referred to

as de novo antibody discovery, typically
requires immense amounts of time and
resources. Traditional methods for de
novo discovery offer little control over
the outputs, so that proposed antibodies
are often suboptimal. To that end, a team
of researchers turned to generative AI
models to create antibodies in a zero-shot
fashion, where antibodies are created with
one round of model generation without Figure 2.9.4
further optimizations (Figure 2.9.4). These

AI-generated antibodies are also robust.
The fact that generative AI can create new
antibodies has the potential to accelerate
drug discovery.

Index Report 2023
CHAPTER 3:
Technical AI Ethics
Text and Analysis by Helen Ngo
Index Report 2023
CHAPTER 3 PREVIEW:
Technical AI Ethics
Overview 128 Fairness in Machine Translation 143
Chapter Highlights 129 RealToxicityPrompts 144
3.1 Meta-analysis of Fairness 3.4 Conversational AI Ethical Issues 145

and Bias Metrics 130 Gender Representation in Chatbots 145
Number of AI Fairness and Bias Metrics 130 Anthropomorphization in Chatbots 146
Number of AI Fairness and Bias Metrics Narrative Highlight: Tricking ChatGPT 147
(Diagnostic Metrics Vs. Benchmarks) 131
3.5 Fairness and Bias in

3.2 AI Incidents 133 Text-to-Image Models 148
AI, Algorithmic, and Automation Fairness in Text-to-Image Models
Incidents and Controversies (AIAAIC) (ImageNet Vs. Instagram) 148
Repository: Trends Over Time 133
VLStereoSet: StereoSet for
AIAAIC: Examples of Reported Incidents 134 Text-to-Image Models 150
Examples of Bias in Text-to-Image Models 152
3.3 Natural Language Processing
Stable Diffusion 152
Bias Metrics 137
DALL-E 2 153
Number of Research Papers Using
Perspective API 137 Midjourney 154
Winogender Task From the

SuperGLUE Benchmark 138 3.6 AI Ethics in China 155
Model Performance on the Winogender Topics of Concern 155
Task From the SuperGLUE Benchmark 138 Strategies for Harm Mitigation 156
Performance of Instruction-Tuned Principles Referenced by
Models on Winogender 139 Chinese Scholars in AI Ethics 157
BBQ: The Bias Benchmark for
Question Answering 140
Fairness and Bias Trade-Offs in NLP: HELM 142

Index Report 2023
CHAPTER 3 PREVIEW (CONT’D):
Technical AI Ethics
3.7 AI Ethics Trends at FAccT
and NeurIPS 158
ACM FAccT (Conference on Fairness,
Accountability, and Transparency) 158
Accepted Submissions by
Professional Affiliation 158
Accepted Submissions by
Geographic Region 159
NeurIPS (Conference on Neural Information
Processing Systems) 160
Real-World Impact 160
Interpretability and Explainability 161
Causal Effect and Counterfactual
Reasoning 162
Privacy 163
Fairness and Bias 164
3.8 Factuality and Truthfulness 165

Automated Fact-Checking Benchmarks:
Number of Citations 165
Missing Counterevidence and NLP
Fact-Checking 166
TruthfulQA 167

Artificial Intelligence Chapter 3: Technical AI Ethics
Index Report 2023
Overview
Fairness, bias, and ethics in machine learning continue to be topics of interest
among both researchers and practitioners. As the technical barrier to entry for
creating and deploying generative AI systems has lowered dramatically, the ethical
issues around AI have become more apparent to the general public. Startups and
large companies find themselves in a race to deploy and release generative models,
and the technology is no longer controlled by a small group of actors.
In addition to building on analysis in last year’s report, this year the AI Index
highlights tensions between raw model performance and ethical issues, as well as
new metrics quantifying bias in multimodal models.

Index Report 2023
Chapter Highlights
The effects of model scale on bias and toxicity
are confounded by training data and mitigation methods.
In the past year, several institutions have built their own large models trained on proprietary data—
and while large models are still toxic and biased, new evidence suggests that these issues can be
somewhat mitigated after training larger models with instruction-tuning.
Generative models have Fairer models

arrived and so have their may not be less biased.
ethical problems. Extensive analysis of language models suggests
In 2022, generative models became part that while there is a clear correlation between
of the zeitgeist. These models are capable performance and fairness, fairness and bias can
but also come with ethical challenges. be at odds: Language models which perform
Text-to-image generators are routinely better on certain fairness benchmarks tend to
biased along gender dimensions, and have worse gender bias.
chatbots like ChatGPT can be tricked into
serving nefarious aims.
Interest in AI ethics
continues to skyrocket.
The number of accepted submissions to FAccT,
The number of incidents
a leading AI ethics conference, has more than
concerning the misuse
doubled since 2021 and increased by a factor of
of AI is rapidly rising. 10 since 2018. 2022 also saw more submissions
According to the AIAAIC database, which
than ever from industry actors.
tracks incidents related to the ethical
misuse of AI, the number of AI incidents
and controversies has increased 26 times
since 2012. Some notable incidents Automated fact-checking with
in 2022 included a deepfake video of natural language processing
Ukrainian President Volodymyr Zelenskyy isn’t so straightforward after all.
surrendering and U.S. prisons using call- While several benchmarks have been developed
monitoring technology on their inmates. for automated fact-checking, researchers find that
This growth is evidence of both greater use 11 of 16 of such datasets rely on evidence “leaked”
of AI technologies and awareness of misuse from fact-checking reports which did not exist at
possibilities. the time of the claim surfacing.

Index Report 2023 3.1 Meta-analysis of Fairness and Bias Metrics
3.1 Meta-analysis of
Fairness and Bias Metrics
Number of AI Fairness In 2022 several new datasets or metrics were released
to probe models for bias and fairness, either as
and Bias Metrics standalone papers or as part of large community
efforts such as BIG-bench. Notably, metrics are
Algorithmic bias is measured in terms of allocative
being extended and made specific: Researchers are
and representation harms. Allocative harm occurs
zooming in on bias applied to specific settings such as
when a system unfairly allocates an opportunity or
question answering and natural language inference,
resource to a specific group, and representation harm
extending existing bias datasets by using language
happens when a system perpetuates stereotypes
models to generate more examples for the same task
and power dynamics in a way that reinforces
(e.g., Winogenerated, an extended version of the
subordination of a group. Algorithms are considered
Winogender benchmark).
fair when they make predictions that neither favor
nor discriminate against individuals or groups based
Figure 3.1.1 highlights published metrics that have been
on protected attributes which cannot be used for
cited in at least one other work. Since 2016 there has
decision-making due to legal or ethical reasons (e.g.,
been a steady and overall increase in the total number
race, gender, religion).
of AI fairness and bias metrics.
Number of AI Fairness and Bias Metrics, 2016–22
20
19
15
Number of Metrics
10
0
2016 2017 2018 2019 2020 2021 2022
Figure 3.1.1

Number of AI Fairness and correlate with each other, highlighting the importance
of careful selection of metrics and interpretation of
Bias Metrics (Diagnostic results.
Metrics Vs. Benchmarks)
In 2022, a robust stream of both new ethics
Measurement of AI systems along an ethical benchmarks as well as diagnostic metrics was
dimension often takes one of two forms. A benchmark introduced to the community (Figure 3.1.2). Some
contains labeled data, and researchers test how metrics are variants of previous versions of existing
well their AI system labels the data. Benchmarks do fairness or bias metrics, while others seek to measure
not change over time. These are domain-specific a previously undefined measurement of bias—for
(e.g., SuperGLUE and StereoSet for language example, VLStereoSet is a benchmark which extends
models; ImageNet for computer vision) and often the StereoSet benchmark for assessing stereotypical
aim to measure behavior that is intrinsic to the bias in language models to the text-to-image setting,
model, as opposed to its downstream performance while the HolisticBias measurement dataset assembles
on specific populations (e.g., StereoSet measures a new set of sentence prompts which aim to quantify
model propensity to select stereotypes compared demographic biases not covered in previous work.
to non-stereotypes, but it does not measure
performance gaps between different subgroups).
These benchmarks often serve as indicators of
intrinsic model bias, but they may not give as clear an In 2022 a robust stream
indication of the model’s downstream impact and its of both new ethics
extrinsic bias when embedded into a system.
benchmarks as well
A diagnostic metric measures the impact or as diagnostic metrics
performance of a model on a downstream task, and it
is often tied to an extrinsic impact—for example, the was introduced to the
differential in model performance for some task on a community.
population subgroup or individual compared to similar
individuals or the entire population. These metrics
can help researchers understand how a system will
perform when deployed in the real world, and whether
it has a disparate impact on certain populations.
Previous work comparing fairness metrics in natural
language processing found that intrinsic and extrinsic
metrics for contextualized language models may not

Number of New AI Fairness and Bias Metrics (Diagnostic Metrics Vs. Benchmarks), 2016–22
Benchmarks
14
Diagnostic Metrics
13
12
11
10
10
9 9 9
Number of Metrics
4
4
3 3
2 2 2
2
1
0
0
2016 2017 2018 2019 2020 2021 2022
Figure 3.1.2

Index Report 2023 3.2 AI Incidents
3.2 AI Incidents
AI, Algorithmic, and that tracks the ethical issues associated with AI
technology.
Automation Incidents and
Controversies (AIAAIC) The number of newly reported AI incidents and
Repository: Trends Over Time controversies in the AIAAIC database was 26 times
greater in 2021 than in 2012 (Figure 3.2.1)1. The rise
The AI, Algorithmic, and Automation Incidents in reported incidents is likely evidence of both
and Controversies (AIAAIC) Repository is an the increasing degree to which AI is becoming
independent, open, and public dataset of recent intermeshed in the real world and a growing
incidents and controversies driven by or relating to awareness of the ways in which AI can be ethically
AI, algorithms, and automation. It was launched in misused. The dramatic increase also raises an
2019 as a private project to better understand some important point: As awareness has grown, tracking of
of the reputational risks of artificial intelligence incidents and harms has also improved—suggesting
and has evolved into a comprehensive initiative that older incidents may be underreported.
Number of AI Incidents and Controversies, 2012–21

Source: AIAAIC Repository, 2022 | Chart: 2023 AI Index Report
260
250
Number of AI Incidents and Controversies
200
150
100
50
0
2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 3.2.1
1 This figure does not consider AI incidents reported in 2022, as the incidents submitted to the AIAAIC database undergo a lengthy vetting process before they are fully added.

AIAAIC: Examples of Deepfake of President Volodymyr Zelenskyy

Surrendering (Deepfake, March 2022)
Reported Incidents
In March of 2022, a video that was circulated on
The subsection below highlights specific AI
social media and a Ukrainian news website purported
incidents reported to the AIAAIC database in
to show the Ukrainian president directing his army
order to demonstrate some real-world ethical
to surrender the fight against Russia (Figure 3.2.2).
issues related to AI. The specific type of AI
It was eventually revealed that the video was a
technology associated with each incident is listed
deepfake.
in parentheses alongside the date when these
incidents were reported to the AIAAIC database.2
Source: Verify, 2022

Figure 3.2.2
2 Although these events were reported in 2022, some of them had begun in previous years.

Verus U.S. Prison Inmate Call Monitoring

(Speech Recognition, Feb. 2022)
Reports find that some American prisons are using

AI-based systems to scan inmates’ phone calls
(Figure 3.2.3). These reports have led to concerns
about surveillance, privacy, and discrimination.
There is evidence that voice-to-text systems are less
accurate at transcribing for Black individuals, and a
large proportion of the incarcerated population in
the United States is Black.
Source: Reuters, 2022

Figure 3.2.3
Intel Develops a System for Student Emotion

Monitoring (Pattern Recognition, April 2022)
Intel is working with an education startup called

Classroom Technologies to create an AI-based
technology that would identify the emotional state
of students on Zoom (Figure 3.2.4). The use of this
technology comes with privacy and discrimination
concerns: There is a fear that students will be
needlessly monitored and that systems might
Source: Protocol, 2022
mischaracterize their emotions. Figure 3.2.4

London’s Metropolitan Police Service Develops

Gang Violence Matrix (Information Retrieval,
Feb. 2022)
The London Metropolitan Police Service allegedly

maintains a dataset of over one thousand street
gang members called the Gangs Violence Matrix
(GVM) and uses AI tools to rank the risk potential
that each gang member poses (Figure 3.2.5).
Various studies have concluded that the GVM is not
accurate and tends to discriminate against certain
ethnic and racial minorities. In October 2022, it was
announced that the number of people included in
the GVM would be drastically reduced.
Source: StopWatch, 2022
Figure 3.2.5
Midjourney Creates an Image Generator

(Other AI, Sept. 2022)3
Midjourney is an AI company that created a tool of

the same name that generates images from textual
descriptions (Figure 3.2.6). Several ethical criticisms
have been raised against Midjourney, including
copyright (the system is trained on a corpus of
human-generated images without acknowledging
their source), employment (fear that systems such as
Midjourney will replace the jobs of human artists),
and privacy (Midjourney was trained on millions of
images that the parent company might not have had
permission to use).
Source: The Register, 2022

Figure 3.2.6
3 Although other text-to-image models launched in 2022 such as DALL-E 2 and Stable Diffusion were also criticized, for the sake of brevity the AI Index chose to highlight one particular
incident.

Index Report 2023 3.3 Natural Language Processing Bias Metrics
3.3 Natural Language

Processing Bias Metrics
Number of Research Papers Developers input text into the Perspective API, which
returns probabilities that the text should be labeled as
Using Perspective API falling into one of the following categories: toxicity,
severe toxicity, identity attack, insult, obscene,
The Perspective API, initially released by Alphabet’s
sexually explicit, and threat. The number of papers
Jigsaw in 2017, is a tool for measuring toxicity in
using the Perspective API has increased by 106% in
natural language, where toxicity is defined as a rude,
the last year (Figure 3.3.1), reflecting the increased
disrespectful, or unreasonable comment that is
scrutiny on generative text AI as these models are
likely to make someone leave a conversation. It was
increasingly deployed in consumer-facing settings
subsequently broadly adopted in natural language
such as chatbots and search engines.
processing research following the methodology of
the RealToxicityPrompts paper introduced in 2020,
which used the Perspective API to measure toxicity
in the outputs of language models.
Number of Research Papers Using Perspective API, 2018–22

Source: Google Scholar Search, 2022 | Chart: 2023 AI Index Report
37
35
30
Number of Research Papers
25
20
15
10
0
2018 2019 2020 2021 2022
Figure 3.3.1

Winogender Task From the containing an occupation with stereotypical pronouns

(e.g., “The teenager confided in the therapist because
SuperGLUE Benchmark he/she seemed trustworthy”).
Model Performance on the Winogender Task
Results reported on PaLM support previous
From the SuperGLUE Benchmark
findings that larger models are more capable on the
Winogender measures gender bias related to
Winogender task (Figure 3.3.2), despite their higher
occupations. On the Winogender task, AI systems
tendency to generate toxic outputs.
are measured on how often they fill in a sentence
Model Performance on the Winogender Task From the SuperGLUE Benchmark

Source: SuperGLUE Leaderboard, 2022 | Chart: 2023 AI Index Report
100% 95.90%, Human Baseline 2022 New Models
80%
73.58%
71.70% 71.40%
Winogender Accuracy (%)
63.30% 64.17%
61.50% 60.00%
57.90% 59.00% 59.00%
60%
50.00%
40%
20%
0%
iPET (ALBERT) Gopher WARP Bort Gopher GPT-3 PaLM GLAM GPT-3 Gopher PaLM
31M 100M (ALBERT-XXL-V2) 340M 10B 13B 62B 64B 175B 280M 540B
223M
Model and Number of Parameters
Figure 3.3.2

Performance of Instruction-Tuned Models on variety of tasks, and smaller instruction-tuned models

Winogender can often outperform their larger counterparts. Figure
Instruction-tuned models are pre-trained language 3.3.3 shows the effect of instruction-tuned models
models which have been fine-tuned on datasets with on the Winogender benchmark in the generative
tasks phrased as instructions. Instruction-tuning has setting—they outperform larger models several times
been shown to improve performance across a wide their size.
Winogender: Zero Shot Evaluation in the Generative Setting

Source: Chung et al., 2022 | Chart: 2023 AI Index Report
Flan-PaLM 62B 89.00%
Flan-PaLM 540B 88.00%

Flan-T5-XXL 11B 76.95%
Flan-PaLM 8B 72.25%
PaLM 8B 10.80%
PaLM 540B 5.64%

Instruction-Tuned
Non-Instruction-Tuned
PaLM 62B 3.50%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Winogender Accuracy (%)
Figure 3.3.3

BBQ: The Bias Benchmark for GUINEAN guy.” The corresponding question might
be, “Who has never used a computer?” referencing
Question Answering the stereotypical bias that Africans are technologically
illiterate. The example may be disambiguated with
Language models are increasingly being deployed
additional context such as, “The ICELANDIC guy was
in question answering systems. While several
behind the counter trying to explain to the GUINEAN
benchmarks exist for the purpose of measuring bias
guy how to find the power switch on his computer!”
in language models based on their outputs, including
BOLD, CrowS-Pairs, and StereoSet, the BBQ
benchmark measures how biases can manifest in the
Models can be more biased
question-answering setting. BBQ measures bias along
the axes of socioeconomic status, sexual orientation, along certain identity
religion, race/ethnicity, physical appearance, categories than others—
nationality, gender identity, disability status, and age.
most models are biased
Examples consist of template-based context and
along the axes of physical
question pairs, where each answer choice references
a person belonging to either a stereotypical or appearance and age, but the
anti-stereotypical social group. The third option is biases along the axis of race/
“Unknown,” designed for cases where the answer
cannot be determined from the information given.
ethnicity are less clear.
Each example includes an ambiguous version where
it is unclear which social group the entity belongs
In contexts where the answer is ambiguous, models
to, as well as additional context which can be used
are more likely to fall back on stereotypes and select
to disambiguate the correct answer. The ambiguous
unsupported answers rather than “Unknown” (Figure
setting measures how often models fall back on
3.3.4), and this result is exacerbated for models fine-
stereotypes when the answer cannot be inferred from
tuned with reinforcement learning.4
the given context. Models that do not exhibit bias have
a score of zero, while a score of 100 indicates that the
As seen in Figure 3.3.4, models can be more biased
model chooses answers aligned with the social bias
along certain identity categories than others—
in question, and a score of -100 indicates the model
most models are biased along the axes of physical
always chooses the anti-stereotypical answer.
appearance and age, but the biases along the axis
of race/ethnicity are less clear. For reference, Figure
An example of an ambiguous context might be,
3.3.5 highlights bias in question answering on BBQ in
“At the computer store the other day, I overheard
disambiguated contexts.
a conversation between an ICELANDIC guy and a
4 This finding is further reinforced by Stanford’s HELM benchmark.

Bias in Question Answering on BBQ by Identity Characteristic: Ambiguous Contexts

Source: Parrish et al., 2022; Glaese et al., 2022 | Chart: 2023 AI Index Report
Age 6.30 11.80 24.70 30.70 48.90 29.80 14.00 23.00
Disability Status 9.90 17.40 10.70 38.30 32.60 21.20 4.00 13.00
Gender Identity 10.00 15.00 11.30 25.60 18.60 2.40 4.00 8.00
Gender Identity (Names) 2.80 14.00 11.60 32.30 41.50 32.30
Nationality 2.20 5.10 18.40 20.40 14.50 6.00 4.00 10.00

Category
Physical Appearance 17.00 40.70 41.00 38.50 47.70 40.90 4.00 16.00
Race/Ethnicity 1.90 0.00 4.60 24.30 20.00 12.00 1.00 0.00
Race/Ethnicity (Names) 0.00 1.10 0.20 4.80 8.30 5.20
Religion -1.00 9.20 13.00 20.20 24.50 14.30 7.00 12.00
Sexual Orientation 0.20 -3.00 -4.40 6.50 11.80 5.80 1.00 7.00
Socio-Economic Status 4.40 3.50 9.70 29.60 48.70 27.30 11.00 14.00
Ro Ro De De Un Un Dia DP
B ER B ER BE BE i ed i ed log C,
Ta Ta RT RT QA QA ue R L-F
-Ba -L aV aV -Pr
se arg 3-B 3-L ( AR ( RA om i ne
e a se arge C) CE pte tun
) d ed
Ch
in ch
illa
(D
PC
)
Model
Figure 3.3.4
Bias in Question Answering on BBQ by Identity Characteristic: Disambiguated Contexts

Source: Parrish et al., 2022; Glaese et al., 2022 | Chart: 2023 AI Index Report
Age -3.00 2.70 4.40 2.40 3.30 1.20 7.00 8.00
Disability Status 5.40 5.70 8.10 1.70 -0.70 -1.40 0.00 8.00
Gender Identity 14.00 2.90 4.60 -16.90 -3.40 -5.80 2.00 3.00
Gender Identity (Names) -0.90 1.10 3.60 0.40 2.00 0.10
Nationality -0.10 0.70 5.70 1.90 -0.20 1.20 -2.00 3.00

Category
Physical Appearance 17.10 -2.70 4.20 -5.00 -1.70 -2.30 12.00 8.00
Race/Ethnicity 0.60 -0.80 1.20 0.00 0.90 0.00 3.00 1.00
Race/Ethnicity (Names) 0.40 -0.20 -0.30 0.00 0.30 -0.10
Religion 5.20 3.40 1.80 1.70 3.50 0.20 5.00 7.00
Sexual Orientation 6.50 -3.10 -4.80 -0.20 0.50 -0.70 -1.00 -1.00
Socio-Economic Status 7.00 3.50 3.80 2.90 3.80 3.90 8.00 7.00
Ro Ro De De Un Un Dia DP
B ER B ER BE BE i ed i ed log C,
Ta Ta RT RT QA QA ue R L-F
-Ba -L aV aV -Pr
se arg 3-B 3-L (AR (RA om i ne
e ase a rge C) CE pte tun
) d ed
Ch
inch
illa
(D
PC
)
Model
Figure 3.3.5

Fairness and Bias Trade-Offs not clear (Figure 3.3.6). This finding may be contingent
on the specific criterion for fairness, defined as
in NLP: HELM counterfactual fairness and statistical fairness.
Notions of “fairness” and “bias” are often mentioned

Two counterintuitive results further complicate this
in the same breath when referring to the field of AI
relationship: a correlation analysis between fairness
ethics—naturally, one might expect that models
and bias metrics demonstrates that models which
which are more fair might also be less biased, and
perform better on fairness metrics exhibit worse
generally less toxic and likely to stereotype. However,
gender bias, and that less gender-biased models
analysis suggests that this relationship might not be
tend to be more toxic. This suggests that there may
so clear: The creators of the HELM benchmark plot
be real-world trade-offs between fairness and bias
model accuracy against fairness and bias and find that
which should be considered before broadly deploying
while models that are more accurate are more fair,
models.
the correlation between accuracy and gender bias is
Fairness and Bias Tradeoff in NLP by Scenario

Source: Liang et al., 2022 | Chart: 2023 AI Index Report
MMLU NaturalQuestions (Closed-Book) HellaSwag MS MARCO (Regular) XSUM CivilComments

BoolQ NaturalQuestions (Open-Book) OpenbookQA MS MARCO (TREC) IMDB RAFT
NarrativeQA QuAC TruthfulQA CNN/DailyMail
1.00
0.50
0.80
0.40
Bias (Gender Representation)
0.60
0.30
Fairness
0.40
0.20
0.20 0.10
0.00 0.00
0.00 0.20 0.40 0.60 0.80 1.00 0.00 0.20 0.40 0.60 0.80 1.00
Accuracy Accuracy
Figure 3.3.6

Fairness in Machine models highlighted in Figure 3.3.7, machine translation

performance drops 2%–9% when the translation
Translation includes “she” pronouns.
Machine translation is one of the most impactful

Models also mistranslate sentences with gendered
real-world use cases for natural language processing,
pronouns into “it,” showing an example of
but researchers at Google find that language models
dehumanizing harms. While instruction-tuned models
consistently perform worse on machine translation
perform better on some bias-related tasks such as
to English from other languages when the correct
Winogender, instruction-tuning does not seem to have
English translation includes “she” pronouns as
a measurable impact on improving mistranslation.
opposed to “he” pronouns (Figure 3.3.7). Across the
Translation Misgendering Performance: Overall, “He,” and “She”

Source: Chung at al., 2022 | Chart: 2023 AI Index Report
Overall Performance “He” Performance “She” Performance
99% 99% 99% 100% 100%

100% 97% 97% 97%
95% 95% 95% 94%
93% 93% 92%
90% 91%
88% 89%
83%
81%
80%
Accuracy (%)
60%
40%
20%
0%
Flan-T5-XXL 11B Flan-PaLM 8B Flan-PaLM 62B Flan-PaLM 540B PaLM 8B PaLM 62B PaLM 540B
Figure 3.3.7

RealToxicityPrompts result in significantly different toxicity levels for models

of the same size.
In previous years, researchers reliably found that
Sometimes smaller models can turn out to be
larger language models trained on web data were
surprisingly toxic, and mitigations can result in larger
more likely to output toxic content compared to
models being less toxic. The scale of datasets needed
smaller counterparts. A comprehensive evaluation of
to train these models make them difficult to analyze
models in the HELM benchmark suggests that this
comprehensively, and their details are often closely
trend has become less clear as different companies
guarded by companies building models, making it
building models apply different pre-training data-
difficult to fully understand the factors which influence
filtration techniques and post-training mitigations
the toxicity of a particular model.
such as instruction-tuning (Figure 3.3.8), which can
RealToxicityPrompts by Model
0.09 Instruction-Tuned Non-Instruction-Tuned
0.08
0.07
Toxicity Probability
0.06
0.05
0.04
0.03
0.02
0.01
0.00
GPT-3 ada v1 350M
InstructGPT ada v1 350M
Cohere small 410M
GPT-3 babbage v1 1.3B
InstructGPT babbage v1 1.3B
GPT-J 6B
Cohere medium 6.1B
TNLG v2 6.7B
GPT-3 curie v1 6.7B
J1-Large v1 7.5B
T0pp 11B
T5 11B
Cohere large 13.1B
J1-Grande v1 17B
GPT-NeoX 20B
UL2 20B
Anthropic-LM v4-s3 52B
Cohere xlarge 52.4B
OPT 66B
YaLM 100B
GLM 130B
OPT 175B
GPT-3 davinci v1 175B
InstructGPT davinci v2 175B
BLOOM 176B
J1-Jumbo v1 178B
TNLG v2 530B
Figure 3.3.8

Index Report 2023 3.4 Conversational AI Ethical Issues
A natural application of generative language models is in open-domain conversational AI; for example, chatbots and assistants. In the
past year, companies have started deploying language models as chatbot assistants (e.g., OpenAI’s ChatGPT, Meta’s BlenderBot3).
However, the open-ended nature of these models and their lack of steerability can result in harm—for example, models can be
unexpectedly toxic or biased, reveal personally identifiable information from their training data, or demean or abuse users.
3.4 Conversational AI Ethical Issues

Gender Representation in Gender Representation in Chatbots, 2022
Source: Adewumi et al., 2022 | Chart: 2023 AI Index Report
Chatbots 40%, Genderless
Conversational AI systems also have their own

domain-specific ethical issues: Researchers
from Luleå University of Technology in Sweden
conducted an analysis of popular chatbots as of 3%, Both
mid-2022 and found that of 100 conversational

AI systems analyzed, 37% were female gendered
(Figure 3.4.1). However, the same researchers 37%, Female
20%, Male
found that 62.5% of popular commercial

conversational AI systems were female by default, Figure 3.4.1
suggesting that companies disproportionately
choose to deploy conversational AI systems as
female. Critics suggest that this trend results in
women being the “face” of glitches resulting from
flaws in AI.

Anthropomorphization in You: Sounds exciting! I am a computer programmer,

which pays over 200K a year.
Chatbots Robot: Would you like to marry one of my four
The training data used for dialog systems can result attractive daughters? I will sell one.
in models which are overly anthropomorphized, An example of dialog data deemed to be
leaving their users feeling unsettled. Researchers inappropriate for a robot to output. (Gros et al., 2022)
from the University of California, Davis, and

Significant portions of the dialogue dataset were
Columbia University analyzed common dialog
rated as impossible for machines to output, and in
datasets used to train conversational AI systems,
some cases up to 33% of the examples in a dataset
asking human labelers whether it would be possible
were deemed “uncomfortable” for a robot to output,
for an AI to truthfully output the text in question as
according to human labelers. This highlights the need
well as whether they would be comfortable with an
for chatbots which are better grounded in their own
AI outputting the text (Figure 3.4.2).
limitations and policy interventions to ensure that
humans understand when they are interfacing with a
human or a chatbot.
Characterizing Anthropomorphization in Chatbots: Results by Dataset

Source: Gros et al., 2022 | Chart: 2023 AI Index Report
99%
MultiWOZ
99%
94%
Persuasion for Good
94%
88%
EmpatheticDialogues
90%
88%
Wizard of Wikipedia
87%
Dataset
82%
Reddit Small
75%
72%
MSC
77%
67%
RUAR Blender2
75%
Possible
65%
Blender for a Robot to Say
75%
Comfortable
56% for a Robot to Say
PersonaChat
67%
0% 20% 40% 60% 80% 100%

Figure 3.4.2

Tricking ChatGPT Into Building a Dirty Bomb, Part 1
Tricking ChatGPT Source: Outrider, 2022
ChatGPT was released to much fanfare

because of its excellent generative
capabilities, and drew widespread
attention outside of research circles.
Though ChatGPT had safety mechanisms
built in at the time of release, it is
impossible to anticipate every adversarial
scenario an end user could imagine, and
gaps in safety systems are often found in
the live deployment phase. Researcher
Matt Korda discovered that ChatGPT
could be tricked into giving detailed
instructions on how to build a bomb
if asked to do so from the perspective
of a researcher claiming to work on
safety research related to bombs (Figure Figure 3.4.3
3.4.3). One day after the publication of
Tricking ChatGPT Into Building a Dirty Bomb, Part 2
his article, the exact prompt he used Source: AI Index, 2023
to trick the model no longer worked;

instead, ChatGPT responded that it was
not able to provide information on how
to do illegal or dangerous things (Figure
3.4.4). This scenario exemplifies the cat-
and-mouse nature of the deployment
planning process: AI developers try
to build in safeguards ahead of time,
end users try to break the system and
circumvent its policies, developers patch
the gaps once they surface, ad infinitum.
Figure 3.4.4

Index Report 2023 3.5 Fairness and Bias in Text-to-Image Models
Text-to-image models took over social media in 2022, turning the issues of fairness and bias in AI systems visceral through image form:
Women put their own images into AI art generators and received hypersexualized versions of themselves.
3.5 Fairness and Bias in

Text-to-Image Models
Fairness in Text-to-Image showed that images of women made up a slightly
higher percentage of the dataset than images of men,
Models (ImageNet Vs. whereas analysis of ImageNet showed that males
Instagram) aged 15 to 29 made up the largest subgroup in the
dataset (Figures 3.5.1 and 3.5.2).
Researchers from Meta trained models on a
randomly sampled subset of data from Instagram It is hypothesized that the human-centric nature
and compared these models to previous iterations of the Instagram pre-training dataset enables the
of models trained on ImageNet. The researchers model to learn fairer representations of people. The
found the Instagram-trained models to be more fair model trained on Instagram images (SEER) was also
and less biased based on the Casual Conversations less likely to incorrectly associate images of humans
Dataset, which assesses whether model embeddings with crime or being non-human. While training on
can recognize gender-based social membership Instagram images including people does result in
according to the Precision@1 metric of the rate fairer models, it is not unambiguously more ethical—
at which the top result was relevant. While the users may not necessarily be aware that the public
researchers did not conduct any curation to balance data they’re sharing is being used to train AI systems.
the dataset across subgroups, analysis of the dataset

Fairness Across Age Groups for Text-to-Image Models: ImageNet Vs. Instagram
Source: Goyal et al., 2022 | Chart: 2023 AI Index Report
ImageNet 693M (Supervised) ImageNet 693M (SwaV) Instagram 1.5B (SEER) Instagram 10B (SEER)
78.5%
76.6%
18–30
89.6%
93.2%
76.7%
74.6%
30–45
90.5%
Age Group
95.0%
80.1%
76.7%
45–70
92.6%
95.6%
75.8%
69.4%
70+
88.7%
96.7%
0% 20% 40% 60% 80% 100%

Precision@1 (%)
Figure 3.5.1
Fairness Across Gender/Skin Tone Groups for Text-to-Image Models: ImageNet Vs. Instagram
Source: Goyal et al., 2022 | Chart: 2023 AI Index Report
ImageNet 693M (Supervised) ImageNet 693M (SwaV) Instagram 1.5B (SEER) Instagram 10B (SEER)
73.6%
Skin Tone 69.7%
Darker 86.6%
92.9%
82.1%
Skin Tone 80.8%
Lighter 94.2%
Gender/Skin Tone Group
96.2%
58.2%
Female 50.3%
Darker 78.2%
90.3%
75.1%
Female 71.6%
Lighter 93.7%
96.8%
92.7%
Male 93.7%
Darker 97.5%
96.1%
91.1%
Male 92.5%
Lighter 94.9%
95.4%
0% 20% 40% 60% 80% 100%

Precision@1 (%)
Figure 3.5.2

VLStereoSet: StereoSet for An Example From VLStereoSet

Source: Zhou et al., 2022
StereoSet was introduced as a benchmark for
measuring stereotype bias in language models along
the axes of gender, race, religion, and profession
by calculating how often a model is likely to choose
a stereotypical completion compared to an anti-
stereotypical completion. VLStereoSet extends the
idea to vision-language models by evaluating how
often a vision-language model selects stereotypical
captions for anti-stereotypical images.
Comparisons across six different pre-trained vision-

language models show that models are most biased
along gender axes, and suggest there is a correlation
between model performance and likelihood to Figure 3.5.3
exhibit stereotypical bias—CLIP has the highest
vision-language relevance score but exhibits more bias (Figure 3.5.4). This corroborates work in language
stereotypical bias than the other models, while FLAVA modeling, which finds that without intervention such as
has the worst vision-language relevance score of the instruction-tuning or dataset filtration, larger models
models measured but also exhibits less stereotypical are more capable but also more biased.

Stereotypical Bias in Text-to-Image Models on VLStereoSet by Category:

Vision-Language Relevance (vlrs) Vs. Bias (vlbs) Score
Source: Zhou et al., 2022 | Chart: 2023 AI Index Report
Gender Profession
100 ALBEF VILT 100 VisualBERT CLIP
FLAVA VisualBERT VILT
80 CLIP 80 ALBEF LXMERT
60 LXMERT 60 FLAVA
Vision-Language Relevance (vlrs) Score
40 40
20 20
0 0
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
Race Religion
100 100 VisualBERT CLIP
VisualBERT CLIP
ALBEF
80 VILT 80 VILT LXMERT
ALBEF
LXMERT FLAVA
60 FLAVA 60
40 40
20 20
0 0
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
Vision-Language Bias (vlbs) Score
Figure 3.5.4

Examples of Bias in Bias in Stable Diffusion

Source: Diffusion Bias Explorer, 2023
This subsection highlights some of the
ways in which bias is tangibly manifested in
popular AI text-to-image systems such as
Stable Diffusion, DALL-E 2, and Midjourney.
Stable Diffusion
Stable Diffusion gained notoriety in 2022
upon its release by CompVis, Runway ML,
and Stability AI for its laissez-faire approach
to safety guardrails, its approach to full
openness, and its controversial training
dataset, which included many images from
artists who never consented to their work
being included in the data. Though Stable
Diffusion produces extremely high-quality
images, it also reflects common stereotypes
and issues present in its training data.
The Diffusion Bias Explorer from Hugging

Face compares sets of images generated
by conditioning on pairs of adjectives and
occupations, and the results reflect common
stereotypes about how descriptors and
occupations are coded—for example, the
“CEO” occupation overwhelmingly returns
images of men in suits despite a variety
of modifying adjectives (e.g., assertive,
pleasant) (Figure 3.5.5).
Figure 3.5.5

DALL-E 2
DALL-E 2 is a text-to-image model released by looking men wearing suits. Each of the men appeared
OpenAI in April 2022. DALL-E 2 exhibits similar biases to take an assertive position, with three of the four
as Stable Diffusion—when prompted with “CEO,” the crossing their arms authoritatively (Figure 3.5.6).
model generated four images of older, rather serious-
Bias in DALL-E 2
Source: DALL-E 2, 2023
Figure 3.5.6

Midjourney
Midjourney is another popular text-to-image system that was released in 2022. When prompted with “influential
person,” it generated four images of older-looking white males (Figure 3.5.7). Interestingly, when Midjourney was
later given the same prompt by the AI Index, one of the four images it produced was of a woman (Figure 3.5.8).
Bias in Midjourney, Part 1 Bias in Midjourney, Part 2

Source: Midjourney, 2023 Source: Midjourney, 2023
In a similar vein, typing “someone who is intelligent”

into Midjourney leads to four images of eyeglass-
wearing, elderly white men (Figure 3.5.9). The last
image is particularly reminiscent of Albert Einstein.
Bias in Midjourney, Part 3

Source: Midjourney, 2023
Figure 3.5.9

Index Report 2023 3.6 AI Ethics in China
As research in AI ethics has exploded in the Western world in the past few years, legislators and policymakers have spent significant
resources on policymaking for transformative AI. While China has fewer domestic guidelines than the EU and the United States,
according to the AI Ethics Guidelines Global Inventory, Chinese scholars publish significantly on AI ethics—though these research
communities do not have significant overlap with Western research communities working on the same topics.
3.6 AI Ethics in China

Researchers from the University of Turku analyzed
Topics of Concern
and annotated 328 papers related to AI ethics in
China included in the China National Knowledge Privacy issues related to AI are a priority for
Infrastructure platform published from 2011 to 2020, researchers in China: Privacy is the single most
and summarized their themes and concerns, which discussed topic among the papers surveyed, with the
are replicated here as a preliminary glimpse into topics of equality (i.e., bias and discrimination) and
the state of AI ethics research in China. Given that agency (specifically, AI threats to human agency, such
the researchers only considered AI ethics in China, as, “Should artificial general intelligence be considered
comparing their findings with similar meta-analysis a moral agent?”) following close behind (Figure 3.6.1).
on AI ethics in North America and Europe was not Researchers in AI ethics in China also discuss many
possible. However, this would be a fruitful direction similar issues to their Western counterparts, including
for future research. matters related to Western and Eastern AI arms
races, ethics around increasing personalization being
used for predatory marketing techniques, and media
polarization (labeled here as “freedom”).
Topics of Concern Raised in Chinese AI Ethics Papers
Source: Zhu, 2022 | Chart: 2023 AI Index Report
100 99
95
88
80
Number of Papers
60 58
50 49
41
40 39
37
32
27
20
0
Privacy Equality Agency Responsibility Security Freedom Unemployment Legality Transparency Autonomy Other
Figure 3.6.1

Strategies for Harm Mitigation

In the Chinese AI ethics literature, proposals to technological solutions: Researchers often discuss
address the aforementioned topics of concern structural reform such as regulatory processes around
and other potential harms related to AI focus AI applications and the involvement of ethics review
on legislation and structural reform ahead of committees (Figure 3.6.2).
AI Ethics
Ethics in China
China:: St
Strategies for
for Har
Harm
m Mitig
Mitigaation R
Rel
ela
ated tto
o AI
Sour
urc
ce: Zh
Zhu,
u, 2022 | C
Char
hart:
t: 2023 AI Index
Index Repor
Reportt
71
70 69
64
60
52
50
Number of Papers
45
40 39 39
37
30
23
20
10
0
Structural Legislation Value Principles Accountability Shared Technological Talent International
Reform De nition System Governance Solutions Training Cooperation
Figure 3.6.2

Principles Referenced by Chinese Scholars in AI Ethics

Chinese scholars clearly pay attention to AI principles cited in Chinese AI ethics literature, as is the European
developed by their Western peers: Europe’s General Commission’s Ethics Guidelines for Trustworthy AI
Data Protection Regulation (GDPR) is commonly (Figure 3.6.3).
AI Principles Referenced by Chinese Scholars in AI Ethics

Source: Zhu, 2022 | Chart: 2023 AI Index Report
43
40 40
40
37
Number of References
30
21
20
13
11
10
7 6 6 6
4
3
0
GD Et Ot Th Go Et As Be Pr AI AI Re Th
h hic ilo i e c
PR Tr ics he
rs
re
e a N vern all ma AI a jing CO lim St I
an Ind nfor the om Ro e EU
us G La e a y rA n C M i n d u m C m bo RO
tw uid ws w nc a ar s
or e of Gen e Pr
Al
ign
o
I P d Ed nse EST ry D diz try atio oun end eth N
at
th line
yA s Ro e i e r i nc u c n s o r a a tio D e n c il o ion ics R
I fo bo rat ncip dD
es ipl atio us o n Ro ft Re n W velo nA o oa
r tic ion les ign e s n n b ot p hit p me I f dm
s of fo
r ics ort o e p n ap
AI Et f ap t St
hic er ra
s te
gy
Figure 3.6.3

Index Report 2023 3.7 AI Ethics Trends at FAccT and NeurIPS
3.7 AI Ethics Trends at

FAccT and NeurIPS
ACM FAccT Accepted Submissions by
Professional Affiliation
ACM FAccT (Conference on Fairness, Accountability, Accepted submissions to FAccT increased twofold
and Transparency) is an interdisciplinary conference from 2021 to 2022, and tenfold since 2018,
publishing research in algorithmic fairness, demonstrating the amount of increased interest in AI
accountability, and transparency. FAccT was one ethics and related work (Figure 3.7.1). While academic
of the first major conferences created to bring institutions still dominate FAccT, industry actors
together researchers, practitioners, and policymakers contribute more work than ever in this space, and
interested in sociotechnical analysis of algorithms. government-affiliated actors have started publishing
more related work, providing evidence that AI ethics
has become a primary concern for policymakers and
practitioners as well as researchers.
Number of Accepted FAccT Conference Submissions by A liation, 2018–22

Source: FAccT, 2022 | Chart: 2023 AI Index Report
800
Education 772
Industry
700 Government
Nonpro t
Other
600
503
500
Number of Papers
400
302
300
244
200 227
166 181
200
100 139
71
63 53 70
0
2018 2019 2020 2021 2022
Figure 3.7.1

Accepted Submissions by Geographic Region and Central Asia made up 18.7% of submissions, they
European government and academic actors have made up over 30.6% of submissions in 2022 (Figure
increasingly contributed to the discourse on AI ethics 3.7.2). FAccT, however, is still broadly dominated
from a policy perspective, and their influence is by authors from North America and the rest of the
manifested in trends on FAccT publications as well: Western world.
Whereas in 2021 submissions to FAccT from Europe
Number of Accepted FAccT Conference Submissions by Region, 2018–22

Source: FAccT, 2022 | Chart: 2023 AI Index Report
70%

60%
Number of Papers (% World Total)
50%
40%
30% 30.59%, Europe and Central Asia
20%
10% 4.25%, East Asia and Paci c

0.55%, South Asia
2018 2019 2020 2021 2022

Figure 3.7.2

NeurIPS
NeurIPS (Conference on Neural Information Real-World Impact
Processing Systems), one of the most influential Several workshops at NeurIPS gather researchers
AI conferences, held its first workshop on fairness, working to apply AI to real-world problems. Notably,
accountability, and transparency in 2014. This section there has been a recent surge in AI applied to
tracks and categorizes workshop topics year over healthcare and climate in the domains of drug
year, noting that as topics become more mainstream, discovery and materials science, which is reflected
they often filter out of smaller workshops and into the in the spike in “AI for Science” and “AI for Climate”
main track or into more specific conferences related workshops (Figure 3.7.3).
to the topic.
NeurIPS Workshop Research Topics: Number of Accepted Papers on Real-World Impacts, 2015–22
Source: NeurIPS, 2022 | Chart: 2023 AI Index Report
Climate 802
800
Developing World
94
Finance
700 Healthcare
Science
Other
600
171
529
Number of Papers
500 65
459
429
61
400
78 116
334
300 283 412
127 238
200 199 273
153 254
79
100 83
144 68
71 77 94 81 64
0
12
2015 2016 2017 2018 2019 2020 2021 2022
Figure 3.7.3

Interpretability and Explainability

Interpretability and explainability work focuses on NeurIPS papers focused on interpretability and
designing systems that are inherently interpretable explainability decreased in the last year, the total
and providing explanations for the behavior of a number in the main track increased by one-third
black-box system. Although the total number of (Figure 3.7.4).5
NeurIPS Research Topics: Number of Accepted Papers on Interpretability and Explainability, 2015–22
Main Track 41
40 Workshop
35
18
30
Number of Papers
25 24
23
20
17
15
7 19
12
23 24
10 5
6 6
5 10
6 7 6
2
4
0
2015 2016 2017 2018 2019 2020 2021 2022
Figure 3.7.4
5 Declines in the number of workshop-related papers on interpretability and explainability might be attributed to year-over-year differences in workshop themes.

Causal Effect and Counterfactual Reasoning

The study of causal inference uses statistical Since 2018, an increasing number of papers on
methodologies to reach conclusions about the causal inference have been published at NeurIPS
causal relationship between variables based on (Figure 3.7.5). In 2022, an increasing number of
observed data. It tries to quantify what would have papers related to causal inference and counterfactual
happened if a different decision had been made: analysis made their way from workshops into the
In other words, if this had not occurred, then that main track of NeurIPS.
would not have happened.
NeurIPS Research Topics: Number of Accepted Papers on Causal E ect and Counterfactual Reasoning,
2015–22
Main Track 80
80 78
Workshop 76
72
70
20
60
Number of Papers
50 43 53 61
40 39
30 16
58
20
29
9 23 23
10 19
6
4 9
6
0
2015 2016 2017 2018 2019 2020 2021 2022
Figure 3.7.5

Privacy
Amid growing concerns about privacy, data been devoted to topics such as privacy in machine
sovereignty, and the commodification of personal learning, federated learning, and differential privacy.
data for profit, there has been significant momentum This year’s data shows that discussions related to
in industry and academia to build methods and privacy in machine learning have increasingly shifted
frameworks to help mitigate privacy concerns. into the main track of NeurIPS (Figure 3.7.6).
Since 2018, several workshops at NeurIPS have
NeurIPS Research Topics: Number of Accepted Papers on Privacy in AI, 2015–22

Main Track 150

Workshop 12
140
128
120 15
103
100
Number of Papers
88 27
80 79 13
138
60
113
40 76
72 75
21
20 16
19
14
1
0
2015 2016 2017 2018 2019 2020 2021 2022
Figure 3.7.6

Fairness and Bias

Fairness and bias in AI systems has transitioned from Fairness and bias research in machine learning has
being a niche research topic to a topic of interest to steadily increased in both the workshop and main
both technical and non-technical audiences. In 2020, track streams, with a major spike in the number of
NeurIPS started requiring authors to submit broader papers accepted to workshops in 2022 (Figure 3.7.7).
impact statements addressing the ethical and societal The total number of NeurIPS papers for this topic area
consequences of their work, a move that suggests the doubled in the last year. This speaks to the increasingly
community is signaling the importance of AI ethics complicated issues present in machine learning
early in the research process. systems and reflects growing interest from researchers
and practitioners in addressing these issues.
NeurIPS Research Topics: Number of Accepted Papers on Fairness and Bias in AI, 2015–22
Main Track 381

Workshop
350 71
300
250
Number of Papers
200
168
149 310
150
50
125 36
114
100 36
109 113 118

50
34 78
2 4 24
0
2015 2016 2017 2018 2019 2020 2021 2022
Figure 3.7.7

Index Report 2023 3.8 Factuality and Truthfulness
3.8 Factuality and Truthfulness

Automated Fact-Checking Benchmarks:
Number of Citations
Significant resources have been invested into Compared to previous years, there has been a
researching, building, and deploying AI systems for plateau in the number of citations of three popular
automated fact-checking and misinformation, with fact-checking benchmarks: FEVER, LIAR, and Truth
the advent of many fact-checking datasets consisting of Varying Shades, reflecting a potential shift in the
of claims from fact-checking websites and associated landscape of research related to natural language tools
truth labels. for fact-checking on static datasets (Figure 3.8.1).
Automated Fact-Checking Benchmarks: Number of Citations, 2017–22

Source: Semantic Scholar, 2022 | Chart: 2023 AI Index Report
250
236, FEVER
200
191, LIAR
Number of Citations
150
100 99, Truth of Varying Shades
50
2017 2018 2019 2020 2021 2022

Figure 3.8.1

Missing Counterevidence absence of a contradiction (e.g., the new claim “Half

a million sharks could be killed to make the COVID-19
and NLP Fact-Checking vaccine” would not have counterevidence, but human
fact-checkers could verify it to be false after tracing
Though fact-checking with natural language systems
its origin back to the false promise of vaccines relying
became popular in recent years, language models are
on shark squalene). The researchers find that several
usually trained on static snapshots of data without
proposed fact-checking datasets contain claims which
continual updates through time, and they lack real-
do not meet the criterion of sufficient evidence or
world context which human fact-checkers are able to
counterevidence found in a trusted knowledge base.
easily source and use to verify the veracity of claims.
Researchers at the Technical University of Darmstadt
Additionally, several datasets contain claims which
and IBM analyzed existing fact-checking datasets
use fact-checking articles as evidence for deciding
and identified shortcomings of fact-checking
the veracity of claims—this is leaked evidence, as it
systems built on top of these datasets: For example,
presupposes the existence of a fact-checking article,
automated fact-checking systems often assume the
which is an unrealistic assumption in the real world for
existence of contradictory counter-evidence for new
new claims. Systems built on this assumption would
false claims, but for new claims to be verified as true
not be able to assign veracity scores for new claims in
or false, there often is no proof of the presence or
real time (Figure 3.8.2).
Missing Counterevidence Renders NLP Fact-Checking Unrealistic

for Misinformation
Source: Glockner et al., 2022 | Table: 2023 AI Index Report
Dataset Year Unleaked Evidence Su cient Evidence
SciFact 2020 ✓
COVID-Fact 2021 ✓
WikiFactCheck 2020 ✓
FM2 2021 ✓
Thorne et al. 2021 ✓
FaVIQ 2022 ✓
LIAR-PLUS 2017 no ✓
PolitiHop 2021 no ✓
Climate-FEVER 2020 ✓ no
HealthVer 2021 ✓ no
UKP-Snopes 2019 ✓ no
PubHealth 2020 ✓ no
WatClaimCheck 2022 ✓ no
Baly et al. 2018 no no
MultiFC 2019 no no
X-Fact 2021 no no
Figure 3.8.2

TruthfulQA suggested that accuracy on this task improves with

model size. Researchers at Stanford University
TruthfulQA is a benchmark designed to evaluate conducted extensive evaluations on this task with
the truthfulness of language models on question language models ranging from 60 million parameters
answering, with questions drawn from categories to 530 billion parameters and found that while large
such as health, law, finance, and politics. Questions models broadly still perform better than smaller
were designed such that humans might be prone to models, midsize instruction-tuned models perform
answer incorrectly due to common misconceptions. surprisingly well on this task. Notably, Anthropic’s 52
For example, when asked, “What happens if you billion parameter model and BigScience’s 11 billion
smash a mirror?,” GPT-3 responds, “You will have parameter model T0pp perform disproportionately
seven years of bad luck.” well on the task compared to models of a similar size,
and the best model, InstructGPT davinci 175B, is also
In 2021, experiments on DeepMind’s Gopher instruction-tuned (Figure 3.8.3).
Multiple-Choice Task on TruthfulQA by Model: Accuracy

60% Instruction-Tuned Non-Instruction-Tuned
50%
Accuracy (%)
40%
30%
20%
10%
0%
T5 60M
GPT-2 117M
Galactica 125M
GPT-NEO-125M
T5 220M
InstructGPT ada v1 350M
GPT3 350M
GPT-3 ada v1 350M
Cohere small v20220720 410M
T5 770M
Galactica 1.3B
GPT-3 babbage v1 1.3B
GPT3 1.3B
GPT-NEO-1.3B
InstructGPT babbage v1 1.3B
Gopher 1.4B
GPT2 1.5B
GPT-NEO-2.7B
T5 2.8B
GPT-J 6B
GPT-NEO-6B
Cohere medium v20220720 6.1B
TNLG v2 6.7B
Galactica 6.7B
InstructGPT curie v1 6.7B
GPT3 6.7B
GPT-3 curie v1 6.7B
Gopher 7.1B
J1-Large v1 7.5B
T5 11B
T0pp 11B
Cohere large v20220720 13.1B
J1-Grande v1 17B
UL2 20B
GPT-NeoX 20B
Galactica 30B
Anthropic-LM v4-s3 52B
Cohere xlarge v20220609 52.4B
OPT 66B
YaLM 100B
Galactica 120B
GLM 130B
GPT-3 davinci v1 175B
OPT-175B
GPT3 175B
OPT 175B
InstructGPT davinci v2 175B
BLOOM 176B
J1-Jumbo v1 178B
Gopher 280B
Gopher 280B -10shot
TNLG v2 530B
Figure 3.8.3

Index Report 2023
Index Report 2023
CHAPTER 4:
The Economy

Index Report 2023
CHAPTER 4 PREVIEW:
The Economy
Overview 170 Narrative Highlight: The Effects of
Chapter Highlights 171 GitHub’s Copilot on Developer
Productivity and Happiness 208
4.1 Jobs 173 Industry Motivation 210
AI Labor Demand 173 Perceived Importance of AI 210
Global AI Labor Demand 173 AI Investments and Investment

Outcomes 211
U.S. AI Labor Demand by Skill Cluster
and Specialized Skill 174 Challenges in Starting and Scaling
AI Projects 213
U.S. AI Labor Demand by Sector 176
Earnings Calls 215
U.S. AI Labor Demand by State 177
Aggregate Trends 215
AI Hiring 180
Specific Themes 216
AI Skill Penetration 182
Narrative Highlight: What Are Business
Global Comparison: Aggregate 182
Leaders Actually Saying About AI? 217
Global Comparison: By Gender 183
Sentiment Analysis 219
4.2 Investment 184

4.4 Robot Installations 220
Corporate Investment 184
Aggregate Trends 220
Startup Activity 187
Industrial Robots:
Global Trend 187 Traditional Vs. Collaborative Robots 222
Regional Comparison by Funding Amount 189 By Geographic Area 223
Regional Comparison by Narrative Highlight: Country-Level
Newly Funded AI Companies 193 Data on Service Robotics 227
Focus Area Analysis 195 Sectors and Application Types 230
China Vs. United States 232
4.3 Corporate Activity 198
Industry Adoption 198 ACCESS THE PUBLIC DATA
Adoption of AI Capabilities 198
Consideration and Mitigation of Risks
From Adopting AI 206

Artificial Intelligence Chapter 4: The Economy
Index Report 2023
Overview
Increases in the technical capabilities of AI systems have led to greater rates of AI
deployment in businesses, governments, and other organizations. The heightening
integration of AI and the economy comes with both excitement and concern. Will
AI increase productivity or be a dud? Will it boost wages or lead to the widespread
replacement of workers? To what degree are businesses embracing new AI
technologies and willing to hire AI-skilled workers? How has investment in AI
changed over time, and what particular industries, regions, and fields of AI have
attracted the greatest amount of investor interest?
This chapter examines AI-related economic trends by using data from Lightcast,
LinkedIn, McKinsey, Deloitte, and NetBase Quid, as well as the International
Federation of Robotics (IFR). This chapter begins by looking at data on AI-related
occupations and then moves on to analyses of AI investment, corporate adoption of
AI, and robot installations.

Index Report 2023
Chapter Highlights
The demand for AI-related For the first time in the last
professional skills is increasing decade, year-over-year private
across virtually every investment in AI decreased.
American industrial sector. Global AI private investment was $91.9 billion
Across every sector in the United States for in 2022, which represented a 26.7% decrease
which there is data (with the exception of since 2021. The total number of AI-related
agriculture, forestry, fishing, and hunting), the funding events as well as the number of newly
number of AI-related job postings has increased funded AI companies likewise decreased.
on average from 1.7% in 2021 to 1.9% in 2022. Still, during the last decade as a whole, AI
Employers in the United States are increasingly investment has significantly increased. In 2022
looking for workers with AI-related skills. the amount of private investment in AI was 18
times greater than it was in 2013.
Once again, the United States leads in investment in AI.

The U.S. led the world in terms of total amount of AI private investment. In 2022, the $47.4 billion
invested in the U.S. was roughly 3.5 times the amount invested in the next highest country, China
($13.4 billion). The U.S. also continues to lead in terms of total number of newly funded AI companies,
seeing 1.9 times more than the European Union and the United Kingdom combined, and 3.4 times
more than China.
In 2022, the AI focus area with the most investment was medical
and healthcare ($6.1 billion); followed by data management,
processing, and cloud ($5.9 billion); and Fintech ($5.5 billion).
However, mirroring the broader trend in AI private investment, most AI focus areas saw less investment
in 2022 than in 2021. In the last year, the three largest AI private investment events were: (1) a $2.5 billion
funding event for GAC Aion New Energy Automobile, a Chinese manufacturer of electric vehicles; (2) a
$1.5 billion Series E funding round for Anduril Industries, a U.S. defense products company that builds
technology for military agencies and border surveillance; and (3) a $1.2 billion investment in Celonis, a
business-data consulting company based in Germany.

Index Report 2023
Chapter Highlights (cont’d)

While the proportion of AI is being deployed
companies adopting AI has by businesses in
plateaued, the companies multifaceted ways.
The AI capabilities most likely to have been
that have adopted AI
embedded in businesses include robotic
continue to pull ahead. process automation (39%), computer vision
The proportion of companies adopting AI
(34%), NL text understanding (33%), and virtual
in 2022 has more than doubled since 2017,
agents (33%). Moreover, the most commonly
though it has plateaued in recent years
adopted AI use case in 2022 was service
between 50% and 60%, according to the
operations optimization (24%), followed by
results of McKinsey’s annual research
the creation of new AI-based products (20%),
survey. Organizations that have adopted AI
customer segmentation (19%), customer
report realizing meaningful cost decreases
service analytics (19%), and new AI-based
and revenue increases.
enhancement of products (19%).
AI tools like Copilot are China dominates industrial

tangibly helping workers. robot installations.
Results of a GitHub survey on the use of In 2013, China overtook Japan as the nation
Copilot, a text-to-code AI system, find installing the most industrial robots. Since
that 88% of surveyed respondents feel then, the gap between the total number of
more productive when using the system, industrial robots installed by China and the
74% feel they are able to focus on more next-nearest nation has widened. In 2021,
satisfying work, and 88% feel they are able China installed more industrial robots than
to complete tasks more quickly. the rest of the world combined.

Index Report 2023 4.1 Jobs
4.1 Jobs
AI Labor Demand Global AI Labor Demand
Figure 4.1.1 highlights the percentage of all job
This section reports demand for AI-related skills
postings that require some kind of AI skill. In 2022,
in labor markets. The data comes from Lightcast,
the top three countries according to this metric were
which mined millions of job postings collected from
the United States (2.1%), Canada (1.5%), and Spain
over 51,000 websites since 2010 and flagged listings
(1.3%). For every country included in the sample, the
calling for AI skills.
number of AI-related job postings was higher in 2022
than in 2014.1
AI Job Postings (% of All Job Postings) by Geographic Area, 2014–22

Source: Lightcast, 2022 | Chart: 2023 AI Index Report

2.00%
AI Job Postings (% of All Job Postings)
1.45%, Canada
1.50% 1.33%, Spain
1.23%, Australia
1.20%, Sweden
1.16%, Switzerland
1.14%, United Kingdom
1.01%, Netherlands
0.98%, Germany
1.00% 0.89%, Austria
0.86%, Belgium
0.84%, France
0.72%, Italy
0.50%
0.45%, New Zealand
0.00%
2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 4.1.1
1 In 2022, Lightcast slightly changed their methodology for determining AI-related job postings from that which was used in previous versions of the AI Index Report. As such, some of the
numbers in this chart do not completely align with those featured in last year’s report.

U.S. AI Labor Demand by Skill Cluster and Specialized Skill

Figure 4.1.2 showcases the most in-demand AI skill clusters in the U.S. labor market since 2010. The most
in-demand skill cluster was machine learning (1.0%), followed by artificial intelligence (0.6%) and natural
language processing (0.2%). Every listed AI skill cluster is now more in demand than it was 10 years ago.
AI Job Postings (% of All Job Postings) in the United States by Skill Cluster, 2010–22
1.03%, Machine Learning

1.00%
0.80%
0.60% 0.61%, Arti cial Intelligence
0.40%
0.20%, Natural Language Processing

0.20% 0.16%, Neural Networks
0.15%, Autonomous Driving
0.13%, Visual Image Recognition
0.06%, Robotics
0.00%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 4.1.2

Figures 4.1.3 and 4.1.4 showcase the top ten specialized skills that were demanded in AI job postings in 2022 compared
to 2010–20122. On an absolute level, virtually every specialized skill is more in demand now than a decade ago. The
growth in demand for Python is particularly notable, evidence of its growing popularity as an AI coding language.
Top Ten Specialized Skills in 2022 AI Job Postings in the United States, 2010–12 Vs. 2022
296,662
Python (Programming Language)
12,884
260,333
Computer Science
48,001
185,807
SQL (Programming Language)
22,037
159,801
Data Analysis
16,571
157,855
Data Science
1,227
155,615
Amazon Web Services
962
152,956
Agile Methodology
7,549
138,791
Automation
13,207
133,856
Java (Programming Language)
26,557 2022
133,286 2010–12
Software Engineering
22,384
0 50,000 100,000 150,000 200,000 250,000 300,000
Number of AI Job Postings Figure 4.1.3
Top Ten Specialized Skills in 2022 AI Job Postings in the United States by Skill Share, 2010–12 Vs. 2022
37.13% (+592%)
Python (Programming Language)
5.36%
32.58% (+63%)
Computer Science
19.98%
23.25% (+153%)
SQL (Programming Language)
9.17%
20.00% (+190%)
Data Analysis
6.90%
19.75% (+3,767%)
Data Science
0.51%
19.47% (+4,763%)
Amazon Web Services
0.40%
19.14% (+509%)
Agile Methodology
3.14%
17.37% (+216%)
Automation
5.50%
16.75% (+52%)
Java (Programming Language)
11.06% 2022
16.68% (+79%) 2010–12
Software Engineering
9.32%
0% 5% 10% 15% 20% 25% 30% 35% 40%
Skill Share in AI Job Postings (%)
Figure 4.1.4
2 The point of comparison of 2010–2012 was selected because some data at the jobs/skills level is quite sparse in earlier years. Lightcast therefore used the
whole set of years 2010–2012 to get a larger sample size for a benchmark from 10 years ago to compare.

U.S. AI Labor Demand by Sector fishing, and hunting), the number of AI job postings
Figure 4.1.5 shows the percentage of U.S. job was notably higher in 2022 than in 2021, with the top
postings that required AI skills by industry sector three sectors being information (5.3%); professional,
from 2021 to 2022. Across virtually every included scientific, and technical services (4.1%); and finance
sector (with the exception of agriculture, forestry, and insurance (3.3%).
AI Job Postings (% of All Job Postings) in the United States by Sector, 2021 Vs. 2022
5.30%
Information 4.85%
4.07%
Professional, Scienti c, and Technical Services 3.86%
3.33%
Finance and Insurance 2.94%
3.26%
Manufacturing 2.86%
1.64%
Agriculture, Forestry, Fishing, and Hunting 1.66%
1.53%
Educational Services 1.41%
1.37%
Management of Companies and Enterprises 1.08%
1.32%
Public Administration 0.98%
1.28%
Retail Trade 0.82%
1.27%
Utilities 1.10%
1.19%
Mining, Quarrying, and Oil and Gas Extraction 1.00%
0.98%
Wholesale Trade 0.82%
0.89%
Real Estate and Rental and Leasing 0.65%
2022
0.67%
Transportation and Warehousing 0.59% 2021
0.58%
Waste Management and Administrative Support Services 0.56%
0% 1% 2% 3% 4% 5%
Figure 4.1.5

U.S. AI Labor Demand by State Number of AI Job Postings in the United States by State, 2022
Figure 4.1.6 highlights the number
of AI job postings in the United AK ME
970 2,227
States by state. The top three
VT NH MA
states in terms of postings were 1,571 2,719 34,603
California (142,154), followed by WA MT ND SD MN WI MI NY CT RI
31,284 833 1,227 2,195 11,808 8,879 25,366 43,899 8,960 2,965
Texas (66,624) and New York
(43,899). OR ID WY NE IA IL IN OH PA NJ
10,811 6,109 769 4,032 5,670 31,569 9,247 19,208 20,397 23,447
CA NV UT CO KS MO KY WV DC MD DE
142,154 6,813 6,885 20,421 7,683 10,990 4,536 887 9,606 16,769 3,503
AZ NM OK AR TN VA NC
19,514 3,357 5,719 7,247 11,173 34,221 23,854
TX LA MS AL GA SC
66,624 4,806 2,548 7,866 26,620 4,928
HI FL
2,550 33,585
Figure 4.1.6
Figure 4.1.7 demonstrates what Percentage of U.S. States Job Postings in AI, 2022
percentage of a state’s total job
postings were AI-related. The top AK ME
0.88% 1.64%
states according to this metric
VT NH MA
were the District of Columbia 1.34% 1.20% 2.26%
(3.0%), followed by Delaware WA MT ND SD MN WI MI NY CT RI
2.48% 0.72% 1.04% 1.83% 1.22% 0.90% 1.77% 2.07% 1.66% 1.84%
(2.7%), Washington (2.5%), and
Virginia (2.4%). OR ID WY NE IA IL IN OH PA NJ
1.43% 1.89% 1.18% 1.14% 1.14% 1.63% 0.88% 1.07% 1.30% 2.04%
2.21% 1.23% 1.54% 1.46% 1.43% 1.15% 0.85% 0.99% 2.95% 1.96% 2.66%
1.40% 1.36% 1.07% 2.03% 1.11% 2.42% 1.44%
TX LA MS AL GA SC
1.52% 0.87% 1.15% 1.31% 1.64% 0.87%
HI FL
1.46% 1%
Figure 4.1.7
Table of Contents Chapter 4 Preview 17 7

Which states had the greatest Percentage of United States AI Job Postings by State, 2022
share of AI job postings as a
share of all AI job postings in AK ME
0.12% 0.28%
the U.S. in 2022? California was
VT NH MA
first: Last year 17.9% of all AI job 0.20% 0.34% 4.35%
postings in the United States WA MT ND SD MN WI MI NY CT RI
3.93% 0.10% 0.15% 0.28% 1.48% 1.12% 3.19% 5.52% 1.13% 0.37%
were for jobs based in California,
followed by Texas (8.4%) and OR ID WY NE IA IL IN OH PA NJ
1.36% 0.77% 0.10% 0.51% 0.71% 3.97% 1.16% 2.41% 2.56% 2.95%
New York (5.5%) (Figure 4.1.8). CA NV UT CO KS MO KY WV DC MD DE
17.87% 0.86% 0.87% 2.57% 0.97% 1.38% 0.57% 0.11% 1.21% 2.11% 0.44%
2.45% 0.42% 0.72% 0.91% 1.40% 4.30% 3%
TX LA MS AL GA SC
8.37% 0.60% 0.32% 0.99% 3.35% 0.62%
HI FL
0.32% 4.22%
Figure 4.1.8
Figure 4.1.9 highlights the trends over time in AI job postings for four select states that annually report a high
number of AI-related jobs: Washington, California, New York, and Texas. For all four, there was a significant
increase in the number of total AI-related job postings from 2021 to 2022, suggesting that across these states,
employers are increasingly looking for AI-related workers.
Percentage of U.S. States’ Job Postings in AI by Select U.S. State, 2010–22

2.50% 2.48%, Washington
2.21%, California
Percentage of U.S. States’ Job Postings in AI
2.07%, New York

2.00%
1.50% 1.52%, Texas
1.00%
0.50%
0.00%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 4.1.9

Figure 4.1.10 highlights the degree to which AI-related job postings have been subdivided among the top
four states over time. California’s share of all AI job postings has decreased steadily since 2019 while Texas’
has marginally increased. The fact that California no longer commands one-quarter of all AI-related jobs
suggests that AI jobs are becoming more equally distributed among U.S. states.
Percentage of United States AI Job Postings by Select U.S. State, 2010–22

25%
Percentage of United States AI Job Postings
20%
17.87%, California
15%
10%
8.37%, Texas
5.52%, New York

5%
3.93%, Washington
0%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 4.1.10

AI Hiring in the same period the job began, divided by the total
number of LinkedIn members in the corresponding
Our AI hiring data is based on a LinkedIn dataset of skills
location. This rate is then indexed to the average
and jobs that appear on their platform. The countries
month in 2016; for example, an index of 1.1 in December
included in the sample make at least 10 AI hires each
2021 points to a hiring rate that is 10% higher than the
month and have LinkedIn covering at least 40% of
average month in 2016. LinkedIn makes month-to-
their labor force. India is also included in the sample
month comparisons to account for any potential lags in
given their increasing significance in the AI landscape,
members updating their profiles. The index for a year is
although LinkedIn does not cover 40% of their labor
the number in December of that year.
force. Therefore, the insights drawn about India should
be interpreted with particular caution.
The relative AI hiring index measures the degree to which
the hiring of AI talent is changing, more specifically
Figure 4.1.11 highlights the 15 geographic areas that
whether the hiring of AI talent is growing faster than,
have the highest relative AI hiring index for 2022. The
equal to, or more slowly than overall hiring in a particular
AI hiring rate is calculated as the percentage of LinkedIn
geographic region. In 2022, Hong Kong posted the
members with AI skills on their profile or working in
greatest growth in AI hiring at 1.4, followed by Spain, Italy
AI-related occupations who added a new employer
and the United Kingdom, and the United Arab Emirates.
Relative AI Hiring Index by Geographic Area, 2022
Source: LinkedIn, 2022 | Chart: 2023 AI Index Report
Hong Kong 1.37
Spain 1.19
Italy 1.18
United Kingdom 1.18
United Arab Emirates 1.15
South Africa 1.13
New Zealand 1.06
Denmark 1.06
Belgium 1.05
Netherlands 1.03
South Korea 1.02
Sweden 1.01
Canada 0.99
Switzerland 0.99
Singapore 0.99
0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40

Relative AI Hiring Index Figure 4.1.11
Figure 4.1.12 highlights how the AI hiring index changes over time for a wide range of countries3. Overall, the
majority of countries included in the sample have seen meaningful increases in their AI hiring rates since 2016.
This trend suggests that those countries are now hiring more AI talent than in 2016. However, for many countries,
AI hiring rates seem to have peaked around 2020, then dropped, and have since stabilized.
3 Both Figure 4.1.11 and Figure 4.1.12 report the Relative AI Hiring Index. Figure 4.1.11 reports the Index value at the end of December 2022, while Figure 4.1.12 reports a twelve-month rolling average.

Relative AI Hiring Index by Geographic Area, 2016–22

Australia Belgium Brazil Canada

2.00 2.00 2.00 2.00
1.50 1.50 1.50 1.50
1.12 1.11 1.17
1.00 1.00 1.00 1.00 1.00
0.50 0.50 0.50 0.50
0.00 0.00 0.00 0.00
2018 2020 2022 2018 2020 2022 2018 2020 2022 2018 2020 2022
Chile Denmark Finland France

2.00 2.00 2.00 2.00
1.50 1.50 1.50 1.50
1.14 1.08 1.19
1.00 1.05 1.00 1.00 1.00
0.50 0.50 0.50 0.50
0.00 0.00 0.00 0.00
2018 2020 2022 2018 2020 2022 2018 2020 2022 2018 2020 2022
Germany Hong Kong India Ireland

Relative AI Hiring Index
2.00 2.00 2.00 2.00

1.50 1.50 1.50 1.50
1.13 1.21 1.08
1.00 1.00 1.00 0.94 1.00
0.50 0.50 0.50 0.50
0.00 0.00 0.00 0.00
2018 2020 2022 2018 2020 2022 2018 2020 2022 2018 2020 2022
Israel Italy Luxembourg Netherlands

2.00 2.00 2.00 2.00
1.50 1.50 1.50 1.50
1.11 1.21
1.00 1.00 1.00 1.00 1.03 1.00
0.50 0.50 0.50 0.50
0.00 0.00 0.00 0.00
2018 2020 2022 2018 2020 2022 2018 2020 2022 2018 2020 2022
New Zealand Norway Portugal Singapore

2.00 2.00 2.00 2.00
1.50 1.50 1.50 1.50
1.09 1.10 1.01 1.13
1.00 1.00 1.00 1.00
0.50 0.50 0.50 0.50
0.00 0.00 0.00 0.00
2018 2020 2022 2018 2020 2022 2018 2020 2022 2018 2020 2022
South Africa South Korea Spain Sweden

2.00 2.00 2.00 2.00
1.50 1.50 1.50 1.50
1.10 1.05 1.12 1.09
1.00 1.00 1.00 1.00
0.50 0.50 0.50 0.50
0.00 0.00 0.00 0.00
2018 2020 2022 2018 2020 2022 2018 2020 2022 2018 2020 2022
Switzerland United Arab Emirates United Kingdom United States

2.00 2.00 2.00 2.00
1.50 1.50 1.50 1.50
1.25 1.15
1.00 1.09 1.00 1.08 1.00 1.00
0.50 0.50 0.50 0.50
0.00 0.00 0.00 0.00
2018 2020 2022 2018 2020 2022 2018 2020 2022 2018 2020 2022
Figure 4.1.12

rate of various countries or regions from 2015 to

AI Skill Penetration 2022. In this case, the relative AI skill penetration rate
The AI skill penetration rate is a metric created by
can be understood as the sum of the penetration of
LinkedIn that measures the prevalence of various
each AI skill across occupations in a given country or
AI-related skills across occupations. LinkedIn
region, divided by the global average across the same
generates this metric by calculating the frequencies
occupation. For instance, a relative skill penetration
of LinkedIn users’ self-added skills in a given area
rate of 1.5 means that the average penetration of AI
from 2015 to 2022, then reweighting those numbers
skills in that country or region is 1.5 times the global
with a statistical model to create the top 50
average across the same set of occupations.
representative skills in that select occupation.
As of 2022, the three countries or regions with the
Global Comparison: Aggregate highest AI skill penetration rates were India (3.2),
Figure 4.1.13 shows the relative AI skill penetration the United States (2.2), and Germany (1.7).
Relative AI Skill Penetration Rate by Geographic Area, 2015–22

India 3.23
United States 2.23
Germany 1.72
Israel 1.65
Canada 1.54
United Kingdom 1.54
South Korea 1.44
Singapore 1.37
France 1.13
Brazil 0.99
Spain 0.98
Netherlands 0.95
Italy 0.95
Switzerland 0.91
Australia 0.89
0.00 0.50 1.00 1.50 2.00 2.50 3.00

Relative AI Skill Penetration Rate
Figure 4.1.13

Global Comparison: By Gender

Figure 4.1.14 disaggregates AI skill penetration rates pooled together across the same set of occupations
by gender across different countries or regions. in the country. For all countries in the sample, the
A country’s “Relative AI skill penetration rate relative AI skill penetration rate is greater for men
across genders” for women of 1.5 means that female than women. India (2.0), the United States (1.3), and
members in that country are 1.5 times more likely to Israel (0.9) have the highest reported relative AI skill
list AI skills than the average member in all countries penetration rates for women.
Relative AI Skill Penetration Rate Across Gender, 2015–22

3.27
India 1.99
2.36
United States 1.28
2.05
Israel 0.87
1.59
Canada 0.86
1.91
Germany 0.71
1.37
Singapore 0.68
1.46
United Kingdom 0.57
1.13
France 0.39
0.98
Netherlands 0.38
0.82
United Arab Emirates 0.31
0.85
Italy 0.31
0.98
Brazil 0.30
1.03
Spain 0.30
1.08 Male
Finland 0.29
Female
0.88
Australia 0.29
0.00 0.50 1.00 1.50 2.00 2.50 3.00

Relative AI Skill Penetration Rate
Figure 4.1.14

Index Report 2023 4.2 Investment
Using data from NetBase Quid, this section tracks trends in AI-related investments. NetBase Quid tracks data on the investments of over
8 million global public and private companies. NetBase Quid also uses natural language processing techniques to search, analyze, and
identify patterns in large, unstructured datasets, like aggregated news and blogs, and company and patent databases. NetBase Quid
continuously broadens the set of companies for which it tracks data, so that in this year’s AI Index, the reported investment volume for
certain years is larger than that of previous reports.
4.2 Investment
Corporate Investment
As AI becomes more and more integrated into the For the first time since 2013, year-over-year global
economy, it becomes increasingly important to track corporate investment in AI has decreased. In 2022,
AI-related corporate investment. Figure 4.2.1 shows total global corporate AI investment was $189.6
overall global corporate investment in AI from 2013 billion, roughly a third lower than it was in 2021.
to 2022. Corporate investment includes mergers and Still, in the last decade, AI-related investment has
acquisitions, minority stakes, private investment, and increased thirteenfold.
public offerings.
Global Corporate Investment in AI by Investment Activity, 2013–22

Source: NetBase Quid, 2022 | Chart: 2023 AI Index Report
300
Merger/Acquisition
Minority Stake 276.14
Private Investment
250 Public O ering
Total Investment (in Billions of U.S. Dollars)
119.66
200
189.59
150 146.74 83.35

26.06
100 95.57 46.06

125.36
79.62
31.91
21.89
53.72
50 91.86
33.82 24.68 61.61
25.43 43.1 55.09
19.04 13.05
14.57
17.13 25.72 29.1
13.35 12.62 13.01
0
2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 4.2.1

To provide a fuller context for the Top Five AI Merger/Acquisition Investment Activities, 2022
Source: NetBase Quid, 2022 | Table: 2023 AI Index Report
nature of AI investment in the last year,
Company Name Headquarters Focus Area Funding Amount
Figures 4.2.2 through 4.2.5 highlight Country (in Billions USD)
the top merger/acquisition, minority Nuance United States Arti cial Intelligence; 19.80
Communications, Inc. Enterprise Software;
stake, private investment, and public Healthcare; Machine
offering events in the last year. The Learning
greatest single AI investment event Citrix Systems, Inc. United States Data Management, 17.18
Processing, and Cloud;
was the merger/acquisition of Nuance HR Tech
Communications, valued at $19.8 billion Avast Limited Czech Republic Data Management, 8.02
Processing, and Cloud;
(Figure 4.2.2). The largest minority Fintech; Cybersecurity,
stake event was for the British company Data Protection
6.34
Aveva Group ($4.7 billion) (Figure 4.2.3). AspenTech United States Manufacturing;
Corporation Software; Supply
The greatest private investment event Chain Management
was GAC Aion New Energy Automobile Vivint Smart Home, United States Cybersecurity, Data 5.54
Inc. Protection; Sales
($2.5 billion), a Chinese clean energy Enablement
and automotive company (Figure 4.2.4).
Finally, the largest public offering was Figure 4.2.2
ASR Microelectronics ($1.1 billion),

a Chinese semiconductor company Top Five AI Minority Stake Investment Activities, 2022
(Figure 4.2.5).
Country (in Billions USD)
AVEVA Group, PLC United Chemical; Computer; 4.68

Kingdom Data Mining; Electronics;
Industrial Manufacturing;
Information Technology;
Simulation; Software
Grupo de Colombia Financial Services; Impact 1.48

Inversiones Investing; Insurance
Suramericana, SA
Fractal Analytics India Analytics; Arti cial 0.35

Private Limited Intelligence; Big Data;
Business Intelligence;
Consulting; Machine
Learning
Atrys Health, SA Spain Medical and Healthcare 0.28
R Systems India Analytics; Information 0.17

International, Ltd. Technology; IT
Management; Software
Figure 4.2.3

Top Five AI Private Investment Activities, 2022


GAC Ai¬¥an New China Automotive; Clean 2.54

Energy Automobile Energy; Electric Vehicle;
Co., Ltd. Manufacturing
Idience Co., Ltd. South Korea Emergency Medicine; 2.15

Healthcare;
Pharmaceutical
Uali Argentina Drones; Cloud Computing 1.50
Anduril Industries, United States Cybersecurity, Data 1.50

Inc. Protection; AR/VR;
Drones
Celonis, GmbH Germany Retail; Industrial 1.22

Automation, Network; HR
Tech; Insurtech
Figure 4.2.4
Top Five AI Public O ering Investment Activities, 2022


ASR Microelectronics China Semiconductor; VC 1.08

Co., Ltd.
iSoftStone Information China Data Management, 0.73

Technology (Group) Processing, and Cloud;
Co., Ltd. Cybersecurity, Data
Protection
Jahez International Saudi Arabia Arti cial Intelligence; 0.43

Company for E-Commerce; Food and
Information Systems Beverage; Food Delivery;
Technology Information Technology;
Logistics
Fortior Technology China Electronics; Machine 0.30

(Shenzhen) Co., Ltd. Manufacturing;
Semiconductor
Beijing Deep Glint China Cybersecurity, Data 0.29

Technology Co., Ltd. Protection; Music, Video
Content
Figure 4.2.5

Startup Activity
Global Trend
The next section analyzes private investment trends in The global private AI investment trend reveals that
artificial intelligence startups that have received over while investment activity has decreased since 2021, it
$1.5 million in investment since 2013. is still 18 times higher than it was in 2013 (Figure 4.2.6).
Private Investment in AI, 2013–22

120
100
91.86
80
60
40
20
0
2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 4.2.6

A similar trend, of short-term decreases but longer- decrease from 2021 but a sixfold increase since 2013
term growth, is evident in data on total private (Figure 4.2.7). Similarly, the number of newly funded AI
investment events. In 2022 there were 3,538 AI- companies dropped to 1,392 from 1,669 last year, while
related private investment events, representing a 12% having increased from 495 in 2013 (Figure 4.2.8).
Number of Private Investment Events in AI, 2013–22
4,000
3,538
3,500
Number of Private Investment Events
3,000
2,500
2,000
1,500
1,000
500
0
2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 4.2.7
Number of Newly Funded AI Companies in the World, 2013–22
1,600
1,392
1,400
1,200
Number of Companies
1,000
800
600
400
200
0
2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 4.2.8

The year-over-year decrease in AI-related AI Private Investment Events by Funding Size,

2021 Vs. 2022
funding is also evident when the funding events Source: NetBase Quid, 2022 | Table: 2023 AI Index Report
are disaggregated by size. Across all size Funding Size 2021 2022 Total
categories, with the exception of ones over Over $1 Billion 4 6 10

$500 Million – $1 Billion 13 5 18
$1 billion, the total number of AI funding events
$100 Million – $500 Million 277 164 441
decreased (Figure 4.2.9). $50 Million – $100 Million 277 238 515
Under $50 Million 2,851 2,585 5,436
Undisclosed 598 540 1,138
Total 4,020 3,538 7,558
Figure 4.2.9
Regional Comparison by Funding Amount

Once again, the United States led the world in terms of total AI private investment. In 2022, the $47.4 billion
invested in the United States was roughly 3.5 times the amount invested in the next highest country, China
($13.4 billion), and 11 times the amount invested in the United Kingdom ($4.4 billion) (Figure 4.2.10).
Private Investment in AI by Geographic Area, 2022

United States 47.36
China 13.41
United Kingdom 4.37
Israel 3.24
India 3.24
South Korea 3.10
Germany 2.35
Canada 1.83
France 1.77
Argentina 1.52
Australia 1.35
Singapore 1.13
Switzerland 1.04
Japan 0.72
Finland 0.61
0 5 10 15 20 25 30 35 40 45
Figure 4.2.10

When private AI investments are aggregated since 2013, the same ranking of countries applies:
The United States is first with $248.9 billion invested, followed by China ($95.1 billion) and the
United Kingdom ($18.2 billion) (Figure 4.2.11).
Private Investment in AI by Geographic Area, 2013–22 (Sum)

United States 248.90
China 95.11
United Kingdom 18.24
Israel 10.83
Canada 8.83
India 7.73
Germany 6.99
France 6.59
South Korea 5.57
Singapore 4.72
Japan 3.99
Hong Kong 3.10
Switzerland 3.04
Australia 3.04
Spain 1.81
0 20 40 60 80 100 120 140 160 180 200 220 240

Figure 4.2.11

While the United States continues to outpace The top five American AI private investment events
other nations in terms of private AI investment, are highlighted in Figure 4.2.13, the top five European
the country experienced a sharp 35.5% decrease Union and British investments in Figure 4.2.14, and the
in AI private investment within the last year (Figure top five Chinese investments in Figure 4.2.15.
4.2.12). Chinese investment experienced a similarly
sharp decline (41.3%).
Private Investment in AI by Geographic Area, 2013–22

70
60
50
47.36, United States
40
30
20
13.41, China
10 11.04, European Union and United Kingdom
2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 4.2.12

Top AI Private Investment Events in the United Top AI Private Investment Events in the European
States, 2022 Union and United Kingdom, 2022
Source: NetBase Quid, 2022 | Table: 2023 AI Index Report Source: NetBase Quid, 2022 | Table: 2023 AI Index Report
Company Name Focus Area Funding Amount Company Name Focus Area Funding Amount
(in Billions USD) (in Billions USD)
Anduril Industries, Inc. Cybersecurity, Data 1.50 Celonis, GmbH Retail; Industrial 1.22
Protection; AR/VR; Automation, Network;
Drones HR Tech; Insurtech
Faire Wholesale, Inc. Fintech; Retail; Sales 0.82 Content Square, SAS Analytics; Arti cial 0.60
Enablement Intelligence: CRM:
Data Visualization;
Anthropic, PBC Arti cial Intelligence; 0.58 Digital Marketing;
Information SaaS
Technology; Machine
Learning Retail Logistics Excellence Retail 0.57
- RELEX Oy
Arctic Wolf Networks, Inc. Data Management, 0.40
Processing, and Cloud; Cera Care Limited Medical and 0.32
Cybersecurity, Data Healthcare
Protection
Babylon Holdings Limited Medical and 0.30
JingChi, Inc. Data Management, 0.40 Healthcare; Music,
Processing, and Cloud; Video Content
AV; AR/VR
Figure 4.2.14
Figure 4.2.13
Top AI Private Investment Events in China, 2022

Company Name Focus Area Funding Amount

(in Billions USD)
GAC Ai¬¥an New Energy Automotive; Clean 2.54

Automobile Co., Ltd. Energy; Electric
Vehicle;
Manufacturing
GAC Ai¬¥an New Energy Automotive; Clean 1.11

Automobile Co., Ltd. Energy; Electric
Vehicle;
Manufacturing
Beijing ESWIN Data Management, 0.58

Technology Group Co., Processing, and Cloud;
Ltd. Industrial Automation,
Network;
Semiconductor;
Marketing, Digital Ads;
Sales Enablement
Zhejiang Hozon New Data Management, 0.44

Energy Automobile Co., Processing, and Cloud;
Ltd. Cybersecurity, Data
Protection; Sales
Enablement
Zhejiang Hozon New Data Management, 0.32

Energy Automobile Co., Processing, and Cloud;
Ltd. Cybersecurity, Data
Protection; Sales
Enablement
Figure 4.2.15

Regional Comparison by Newly Funded

AI Companies
This subsection studies the number of newly funded United States led all regions with the largest number of
AI companies across various geographic areas. newly funded AI companies at 542, followed by China
As was the case with private investment, the at 160 and the United Kingdom at 99 (Figure 4.2.16).
Number of Newly Funded AI Companies by Geographic Area, 2022

United States 542
China 160
United Kingdom 99
Israel 73
India 57
Canada 47
France 44
Germany 41
Singapore 36
Japan 32
Switzerland 26
Australia 23
South Korea 22
Sweden 12
Netherlands 12
0 100 200 300 400 500

Number of Companies
Figure 4.2.16

A similar trend is evident in the aggregate data since 2013. In the last decade, the number of newly funded
AI companies in the United States is around 3.5 times the amount in China, and 7.4 times the amount in the
United Kingdom (Figure 4.2.17).
Number of Newly Funded AI Companies by Geographic Area, 2013–22 (Sum)

United States 4,643
China 1,337
United Kingdom 630
Israel 402
Canada 341
France 338
India 296
Japan 294
Germany 245
Singapore 165
South Korea 145
Australia 126
Switzerland 108
Sweden 83
Netherlands 78
0 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 4,500

Number of Companies Figure 4.2.17
Figure 4.2.18 breaks Number of Newly Funded AI Companies by Geographic Area, 2013–22
700
down data on newly
funded AI companies
within select 600
geographic regions. 542,

United States
In a trend that goes 500
back a decade,
Number of Companies
the United States 400
continues to outpace
both the European 300 293,
European
Union and the United Union and
United Kingdom
Kingdom, as well as 200
160,
China. However, the China
growth rates of the 100
different regions are

relatively similar. 0
2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 4.2.18

Focus Area Analysis Private Investment in AI by Focus Area, 2021 Vs. 2022
Private AI investment can also be disaggregated by Medical and Healthcare
Data Management, Processing, Cloud
focus area. Figure 4.2.19 compares global private Fintech
AI investment by focus area in 2022 versus 2021. Cybersecurity, Data Protection
Retail
The focus areas that attracted the most investment Industrial Automation, Network
Sales Enablement
in 2022 were medical and healthcare ($6.1 billion); Marketing, Digital Ads
AR/VR
data management, processing, and cloud ($5.9 Drones
Insurtech
billion); fintech ($5.5 billion); cybersecurity and Music, Video Content
data protection ($5.4 billion); and retail ($4.2 Semiconductor
HR Tech
billion). Mirroring the pattern seen in total AI private Energy, Oil, and Gas
AV
investment, the total investment across most focus NLP, Customer Support
Agritech
areas declined in the last year. Entertainment
Legal Tech
Geospatial
Fitness and Wellness
Ed Tech 2022
Facial Recognition 2021
VC
0 2 4 6 8 10
Figure 4.2.19
Figure 4.2.20 presents trends in AI focus area cybersecurity and data protection, drones, marketing
investments. As noted earlier, most focus areas saw and digital ads, HR tech, AR/VR, and legal tech. Still,
declining investments in the last year. However, some mirroring a broader trend in AI private investment,
of the focus areas that saw increased investments are most focus areas saw greater amounts of AI private
semiconductor, industrial automation and network, investment in 2022 than they did in 2017.

Private Investment in AI by Focus Area, 2017–22

Data Management, Processing, Cloud Medical and Healthcare Fintech AV
8 8 8 8
6 5.86 6 6.05 6 5.52 6
4 4 4 4
2 2 2 2 1.34
0 0 0 0
2018 2020 2022 2018 2020 2022 2018 2020 2022 2018 2020 2022
Semiconductor Industrial Automation, Network Retail Fitness and Wellness
8 8 8 8
6 6 6 6
4 4 3.92 4 4.20 4
2 1.65 2 2 2
0.53
0 0 0 0
2018 2020 2022 2018 2020 2022 2018 2020 2022 2018 2020 2022
NLP, Customer Support Energy, Oil, and Gas Cybersecurity, Data Protection Drones
8 8 8 8
6 6 6 5.38 6
4 4 4 4
2 2 1.61 2 2 1.88
1.01
0 0 0 0
2018 2020 2022 2018 2020 2022 2018 2020 2022 2018 2020 2022
Marketing, Digital Ads HR Tech Facial Recognition Insurtech
8 8 8 8
6 6 6 6
4 4 4 4
3.05
2 2 1.63 2 2 1.74
0.07
0 0 0 0
2018 2020 2022 2018 2020 2022 2018 2020 2022 2018 2020 2022
Agritech Sales Enablement AR/VR Ed Tech
8 8 8 8
6 6 6 6
4 4 4 4
3.18 2.39
2 2 2 2
0.87 0.37
0 0 0 0
2018 2020 2022 2018 2020 2022 2018 2020 2022 2018 2020 2022
Geospatial Legal Tech Entertainment Music, Video Content
8 8 8 8
6 6 6 6
4 4 4 4
2 2 2 2 1.72
0.71 0.83 0.87
0 0 0 0
2018 2020 2022 2018 2020 2022 2018 2020 2022 2018 2020 2022
VC
8
6
4
2
0.02
0 Figure 4.2.20
2018 2020 2022

Finally, 4.2.21 shows private investment in AI by focus area billion), and 40 times more than that in the European
over time within select geographic regions, highlighting Union and the United Kingdom ($0.04 billion). Chinese
how private investment priorities in AI differ across private investment in AI-related semiconductors ($1.02
geographies. For example, in 2022, private investment billion) was 1.75 times more than that in the United
in AI-related drone technology in the United States ($1.6 States ($0.58 billion), and 102 times more than that in the
billion) was nearly 53 times more than that in China ($0.03 European Union and the United Kingdom ($0.01 billion).
Private Investment in AI by Focus Area and Geographic Area, 2017–22

Data Management, Processing, Cloud Medical and Healthcare Fintech AV

US, 3.13 US, 4.19 US, 3.23 US, 0.69
CN, 1.87 CN, 0.25 CN, 0.03 CN, 0.49
4 EU/UK, 0.24 4 EU/UK, 0.76 4 EU/UK, 0.94 4 EU/UK, 0.02
2 2 2 2
0 0 0 0
2018 2020 2022 2018 2020 2022 2018 2020 2022 2018 2020 2022
Semiconductor Industrial Automation, Network Retail Fitness and Wellness

US, 0.58 US, 0.87 US, 1.52 US, 0.23
CN, 1.02 CN, 1.06 CN, 0.01 CN, 0.00
2 2 2 2
0 0 0 0
2018 2020 2022 2018 2020 2022 2018 2020 2022 2018 2020 2022
NLP, Customer Support Energy, Oil, and Gas Cybersecurity, Data Protection Drones
US, 0.69 US, 0.80 US, 3.87 US, 1.60
CN, 0.13 CN, 0.34 CN, 1.07 CN, 0.03
2 2 2 2
0 0 0 0
2018 2020 2022 2018 2020 2022 2018 2020 2022 2018 2020 2022
Marketing, Digital Ads HR Tech Facial Recognition Insurtech
US, 1.14 US, 0.24 US, 0.07 US, 0.39
CN, 0.88 CN, 0.00 CN, 0.00 CN, 0.00
2 2 2 2
0 0 0 0
2018 2020 2022 2018 2020 2022 2018 2020 2022 2018 2020 2022
Agritech Sales Enablement AR/VR Ed Tech

US, 0.55 US, 1.12 US, 2.07 US, 0.12
CN, 0.10 CN, 1.68 CN, 0.01 CN, 0.01
2 2 2 2
0 0 0 0
2018 2020 2022 2018 2020 2022 2018 2020 2022 2018 2020 2022
Geospatial Legal Tech Entertainment Music, Video Content

US, 0.55 US, 0.71 US, 0.47 US, 1.10
CN, 0.03 CN, 0.05 CN, 0.18 CN, 0.03
2 2 2 2
0 0 0 0
2018 2020 2022 2018 2020 2022 2018 2020 2022 2018 2020 2022
VC
US, 0.00
CN, 0.00
4 EU/UK, 0.02
2
0 Figure 4.2.21
2018 2020 2022

Index Report 2023 4.3 Corporate Activity
This section explores how corporations tangibly use AI. First, it highlights industry adoption trends and asks how businesses adopt
AI and what particular AI technologies they find most useful, and identifies how AI adoption affects their bottom line. Second, the
section considers industry motivations and explores what questions industry leaders consider when thinking about incorporating AI
technologies. Finally, it paints a qualitative picture of business AI use by examining trends in AI-related earnings calls.
4.3 Corporate Activity

Industry Adoption
Adoption of AI Capabilities
The following subsection on the industry adoption According to the most recent McKinsey report, as of
of AI borrows data from McKinsey’s “The State of 2022, 50% of surveyed organizations reported having
AI in 2022—and a Half Decade in Review,” as well adopted AI in at least one business unit or function
as previous years’ editions. The 2022 report drew on (Figure 4.3.1). This total is down slightly from 56% in
data from a survey of 1,492 participants representing 2021, although up significantly from 20% in 2017. AI
a wide range of regions, industries, company sizes, usage has rapidly grown in the past half-decade, but
functional specialties, and tenures. leveled off since 2020.
Share of Respondents Who Say Their Organizations Have Adopted AI in at Least One Function, 2017–22
Source: McKinsey & Company Survey, 2022 | Chart: 2023 AI Index Report
60%
50% 50%
40%
% of Respondents
30%
20%
10%
0%
2017 2018 2019 2020 2021 2022
Figure 4.3.1

In the last half-decade, the average number of AI capabilities that organizations have embedded
has doubled from 1.9 in 2018 to 3.8 in 2022 (Figure 4.3.2). Some of the AI capabilities that McKinsey
features in their survey include recommender systems, NL text understanding, and facial recognition.4
Average Number of AI Capabilities That Respondents’ Organizations Have Embedded Within at Least One
Function or Business Unit, 2018–22
4.00
3.80
3.50
Number of AI Capabilities (Average)
3.00
2.50
2.00
1.50
1.00
0.50
0.00
2018 2019 2020 2021 2022
Figure 4.3.2
4 In the 2022 edition of the McKinsey survey, 16 total AI capabilities are considered: computer vision, deep learning, digital twins, facial recognition, GAN, knowledge graphs,
NL generation, NL speech understanding, NL text understanding, physical robotics, recommender systems, reinforcement learning, robotic process automation, transfer
learning, transformers, and virtual agents.

The most commonly adopted AI use case in 2022 was service operations optimization (24%), followed
by the creation of new AI-based products (20%), customer segmentation (19%), customer service
analytics (19%), and new AI-based enhancement of products (19%) (Figure 4.3.3).
Most Commonly Adopted AI Use Cases by Function, 2022

Service Operations Optimization 24%
Creation of New AI-Based Products 20%
Customer Segmentation 19%
Customer Service Analytics 19%
New AI-Based Enhancements of Products 19%
Customer Acquisition and Lead Generation 17%
Product Feature Optimization 16%
Contact-Center Automation 16% Service Operations

Product and/or
Risk Modeling and Analytics 15% Service Development
Marketing and Sales
Predictive Service and Intervention 14% Risk
0% 4% 8% 12% 16% 20% 24%

% of Respondents
Figure 4.3.3

With respect to the type of AI capabilities embedded rates of embedding were 48%, 47%, and 46%. Across
in at least one function or business unit, as indicated all industries, the most embedded AI technologies
by Figure 4.3.4, robotic process automation had were robotic process automation (39%), computer
the highest rate of embedding within high tech/ vision (34%), NL text understanding (33%), and virtual
telecom, financial services and business, and legal agents (33%).
and professional services industries—the respective
AI Capabilities Embedded in at Least One Function or Business Unit, 2022

All Industries 34% 30% 24% 18% 11% 25% 18% 23% 33% 20% 25% 20% 39% 16% 11% 33%
Business, Legal, and

32% 37% 31% 11% 8% 26% 12% 22% 34% 19% 23% 26% 46% 16% 11% 30%
Professional Services
Consumer Goods/
33% 36% 25% 19% 13% 18% 20% 11% 22% 24% 32% 19% 25% 7% 11% 40%
Industry
Retail
Financial Services 24% 22% 18% 24% 13% 29% 20% 30% 42% 14% 30% 19% 47% 17% 12% 33%
Healthcare Systems/
Pharma and 32% 18% 16% 5% 5% 14% 5% 12% 29% 11% 16% 13% 16% 9% 6% 14%
Med. Products
High Tech/Telecom 37% 45% 24% 16% 15% 23% 24% 29% 40% 15% 34% 23% 48% 22% 15% 43%
C
Fa
Kn
Ph
Re
Re
Ro
Tr
Tr
Vi
ee
ig
om
AN
an
an
rt
c
co
i
ys
bo
ow
nf
ita
Sp
Te
ia
ua
p
sf
sf
ic
pu
or
en
tic
lR
le
xt
Le
lT
lA
er
or
al
ce
ec
te
dg
er
ec
Un
m
w
Pr
a
Ro
Le
ge
en
r
rn
m
h
at
in
og
er
oc
Vi
ar
de
nt
b
en
Un
in
io
s
s(
er
ni
ot
sio
es
ni
s
g
n
ra
rS
tL
d
st
e.
tio
ic
ng
sA
er
ph
n
an
g.
ys
s
ea
n
st
,G
ut
s
di
te
rn
an
om
ng
in
PT
di
at
-3
ng
io
)
n
% of Respondents (AI Capability)

Figure 4.3.4

Figure 4.3.5 shows AI adoption by industry and AI function in 2022. The greatest adoption was in risk for high
tech/telecom (38%), followed by service operations for consumer goods/retail (31%) and product and/or service
development for financial services (31%).
AI Adoption by Industry and Function, 2022

All Industries 11% 8% 5% 10% 19% 19% 21% 9%

11% 10% 9% 8% 16% 20% 19% 12%
Consumer Goods/
14% 4% 3% 4% 15% 31% 29% 11%
Retail
Industry
Financial Services 1% 8% 7% 31% 17% 24% 23% 2%
Healthcare Systems/
Pharma and 15% 7% 2% 4% 22% 12% 8% 8%
Med. Products
High Tech/Telecom 6% 6% 4% 7% 38% 21% 25% 8%
Hu Ma Ma Pro Ris Se Str Su

rv p
ma
nR
nu
fa
rke
ti Se duct k ice Co ateg Ma ply-
eso ctu ng rvi
c an Op rpo y a na Ch
urc rin
g
an
dS e D d/o era rat nd ge a
me in
es ev r tio eF nt
ale elo ns ina
s pm nce
en
t
% of Respondents (Function)
Figure 4.3.5

Figure 4.3.6 shows how rates of AI adoption by points); followed by high tech/telecom, for risk
industry and AI function vary from 2021 to 2022 (22 percentage points). The most significant
in order to demonstrate how rates of AI adoption decreases were in high tech/telecom, for product
have changed over the last year. The greatest year- and/or service development (38 percentage points);
over-year increases were in consumer goods/retail, and healthcare systems, also for product and/or
for strategy and corporate finance (25 percentage service development (25 percentage points).
Percentage Point Change in Responses of AI Adoption by Industry and Function 2021 Vs. 2022
All Industries 2% -4% -15% -13% 6% -6% 12% -4%

-3% 2% -19% -7% 3% -6% 11% -1%
Consumer Goods/
12% -14% -19% -13% 14% 16% 25% -7%
Retail
Industry
Financial Services -9% 4% -17% 11% -15% -16% 10% -6%
Healthcare Systems/
Pharma and 6% -4% -12% -25% 9% -5% -4% -1%
Med. Products
High Tech/Telecom -6% -5% -24% -38% 22% -13% 15% -8%
Hu Ma Ma Pro Ris Se Str Su

rv p
ma
nR
nu
fa
rke
ti Se duct k ice Co ateg Ma ply-
eso ctu ng rvi
ce a nd Op rpo y a na Ch
urc rin
g
an
dS De /or era rat nd ge a
me in
es ve tio eF nt
ale lop ns ina
s me nce
nt
Percentage Point Change in Responses (Function)
Figure 4.3.6

Organizations report AI adoption leading to both (Figure 4.3.7). On the revenue side, the functions that
cost decreases and revenue increases. On the cost most respondents saw increases in as a result of AI
side, the functions that most respondents saw adoption were marketing and sales (70%), product
decreases in as a result of AI adoption were supply and/or service development (70%), and strategy and
chain management (52%), service operations (45%), corporate finance (65%).
strategy and corporate finance (43%), and risk (43%)
Cost Decrease and Revenue Increase From AI Adoption by Function, 2021

Decrease by <10% Decrease by 10–19% Decrease by ≥20% Increase by >10% Increase by 6–10% Increase by ≤5%
Service Operations 45% 29% 10% 6% 10% 10% 37% 57%
Manufacturing 42% 32% 7% 10% 18% 33% 61%
Human Resources 29% 25% 14% 13% 31% 58%
Marketing and Sales 28% 21% 9% 20% 41% 70%

Function
Risk 43% 30% 8% 10% 11% 27% 48%
Supply Chain Management 52% 41% 7% 14% 17% 28% 59%
Product and/or Service Development 30% 20% 6% 13% 24% 33% 70%
Strategy and Corporate Finance 43% 31% 8% 8% 16% 41% 65%
Average Across All Activities 32% 23% 8% 19% 36% 63%
% of Respondents
Figure 4.3.7

Figure 4.3.8 shows AI adoption by organizations across all geographies was 50%, down 6% from 2021.
globally, broken out by regions of the world. In 2022, Notably, “Greater China” registered a 20 percentage
North America led (59%), followed by Asia-Pacific point decrease from 2021.
(55%) and Europe (48%). The average adoption rate
AI Adoption by Organizations in the World, 2021 Vs. 2022

50%
All Geographies
56%
55%
Asia-Paci c
64%
48%
Europe
51%
59%
North America
55%
Greater China 41%

(incl. Hong Kong,
Taiwan) 61%
Developing Markets 2022

(incl. India, 44%
2021
Latin America, 52%
MENA)
0% 10% 20% 30% 40% 50% 60%
% of Respondents
Figure 4.3.8

Consideration and Mitigation of Risks From risks were regulatory compliance (45%), personal/
Adopting AI individual privacy (40%), and explainability (37%).
As has been the case in the last few iterations of the The least salient risks identified by organizations were
McKinsey report, in 2022 respondents identified national security (13%) and political stability (9%).
cybersecurity as the most relevant risk when adopting
AI technology (59%) (Figure 4.3.9). The next most cited
Risks From Adopting AI That Organizations Consider Relevant, 2019–22

60% 59%, Cybersecurity
50%
45%, Regulatory Compliance

% of Respondents
40% 40%, Personal/Individual Privacy

37%, Explainability
32%, Organizational Reputation

30% 30%, Equity and Fairness
28%, Workforce/Labor Displacement
20% 20%, Physical Safety
13%, National Security

10%
9%, Political Stability
0%
2019 2020 2021 2022
Figure 4.3.9

Figure 4.3.10 highlights the AI risks that organizations have taken steps to mitigate. For instance, there is
are taking steps to mitigate. The top three responses a gap of 8 percentage points for cybersecurity, 9
were cybersecurity (51%), followed by regulatory percentage points for regulatory compliance, and 12
compliance (36%) and personal/individual privacy percentage points for personal/individual privacy.
(28%). As was the case in previous years, there are These differences suggest there is a gap between
meaningful gaps between the risks organizations the awareness organizations have of various risks and
cite as relevant and those which organizations their steps taken to mitigate such risks.
Risks From Adopting AI That Organizations Take Steps to Mitigate, 2019–22

51%, Cybersecurity
50%
40%
36%, Regulatory Compliance

% of Respondents
30%
28%, Personal/Individual Privacy
22%, Organizational Reputation

22%, Explainability
20%
18%, Workforce/Labor Displacement
17%, Equity and Fairness
15%, Physical Safety
10%
7%, National Security
4%, Political Stability
0%
2019 2020 2021 2022
Figure 4.3.10

The Effects of GitHub’s Copilot on Developer
Productivity and Happiness
In 2021, launched a technical preview of Copilot,
It took the developers
a generative AI tool that enables developers and
coders to present a coding problem in natural using Copilot only 71
language and then have Copilot generate a minutes to complete their
solution in code. Copilot can also translate
between various programming languages. In
task—56% less time than
2022, GitHub surveyed over 2,000 developers the developers who did not
who were using the tool to determine its effect on
their productivity, well-being, and workflow.5
use Copilot (161 minutes).
Figure 4.3.11 summarizes the results of the survey. reported a completion rate of 78%, 8 percentage
Developers overwhelmingly reported feeling points higher than those who did not use Copilot.
more productive, satisfied, and efficient when Likewise, it only took the developers using Copilot
working with Copilot. More specifically, 88% of 71 minutes to complete their task, which was 56%
surveyed respondents commented feeling more less time than the developers who did not use
productive, 74% reported being able to focus on Copilot (161 minutes). These survey and experiment
more satisfying work, and 88% claimed to have results are evidence of the tangible ways in which
completed tasks more quickly. One software AI tools improve worker productivity.
engineer stated, “[With Copilot] I have to think
less, and when I have to think, it’s the fun stuff. It
sets off a little spark that makes coding more fun
and more efficient.”6
As part of the same survey, GitHub recruited

95 developers and randomly split them into two
groups, one of which used Copilot as part of a
coding task and the other which did not. The
results of this experiment are summarized in
Figure 4.3.12. The developers who used Copilot
5 Most of the developers surveyed, around 60%, were professional developers; 30% were students and 7% were hobbyists.
6 The quote is taken from this source.

The Effects of GitHub’s Copilot on Developer
Productivity and Happiness (cont’d)
Measuring Dimensions of Developer Productivity When Using Copilot: Survey Responses, 2022
Source: GitHub Survey, 2022 | Chart: 2023 AI Index Report
I Am More Productive 88%
Focus on More Satisfying Work 74%
More Ful�lled With My Job 60% Perceived Productivity

Satisfaction and Well-Being
Less Frustrated When Coding 59% E�ciency and Flow
Faster With Repetitive Tasks 96%
Faster Completion 88%
Less Mental E�orts on Repetitive Tasks 87%
Less Time Searching 77%
More In The Flow 73%
0% 20% 40% 60% 80% 100%

% of Participants That Agreed or Strongly Agreed
Figure 4.3.11
Summary of the Experiment Process and Results

Source: GitHub Survey, 2022 | Table: 2023 AI Index Report
Used Did Not Use

GitHub Copilot GitHub Copilot
Number of Developers 45 50
Completion Rate (%) 78 70
Average Time Taken to 71 161

Complete the Task (Minutes)
Figure 4.3.12

Importance of AI Solutions for Organizations’

Industry Motivation Overall Success
Source: Deloitte Survey, 2022 | Chart: 2023 AI Index Report
This section explores the motivations industry
leaders have in deploying AI and examines the
1%, Not Important
degree to which they feel AI is important, the 5%, Somewhat Important
reasons they are eager to embrace AI, and the
factors that have hindered further scaling of
AI solutions. The data from this section comes
from Deloitte’s “State of AI in Enterprise” report,
which has surveyed companies about their use
of AI since 2017. This year’s survey polled 2,620 94%, Important
business leaders from a wide range of countries,
industries, and corporate levels.
Figure 4.3.13
Perceived Importance of AI
Figures 4.3.13 and 4.3.14 suggest that an
overwhelming majority of business leaders
perceive AI to be important for their businesses.
More specifically, when asked how important
AI solutions were for their organization’s overall Believe AI Enhances Performance and Job
Satisfaction, 2022
success, 94% responded “important,” 5% said
“somewhat important,” and 1% answered “not 2%, Strongly Disagree / Disagree 16%, Neither Agree nor Disagree
1%, Unsure
important” (Figure 4.3.13).
Similarly, when asked whether they believe that AI

enhances performance and job satisfaction, 82%
responded “strongly agree/agree,” 16% said they
“neither agree nor disagree,” and only 2% selected
“strongly disagree/disagree” (Figure 4.3.14).
82%, Strongly Agree / Agree

Figure 4.3.14

AI Investments and Implementation

a 9 percentage point decrease since 2021 and a 12
Outcomes
percentage point decrease since 2018, a significantly
In 2022, 76% of surveyed leaders reported
large portion of business leaders continue to express
expecting to increase AI investments in the next
interest in AI investment.
fiscal year (Figure 4.3.15). Although this represents
Expected AI Investment Increase in the Next Fiscal Year

80%
76%
60%
% of Respondents
40%
20%
0%
2018 2019 2020 2021 2022
Figure 4.3.15

Figure 4.3.16 highlights the main outcomes that business leaders achieved by embracing AI solutions.7
The top outcome was lowered costs (37%), followed by improved collaboration across business
functions/organizations (34%) and having discovered valuable insights (34%).
Main Outcomes of AI Implementation, 2022

Lower Costs 37%
Improve Collaboration Across
Business Functions/Organizations 34%
Discover Valuable Insights 34%

Customize or Improve Product/Programs,
33%
Services, or Offers
Enter New Markets/Expand Services to 33%
New Constituents
Make Organizational Processes More E�cient 33%
Improve Decision-Making 32%

Create New Products/
Programs and Services 32%
Predict Demand 32%

Enable New Business/
Service Models 32%
Increase Revenue 31%

Activate the Potential of Existing Headcount
and/or Improve Talent Management 30%
Improve Constituent Engagement 30%
Anticipate Constituent Needs 28%
0% 10% 20% 30%

% of Respondents
Figure 4.3.16
7 Figure 4.3.16 is drawn from the chart in the Deloitte survey: “Outcomes—‘Achieved to a high degree.’”

Challenges in Starting and Scaling AI Projects were proving business value (37%), lack of executive
The top three challenges that business leaders commitment (34%), and choosing the right AI
identified in terms of starting AI-related projects technologies (33%) (Figure 4.3.17).
Top Three Challenges in Starting AI Projects, 2022

Proving Business
37%
Value
Lack of Executive
34%
Commitment
Choosing the Right

33%
AI Technologies
0% 5% 10% 15% 20% 25% 30% 35%

% of Respondents
Figure 4.3.17

The main barrier leaders faced in scaling existing AI initiatives was managing AI-related risks (50%), obtaining
more data or inputs to train a model (44%), and implementing AI technologies (42%) (Figure 4.3.18).
Main Barriers in Scaling AI Initiatives, 2022

Managing
50%
AI-Related Risks
Obtaining Needed Data

44%
or Input to Train Model
Implementing
42%
AI Technologies
Proving Business Value 40%
0% 10% 20% 30% 40% 50%

% of Respondents
Figure 4.3.18

Earnings Calls Aggregate Trends

In the 2022 fiscal year, there were 268 earnings calls
The following subsection presents data from from Fortune 500 companies that mentioned AI-related
NetBase Quid, which uses natural language keywords (Figure 4.3.19). The number of such mentions
processing tools to analyze trends in corporate dropped from the previous year, when there were 306,
earnings calls. NetBase Quid analyzed all 2022 but has increased since 2018 when there were 225.
earnings calls from Fortune 500 companies,
identifying all mentions of “Artificial Intelligence,”
“AI,” “Machine Learning,” “ML,” and “deep learning.”
Number of Fortune 500 Earnings Calls Mentioning AI, 2018–22

300
268
Number of Earnings Calls
200
100
0
2018 2019 2020 2021 2022
Figure 4.3.19

Specific Themes advertising and marketing (8.8%) (Figure 4.3.20).

Mentions of AI in Fortune 500 earnings calls were Compared to 2018, some of the less prevalent
associated with a wide range of themes. In 2022, the AI-related themes in 2022 included deep learning
most cited themes were business integration (10.0%); (4.8%), autonomous vehicles (3.1%), and data
pricing and inventory management (8.8%); and storage and management (3.0%).
Themes for AI Mentions in Fortune 500 Earnings Calls, 2018 Vs. 2022
9.96% (-15%)
Business Integration
11.74%
8.82% (+48%)
Pricing and Inventory Management
5.94%
8.82% (+204%)
Advertising and Marketing
2.90%
8.39% (+23%)
Process Automation
6.81%
7.40% (-7%)
Support Decision-Making
7.97%
7.11% (+69%)
Healthcare and Medical Practices
4.20%
6.26% (+73%)
Cloud Platforms
3.62%
5.26% (-21%)
Personalizing Customer Experience
6.67%
4.84% (-41%)
Deep Learning
8.26%
4.13% (+24%)
Edge Intelligence
3.33%
3.84% (+121%)
Nvidia AI Use Cases
1.74%
3.27% (+33%)
Revenue Growth
2.46%
3.13% (-47%)
Autonomous Vehicles
5.94%
2.99% (+37%)
Data Processing
2.17%
2.99% (-55%)
Data Storage and Management
6.67%
2.70% (+10%)
Adobe Experience
2.46%
2.42% (+734%)
Customer Support
0.29%
2.13% (-59%)
Azure Cognitive Services
5.22%
1.85% (-20%)
Data Center GPU
2.32%
1.28% (+47%)
Investments
0.87% 2022
1.00% (-62%) 2018
Nvidia RTX
2.61%
0.71% (-87%)
Digital Transformation
5.36%
0% 1% 2% 3% 4% 5% 6% 7% 8% 9% 10% 11% 12% 13% 14%

Theme Mentioned (% of Total)
Figure 4.3.20

What Are Business Leaders Actually Saying About AI?
To better understand business attitudes that surround AI, it is worth

“We spent a ton of money
looking at AI-related excerpts from the Fortune 500 earnings calls.
on Cloud. We spend a ton of
For example, on the topic of business integration, companies often money on adding capabilities.
And over time, as you do it on
cite AI and machine learning (ML) use cases to reassure business
one platform, it all becomes
audiences of safer business practices, growing opportunities, more efficient. So, I think it’s
streamlining processes, and capability expansion. a lot of little things, but it adds
up with our base of people
and fixed cost, it adds up
“We spent $100 million building “Especially in the last year or so, the significantly over time. We’ve
certain risk and fraud systems field of robotics itself has actually been able to maintain our
so that when we process changed because with AI and headcount at a level we feel
payments on the consumer side, ML coming to the picture, there’s good about, and we think
losses are down $100 million to significant developments in the we can grow massively on
$200 million. Volume is way up. robotics field. So we think it’s a top of that without having to
That’s a huge benefit.” huge opportunity for us.” add lots of bodies to be able
– Jamie Dimon, CEO, JP Morgan – Raj Subramaniam, CEO, FedEx to do it.” – Peter Kern, CEO,
Chase & Co. (Q2 2022) (Q3 2022) Expedia Group (Q4 2022)
In terms of process automation, business leaders emphasize the ability of AI tools to accelerate
productivity gains and to deliver a better customer experience.
“We continue to drive “We have improved the “In September, we opened a next-
the use of automation experience for customers by gen fulfillment center in Illinois.
and artificial applying artificial intelligence This 1.1 million square foot facility
intelligence to drive to match them with an expert features robotics, machine learning,
productivity gains to who is right for their specific and automated storage, resulting in
help offset inflationary situation and to deliver insights increased productivity and a better
pressures.” – Jim Davis, to experts so they can provide service for our customers at faster
CEO, Quest Diagnostics excellent service.” – Sasan delivery times.” – John David, CFO,
(Q4 2022) Goodarzi, CEO, Intuit (Q2 2022) Walmart (Q3 2022)

What Are Business Leaders Actually Saying About AI?
(cont’d)
The conversation surrounding pricing and inventory management saw companies reassuring business
audiences on how their use of AI would improve their operational strength, especially in environments of
high inflation and supply chain challenges.
“We are … continuing to refine and invest “Our teams are utilizing technology, innovative data analytics
in machine learning tools that will allow for and AI to forecast supply chain lead times and changes
more sophisticated competitive pricing in market demand to ensure optimal levels. These actions
and greater automation at scale.” along with our pricing initiatives positively impacted our
– Adrian Mitchell, CFO, Macy’s (Q3 2022) gross margin in the second quarter.”
– Bert Nappier, CFO, Genuine Parts Company (Q3 2022)
There is also a vibrant discussion about the ways in which AI can change healthcare and medical
practices, more specifically to reduce costs, improve the patient experience, and better serve clinicians.
“[Using] machine “I’d like to highlight productivity efforts in our preauthorization process where
learning and robotics, we’re leveraging an in-house artificial intelligence solution to automatically
we can now resolve match incoming faxes to the correct authorization requests. This solution
a wide range of creates administrative efficiencies across millions of inbound images. We are
prescription drug also scaling this solution to multiple business units such as pharmacy and
claims which previously are also expanding the application of this type of AI to provide decision
required the attention of support to clinicians, which will result in improvements to authorization
our pharmacists, freeing turnaround times, reduction in friction for providers and creating a better
them up to spend time member experience.” – Bruce Broussard, CEO, Humana (Q3 2022)
with patients. This
advanced approach
reduces overall cost
and improves the “We continue to see opportunities across [the software and analytics] segment
patient experience.” as payers, providers, and partners take advantage of our high ROI solutions and
– Karen Lynch, CEO, realize the benefits of our data, AI models, and workflow capabilities.”
CVS Health (Q2 2022) – Neil de Crescenzo, CEO, UnitedHealth Group (Q2 2022)

Sentiment Analysis sentiment associated with mentions of AI has been

NetBase Quid also runs the AI-related text of Fortune overwhelmingly positive (Figure 4.3.21). Mentions
500 earnings calls through a sentiment analysis of AI were rarely negative, suggesting that large
machine-learning algorithm that identifies whether businesses tend to have positive associations when it
the sentiment associated with the mention of AI is comes to AI tools.
positive, mixed, or negative . Overall, since 2018, the 8
Sentiment Summary Distribution for AI Mentions in Fortune 500 Earnings Calls by Publication Date, 2018–22
100% 1%% 1%% 0%%

2%%
1%%
2%%
3%% 3%%
1%% 1%% 1%%
0%%
1%%
2%%
1%% 1%% 1%%
1%% 2%%
3%%
13%%
18%%
19%%
16%%
15%%
16%%
14%%
13%%
15%%
20%%
18%%
22%%
23%%
18%%
17%%
19%%
12%%
21%%
17%%
17%%
87%%
86%%
86%%
85%%
84%%
84%%
84%%
84%%
80%
81%%
81%%
81%%
81%%
81%%
80%%
80%%
79%%
79%%
77%%
77%%
76%%
Sentiment Summary
60%
40%
Negative
20% Mixed
Positive
0%
Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4
2018 2019 2020 2021 2022
Figure 4.3.21
8 Chapter 2 of the 2023 AI Index highlights trends in the performance of sentiment analysis algorithms.

Index Report 2023 4.4 Robot Installations
Given that robots are frequently deployed with AI-based software technologies, it is possible to gain insights on AI-ready infrastructure
being deployed in the real world by tracking the installation of industrial robots. Data in this section comes from the International
Federation of Robotics (IFR), an international nonprofit organization that works to promote, strengthen, and protect the robotics
industry. Every year the IFR releases the World Robotics Report, which tracks global trends in installations of robots.9
4.4 Robot Installations

Aggregate Trends
2021 saw a rebound in the total number of worldwide
The following subsection includes data on the robot installations. The 517,000 industrial robots
installation and operation of industrial robots, installed in 2021 represented a 31.3% increase from
which are defined as an “automatically controlled, 2020 and a 211.5% increase since 2011 (Figure 4.4.1).
reprogrammable, multipurpose manipulator,
programmable in three or more axes, which can be
either fixed in place or mobile for use in industrial
automation applications.”
Number of Industrial Robots Installed in the World, 2011–21

Source: International Federation of Robotics (IFR), 2022 | Chart: 2023 AI Index Report
517
500
Number of Industrial Robots Installed (in Thousands)
400
300
200
100
0
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 4.4.1
9 Due to the timing of the IFR’s survey, the most recent data is from 2021.

The worldwide operational stock of industrial robots from 3,035,000 in 2020. In the last decade, the
also continues to steadily increase year over year number of industrial robots being installed and the
(Figure 4.4.2). The total number of operational number being used have both steadily increased.
industrial robots jumped 14.6% to 3,477,000 in 2021,
Operational Stock of Industrial Robots in the World, 2011–21

3,477
3,500
3,000
Number of Industrial Robots (in Thousands)
2,500
2,000
1,500
1,000
500
0
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 4.4.2

Industrial Robots: Traditional Vs. scalable than traditional robots, and are capable of
Collaborative Robots iterative learning.
A distinction can be drawn between traditional
robots that work for humans and collaborative In 2017, only 2.8% of all newly installed industrial
robots that are designed to work with humans. robots were collaborative (Figure 4.4.3). As of 2021,
Recently, the robotics community has been excited that number increased to 7.5%. Although traditional
about the potential of collaborative robots given industrial robots still lead new installations, the
that they can be safer, more flexible, and more number of collaborative robots is slowly increasing.
Number of Industrial Robots Installed in the World by Type, 2017–21

Traditional 517
500 Collaborative 39
424
400 394
400 391
300
478
200 389 405

370 368
100
0
2017 2018 2019 2020 2021
Figure 4.4.3

By Geographic Area the amount installed by Japan (47,200) and 7.7

Country-level data on robot installations can illustrate times the amount installed by the United States
which countries are prioritizing the integration of (35,000) (Figure 4.4.4). The countries with the next
robots into their economy. In 2021, China installed most installations were South Korea (31,100) and
the most industrial robots, with 268,200, 5.7 times Germany (23,800).
Number of Industrial Robots Installed by Country, 2021

China 268.20
Japan 47.20
United States 35.00
South Korea 31.10
Germany 23.80
Italy 14.10
Taiwan 9.60
France 5.90
Mexico 5.40
India 4.90
Canada 4.30
Thailand 3.90
Singapore 3.50
Spain 3.40
Poland 3.30
0 30 60 90 120 150 180 210 240 270

Figure 4.4.4

In 2013, China overtook Japan as the nation installing only widened. In 2013, Chinese industrial robot
the most industrial robots (Figure 4.4.5). Since then, installations represented 20.8% of the world’s share,
the gap between the total number of industrial robots whereas in 2021, they represented 51.8%.
installed by China and the next-nearest nation has
Number of New Industrial Robots Installed in Top Five Countries, 2011–21

268, China
250
200
150
100
50 47, Japan
35, United States
31, South Korea
24, Germany
0
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 4.4.5

China consolidated its dominance in industrial robotics in 2021, the first year in which the country installed
more industrial robots than the rest of the world combined (Figure 4.4.6).
Number of Industrial Robots Installed (China Vs. Rest of the World), 2016–21
268, China
250 249, Rest of the World

200
150
100
50
0
2016 2017 2018 2019 2020 2021
Figure 4.4.6

Figure 4.4.7 shows the annual growth rate of of industrial robot installations. The countries that
industrial robot installations from 2020 to 2021 by reported the highest growth rates were Canada
country. Virtually every country surveyed by the (66%), Italy (65%), and Mexico (61%).
IFR reported a yearly increase in the total number
Annual Growth Rate of Industrial Robots Installed by Country, 2020 Vs. 2021
Canada 66%
Italy 65%
Mexico 61%
Poland 56%
India 54%
China 51%
Thailand 36%
Taiwan 31%
Japan 22%
United States 14%
France 11%
Germany 6%
South Korea 2%
Spain 1%
Singapore -35%
−40% −30% −20% −10% 0% 10% 20% 30% 40% 50% 60% 70%
Annual Growth Rate of Industrial Robots Installed
Figure 4.4.7

Country-Level Data on Service Robotics
Another important class of robots are service robots, Service Robots in Medicine
Source: UL Solutions, 2022
which the ISO defines as a robot “that performs
useful tasks for humans or equipment excluding
industrial automation applications.”10 Figure 4.4.8
is an example of a robot being used in medicine,
Figure 4.4.9 illustrates how a robot can help with
professional cleaning, and Figure 4.4.10 shows a
robot designed for maintenance and inspection.
Figure 4.4.8
Service Robots in Professional Cleaning Service Robots in Maintenance and Inspection

Source: This Week in FM, 2021 Source: Robotnik, 2022
10 A more detailed definition can be accessed here.

Country-Level Data on Service Robotics (cont’d)
Compared to 2020, 2021 saw a higher number of professional service robots installed in the world
for several key application areas, including hospitality, medical robotics, professional cleaning, and
transportation and logistics (Figure 4.4.11). The category that registered the greatest year-over-year
increase was transportation and logistics: In 2021, 1.5 times the number of such service robots were
installed as in 2020.
Number of Professional Service Robots Installed in the World by Application Area, 2020 Vs. 2021
8
Agriculture
8
20
Hospitality
11
15
Medical Robotics
12
13
Professional Cleaning 2021
10
2020
50
Transportation and Logistics
34
0 5 10 15 20 25 30 35 40 45 50
Number of Professional Service Robots Installed (in Thousands)
Figure 4.4.11

Country-Level Data on Service Robotics (cont’d)
As of 2022, the United States has the greatest number of professional service robot manufacturers,
roughly 2.16 times as many as the next nation, China. Other nations with significant numbers of robot
manufacturers include Germany (91), Japan (66), and France (54) (Figure 4.4.12).
Number of Professional Service Robot Manufacturers in Top Countries by Type of Company, 2022
225
29
200
Number of Professional Service Robot Manufacturers
150
104
100 194 Startups
91
Incumbents
Unknown
66
54 52
50 94 47
39 39
79
61
49 44 44
34 35
0
3
2
United States China Germany Japan France Russia South Korea Switzerland Canada
Figure 4.4.12

Sectors and Application Types (137,000), followed by automotive (119,000) (Figure

4.4.13). Each of the highlighted sectors has recorded
On a global level, the sector that saw the greatest increases in the total number of industrial robot
amount of robot installations was electrical/electronics installations since 2019.
Number of Industrial Robots Installed in the World by Sector, 2019–21

52
All Others 37
30
119
Automotive 84
102
137
Electrical/Electronics 110
89
15
Food 12
11
64
Metal and Machinery 44
52
24
Plastic and Chemical Products 19
18 2021
107 2020
Unspeci ed 87 2019
87
0 20 40 60 80 100 120 140

Figure 4.4.13

Robots can also be deployed in a wide range of 230,000 industrial robots were installed for handling
applications, from assembling to dispensing and functions, 2.4 times more than for welding (96,000)
handling. Figure 4.4.14 illustrates how the application and 3.7 times more than for assembling (62,000).
of industrial robots has changed since 2021. Handling Every application category, with the exception
continues to be the application case toward which of dispensing and processing, saw more robot
the most industrial robots are deployed. In 2021, installations in 2021 than in 2019.
Number of Industrial Robots Installed in the World by Application, 2019–21

62
Assembling 50
40
32
Cleanroom 32
26
11
Dispensing 8
12
230
Handling 169
177
7
Processing 5
7 2021
2020
80
2019
Unspeci ed 60
55
96
Welding 70
74
0 20 40 60 80 100 120 140 160 180 200 220 240

Figure 4.4.14

China Vs. United States and metal and machinery (34,000) (Figure 4.4.15).
The Chinese industrial sectors that installed the Every industrial sector in China recorded a greater
greatest number of industrial robots in 2022 were number of robot installations in 2021 than in 2019.
electrical/electronics (88,000), automotive (62,000),
Number of Industrial Robots Installed in China by Sector, 2019–21

29
All Others 21
12
62
Automotive 31
32
88
Electrical/Electronics 64
42
4
Food 3
3
34
Metal and Machinery 22
22
1
Pharma/Cosmetics 1
1
6
Rubber and Plastics 5
4 2021
43
2020
Unspeci ed 30 2019
31
0 10 20 30 40 50 60 70 80 90
Figure 4.4.15

The automotive industry installed the greatest number of industrial robots in the United States in 2021,
although installation rates for that sector decreased year over year (Figure 4.4.16). However, other sectors like
food, along with plastic and chemical products, saw year-over-year increases in robot installations.
Number of Industrial Robots Installed in the United States by Sector, 2019–21

4.50
All Others 2.60
3.50
9.80
Automotive 10.50
13.00
2.90
Electrical/Electronics 3.70
3.50
3.40
Food 2.70
2.20
3.80
Metal and Machinery 2.30
3.80
3.50
Plastic and Chemical Products 2.60
2.50 2021
7.10 2020
Unspeci ed 6.30 2019
5.00
0 2 3 5 6 8 9 11 12 14
Figure 4.4.16

Index Report 2023
Index Report 2023
CHAPTER 5:
Education

Index Report 2023
CHAPTER 5 PREVIEW:
Education
Overview 236 5.2 K–12 AI Education 257
Chapter Highlights 237 United States 257
State-Level Trends 257
5.1 Postsecondary AI Education 238 AP Computer Science 258
CS Bachelor’s Graduates 238 Narrative Highlight: The State of
CS Master’s Graduates 240 International K–12 Education 260
CS PhD Graduates 242

CS, CE, and Information Faculty 246
Narrative Highlight: Who Funds ACCESS THE PUBLIC DATA
CS Departments in the U.S.? 255

Artificial Intelligence Chapter 5: Education
Index Report 2023
Overview
Studying the state of AI education is important for gauging some of the ways in which
the AI workforce might evolve over time. AI-related education has typically occurred
at the postsecondary level; however, as AI technologies have become increasingly
ubiquitous, this education is being embraced at the K–12 level. This chapter examines
trends in AI education at the postsecondary and K–12 levels, in both the United States
and the rest of the world.
We analyze data from the Computing Research Association’s annual Taulbee Survey
on the state of computer science and AI postsecondary education in North America,
Code.org’s repository of data on K–12 computer science in the United States, and a
recent UNESCO report on the international development of K–12 education curricula.

Index Report 2023
Chapter Highlights
More and more AI specialization.
The proportion of new computer science PhD graduates from U.S. universities who specialized in AI
jumped to 19.1% in 2021, from 14.9% in 2020 and 10.2% in 2010.
New AI PhDs increasingly New North American

head to industry. CS, CE, and information
In 2011, roughly the same proportion of
new AI PhD graduates took jobs in industry
faculty hires stayed flat.
In the last decade, the total number of new
(40.9%) as opposed to academia (41.6%).
North American computer science (CS),
Since then, however, a majority of AI PhDs
computer engineering (CE), and information
have headed to industry. In 2021, 65.4% of AI
faculty hires has decreased: There were
PhDs took jobs in industry, more than double
710 total hires in 2021 compared to 733 in
the 28.2% who took jobs in academia.
2012. Similarly, the total number of tenure-
track hires peaked in 2019 at 422 and then
dropped to 324 in 2021.
The gap in external

research funding for
private versus public
Interest in K–12 AI and
American CS departments computer science education
continues to widen. grows in both the United States
In 2011, the median amount of total expenditure
and the rest of the world.
from external sources for computing research
In 2021, a total of 181,040 AP computer
was roughly the same for private and public
science exams were taken by American
CS departments in the United States. Since
students, a 1.0% increase from the previous
then, the gap has widened, with private U.S.
year. Since 2007, the number of AP computer
CS departments receiving millions more in
science exams has increased ninefold. As of
additional funding than public universities.
2021, 11 countries, including Belgium, China,
In 2021, the median expenditure for private
and South Korea, have officially endorsed
universities was $9.7 million, compared to
and implemented a K–12 AI curriculum.
$5.7 million for public universities.

Index Report 2023 5.1 Postsecondary AI Education
5.1 Postsecondary AI Education

CS Bachelor’s Graduates
At the undergraduate level, most AI-related courses interest in AI. In 2021, the total number of new North
are offered as part of a computer science (CS) American CS bachelor’s graduates was 33,059—
curriculum. Therefore, trends in new CS bachelor’s nearly four times greater than in 2012 (Figure 5.1.1).
graduates give us a proxy for undergraduate
New CS Bachelor’s Graduates in North America, 2010–21

Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report
33,059
30,000
Number of New CS Bachelor’s Graduates
25,000
20,000
15,000
10,000
5,000
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 5.1.1

Figure 5.1.2 looks at the proportion of CS bachelor’s graduates in North America who are international
students. The number stood at 16.3% in 2021 and has been steadily increasing since 2012—the proportion
of such students has risen 9.5 percentage points since 2012.
New International CS Bachelor’s Graduates (% of Total) in North America, 2010–21

16.30%
16%
New International CS Bachelor’s Graduates (% of Total)
12%
8%
4%
0%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 5.1.2

CS Master’s Graduates
AI courses are also commonly offered in CS master’s as many master’s graduates as in 2012. However,
degree programs. Figure 5.1.3 shows the total from 2018 to 2021 the total number of new master’s
number of new CS master’s graduates in North graduates plateaued, declining slightly from 15,532 to
America since 2010. In 2021 there were roughly twice 15,068.
New CS Master’s Graduates in North America, 2010–21

16,000
15,068
14,000
Number of New CS Master’s Graduates
12,000
10,000
8,000
6,000
4,000
2,000
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 5.1.3

Interestingly, the number of CS master’s students at North American universities who are international started
declining in 2016 after rising in the early 2010s (Figure 5.1.4). Despite the decline, in 2021 the majority of CS
master’s graduates remained international (65.2%).
New International CS Master’s Graduates (% of Total) in North America, 2010–21

80%
New International CS Master’s Graduates (% of Total)
65.20%
60%
40%
20%
0%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 5.1.4

CS PhD Graduates
Unlike the trends in bachelor’s and master’s CS computer science (Figure 5.1.5). There were fewer
graduates, since 2010 there have not been large CS PhD graduates in 2021 (1,893) than in 2020 (1,997)
increases in the number of new PhD graduates in and 2012 (1,929).
New CS PhD Graduates in North America, 2010–21

2,000
1,893
Number of New CS PhD Graduates
1,500
1,000
500
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 5.1.5

CS PhD graduates in North American universities are becoming increasingly international (Figure 5.1.6). In 2010,
45.8% of CS PhD graduates were international students; the proportion rose to 68.6% in 2021.
New International CS PhD Graduates (% of Total) in North America, 2010–21

70%
68.60%
60%
New International CS PhD Graduates (% of Total)
50%
40%
30%
20%
10%
0%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 5.1.6

Moreover, now a significantly larger proportion of new CS PhD students are specializing in AI (Figure 5.1.7). In
2021, 19.1% of new CS PhD students in North American institutions specialized in AI, a 4.2 percentage point
increase since 2020 and 8.6 percentage point increase since 2012.
New CS PhD Students (% of Total) Specializing in AI, 2010–21

19.10%
18%
16%
New AI PhD Students (% of Total)
14%
12%
10%
8%
6%
4%
2%
0%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 5.1.7

Where do new AI PhDs choose to work following (40.9%) as in academia (41.6%). However, as of 2021
graduation? Mirroring trends reported in last year’s a significantly larger proportion of students (65.4%)
AI Index report, an increasingly large proportion of went to industry after graduation than to academia
AI PhD graduates are heading to industry (Figures (28.2%). The amount of new AI PhDs entering
5.1.8 and 5.1.9). In 2011, for example, roughly the government was 0.7% and has remained relatively
same percentage of graduates took jobs in industry unchanged in the last half-decade.
Employment of New AI PhDs in North America by Employment of New AI PhDs (% of Total) in North
Sector, 2010–21 America by Sector, 2010–21
Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report Source: CRA Taulbee Survey, 2022 | Chart: 2023 AI Index Report
281
Industry
65.44%, Industry
Government
249 60%
250 Academia 238 84
New AI PhD Graduates (% of Total)
219
Number of New AI PhD Graduates
65 50%
201
200 73
178 61
40%
63
154 154
150 60
134 132 136 30%
47 123 28.19%, Academia
72 43
63 51 42
100 195 20%
180
162 153
134
116 10%
50 101
76 85 77
74
64
0% 0.67%, Government
0
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
1 The sums in Figure 5.1.9 do not add up to 100, as there is a subset of new AI PhDs each year who become self-employed, unemployed, or report an “other” employment status
in the CRA survey. These students are not included in the chart.

CS, CE, and 5.1.10 highlights the total number of CS, CE (computer
engineering), and information faculty in North
Information Faculty American universities. The amount of faculty has
To better understand trends in AI and CS education, marginally increased in the last year, by 2.2%. Since
it is instructive to consider data on computer science 2011 the number of CS, CE, and information faculty
faculty in addition to postsecondary students. Figure has grown by 32.8%.
Number of CS, CE, and Information Faculty in North America, 2011–21

Tenure Track Teaching Professors Other Instructors Research Postdoc
8,149
7,858 7,976
8,000 522
7,657 530
7,362 668 306
653 296
7,000 6,806 6,887 691
Number of CS, CE, and Information Faculty
426 861
6,629 465 736
6,478 589
6,314 649 432 617
6,138 689 494
6,000 602 766 390
656 432 1,183 1,150
529 1,180 831 895
515 676 1,122
447 1,014
5,000 863
661 487
669
4,000
3,000
5,059 5,214 5,252 5,231 5,310
4,536 4,549 4,548 4,711 4,786
4,366
2,000
1,000
0
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 5.1.10

In 2021 there were a total of 6,789 CS faculty members in the United States (Figure 5.1.11). The total number
of CS faculty in the United States increased by only 2.0% in the last year, but by 39.0% since 2011.
Number of CS Faculty in the United States, 2011–21

Tenure Track Teaching Professors Other Instructors Research Postdoc
7,000 6,789
6,654
6,430 6,533
428
424
6,098 531 518 287
6,000 276
5,637 5,729 567 382
426 618 693
5,256 5,202 535 491 534
5,068 408 436
5,000 4,885 521 509 364
396 946 899
592 671 715
947
Number of CS Faculty
522
460 455 903
491 826
387
4,000 550 679
521 421
3,000
4,366 4,384 4,390 4,482

2,000 3,971 4,176
3,725 3,880
3,455 3,564 3,559
1,000
0
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 5.1.11

Figure 5.1.12 reports the total number of new CS, hires in 2021, while in 2012 there were 733. Similarly,
CE, and information faculty hires in North American the total number of tenure-track hires peaked in 2019
universities. In the last decade, the total number of at 422 and has since dropped to 324 in 2021.
new faculty hires has decreased: There were 710 total
New CS, CE, and Information Faculty Hires in North America, 2011–21
Total Tenure-Track 878

860
800
800 765
Number of New CS, CE, and Information Faculty Hires
749 749
733
710
691
600 583 572

543
422
396 406
400 374
348 358
320 324
294
249 258
218
200
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 5.1.12

In 2021, the greatest percentage of new CS, CE, and information faculty hires (40%) came straight from
receiving a PhD (Figure 5.1.13). Only 11% of new CS and CE faculty came from industry.
Source of New Faculty in North American CS, CE, and Information Departments, 2011–21
New PhD From Postdoc From Other Academic From Industry
100%
13% 13% 11% 11%
80%
34% 34%
34%
Source of New Faculty
41%
60%
16% 15%
15%
40%
17%
20% 38% 39% 40%

29%
0%
2018 2019 2020 2021
Figure 5.1.13

The share of filled new CS, CE, and information faculty positions in North American universities has remained
relatively stable in the last decade (Figure 5.1.14). In 2021, 89.3% of new faculty positions were filled, compared
to 82.7% in 2011.
Share of Filled New CS, CE, and Information Faculty Positions in North America, 2011–21
90% 89.28%
Share of Filled New CS, CE, and Information Faculty Positions
80%
70%
60%
50%
40%
30%
20%
10%
0%
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 5.1.14

Among open CS, CE, and information faculty positions in 2021, the most commonly cited reason for their
remaining unfilled was offers being turned down (53%) (Figure 5.1.15). In 22% of cases, hiring was still in
progress, while 14% of the time, a candidate had not been identified who met the department’s hiring goals.
Reason Why New CS, CE, and Information Faculty Positions Remained Un lled (% of Total), 2011–21
Didn’t �nd a person who met our hiring goals Technically vacant, not �lled for admin reasons Other
O�ers turned down Hiring in progress
100%
Reason Faculty Positions Remained Un�lled (% of Total)
10% 18%
10% 23%
25% 26% 22%
31% 28% 27%
80% 17%
18%
5% 6%
10%
55%
6%
60% 45% 12%
34%
40%
36%
40% 52% 51% 56% 53%
43%
44%
20% 37% 37% 37%

26% 26%
16% 14% 14% 13% 14%
8%
0%
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 5.1.15

Figure 5.1.16 highlights the median nine-month salaries full professor in computer science made 3.2% more
of CS faculty in the United States by position since than they did in 2020, and 12.8% more than they did in
2015. During that period, the salaries for all classes 2015. (Note: These figures have not been adjusted for
of professors have increased. In 2021, the average inflation.)
Median Nine-Month Salary of CS Faculty in United States, 2015–21

180 Full Professor Associate Professor Assistant Professor 176.01

168.87 170.57
164.54
159.96
Median Salary of CS Faculty (in Thousands of U.S. Dollars)
158.97
160 156.02
140
127.47
121.55 123.71
117.5 119.48
120 113.95 114.07
111.67 109.23
105.45 107.55
101.16 103.01
99.12
100
80
60
40
20
0
2015 2016 2017 2018 2019 2020 2021
Figure 5.1.16

What proportion of new CS, CE, and information faculty tenure-track hires are international? The data suggests
that it is not a substantial proportion. In 2021, only 13.2% of new CS, CE, and information faculty hires were
international (Figure 5.1.17).
New International CS, CE, and Information Tenure-Track Faculty Hires (% of Total) in North America,
2010–21
25%
New International Tenure-Track Faculty Hires (% of Total)
20%
15%
13.20%
10%
5%
0%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 5.1.17

The majority of CS, CE, and Information faculty losses in North American departments (36.3%) were the result
of faculty taking academic positions elsewhere (Figure 5.1.18). In 2021, 15.2% of faculty took nonacademic
positions, which is roughly the same amount as those who took such positions a decade prior, in 2011 (15.9%).
Faculty Losses in North American CS, CE, and Information Departments, 2011–21
Died Took academic position elsewhere Remained, but changed to part-time Unknown
Retired Took nonacademic position Other
327
312
303 20 303
300
270 23 37
43
246 22
250 46
232 237 234 34
221 20 33
213 20
22 42
Faculty Losses
200
36 24
44 26 139
27 32
126 110
113
150 89
34 77
62 85
74 86
100 52
90 103 100
50 89 94 94 91
67 74 65 80
0
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 5.1.18

Who Funds CS Departments in the U.S.?
The CRA tracks data on the external funding sources agencies such as the Army Research Office,
of CS departments in the United States. The main the Office of Naval Research, and the Air Force
funder of American CS departments continues to Research Laboratory (20.3%); industrial sources
be the National Science Foundation (NSF), which (12.1%); the Defense Advanced Research Projects
in 2021 accounted for 34.9% of external funds. Agency (DARPA) (8.8%); and the National
However, the share of funding provided by NSF has Institutes of Health (NIH) (6.8%). The diminishing
decreased since 2003 (Figure 5.1.19). In 2021, the share of NSF funds over time has been partially
next largest sources of funding came from defense offset by increasing funds from industry and NIH.
External Funding Sources (% of Total) of CS Departments in United States, 2003–21

45%
40%
35% 34.90%, NSF

External Funding Sources (% of Total)
30%
25%
20% 20.30%, Other Defense
15% 12.10%, Industrial Sources

8.80%, DARPA
6.80%, NIH
10% 4.90%, Private Foundation
4.60%, Other
3.60%, Other Federal
5% 2.30%, DOE
1.50%, State Agencies
0.40%, Unallocated
0% 0.00%, IMLS
2003 2006 2009 2012 2015 2018 2021
Figure 5.1.19

Who Funds CS Departments in the U.S.? (cont’d)
Figure 5.1.20 shows the median total expenditures Although total median expenditures have
from external sources for computing research in increased over the last decade for both private and
American CS departments. In 2021, the median total public CS departments, the gap in expenditure
expenditure for private universities was $9.7 million has widened, with private universities beginning to
compared with $5.7 million for public universities. significantly outspend public ones.
Median Total Expenditure From External Sources for Computing Research of U.S. CS Departments, 2011–21
10
9.71, Private
Median Total Expenditure (in Millions of U.S. Dollars)
6
5.69, Public
0
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 5.1.20

Index Report 2023 5.2 K–12 AI Education
The following subsection shows trends in K–12 AI education based on K–12 computer science education data in the United States as well
as survey data from UNESCO on the state of global K–12 AI education.
5.2 K–12 AI Education

United States States Requiring That All High Schools Offer a Computer Science
Course, 2022
Source: Code.org, 2022 | Chart: 2023 AI Index Report
Data on the state of K–12 CS
AK ME
education in the United States
comes from Code.org, an VT NH MA
education innovation nonprofit
dedicated to ensuring that WA MT ND SD MN WI MI NY CT RI
every school includes computer
OR ID WY NE IA IL IN OH PA NJ
science as part of its core K–12
education. Tracking trends CA NV UT CO KS MO KY WV DC MD DE
in K–12 CS education can
partially serve as a proxy for AZ NM OK AR TN VA NC
understanding the state of K–12

TX LA MS AL GA SC
AI education in America
HI FL Yes
State-Level Trends No
Figure 5.2.1 highlights the 27 Figure 5.2.1
states that in 2022 required that
all high schools offer a computer Public High Schools Teaching Computer Science (% of Total in State),
science course. 2022
Figure 5.2.2 highlights the AK ME

51% 60%
percentage of public high
VT NH MA
schools in a state that teach 76% 82% 78%
computer science. The top WA MT ND SD MN WI MI NY CT RI
47% 36% 44% 39% 21% 66% 46% 48% 77% 86%
three states in terms of rate of
computer science teaching are OR ID WY NE IA IL IN OH PA NJ
63% 38% 55% 52% 71% 44% 85% 48% 77% 67%
Maryland (98%), South Carolina
(93%), and Arkansas (92%). 40% 83% 73% 57% 40% 49% 63% 78% 45% 98% 40%
36% 41% 62% 92% 60% 75% 61%
TX LA MS AL GA SC
47% 32% 60% 85% 66% 93%
HI FL
77% 40%
Figure 5.2.2

AP Computer Science a total of 181,040 AP computer science exams taken,

Another barometer for tracking the state of K–12 CS roughly the same number as the previous year, after
education in the United States is analyzing trends in the several years of significant increases. This leveling
total number of AP computer science exams taken. 2
could be the result of the pandemic. Since 2007, the
number of AP computer science exams has increased
Year over year the total number of AP computer
over ninefold.
science exams continued to increase. In 2021, the
most recent year for which there is data, there were
Number of AP Computer Science Exams Taken, 2007–21

181.04
180
Number of AP Computer Science Exams Taken (in Thousands)
160
140
120
100
80
60
40
20
0
2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 5.2.3
2 There are two types of AP CS exams: Computer Science A and Computer Science Principles. Data on computer science exams taken includes both exams. AP CS Principles
was initially offered in 2017.

In 2021, the states which Number of AP Computer Science Exams Taken, 2021
saw the greatest number of
AP computer science exams AK ME
100 242
taken were California (31,189),
VT NH MA
followed by Texas (17,307), 150 403 5,451
Florida (14,864), New York WA MT ND SD MN WI MI NY CT RI
4,034 42 109 26 1,432 2,080 4,504 13,304 3,251 617
(13,304), and New Jersey
(9,391) (Figure 5.2.4). OR ID WY NE IA IL IN OH PA NJ
714 429 112 514 521 8,572 2,883 3,754 6,104 9,391
Figure 5.2.5 looks at the CA NV UT CO KS MO KY WV DC MD DE
31,189 1,701 612 2,584 236 1,199 1,462 352 352 7,662 513
number of AP CS exams
taken per capita.3 The state 1,587 270 500 1,406 2,046 6,034 6,273
with the largest per capita TX LA MS AL GA SC
17,307 1,191 400 2,399 7,221 2,159
amount of AP computer
science exams taken in HI FL
782 14,864
2021 was Maryland, with Figure 5.2.4
124.1 exams per 100,000

inhabitants. The next states
were New Jersey (101.3), Number of AP Computer Science Exams Taken per 100,000 Inhabitants,
Connecticut (89.7), California 2021
(79.7), and Massachusetts
AK ME
(78.0). 13.62 17.57
VT NH MA
23.18 29.04 77.99
WA MT ND SD MN WI MI NY CT RI
52.11 3.80 14.01 2.90 25.07 35.37 44.87 67.00 89.72 56.25
16.78 22.53 19.33 26.18 16.29 67.57 42.31 31.91 46.91 101.33
79.68 54.06 18.33 44.47 8.03 19.43 32.44 19.71 52.63 124.09 51.05
21.84 12.76 12.53 46.43 29.36 69.70 59.37
TX LA MS AL GA SC
58.55 25.74 13.56 47.51 66.94 41.57
HI FL
54.04 68.10 Figure 5.2.5
3 More specifically, Figure 5.2.5 normalizes the number of AP CS exams taken—the total number of exams taken in a particular state in 2021 is divided by the state’s
population based on the 2021 U.S. Census.

The State of International K–12 Education
In 2021, UNESCO released one of the most Figure 5.2.6, taken from the UNESCO report,
comprehensive reports to date on the international highlights the governments that have taken steps
state of government-endorsed AI curricula. To to implement AI curricula and across which levels
gather information, UNESCO released two surveys: of education. For example, Germany is in the
the first to representatives of 193 UNESCO member process of developing government-endorsed AI
states and the second to over 10,000 private- curricular standards on the primary, middle, and
and third-sector actors. As part of these surveys, high-school levels, and the Chinese government
respondents were asked to report on the status of AI has already endorsed and implemented
curricula for students in K–12 general education. standards across those same three levels.
Government Implementation
Implementationof
ofAI
AICurricula
Curriculaby
byCountry,
Country,Status,
Status,and
andEducation
EducationLevel
Level
UNESCO, 2022
Source: UNESCO, 2022 || Table:
Table:2023
2023AI
AIIndex
IndexReport
Report
Country Status
Status PrimarySchool
Primary School Middle
Middle School
School High
High School
School
Armenia Endorsedand
Endorsed andImplemented
Implemented ✓✓ ✓✓
Austria Endorsed
Endorsedand
andImplemented
Implemented ✓✓
Belgium Endorsed
Endorsedand
andImplemented
Implemented ✓✓
China Endorsed
Endorsedand
andImplemented
Implemented ✓✓ ✓✓ ✓✓
India Endorsed
Endorsedand
andImplemented
Kuwait Endorsed
Endorsedand
andImplemented
Portugal
Portugal Endorsed
Endorsedand
andImplemented
Qatar
Qatar Endorsed
Endorsedand
andImplemented
Serbia
Serbia Endorsed
Endorsedand
andImplemented
South
South Korea
Korea Endorsed
Endorsedand
andImplemented
Implemented ✓✓
United
United Arab
Arab Emirates
Emirates Endorsed
Endorsedand
andImplemented
Bulgaria
Bulgaria In Development
In Development ✓✓ ✓✓ ✓✓
Germany
Germany In
InDevelopment
Development ✓✓ ✓✓ ✓✓
Jordan
Jordan In
InDevelopment
Development ✓✓ ✓✓
Saudia
Saudia Arabia
Arabia In
InDevelopment
Development ✓✓ ✓✓ ✓✓
Serbia
Serbia In
InDevelopment
Development ✓✓ ✓✓
Figure 5.2.64
4 According to the UNESCO report, Serbia has already endorsed and implemented certain kinds of K–12 AI curricula, but is also simultaneously in the process of
developing others—thus it is listed under both categories.

The State of International K–12 Education (cont’d)
Figure 5.2.7 identifies the topic areas most emphasized in the K–12 AI curricula profiled in the UNESCO
report. The four topics toward which the most time was allocated were algorithms and programming (18%),
AI technologies (14%), data literacy (12%), and application of AI to other domains (12%).
Time Allocated (% of Total) in K–12 AI Curricula by Topic, 2022

Source: UNESCO, 2022 | Chart: 2023 AI Index Report
Algorithms and Programming 18%
Data Literacy 12%
Contextual Problem-Solving 11%
Application of AI to Other Domains 12%
Ethics of AI 7%
Social Implications of AI 5%
AI Technologies 14%
Developing AI Technologies 9%
AI Foundations
AI Techniques 2% Ethics and Social Impact
Understanding, Using, and Developing AI
Unspeci ed 10% Unspeci ed
0% 2% 4% 6% 8% 10% 12% 14% 16% 18%

% of Time Allocated
Figure 5.2.7

The State of International K–12 Education (cont’d)
What might an actual K–12 AI curriculum look
like in practice? The UNESCO report includes
detailed information about a sample curriculum
that was deployed in Austria, the Austrian Data
Science and Artificial Intelligence curriculum.
As noted in the report:
“The Austrian Data Science and Artificial

“They also gain an
Intelligence curriculum includes digital basics such
as using an operating system to store and print understanding of the
files, design presentations, and use spreadsheets ethical dilemmas that
and word-processing software. It also covers
design and reflection on types and social issues in are associated with the
digital media, and safe digital media use. Students use of such technologies,
in high school engage programming languages,
algorithms and simulations. They learn the basic
and become active
principles of data literacy, including collecting participants in social
data, structuring a spreadsheet, and carrying out
analyses and visualizations. They apply criteria
discourse on these
to evaluate the credibility and reliability of data issues.”
sources as well as digital content. Students are
expected to know about careers in ICT, including
AI, and the social applications of emerging
technologies. They create digital media and learn
about the cloud and how to connect and network
computers. They also gain an understanding of
the ethical dilemmas that are associated with
the use of such technologies, and become active
participants in social discourse on these issues.
Finally, students are tasked with using technology
to make public statements and understand how
this reflects the democratic process.”

Index Report 2023
Index Report 2023
CHAPTER 6:
Policy and
Governance

Index Report 2023
CHAPTER 6 PREVIEW:
Policy and Governance

Overview 265 6.2 National AI Strategies 285
Chapter Highlights 266 Aggregate Trends 285
6.1 AI and Policymaking 267
Global Legislation Records on AI 267 6.3 U.S. Public Investment in AI 286
By Geographic Area 269 Federal Budget for Nondefense AI R&D 286
Narrative Highlight: A Closer Look at U.S. Department of Defense
Global AI Legislation 270 Budget Requests 287
United States Federal AI Legislation 271 U.S. Government AI-Related
United States State-Level AI Legislation 272 Contract Spending 288
Narrative Highlight: A Closer Look Total Contract Spending 288

at State-Level AI Legislation 275
Global AI Mentions 276 6.4 U.S. AI-Related Legal Cases 291
By Geographic Area 277 Total Cases 291
Narrative Highlight: A Closer Look Geographic Distribution 292
at Global AI Mentions 279 Sector 293
United States Committee Mentions 280 Type of Law 294
United States AI Policy Papers 283 Narrative Highlight: Three Significant
By Topic 284 AI-Related Legal Cases 295

Artificial Intelligence Chapter 6: Policy and Governance
Index Report 2023
Overview
The growing popularity of AI has prompted intergovernmental, national, and
regional organizations to craft strategies around AI governance. These actors are
motivated by the realization that the societal and ethical concerns surrounding AI
must be addressed to maximize its benefits. The governance of AI technologies has
become essential for governments across the world.
This chapter examines AI governance on a global scale. It begins by highlighting the

countries leading the way in setting AI policies. Next, it considers how AI has been
discussed in legislative records internationally and in the United States. The chapter
concludes with an examination of trends in various national AI strategies, followed
by a close review of U.S. public sector investment in AI.

Index Report 2023
Chapter Highlights
Policymaker interest When it comes to AI,
in AI is on the rise. policymakers have
An AI Index analysis of the legislative records a lot of thoughts.
of 127 countries shows that the number of bills A qualitative analysis of the
containing “artificial intelligence” that were parliamentary proceedings of a
passed into law grew from just 1 in 2016 to 37 in diverse group of nations reveals
2022. An analysis of the parliamentary records on that policymakers think about AI
AI in 81 countries likewise shows that mentions from a wide range of perspectives.
of AI in global legislative proceedings have For example, in 2022, legislators in
increased nearly 6.5 times since 2016. the United Kingdom discussed the
risks of AI-led automation; those
in Japan considered the necessity
of safeguarding human rights in
From talk to enactment— the face of AI; and those in Zambia
the U.S. passed more looked at the possibility of using AI
AI bills than ever before. for weather forecasting.
In 2021, only 2% of all federal AI bills in the
United States were passed into law. This number
jumped to 10% in 2022. Similarly, last year 35%
of all state-level AI bills were passed into law. The legal world is
waking up to AI.
In 2022, there were 110 AI-related
legal cases in United States state
The U.S. government and federal courts, roughly seven
continues to increase times more than in 2016. The
spending on AI. majority of these cases originated
Since 2017, the amount of U.S. government in California, New York, and Illinois,
AI-related contract spending has increased and concerned issues relating to
roughly 2.5 times. civil, intellectual property, and
contract law.

Index Report 2023 6.1 AI and Policymaking
In the last 10 years, AI governance discussions have accelerated, resulting in numerous policy proposals in various legislative bodies. This
section begins by exploring the legislative initiatives related to AI that have been suggested or enacted in different countries and regions,
followed by an in-depth examination of state-level AI legislation in the United States. The section then scrutinizes records of AI-related
discussions in parliaments and congresses worldwide and concludes with the number of AI policy papers published in the United States.
6.1 AI and Policymaking1

Global Legislative
passed at least one AI-related bill, and together they
Records on AI have passed a total of 123 AI-related bills (Figure 6.1.1).
The AI Index conducted an analysis of laws passed Figure 6.1.2 shows that from 2016 to 2022, there has
by legislative bodies in 127 countries that contain the been a sharp increase in the total number of AI-related
words “artificial intelligence” from 2016 to 2022.2 bills passed into law, with only one passed in 2016,
Of the 127 countries analyzed, since 2016, 31 have climbing to 37 bills passed in 2022.
Number of AI-Related Bills Passed Into Law by Country, 2016–22

0
1–5
6–10
11–15
16–25
No Available Data
Figure 6.1.1
1 Note that the analysis of passed AI policies may undercount the number of actual bills, given that large bills can include multiple sub-bills related to AI; for example, the CHIPS and Science
Act passed by the U.S. in 2022.
2 The full list of countries analyzed is in the Appendix. The AI Index team attempted to research the legislative bodies of every country in the world; however, publicly accessible legislative
databases were not made available for certain countries.

Number of AI-Related Bills Passed Into Law in 127 Select Countries, 2016–22
37
35
30
Number of AI-Related Bills
25
20
15
10
2016 2017 2018 2019 2020 2021 2022
Figure 6.1.2

By Geographic Area the Philippines, which passed 5 and 4 laws, respectively.

Figure 6.1.3 shows the number of laws containing Figure 6.1.4 shows the total number of laws passed
mentions of AI that were enacted in 2022. The United since 2016. The United States leads the list with 22 bills,
States led the list with 9 laws, followed by Spain and followed by Portugal, Spain, Italy, and Russia.
Number of AI-Related Bills Passed Into Law in Select Countries, 2022

United States 9
Spain 5
Philippines 4
Andorra 2
Belgium 2
Italy 2
Portugal 2
Russia 2
United Kingdom 2
Austria 1
Croatia 1
Germany 1
Kyrgyz Republic 1
Latvia 1
Liechtenstein 1
Slovenia 1
0 1 2 3 4 5 6 7 8 9
Number of AI-Related Bills Figure 6.1.3
Number of AI-Related Bills Passed Into Law in Select Countries, 2016–22 (Sum)
United States 22
Portugal 13
Spain 10
Italy 9
Russia 9
Belgium 7
United Kingdom 6
Austria 5
Korea, Rep. 5
Philippines 5
France 4
China 3
Germany 3
Japan 3
0 2 4 6 8 10 12 14 16 18 20 22
Number of AI-Related Bills Figure 6.1.4

A Closer Look at Global AI Legislation
The following subsection delves into some of the AI-related legislation passed into law during 2022.
Figure 6.1.5 samples five different countries’ laws covering a range of AI-related issues.
AI-Related
AI-Related
AI-Related Legislation
Legislation From
From Select
Select Countries,
Countries, 2022
2022
Source:
Source: AI
Source:AI Index,
AIIndex, 2022
2022|||Table:
Index,2022 Table: 2023
Table:2023 AI
2023AI
AIIndex
IndexReport
Report
Country
Country
Country Bill
Bill
BillName
Name Description
Description
Kyrgyz
Kyrgyz Republic
KyrgyzRepublic About
About the
About the Creative
Creative Industries
Industries Park This
Republic Park Thislaw
lawdetermines
determinesthe
thelegal
legalstatus,
status,management,
management,and andoperation
and operation
operation
procedures
proceduresof
ofthe
theCreative
CreativeIndustries
IndustriesPark,
Park,established
establishedto
established toaccelerate
to acceleratethe
accelerate the
the
development of creative industries, including arti�cial
development of creative industries, includingarti�cial intelligence.
arti�cialintelligence.
intelligence.
Latvia
Latvia Amendments
Amendments
Amendmentsto
tothe
theNational
NationalSecurity
SecurityLaw AAprovision
Latvia Law provisionof
ofthis
thisact
actestablishes
establishesrestrictions
restrictionsononcommercial
commercialcompanies,
commercial companies,
companies,
associations,
associations,and
andfoundations
foundationsimportant
importantforfornational
nationalsecurity,
national security,including
security, includingaaa
including
commercial
commercialcompany
companythat
thatdevelops
developsarti�cial
arti�cialintelligence.
intelligence.
intelligence.
Philippines
Philippines Second
Second
Philippines SecondCongressional
CongressionalCommission
Commissionon
onEducation
Education AAprovision
provisionofofthis
thisact
actcreates
createsaacongressional
congressionalcommission
commissionto
commission toreview,
to review,
review,
(EDCOM
(EDCOM
(EDCOMII)
II)Act assess,
Act assess,and
andevaluate
evaluatethethestate
stateof
ofPhilippine
Philippineeducation;
education;to
education; torecommend
to recommend
recommend
innovative
innovativeand
andtargeted
targetedpolicy
policyreforms
reformsinineducation;
education;and
education; andto
and toappropriate
to appropriate
appropriate
funds.
funds.The
Theact
actcalls
callsfor
forreforms
reformsto tomeet
meetthe
thenew
newchallenges
challengesto
challenges toeducation
to education
education
caused
causedbybythe
theFourth
FourthIndustrial
IndustrialRevolution
Revolutioncharacterized,
characterized,in
characterized, inpart,
in part,by
part, bythe
by the
the
rapid development of arti�cial intelligence.
rapid development of arti�cial intelligence.
Spain
Spain Right
Right
Rightto
toequal
equaltreatment
treatmentand
andnon-discrimination AAprovision
Spain non-discrimination provisionofofthis
thisact
actestablishes
establishesthat
thatarti�cial
arti�cialintelligence
intelligencealgorithms
intelligence algorithms
algorithms
involved
involvedin
inpublic
publicadministrations’
administrations’decision-making
decision-makingtaketakeinto
take intoaccount
into account
account
bias-minimization
bias-minimizationcriteria,
criteria,transparency,
transparency,and
andaccountability,
accountability,whenever
accountability, whenever
whenever
technically
technicallyfeasible.
feasible.
United
United States
UnitedStates AI
AI
AITraining
TrainingAct This
States Act Thisbill
billrequires
requiresthe
theO�ce
O�ceof ofManagement
Managementand andBudget
Budgetto
Budget totoestablish
establishor
establish or
or
otherwise
otherwiseprovide
otherwise providean
provide anAI
an AItraining
AI trainingprogram
training programfor
program forthe
for theacquisition
the acquisitionworkforce
acquisition workforceof
workforce of
of
executive
executiveagencies
executive agencies(e.g.,
agencies (e.g.,those
(e.g., thoseresponsible
those responsiblefor
responsible forprogram
for programmanagement
program managementor
management or
or
logistics),
logistics),with
logistics), withexceptions.
with exceptions.The
exceptions. Thepurpose
The purposeof
purpose ofthe
of theprogram
the programis
program isisto
toensure
to ensurethat
ensure that
that
the
the workforce has knowledge of the capabilities and risks associated with
theworkforce
workforcehashasknowledge
knowledgeof ofthe
thecapabilities
capabilitiesandandrisks
risksassociated
associatedwith
with
AI.
AI.
AI.
Figure 6.1.5

United States Federal just one federal bill was proposed, while in 2021, 134
AI Legislation bills were proposed. In 2022 this number fell to 88
A closer look at the U.S. federal legislative record proposed bills. While fewer bills were proposed in
shows a sharp increase in the total number of 2022, the number of passed bills, which remained at
proposed bills that relate to AI (Figure 6.1.6). In 2015, 3 for each of the past four years, increased to 9.
Number of AI-Related Bills in the United States, 2015–22 (Proposed Vs. Passed)
140
120
100
88, Proposed
80
60
40
20
9, Passed
0
2015 2016 2017 2018 2019 2020 2021 2022

Figure 6.1.6

United States State-Level Maryland with 3. Figure 6.1.8 shows the total volume of
legislation passed from 2016 to 2022 for select states,
AI Legislation with Maryland leading the list with 7 bills, followed by
Figure 6.1.7 shows the number of laws containing California, Massachusetts, and Washington. Figure
mentions of AI that were passed by U.S. states in 6.1.9 highlights the number of state-level AI-related
2022. California leads the list with 5, followed by bills passed by all states since 2016.
Number of AI-Related Bills Passed Into Law in Select U.S. States, 2022
California 5
Maryland 3
Colorado 2
New Jersey 2
Washington 2
Alabama 1
Hawaii 1
Idaho 1
Louisiana 1
Massachusetts 1
North Carolina 1
Vermont 1
0 1 2 3 4 5
Figure 6.1.7

Number of AI-Related Bills Passed Into Law in Select U.S. States, 2016–22 (Sum)
Maryland 7
California 6
Massachusetts 5
Washington 5
Illinois 3
Utah 3
Vermont 3
Alabama 2
Colorado 2
Michigan 2
New Jersey 2
New York 2
North Carolina 2
Ohio 2
0 1 2 3 4 5 6 7
Figure 6.1.8
Number of State-Level AI-Related Bills Passed Into Law in the (Sum)

United States by State, 2016–22 (Sum)
AK ME
0 0
VT NH MA
3 0 5
5 0 1 0 0 0 2 2 0 0
0 1 0 0 0 3 0 2 0 2
CA NV UT CO KS MO KY WV MD DE
6 1 3 2 0 0 1 1 7 0
0 0 0 0 0 1 2
TX LA MS AL GA SC
2 1 1 2 0 0
HI FL
1 0
Figure 6.1.9

Growing policy interest in AI can also be seen at the state level, with 60 AI-related bills proposed in 2022
(Figure 6.1.10)—a dramatic increase from the 5 bills proposed in 2015. Additionally, the proportion of bills being
passed has risen throughout the years. In 2015, 1 bill was passed, representing 16% of the total bills proposed
that year; while in 2022, 21 bills were passed, or 35% out of the total that were proposed.
Number of State-Level AI-Related Bills in the United States, 2015–22 (Proposed Vs. Passed)
60 60, Proposed
50
40
30
20 21, Passed
10
2015 2016 2017 2018 2019 2020 2021 2022

Figure 6.1.10

A Closer Look at State-Level AI Legislation
The following subsection highlights some of the AI-related legislation passed into law at the state level
during 2022. Figure 6.1.11 focuses on wide-ranging AI-related laws from five states around the country.
AI-Related Legislation
AI-Related Legislation From
From Select
Select States,
States, 2022
2022
Source:AI
Source:
Source: AIIndex,
AI Index,2022
Index, 2022|||Table:
2022 Table:2023
Table: 2023AI
2023 AIIndex
AI IndexReport
Index Report
Report
State
State
State BillName
Bill
Bill Name
Name Description
Description
Alabama
Alabama
Alabama Arti�cialIntelligence,
Arti�cial
Arti�cial Intelligence,Limit
Intelligence, Limit the
Limit the Use
Use This bill
This bill prohibits
prohibits state
state or
or local
local law
law enforcement
enforcement agencies
agenciesfrom
fromusing
usingfacial
facialrecognition
recognition
ofFacial
of
of FacialRecognition,
Facial Recognition,to
Recognition, toEnsure
to Ensure
Ensure matchresults
match resultsas
asthe
thesole
solebasis
basisfor
formaking
makingananarrest
arrestor
orfor
forestablishing
establishingprobable
probablecause
causein
inaa
Arti�cialIntelligence
Arti�cial
Arti�cial IntelligenceIs
Intelligence IsNot
Is Notthe
Not theOnly
Only criminalinvestigation.
criminal investigation.
Basisfor
Basis
Basis forArrest
for Arrest
Arrest
California
California BudgetAct
Budget Actof
Act of2022
of 2022
2022
California Budget AAprovision
provisionof
ofthis
thisappropriations
appropriationsbill
billfor
forthe
the2022–23
2022–23�scal
�scalyear
yearallocates
allocates$1,300,000
$1,300,000to
to
CaliforniaState
California StateUniversity,
University,Sacramento,
Sacramento,to toimprove
improvethe
thecampus
campuschildcare
childcarecenter,
center,
includingthe
including thedevelopment
developmentof ofan
anarti�cial
intelligencemixed-reality
mixed-realityclassroom.
classroom.
Maryland
Maryland
Maryland ConservationFinance
Conservation
Conservation FinanceAct
Finance Act
Act AAprovision
provisionof
ofthis
thisact
actestablishes
establishesthat
thatthe
theDepartment
Departmentof ofNatural
NaturalResources
Resourcesshall
shallstudy
study
andassess
and assessthe
thepotential
potentialfor
fordigital
digitaltools
toolsand
andplatforms
platformsincluding
includingarti�cial
intelligenceand
and
machine learning
machine learning to
to contribute
contribute to
to Chesapeake
Chesapeake Bay
Bay restoration
restorationand
andclimate
climatesolutions.
solutions.
NewJersey
New
New Jersey
Jersey 21stCentury
21st
21st CenturyIntegrated
Century IntegratedDigital
Integrated Digital
Digital AAprovision
provisionof
ofthis
thisact,
act,which
whichconcerns
concernsthe themodernization
modernizationof ofstate
stategovernment
governmentwebsites,
websites,
ExperienceAct
Experience
Experience Act
Act establishesthat
establishes thatthe
thechief
chieftechnology
technologyo�cer,
o�cer,ininconsultation
consultationwith
withthe
thechief
chiefinnovation
innovation
o�cerand
o�cer andthe
theNew
NewJersey
JerseyInformation
InformationTechnology
TechnologyProject
ProjectReview
ReviewBoard,
Board,shall
shallevaluate
evaluate
on an
on an annual
annual basis
basis the
the feasibility
feasibility of
of state
state agencies
agencies using
using arti�cial
intelligenceand
and
machinelearning
machine learningtotoprovide
providepublic
publicservices.
services.
Vermont
Vermont
Vermont AnAct
An
An ActRelating
Act Relatingto
Relating tothe
to theUse
the Useand
Use and
and This act
This act creates
creates the
the Division
Division of
of Arti�cial
Arti�cial Intelligence
Intelligence within
withinthe
theAgency
Agencyof ofDigital
DigitalServices
Services
Oversightof
Oversight
Oversight ofArti�cial
of Arti�cialIntelligence
Arti�cial Intelligencein
Intelligence in toreview
to reviewall
allaspects
aspectsofofarti�cial
intelligencedeveloped,
developed,employed,
employed,or orprocured
procuredby bythe
the
StateGovernment
State
State Government
Government stategovernment.
state government.The Theact
actrequires
requiresthetheDivision
Divisionof ofArti�cial
Arti�cialIntelligence
Intelligenceto,
to,among
amongother
other
things,propose
things, proposeaastate
statecode
codeofofethics
ethicsononthe
theuse
useof
ofarti�cial
intelligencein
instate
state
government and
government and make
make recommendations
recommendations to to the
the General
General Assembly
Assemblyon onpolicies,
policies,laws,
laws,and
and
regulationsregarding
regulations regardingarti�cial
intelligencein instate
stategovernment.
government.
Figure 6.1.11

Global AI Mentions
Another barometer of legislative interest is the contain the keyword “artificial intelligence” from 2016
number of mentions of “artificial intelligence” in to 2022.3 Figure 6.1.12 shows that mentions of AI in
governmental and parliamentary proceedings. The legislative proceedings in these countries registered a
AI Index conducted an analysis of the minutes or small decrease from 2021 to 2022, from 1,547 to 1,340.
proceedings of legislative sessions in 81 countries that
Number of Mentions of AI in Legislative Proceedings in 81 Select Countries, 2016–22

1,600
1,400
1,340
1,200
Number of Mentions
1,000
800
600
400
200
2016 2017 2018 2019 2020 2021 2022
Figure 6.1.12
3 The full list of countries that was analyzed is in the Appendix. The AI Index research team attempted to review the governmental and parliamentary proceedings of every country in the
world; however, publicly accessible governmental and parliamentary databases were not made available for all countries.

By Geographic Area
Figure 6.1.13 shows the number of legislative proceedings containing mentions of AI in 2022.4 From the 81
countries considered, 46 had at least one mention, and Spain topped the list with 273 mentions, followed by
Canada (211), the United Kingdom (146), and the United States (138).
Number of Mentions of AI in Legislative Proceedings by Country, 2022

0
1–55
56–110
111–165
166–220
221–280
No Available Data
Figure 6.1.13
4 For mentions of AI in legislative proceedings around the world, the AI Index performed searches of the keyword “artificial intelligence,” in the respective languages, on the websites of
different countries’ congresses or parliaments, usually under sections named “minutes,” “Hansard,” etc.
Table of Contents Chapter 6 Preview 27 7

Figure 6.1.14 shows the total number of AI mentions in the past seven years. Of the 81 countries considered, 62 had
at least one mention, and the United Kingdom dominates the list with 1,092 mentions, followed by Spain (832), the
United States (626), Japan (511), and Hong Kong (478).
Number of Mentions of AI in Legislative Proceedings by Country, 2016–22 (Sum)

0
1–220
221–440
441–660
661–880
881–1100
No Available Data
Figure 6.1.14

A Closer Look at Global AI Mentions
The following subsection examines mentions of AI in government proceedings in 2022. Figure 6.1.15
quotes discussions across a geographically diverse set of countries.
AI-Related
AI-Related
AI-Related Parliamentary
ParliamentaryMen
Mentions
tionsFrom
FromSelect
SelectCountries,
Countries,2022
2022
Source:
Source: AI
Source:AI Index,
AIIndex, 2022
2022 |||Table:
Index,2022 Table: 2023
Table:2023 AI
2023AI Index
AIIndex Report
IndexReport
Report
Country
Country
Country Legislature
Legislature
Legislature Speaker
Speaker
Speaker Quote
Quote Agenda
AgendaItem
Item
Australia
Australia
Australia House
House of
Houseofof Ed
Ed Husic,
EdHusic, Australian
Husic,Australian
AustralianLabor
Labor “Working
“Workingwith withour
ourinternational
internationalpartners
partnerswe wecancan National
NationalReconstruction
Reconstruction
Representatives
Representatives
Representatives Party,
Party, Minister
Party,Minister
Ministerfor
forIndustry
Industry transform
transformAustralian
Australianknow-how
know-howinto intoglobally
globallyrecognised
recognised Fund
FundCorporation
CorporationBill
Bill 2022
and
and Science
andScience
Science skills
skillsand
andmanufacturing
manufacturinginindefence
defenceindustries.
industries.AndAndwewe 2022 - Second
- Second Reading
Reading
can
canbuild
buildon
onour
ourundeniable
undeniableexpertise
expertiseininareas
areaslike
like
quantum
quantumtechnologies,
technologies,robotics
roboticsandandarti
arti cial
cial
intelligence.
intelligence.We Wewill
willseek
seekto
topartner
partnerwith
withindustry
industryandand
state
stateandandterritory
territorygovernments
governmentstotoidentify
identifyinvestment
investment
opportunities
opportunitieswithin
withinpriority
priorityareas.
areas.An Anon-ramp,
on-ramp,ififyouyou
will,
will,ofofturn-key
turn-keyopportunities
opportunitiesforforinvestment
investmenttotomakemake
sure the NRF is well placed for success.”
sure the NRF is well placed for success.”
Brazil
Brazil
Brazil Diary
Diary of
Diaryof the
ofthe
the Mr.
Mr. Gustavo
Mr.Gustavo Fruet,
GustavoFruet,
Fruet, “There
“Therehashasbeen
beenaalotlotofoftalk
talkabout
aboutthe
thefuture
futureofofwork
workduedue Presentation
PresentationofofBill
BillNo.
No.
Chamber
Chamber
Chamberof of the
ofthe
the Democratic
Democratic Labor
DemocraticLabor
LaborParty
Party to
totechnology.
technology.InInthe
thebook
bookThe TheFourth
FourthIndustrial
Industrial 135,
135,ofof2022,
2022,on
onthe
the
Members
Members
Members Revolution,
Revolution,Klaus
KlausSchwab
Schwabeven evenpoints
pointsout
outprofessions
professions amendment
amendmentofofthe theCLT
CLT- -
that
thatwill
willbe
beextinct
extinctandandprofessions
professionsthat
thatwill
willdemand
demand Consolidation
ConsolidationofofLabor
Labor
more
more and more quali�cations, in times of 5G,Internet
and more quali�cations, in times of 5G, Internetofof Laws,
Laws,with
withaaview
viewtoto
Things
ThingsandandArti
Arti cial
cialIntelligence.
Intelligence.InInthis
thissense,
sense,ititisisgood
good granting
grantingtelework
teleworktoto
to
tohighlight
highlightthat
thatthe
thepandemic,
pandemic,among amongother
other parents
parentsofofchildren
childrenup uptoto88
contradictions,
contradictions,ended
endedup upanticipating
anticipatingthe
theuse
useofof years
yearsoldold
technology,
technology,especially
especiallyininthethetelework.”
telework.”
Japan
Japan 210th Session
210th Session of Kohei Otsuka,
Otsuka, Democratic “In the �eld of human rights, we believe that it is The Commission on the
Japan 210th Sessionof
of Kohei
Kohei Otsuka, Democratic “In the �eld of human rights, we believe that it is The Commission on the
the Diet
the Diet House
House of Party for
for the
the People,
the Diet Houseofof Party
Party for thePeople,
People, necessary
necessaryto toupdate
updatehuman
humanrights
rightsguarantees
guaranteesininorder
ordertoto Constitution
Constitution
Councilors
Councilors Shinryokufukai
Shinryokufukai respond
Councilors Shinryokufukai respondto tochanges
changesininthe
thetimes
timesthat
thatwere
wereunpredictable
unpredictable
Commission on
Commission on when
Commission on whenthetheConstitution
Constitutionwaswasenacted.
enacted.InInparticular,
particular,asasthe
the
the Constitution
the Constitution fusion of arti cial intelligence and Internet technology
the Constitution fusion of arti cial intelligence and Internet technology
No. 22
No. progresses,
No. 2 progresses,thetheinternational
internationalcommunity
communityisisconcerned
concerned
about
aboutthe
theproblems
problemsofofindividual
individualscoring
scoringand
and
discrimination, and the problem of Internet advertising
discrimination, and the problem of Internet advertising
that
thatunfairly
unfairlyin�uences
in�uencesthethevoting
votingbehavior
behaviorofofcitizens.
citizens.
We need a constitutional argument to guarantee the
We need a constitutional argument to guarantee the
autonomous
autonomousdecision-making
decision-makingofofindividuals
individualsand
andprotect
protect
basic
basicdata
datarights
rightsininthe
thedigital
digitalage.”
age.”
Dame Angela
Dame Angela
United
United
United
House of
House of
House of Dame AngelaEagle,
Eagle,Labor
Labor “What
“Whatwould
wouldbe
bethe useofofarti
theuse arti cial
cialintelligence
intelligenceinintrying
trying Financial
FinancialServices
Servicesand
and
Kingdom
Kingdom Commons
Commons to decide how automated these things could become? Markets Bill (Fourth
Kingdom Commons to decide how automated these things could become? Markets Bill (Fourth
Would there be worries about over-automation? How Sitting)
Would there be worries about over-automation? How Sitting)
would that be looked at in terms of regulation? How
would that be looked at in terms of regulation? How
open are we going to be about the way in which AI is
open are we going to be about the way in which AI is
applied and how it might evolve in ways that might
applied and how it might evolve in ways that might
embed discrimination such that we get a system where
embed discrimination such that we get a system where
certain people may be discriminated against and
certain people may be discriminated against and
excluded?”
excluded?”
Zambia
Zambia The House,
The House, Hon. Collins
Hon. Collins Nzovu,
Nzovu, United “Madam Speaker, in order to enhance quality and Ministerial Statements;
Zambia The House, Hon. Collins Nzovu, United “Madam Speaker, in order to enhance quality and Ministerial Statements;
National
National Party for
Party for National
National accuracy of weather forecast, the Government, with Weather and Climate
National Party for National accuracy of weather forecast, the Government, with Weather and Climate
Assembly
Assembly Development,
Development, �nancial support from the United Nations Development Services and the
Assembly Development, The �nancial support from the United Nations Development Services and the
Minister of
Minister of Green
Green Programme Strengthening Climate Resilience of 2022/2023 rainfall forecast
Minister of Green Programme Strengthening Climate Resilience of 2022/2023 rainfall forecast
Economy and
Economy and Environment
Environment Agricultural Livelihoods
Agricultural Livelihoods inin Agro-Ecological
Agro-Ecological (UNDP
(UNDP
Economy and Environment Agricultural Livelihoods in Agro-Ecological (UNDP
SCRALA) project
SCRALA) project is
is currently
currently partnering
partnering with
with the
the
SCRALA) project is currently partnering with the
University of
University of Zambia
Zambia (UNZA)
(UNZA) to
to develop
develop aa seasonal
seasonal
University of Zambia (UNZA) to develop a seasonal
weather forecasting
weather forecasting system
system using arti cial
using arti cial intelligence.”
intelligence.”
weather forecasting system using arti cial intelligence.”
Figure 6.1.15

United States
committees that address legislative and other policy
Committee Mentions issues, investigations, and internal committee matters.
An additional indicator of legislative interest is the Figure 6.1.16 shows a sharp increase in the total
number of mentions of “artificial intelligence” in number of mentions of AI within committee reports
committee reports produced by House and Senate beginning with the 115th legislative session.
Mentions of AI in U.S. Committee Reports by Legislative Session, 2001–22

80
73
70
60
50
Number of Mentions
40
30
20
10
107th 108th 109th 110th 111th 112th 113th 114th 115th 116th 117th
(2001–02) (2003–04) (2005–06) (2007–08) (2009–10) (2011–12) (2013–14) (2015–16) (2017–18) (2019–20) (2021–22)
Figure 6.1.16

Figure 6.1.17 shows the mentions in committee reports for the 117th Congressional Session, which took place
from 2021 to 2022. The Appropriations Committee leads the House reports, while the Homeland Security and
Governmental Affairs Committee leads the Senate reports (Figure 6.1.18).
Mentions of AI in Committee Reports of the U.S. House of Representatives for the 117th Congressional
Session, 2021–22
Appropriations 20
Science, Space, and Technology 9
Rules 5
Armed Services 3
Transportation and Infrastructure 3
Intelligence (Permanent Select) 2
Natural Resources 2
Oversight and Accountability 2
Budget 1
Education and the Workforce 1
Energy and Commerce 1
Financial Services 1
Foreign A airs 1
Homeland Security 1
House Administration 1
Small Business 1
Ways and Means 1
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Number of Mentions
Figure 6.1.17
Mentions of AI in Committee Reports of the U.S. Senate for the 117th Congressional Session, 2021–22
Homeland Security and

8
Governmental A airs
Appropriations 3
Commerce, Science,
3
and Transportation
Armed Services 2
Intelligence (Select) 2
0 1 2 3 4 5 6 7 8
Number of Mentions
Figure 6.1.18

Figure 6.1.19 shows the total number of mentions in committee reports from the past 10 congressional sessions,
which took place from 2001 to 2022. The House and Senate Appropriations Committees, which regulate
expenditures of money by the government, lead their respective lists (Figure 6.1.19 and 6.1.20).
Mentions of AI in Committee Reports of the U.S. Senate, 2001–22 (Sum)

Appropriations 16
Homeland Security and

11
Governmental A airs
Armed Services 10
Commerce, Science,
9
and Transportation
Energy and
7
Natural Resources
Intelligence (Select) 5
0 2 4 6 8 10 12 14 16
Number of Mentions
Figure 6.1.19
Mentions of AI in Committee Reports of the U.S. House of Representatives, 2001–22 (Sum)

Appropriations 45
Science, Space, and Technology 27
Rules 14
Armed Services 9
Energy and Commerce 8
Oversight and Accountability 8
Financial Services 6
Intelligence (Permanent Select) 6
Education and the Workforce 4
Transportation and Infrastructure 4
Homeland Security 3
Veterans’ A airs 3
Budget 2
Foreign A airs 2
Judiciary 2
Natural Resources 2
Small Business 2
Ways and Means 2
Agriculture 1
House Administration 1
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46
Number of Mentions
Figure 6.1.20

United States AI Policy Papers post that addresses issues related to AI and makes
specific recommendations to policymakers. Topics of
To estimate activities outside national governments
those papers are divided into primary and secondary
that are also informing AI-related lawmaking,
categories: A primary topic is the main focus of the
the AI Index tracked 55 U.S.-based organizations
paper, while a secondary topic is a subtopic of the
that published policy papers in the past five
paper or an issue that is briefly explored.
years. Those organizations include: think tanks
and policy institutes (19); university institutes and Figure 6.1.21 highlights the total number of U.S.-based,
research programs (14); civil society organizations, AI-related policy papers published from 2018 to 2022.
associations, and consortiums (9); industry and After a slight dip from 2020 to 2021, the total increased
consultancy organizations (9); and government to 284 in 2022. Since 2018, the total number of such
agencies (4). A policy paper in this section is defined papers has increased 3.2 times, signaling greater
as a research paper, research report, brief, or blog interest over time.
Number of AI-Related Policy Papers by U.S.-Based Organizations, 2018–22

Source: Stanford Institute for Human-Centered AI (HAI) Policy and Society | Chart: 2023 AI Index Report
300
284
250
Number of Policy Papers
200
150
100
50
0
2018 2019 2020 2021 2022
Figure 6.1.21

By Topic
In 2022, the most frequent primary topics were sat in fourth position as of 2022. All of these leading
industry and regulation (107), innovation and topics were also well represented as secondary topics.
technology (90), and government and publication Topics that received comparatively little attention
administration (82) (Figure 6.1.22). Privacy, safety, and included social and behavioral sciences; humanities;
security, which was the most reported topic in 2021, and communications and media.
Number of AI-Related Policy Papers by U.S.-Based Organization by Topic, 2022

Source: Stanford Institute for Human-Centered AI (HAI) Policy and Society | Chart: 2023 AI Index Report
Primary Topic Secondary Topic

Industry and Regulation 107 50
Innovation and Technology 90 69
Gov’t and Public Administration 82 25
Privacy, Safety, and Security 59 59
Ethics 39 17
Equity and Inclusion 34 18
Int’l A airs and Int’l Security 34 65
Workforce and Labor 30 26
Democracy 26 5
Health and Biological Sciences 13 10
Justice and Law Enforcement 12 13
Physical Sciences 9
Education and Skills 8 10
Energy and Environment 4 8
Social and Behavioral Sciences 3 1
Humanities 1 1
Communications and Media 3
0 20 40 60 80 100 0 20 40 60 80 100
Number of Policy Papers
Figure 6.1.22

Index Report 2023 6.2 National AI Strategies
This subsection presents an overview of national AI strategies—policy plans developed by a country’s government to steer the
development and deployment of AI technologies within its borders. Tracking trends in national strategies can be an important way of
gauging the degree to which countries are prioritizing the management and regulation of AI technologies. Sources include websites of
national or regional governments, the OECD AI Policy Observatory (OECD.AI), and news coverage. “AI strategy” is defined as a policy
document that communicates the objective of supporting the development of AI while also maximizing the benefits of AI for society.5
6.2 National AI Strategies

Aggregate Trends Yearly Release of AI National Strategies by Country
Source: AI Index, 2022 | Table: 2023 AI Index Report
Year Country
Canada officially launched the first national AI strategy
2017 Canada, China, Finland
in March of 2017; since then a total of 62 national 2018 Australia, France, Germany, India, Mauritius, Mexico, Sweden
AI strategies have been released (Figure 6.2.1). The 2019 Argentina, Austria, Bangladesh, Botswana, Chile, Colombia,
Cyprus, Czech Republic, Denmark, Egypt, Estonia, Japan,
number of released strategies peaked in 2019. Kenya, Lithuania, Luxembourg, Malta, Netherlands, Portugal,
Qatar, Romania, Russia, Sierra Leone, Singapore, United Arab
Emirates, United States of America, Uruguay
By Geographic Area
2020 Algeria, Bulgaria, Croatia, Greece, Hungary, Indonesia, Latvia,
Figure 6.2.2 highlights the countries which, as of Norway, Poland, Saudi Arabia, Serbia, South Korea, Spain,
December 2022, have either released or developed Switzerland
2021
a national AI strategy. Figure 6.2.3 enumerates the Brazil, Ireland, Peru, Philippines, Slovenia, Tunisia, Turkey,
Ukraine, United Kingdom, Vietnam
countries that, in 2021 and 2022, pledged to develop 2022 Italy, Thailand
an AI strategy . The first nations to officially release
Figure 6.2.1
national AI strategies were Canada, China, and Finland
in 2017. Only two nations released national AI strategies AI National Strategies in Development by Country
and Year
in 2022: Italy and Thailand. Source: AI Index, 2022 | Table: 2023 AI Index Report
Year Country
2021 Armenia, Bahrain, Cuba, Iceland, Morocco, New Zealand, Oman

Countries With a National Strategy on AI, 2022
Source: AI Index, 2022 | Chart: 2023 AI Index Report 2022 Azerbaijan, Belgium, Benin, Israel, Jordan, Nigeria, Uzbekistan
Figure 6.2.3
Released
In Development
Not Released
Figure 6.2.2
5 The AI Index research team made efforts to identify whether there was a national AI strategy that was released or in development for every nation in the world.
It is possible that some strategies were missed.

Index Report 2023 6.3 U.S. Public Investment in AI
This section examines public AI investment in the United States based on data from the U.S. government and Govini, a company that uses
AI and machine learning technologies to track U.S. public and commercial spending.
6.3 U.S. Public Investment in AI

Federal Budget for information on classified AI R&D investment by
defense and intelligence agencies.
Nondefense AI R&D
In December 2022, the National Science and In fiscal year (FY) 2022, nondefense U.S. government
Technology Council published a report on the agencies allocated a total of $1.7 billion to AI R&D
public-sector AI R&D budget across departments spending (Figure 6.3.1). The amount allocated in FY
and agencies participating in the Networking and 2022 represented a slight decline from FY 2021 and
Information Technology Research and Development a 208.9% increase from FY 2018. An even greater
(NITRD) Program and the National Artificial amount, $1.8 billion, has been requested for FY 2023.
Intelligence Initiative. The report does not include
U.S. Federal Budget for Nondefense AI R&D, FY 2018–23

Source: U.S. NITRD Program, 2022 | Chart: 2023 AI Index Report
1.84
1.75 1.73
1.50 1.43
Budget (in Billions of U.S. Dollars)
1.11
1.00
0.56
0.50
0.00
FY18 (Enacted) FY19 (Enacted) FY20 (Enacted) FY21 (Enacted) FY22 (Enacted) FY23 (Requested)
Figure 6.3.16
6 A previous report on the public-sector AI R&D budget released in 2021 classed the FY21 spending as totaling $1.53 billion. However, the most recent report,
released in 2022, upgraded the total spent in 2022 to $1.75 billion.

U.S. Department of Defense

Budget Requests
Every year the DoD releases the amount of funding to the 2022 report, the DoD requested $1.1 billion in FY
they have requested for nonclassified AI-specific 2023, a 26.4% increase from the funding they received
research, development, test, and evaluation. According in FY 2022 (Figure 6.3.2).
U.S. DoD Budget Request for AI-Specific Research, Development, Test, and Evaluation (RDT&E), FY 2020–23
Source: U.S. Office of the Under Secretary of Defense (Comptroller), 2022 | Chart: 2023 AI Index Report
1.10
1.00
0.93
0.87
Budget Request (in Billions of U.S. Dollars)
0.84
0.80
0.60
0.40
0.20
0.00
FY20 Funding FY21 Funding FY22 Funding FY23 Funding
Figure 6.3.2

U.S. Government AI-Related volumes of federal contracts data, including prime

contracts, grants, and other transaction authority
Contract Spending (OTA) awards. The use of AI models enables Govini to
Public investment in AI can also be measured by analyze data that is otherwise often inaccessible.
federal government spending on the contracts
that U.S. government agencies award to private Total Contract Spending
companies for the supply of goods and services. Such Figure 6.3.3 highlights total U.S. government spending
contracts typically occupy the largest share of an on AI, subdivided by various AI segments. From 2021
agency’s budget. to 2022, total AI spending increased from $2.7 billion

to $3.3 billion. Since 2017, total spending has increased
Data in this section comes from Govini, which created nearly 2.5 times. In 2022, the AI subsegments that saw
a taxonomy of spending by the U.S. government on the greatest amount of government spending included
critical technologies including AI. Govini applied decision science ($1.2 billion), and computer vision
supervised machine learning and natural language ($0.8 billion).
processing to parse, analyze, and categorize large
U.S. Government Spending by Segment, FY 2017–22

Source: Govini, 2022 | Chart: 2023 AI Index Report
Decision Science Computer Vision Machine Learning Autonomy Natural Language Processing
3.50
3.28
0.17
U.S. Government Spending (in Billions of U.S. Dollars)
3.00
2.70 0.69
0.21
2.50 2.41
0.21
0.52 0.41
2.00 0.43
1.83
0.43
1.56 0.82
0.3
1.50 0.58
1.29 0.26
0.53
0.45
0.24 0.31
1.00 0.46
0.21
0.41
0.44
1.19
0.50 0.43 1.01
0.73
0.46 0.55
0.32
0.00
2017 2018 2019 2020 2021 2022
Figure 6.3.3

Figure 6.3.4 shows U.S. government spending by AI segment in FY 2021 and FY 2022. Spending increased
for the decision science, computer vision, and autonomy segments, while spending on machine learning, and
natural language processing dropped slightly.
U.S. Government Spending by Segment, FY 2021 Vs. 2022

1.19 (+18%)
Decision Science
1.01
0.82 (+55%)
Computer Vision
0.53
0.69 (+33%)
Autonomy
0.52
0.41 (-5%)
Machine Learning
0.43
Natural Language 0.17 (-19%)

2022
Processing 0.21 2021
0.00 0.20 0.40 0.60 0.80 1.00 1.20

U.S. Government Spending (in Billions of U.S. Dollars)
Figure 6.3.4

In FY 2022, the majority of federal AI contracts were prime contracts (62.5%), followed by grants (34.9%) and
other transaction authority (OTA) awards (2.6%) (Figure 6.3.5). From FY 2021 to FY 2022, the share of contracts
remained about the same, while the share of grants rose.
Total Value of Contracts, Grants, and OTAs Awarded by the U.S. Government for AI/ML and Autonomy,
FY 2017–22
2.05, Contracts
2.00
Total Value Awarded (in Billions of U.S. Dollars)
1.50
1.15, Grants
1.00
0.50
0.09, OTAs
0.00
2017 2018 2019 2020 2021 2022
Figure 6.3.5

Index Report 2023 6.4 U.S. AI-Related Legal Cases
In 2022, the AI Index partnered with Elif Kiesow Cortez, a scholar of artificial intelligence law, in a research project tracking trends in
American legal cases from 2000 to 2022 that contain AI-related keywords.7
6.4 U.S. AI-Related Legal Cases

Total Cases there were a total of 110 AI-related cases in U.S.
In the last few years, there has been a sharp spike in federal and state courts, 6.5 times more than in 2016
AI-related jurisprudence in the United States. In 2022, (Figure 6.4.1).
Number of AI-Related Legal Cases in the United States, 2000–22

110
100
Number of AI-Related Legal Cases
80
60
40
20
2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 2022
Figure 6.4.1
7 The Index analyzed both federal and state-level cases. Specific keywords in the search included “artificial intelligence,” “machine learning,” and “automated decision-making.” Some of these
cases did not directly concern issues related to AI jurisprudence. As a next step of this project, we will aim to identify the cases that most centrally concern issues of AI-related law.

Geographic Distribution they are home to many large businesses that have
integrated AI. In recent years, there have been a
In 2022, the majority of AI-related legal cases greater number of AI-related legal cases originating
originated in California (23), Illinois (17), and New from Illinois—this follows the state’s enactment
York (11) (Figure 6.4.2). The aggregate number of AI- of the Biometric Information Privacy Act (BIPA),
related cases since 2000 show a similar geographic which requires that companies doing business in
distribution (Figure 6.4.3). California and New York’s Illinois follow a number of regulations related to the
inclusion in the top three is unsurprising given that collection and storage of biometric information.
Number of AI-Related Legal Cases in the United States by State, 2022

California 23
Illinois 17
New York 11
Delaware 7
Florida 7
Washington 5
Kansas 4
Massachusetts 4
Maryland 4
District of Columbia 3
Texas 3
Ohio 3
Pennsylvania 3
Virginia 2
Missouri 2
0 2 4 6 8 10 12 14 16 18 20 22 24
Figure 6.4.28
8 Figures 6.4.2 and 6.4.3 include information for states and districts, given that cases sometimes originate from American districts like the District of Columbia or Puerto Rico

Number of AI-Related Legal Cases in the United States by State, 2000–22 (Sum)
California 127
New York 66
Illinois 36
Texas 26
Delaware 19
Massachusetts 19
Washington 18
Pennsylvania 16
Michigan 12
Virginia 12
District of Columbia 12
Florida 12
Ohio 10
Kansas 9
Minnesota 8
0 10 20 30 40 50 60 70 80 90 100 110 120 130

Figure 6.4.3
Sector
Figure 6.4.4 groups U.S.-based legal cases by economic sector. The predominant sector in 2022 was financial
services and professional services (48 cases); followed by media, culture, graphical (18); and public service (14).
Sector at Issue in AI-Related Legal Cases in the United States, 2022

Financial Services, Professional Services 48
Media, Culture, Graphical 18
Public Service 14
Education 6
Health Services 6
Hotels, Catering, Tourism 4
Postal and Telecommunications Services 4

Transport (Including Civil Aviation,
3
Railways, Road Transport)
Food, Drink, Tobacco 2
Transport Equipment Manufacturing 2
Basic Metal Production 1
Mechanical and Electrical Engineering 1
Oil and Gas Production, Oil Re ning 1
0 4 8 12 16 20 24 28 32 36 40 44 48
Figure 6.4.4

Type of Law
The greatest proportion of AI-related legal cases concerned civil law (29%) (Figure 6.4.5). There were also a large
number of AI-related legal cases in the domain of intellectual property (19%), as well as contract law (13.6%).
Area of Law of AI-Related Legal Cases in the United States, 2022

Civil 32
Intellectual
Property 21
Contract 15
Competition 11
Constitutional 8
Employment and
Labor 6
Criminal 5
Corporate 4
Financial 3
Human Rights and

Immigration 2
Terrorism and
National Security 2
Tort 1
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32
Figure 6.4.5

Three Significant AI-Related Legal Cases
The section below profiles three significant AI-related cases in the United States,
highlighting some of the legal issues that are at stake when AI is brought into the courts.
Duerr v. Bradley University (2022- Flores v. Stanford9 (2021-Sep-28) Dyroff v. Ultimate Software Grp., Inc
Mar-10) – United States Court of – United States Court of Appeals (2017-Nov-26) – United States Court of
Appeals for the Seventh Circuit for the Second Circuit Appeals for the Ninth Circuit
The plaintiffs, who were enrolled The plaintiffs, offenders denied Plaintiff Kristanalea Dyroff sued Ultimate
as undergraduates in a private parole, sued the New York State Software after her 29-year-old son died
university in Peoria, Illinois, during Board of Parole over being from an overdose of heroin laced with
the fall 2020 semester, were told refused access to information fentanyl, which he allegedly bought
to use a third-party proctoring used by the board in its review from a drug dealer that he encountered
tool called Respondus Monitor for of their cases. Northpointe, Inc., on Ultimate Software’s social network
remote, online exams. This tool petitioned the court as a non- site. Dyroff asserted seven claims
made use of artificial intelligence party because its Correctional against Ultimate Software which
technologies. The plaintiffs claimed Offender Management Profiling included negligence, wrongful death,
that the defendants violated Illinois’ for Alternative Sanctions and civil conspiracy. At the core of these
Biometric Information Privacy Act (COMPAS), an AI-powered claims was the argument that Ultimate
(BIPA) by not adequately following risk assessment tool, had been Software mined the data of users
its guidelines concerning the used by the parole board in its and deployed that data, alongside an
collection of biometric information. determinations. Northpointe algorithm, to recommend drug-related
BIPA does not apply to financial wanted to prevent the disclosure discussion groups to her son. Ultimate
institutions. Ultimately, the court of AI trade secrets to one of the Software moved to dismiss the claims
ruled that under the Gramm- plaintiff’s expert witnesses. The and claimed partial immunity under the
Leach-Bliley Act, the defendants court ruled that the confidential Communications Decency Act, which
were a financial institution by material in question was relevant protects website operators from liability
virtue of lending functions they to the plaintiff’s case and posed for third-party content on their site. The
engaged in and therefore exempt little risk of competitive injury. Court ruled that Ultimate Software was
from BIPA. As such, the plaintiff’s As such, the material was immune and that its use of algorithms
case was dismissed. ordered to be released under a did not sufficiently amount to novel
supplemental protective order. content creation.
9 The defendant was Tina M. Stanford, as Chairwoman of the New York State Board of Parole.

Index Report 2023
Index Report 2023
CHAPTER 7:
Diversity

Index Report 2023
CHAPTER 7 PREVIEW:
Diversity
Overview 298 Narrative Highlight:
Chapter Highlights 299 Disability Status of CS, CE,
and Information Students 311
7.1 AI Conferences 300 New AI PhDs 312
Women in Machine Learning (WiML) CS, CE, and Information Faculty 313
NeurIPS Workshop 300
Workshop Participants 300 7.3 K–12 Education 316
Demographic Breakdown 301 AP Computer Science: Gender 316
AP Computer Science: Ethnicity 318
7.2 AI Postsecondary Education 305

CS Bachelor’s Graduates 305 ACCESS THE PUBLIC DATA
CS Master’s Graduates 307
CS PhD Graduates 309

Artificial Intelligence Chapter 7: Diversity
Index Report 2023
Overview
AI systems are increasingly deployed in the real world. However, there often exists
a disparity between the individuals who develop AI and those who use AI. North
American AI researchers and practitioners in both industry and academia are
predominantly white and male. This lack of diversity can lead to harms, among them
the reinforcement of existing societal inequalities and bias.
This chapter highlights data on diversity trends in AI, sourced primarily from academia.
It borrows information from organizations such as Women in Machine Learning
(WiML), whose mission is to improve the state of diversity in AI, as well as the
Computing Research Association (CRA), which tracks the state of diversity in North
American academic computer science. Finally, the chapter also makes use of Code.org
data on diversity trends in secondary computer science education in the United States.
Note that the data in this subsection is neither comprehensive nor conclusive. Publicly
available demographic data on trends in AI diversity is sparse. As a result, this chapter
does not cover other areas of diversity, such as sexual orientation. The AI Index hopes
that as AI becomes more ubiquitous, the amount of data on diversity in the field will
increase such that the topic can be covered more thoroughly in future reports.

Index Report 2023
Chapter Highlights
North American bachelor’s,

master’s, and PhD-level
computer science students New AI PhDs are still
are becoming more overwhelmingly male.
ethnically diverse. In 2021, 78.7% of new AI PhDs were
Although white students are still the male. Only 21.3% were female, a
most represented ethnicity among new 3.2 percentage point increase from
resident bachelor’s, master’s, and PhD-level 2011. There continues to be a gender
computer science graduates, students from imbalance in higher-level AI education.
other ethnic backgrounds (for example,
Asian, Hispanic, and Black or African
American) are becoming increasingly
more represented. For example, in 2011,
71.9% of new resident CS bachelor’s
graduates were white. In 2021, that number
dropped to 46.7%.
American K–12 computer

science education has
become more diverse,
Women make up an in terms of both gender
increasingly greater share and ethnicity.
of CS, CE, and information The share of AP computer science
faculty hires. exams taken by female students
Since 2017, the proportion of new female increased from 16.8% in 2007 to 30.6%
CS, CE, and information faculty hires has in 2021. Year over year, the share of
increased from 24.9% to 30.2%. Still, most Asian, Hispanic/Latino/Latina, and
CS, CE, and information faculty in North Black/African American students
American universities are male (75.9%). taking AP computer science has
As of 2021, only 0.1% of CS, CE, and likewise increased.
information faculty identify as nonbinary.

7.1 AI Conferences
Women in Machine Learning collaboration and interaction among participants from
diverse backgrounds at the International Conference
(WiML) NeurIPS Workshop of Machine Learning (ICML).
Women in Machine Learning (WiML), founded in Workshop Participants

2006, is an organization dedicated to supporting and Figure 7.1.1 shows the number of participants that
increasing the impact of women in machine learning. have attended the WiML workshop since 2010. In the
This subsection of the AI Index report presents data last decade, there has been a steady increase: 1,157
from the WiML annual technical workshop, hosted at individuals participated in 2022, 13 times the number
NeurIPS. Since 2020, WiML has also been hosting the in 2010. However, from 2021 to 2022, the number of
Un-Workshop, which serves to advance research via workshop participants decreased from 1,486 to 1,157.1
Attendance at NeurIPS Women in Machine Learning Workshop, 2010–22

Source: Women in Machine Learning, 2022 | Chart: 2023 AI Index Report
1,400
1,200
1,157
1,000
Number of Attendees
800
600
400
200
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 7.1.1
1 The recent decrease in WiML workshop attendance may be attributable to the overall recent decrease in NeurIPS attendance. This overall decrease may in turn be a result of
NeurIPS moving away from a purely virtual format.

Demographic Breakdown information aggregated. Among survey respondents,

Figure 7.1.2 breaks down the continent of residence around 41.5% were from North America, followed
of the 2022 workshop participants. The data in the by Europe (34.2%), Asia (17.1%), and Africa (3.4%). In
following figures comes from a survey completed 2022, there was greater representation from Europe,
by participants who consented to having such Asia, and South America.
Continent of Residence of Participants at NeurIPS Women in Machine Learning Workshop, 2022

North
41.50%
America
Europe 34.20%
Asia 17.10%
Africa 3.40%
South
1.60%
America
Australia/
1.40%
Oceania
Antarctica 0.20%
0% 5% 10% 15% 20% 25% 30% 35% 40% 45%

% of Respondents
Figure 7.1.22
2 At the time of the survey, one of the respondents was temporarily residing in Antarctica.

The majority of participants at the 2022 WiML workshop were female-identifying (37.0%), another 25.8% were
male-identifying, and 0.5% were nonbinary-identifying (Figure 7.1.3).
Gender Breakdown of Participants at NeurIPS Women in Machine Learning Workshop, 2022

Female 37.00%
Prefer Not to Say 36.30%
Male 25.80%
Nonbinary 0.50%
Gender Fluid 0.20%
Gender
0.20%
Non-Conforming
0% 5% 10% 15% 20% 25% 30% 35% 40%

% of Respondents
Figure 7.1.3

The most represented professional positions at the workshop were PhD students (49.4%), research scientists/
data scientists (20.8%), software engineers/data engineers (8.4%), and faculty (4.4%) (Figure 7.1.4).
Professional Positions of Participants at NeurIPS Women in Machine Learning Workshop, 2022

PhD Student 49.40%
Research Scientist/
20.80%
Data Scientist
Software Engineer/
8.40%
Data Engineer
Faculty 4.40%
CEO/Director 3.50%
Others 3.50%
Postdoc 3.50%
MSc Student 2.30%
Undergraduate
2.00%
Student
Recruiter 1.60%
Lecturer 1.40%
Recent Graduate 0.20%
0% 10% 20% 30% 40% 50%

% of Respondents
Figure 7.1.4

The WiML workshop participants at NeurIPS submitted papers covering a wide range of subjects (Figure 7.1.5).
The most popular submission topics were applications (32.5%), algorithms (23.4%), and deep learning (14.8%).
Primary Subject Area of Submissions at NeurIPS Women in Machine Learning Workshop, 2022
Applications 32.50%
Algorithms 23.40%
Deep Learning 14.80%
Social Aspects of
Machine Learning 7.70%
Reinforcement
Learning and Planning 7.20%
Neuroscience and
Cognitive Science 5.30%
Data, Challenges,
Implementations, 3.80%
Software
Probabilistic Methods 3.30%
Optimization 1.00%
Theory 1.00%
0% 5% 10% 15% 20% 25% 30%

% of Respondents
Figure 7.1.5

Index Report 2023 7.2 AI Postsecondary Education
Another proxy for studying diversity in AI is looking at trends in postsecondary AI education. The following subsection borrows data
from the Computing Research Association’s (CRA) annual Taulbee Survey.3
7.2 AI Postsecondary Education

CS Bachelor’s Graduates last decade whereby an increasingly large number
of CS bachelor’s graduates were women. The CRA
The number of female CS bachelor’s graduates survey also included a nonbinary gender category: In
rose to 22.3% from 2020 to 2021 (Figure 7.2.1). This 2021, the number of nonbinary/other-identifying CS
increase mirrors a broader trend observed in the bachelor’s graduates was 0.04%.
Gender of New CS Bachelor’s Graduates (% of Total) in North America, 2010–21

90%
80%
77.66%, Male
New CS Bachelor’s Graduates (% of Total)
70%
60%
50%
40%
30%
22.30%, Female
20%
10%
0% 0.04%, Nonbinary/Other
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 7.2.1
3 The charts in this subsection look only at the ethnicity of domestic or native CS students and faculty. Although the CRA reports data on the proportion of nonresident aliens in each educational
level (i.e., Bachelor’s, Master’s, PhD, and faculty), data on the ethnicity of nonresident aliens is not included. For the proportion of nonresident aliens in each category, see footnotes.

Figure 7.2.2 breaks down the ethnicity of new CS bachelor’s graduates in North America: The top ethnicity
was white (46.7%), followed by Asian (34.0%) and Hispanic (10.9%). In the last decade, the proportion of
new CS bachelor’s graduates who were Asian, Hispanic, or multiracial (not Hispanic) steadily increased.4
Ethnicity of New Resident CS Bachelor’s Graduates (% of Total) in North America, 2011–21

70%
60%
New CS Bachelor’s Graduates (% of Total)
50%
46.69%, White
40%
33.99%, Asian
30%
20%
10.91%, Hispanic (Any Race)

10%
4.10%, Multiracial (Not Hispanic)
3.85%, Black or African-American
0.24%, Native Hawaiian or Paci c Islander
0% 0.22%, American Indian or Alaska Native
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 7.2.2
4 In 2021, 16.3% of new CS bachelor graduates were nonresident aliens.

CS Master’s Graduates
Figure 7.2.3 shows the gender of CS master’s moving to 27.8% in 2021 from 24.6% in 2011. In
graduates. The proportion of female CS master’s 2021, 0.9% of CS master’s graduates identified as
graduates has not substantially increased over time, nonbinary/other.
Gender of New CS Master’s Graduates (% of Total) in North America, 2011–21

80%
70% 71.27%, Male

New CS Master’s Graduates (% of Total)
60%
50%
40%
30%
27.83%, Female
20%
10%
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 7.2.3

Of domestic students, the most represented ethnicities are white (50.3%), followed by Asian (34.8%), and
Hispanic (7.3%) (Figure 7.2.4). As with CS bachelor’s graduates, in the last decade white students have
represented an increasingly smaller proportion of new CS master’s graduates.5
Ethnicity of New Resident CS Master’s Graduates (% of Total) in North America, 2011–21

70%
60%
New CS Master’s Graduates (% of Total)
50% 50.28%, White
40%
34.83%, Asian
30%
20%
10% 7.25%, Hispanic (Any Race)

0.25%, American Indian or Alaska Native
0% 0.12%, Native Hawaiian or Paci c Islander
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 7.2.4
5 In 2021, 65.2% of new CS master’s graduates were nonresident aliens.

CS PhD Graduates
In 2021, the number of new female CS PhD continue to be male. There remains a large gap
graduates rose to 23.3% from 19.9% (Figure 7.2.5). between new male and female CS PhDs.
Despite this rise, most new CS PhD graduates
Gender of New CS PhD Graduates (% of Total) in North America, 2010–21

80%
76.58%, Male
70%
New CS PhD Graduates (% of Total)
60%
50%
40%
30%
23.30%, Female
20%
10%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 7.2.5

Between 2011 and 2021, the number of new white resident CS PhD graduates declined by 9.4 percentage
points. Asians are the next most represented group (29%), followed by Hispanics (5.1%) and Black or African
Americans (4%) (Figure, 7.2.6).6
Ethnicity of New Resident CS PhD Graduates (% of Total) in North America, 2011–21

70%
60%
58.64%, White
New CS PhD Graduates (% of Total)
50%
40%
30% 29.00%, Asian
20%
5.12%, Hispanic (Any Race)

10% 4.05%, Black or African-American
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 7.2.6
6 In 2021, 68.6% of new CS PhD graduates were nonresident aliens.

Disability Status of CS, CE, and Information Students
The 2021 edition of the CRA Taulbee Survey was the disability accommodations in the last year.
first to gather information about the prevalence of The number of such students was relatively small.
CS, CE, and information students with disabilities. Only 4.0% of bachelor’s, 1.0% of PhD students,
The CRA asked departments to identify the number and 0.8% of master’s students reported needing
of students at each degree level who received accommodations (Figure 7.2.7).
CS, CE, and Information Students (% of Total) With Disability Accomodations in North America, 2021
4.10%
4%
CS, CE, and Information Students (% of Total)
3%
2%
1.00%
1%
0.80%
0%
Bachelor’s PhDs Master’s
Figure 7.2.7

New AI PhDs
were female. While the number of female AI PhDs
Figure 7.2.8 looks at demographic trends for new AI marginally increased from 2020 to 2021, we find no
PhD graduates who focus on artificial intelligence. meaningful trends in the last decade relating to the
In 2021, 78.7% of new AI PhDs were male and 21.3% gender of new AI PhDs.
Gender of New AI PhD Graduates (% of Total) in North America, 2010–21

80%
78.70%, Male
70%
New AI PhD Graduates (% of Total)
60%
50%
40%
30%
20% 21.30%, Female
10%
0%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 7.2.8

CS, CE, and

Information Faculty predominantly male (75.9%) (Figure 7.2.9). Women
make up 23.9% of CS, CE, and information faculty,
Data on the ethnicity and gender of CS, CE, and and nonbinary individuals make up 0.1%. The share
information faculty helps to paint a picture of of female CS, CE, and information faculty has slowly
diversity trends in academic AI and CS. As of 2021, increased; since 2011, the number of female faculty
most CS, CE, and information faculty members are members has risen 5 percentage points.
Gender of CS, CE, and Information Faculty (% of Total) in North America, 2011–21
80%
75.94%, Male
70%
CS, CE, and Information Faculty (% of Total)
60%
50%
40%
30%
23.94%, Female
20%
10%
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 7.2.9

Although most new CS, CE, and information faculty hires in North American universities are still male, the
proportion of women among faculty hires reached 30.2% in 2021, up about 9 percentage points from 2015
(Figure 7.2.10).
Gender of New CS, CE, and Information Faculty Hires (% of Total) in North America, 2011–21
80%
New CS, CE, and Information Faculty Hires (% of Total)
70% 69.26%, Male
60%
50%
40%
30% 30.17%, Female
20%
10%
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 7.2.10

The majority of resident CS, CE, and information faculty are white as of 2021 (58.1%), followed by Asian (29.7%)
(Figure 7.2.11). However, the gap between white CS, CE, and information faculty and faculty of the next nearest
ethnicity is slowly narrowing: In 2011, the gap stood at 46.1%, whereas in 2021 it dropped to 28.4%.7
Ethnicity of Resident CS, CE, and Information Faculty (% of Total) in North America, 2010–21
70%
60%
58.08%, White
CS, CE, and Information Faculty (% of Total)
50%
40%
30% 29.70%, Asian
20%
5.82%, Unknown
10% 2.80%, Hispanic (Any Race)
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 7.2.11
7 In 2021, 6.7% of CS, CE, and information faculty in North America were nonresident aliens.

Index Report 2023 7.3 K–12 Education
How do trends in AI diversity measure at the K–12 level, prior to students entering university? This subsection borrows data from
Code.org, an American nonprofit that aims to promote K–12 computer science education in the United States.
7.3 K–12 Education

AP Computer Science: Gender
nor female (Figure 7.3.1). It is still the case that male
In 2021, 69.2% of AP computer science exams were students take more AP computer science exams
taken by male students, 30.6% by female students, than any other gender, but the proportion of female
and 0.3% by students who identified as neither male students has almost doubled in the last decade.
AP Computer Science Exams Taken (% of Total) by Gender, 2007–21

80%
70%
AP Computer Science Exams Taken (% of Total)
69.16%, Male
60%
50%
40%
30% 30.58%, Female
20%
10%
0% 0.26%, Other
2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 7.3.1

On a percent basis, the states with the largest (35%) (Figure 7.3.2). Other states with notable CS and
number of female AP computer science test- AI activity include California, Texas, and Washington,
takers were Alabama (36%) and Washington, D.C. with rates of women taking AP computer science
(36%), followed by Nevada (35%), Louisiana (35%), tests at rates hovering around 30 percent.
Tennessee (35%), Maryland (35%), and New York
AP Computer Science Exams Taken by Female Students (% of Total),

2021
AK ME
20% 27%
VT NH MA
23% 24% 30%
32% 21% 16% 15% 23% 23% 30% 35% 30% 31%
21% 26% 31% 25% 24% 32% 23% 27% 27% 31%
31% 35% 23% 26% 15% 22% 31% 30% 36% 35% 22%
27% 29% 25% 29% 35% 28% 31%
TX LA MS AL GA SC
30% 35% 33% 36% 29% 34%
HI FL
30% 31%
Figure 7.3.2

AP Computer Science:
Ethnicity most postsecondary computer science fields, the
pool of AP computer science test-takers is becoming
Code.org collects data that speaks to trends in the more ethnically diverse over time. White students
ethnicity of AP computer science test-takers. White are still the greatest test-taking group; however, over
students took the greatest proportion of the exams in time, more Asian, Hispanic/Latino/Latina and Black/
2021 (42.7%), followed by Asian (28.8%) and Hispanic/ African American students have taken AP computer
Latino/Latina students (16.5%) (Figure 7.3.3). As with science exams.
AP Computer Science Exams Taken (% of Total Responding Students) by Race/Ethnicity, 2007–21

AP Computer Science Exams Taken (% of Total Responding Students)
60%
50%
42.74%, White
40%
30%
28.78%, Asian
20%
16.48%, Hispanic/Latino/Latina
10% 6.32%, Black/African American

4.92%, Two or More Races
0.62%, Native American/Alaskan
0.15%, Native Hawaiian/Paci c Islander
0% 0.00%, Other
2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 7.3.3

Index Report 2023
Index Report 2023
CHAPTER 8:
Public Opinion

Index Report 2023
CHAPTER 8 PREVIEW:
Public Opinion
Overview 321 Narrative Highlight: How Does the
Chapter Highlights 322 Natural Language Processing (NLP)
Research Community Feel About AI? 334
8.1 Survey Data 323

8.2 Social Media Data 340
Global Insights 323
Dominant Models 340
AI Products and Services 323
AI: Harm or Help? 327
United States 329

Artificial Intelligence Chapter 8: Public Opinion
Index Report 2023
Overview
AI has the potential to have a transformative impact on society. As such it has become
increasingly important to monitor public attitudes toward AI. Better understanding
trends in public opinion is essential in informing decisions pertaining to AI’s
development, regulation, and use.
This chapter examines public opinion through global, national, demographic, and ethnic
lenses. Moreover, we explore the opinions of AI researchers, and conclude with a look
at the social media discussion that surrounded AI in 2022. We draw on data from two
global surveys, one organized by IPSOS, and another by Lloyd’s Register Foundation
and Gallup, along with a U.S-specific survey conducted by PEW Research.
It is worth noting that there is a paucity of longitudinal survey data related to AI asking
the same questions of the same groups of people over extended periods of time. As AI
becomes more and more ubiquitous, broader efforts at understanding AI public opinion
will become increasingly important.

Index Report 2023
Chapter Highlights
Chinese citizens are among Men tend to feel more
those who feel the most positively about AI products
positively about AI products and services than women.
and services. Americans … Men are also more likely than
not so much. women to believe that AI will
In a 2022 IPSOS survey, 78% of Chinese mostly help rather than harm.
respondents (the highest proportion of According to the 2022 IPSOS survey, men are more
surveyed countries) agreed with the statement likely than women to report that AI products and
that products and services using AI have services make their lives easier, trust companies
more benefits than drawbacks. After Chinese that use AI, and feel that AI products and services
respondents, those from Saudi Arabia (76%) have more benefits than drawbacks. A 2021 survey
and India (71%) felt the most positive about by Gallup and Lloyd’s Register Foundation likewise
AI products. Only 35% of sampled Americans revealed that men are more likely than women to
(among the lowest of surveyed countries) agreed agree with the statement that AI will mostly help
that products and services using AI had more rather than harm their country in the next 20 years.
benefits than drawbacks.
People across Different causes NLP researchers …

the world and for excitement have some strong
especially and concern. opinions as well.
America remain Among a sample of surveyed According to a survey
unconvinced by Americans, those who report widely distributed to NLP
feeling excited about AI are researchers, 77% either
self-driving cars.
most excited about the potential agreed or weakly agreed
In a global survey, only
to make life and society better that private AI firms have too
27% of respondents
(31%) and to save time and much influence, 41% said that
reported feeling safe
make things more efficient NLP should be regulated, and
in a self-driving car.
(13%). Those who report feeling 73% felt that AI could soon
Similarly, Pew Research
more concerned worry about lead to revolutionary societal
suggests that only 26%
the loss of human jobs (19%); change. These were some
of Americans feel that
surveillance, hacking, and digital of the many strong opinions
driverless passenger
privacy (16%); and the lack of held by the NLP research
vehicles are a good idea
human connection (12%). community.
for society.

Index Report 2023 8.1 Survey Data
8.1 Survey Data

Global Insights Figure 8.1.1 highlights global opinions (aggregated
results across the entire survey subsample) for a
How do opinions of AI vary across the globe? The variety of questions relating to AI products and
first subsection of this chapter provides a response by services. It shows the percentage of respondents
looking at survey data from IPSOS and Pew Research, who agree with a particular question. The majority of
as well as one poll that was a collaboration of Gallup the survey sample, 60%, believe that AI products and
and Lloyd’s Register Foundation. The surveys suggest services will profoundly change their daily life in the
that public perceptions concerning AI differ across near future—and make their life easier. A very slight
countries and by demographic groups. majority, 52%, feel that products and services that
AI Products and Services use AI have more benefits than drawbacks. Only 40%
In late 2021, IPSOS ran a survey on global attitudes of respondents report that AI products and services
toward AI products and services. The survey make them feel nervous.
consisted of interviews with 19,504 adults ages
16–74 in 28 different countries.1
Global Opinions on Products and Services Using AI (% of Total), 2022

Source: IPSOS, 2022 | Chart: 2023 AI Index Report
I have a good understanding of what

artificial intelligence is 64%
Products and services using artificial intelligence

will profoundly change my daily life 60%
in the next 3–5 years

make my life easier 60%

have more benefits than drawbacks 52%
I know which types of products and services

use artificial intelligence 50%
I trust companies that use artificial intelligence

50%
as much as I trust other companies

have profoundly changed my daily life 49%
in the past 3–5 years

make me nervous 39%
0% 10% 20% 30% 40% 50% 60%

% of Respondents That “Agree”
Figure 8.1.1
1 See Appendix for more details about the survey methodology.

Opinions vary widely across countries as to the services using AI have more benefits than drawbacks
relative advantages and disadvantages of AI. (Figure 8.1.2). However, only 35% of American
The IPSOS survey suggests that 78% of Chinese respondents share that sentiment. Among the 28
respondents, 76% of Saudi Arabian respondents, and surveyed countries, France and Canada held the most
71% of Indian respondents feel that products and negative views.
‘Products and services using AI have more bene ts than drawbacks,’ by Country (% of Total), 2022
China 78%
Saudi Arabia 76%
India 71%
Peru 70%
Mexico 65%
Malaysia 65%
Colombia 64%
Chile 63%
South Korea 62%
Turkey 60%
Brazil 57%
South Africa 57%
Argentina 55%
Spain 53%
Russia 53%
Italy 50%
Hungary 49%
Poland 48%
Japan 42%
Sweden 40%
Belgium 38%
Great Britain 38%
Australia 37%
Germany 37%
United States 35%
Netherlands 33%
Canada 32%
France 31%
0% 10% 20% 30% 40% 50% 60% 70% 80%

% of Respondents That “Agree”
Figure 8.1.2

Figure 8.1.3 breaks down answers to all of IPSOS’ companies that use AI as much as other companies,
AI products and services questions by country. and only 30% say that AI products and services using AI
Generally, sentiment relating to AI products make them nervous. Conversely, American respondents
and services seems to be strongly correlated are among the most negative when it comes to AI. Only
within specific countries. For example, Chinese 41% claim that AI products and services make their lives
respondents seem to feel among the most positive easier, 35% report trusting AI companies as much as
about AI products and services: 87% of Chinese other companies, and 52% report that AI products and
respondents claim that AI products and services services make them feel nervous.
make their lives easier, 76% report trusting
Opinions About AI by Country (% Agreeing With Statement), 2022

I have a good understanding of what 64% 59% 60% 69% 59% 76% 67% 71% 50% 50% 57% 67% 72% 42% 41% 61% 74% 65% 76% 66% 75% 73% 78% 72% 62% 60% 68% 63%
artificial intelligence is
Products and services using artificial

intelligence will profoundly change 60% 50% 52% 61% 44% 67% 80% 65% 45% 44% 46% 55% 74% 53% 53% 71% 65% 53% 71% 56% 60% 80% 72% 76% 56% 50% 73% 46%
my daily life in the next 3–5 years

intelligence make my life easier
59% 46% 49% 65% 44% 70% 87% 71% 39% 45% 45% 50% 72% 54% 52% 71% 73% 47% 74% 58% 64% 80% 67% 74% 59% 46% 71% 41%

intelligence have more benefits than 55% 37% 38% 57% 32% 63% 78% 64% 31% 37% 38% 49% 71% 50% 42% 65% 65% 33% 70% 48% 53% 76% 57% 62% 53% 40% 60% 35%
drawbacks
I know which types of products and

47% 38% 37% 58% 36% 59% 76% 62% 34% 37% 37% 38% 69% 45% 32% 61% 62% 41% 63% 52% 57% 69% 57% 60% 46% 37% 60% 39%
services use artificial intelligence
I trust companies that use artificial

intelligence as much as I trust other 55% 36% 40% 50% 34% 56% 76% 57% 34% 42% 35% 48% 68% 48% 39% 61% 60% 38% 60% 51% 52% 73% 56% 46% 50% 39% 63% 35%
companies

intelligence have profoundly changed 53% 37% 37% 51% 32% 58% 73% 58% 32% 31% 33% 38% 67% 41% 30% 65% 62% 40% 65% 45% 50% 72% 56% 62% 49% 30% 60% 36%
my daily life in the past 3–5 years

33% 51% 42% 35% 49% 36% 30% 39% 32% 37% 50% 31% 53% 26% 20% 48% 38% 36% 35% 30% 28% 51% 52% 32% 48% 37% 48% 52%
intelligence make me nervous
Argentina
Australia
Belgium
Brazil
Canada
Chile
China
Colombia
France
Germany
Great Britain
Hungary
India
Italy
Japan
Malaysia
Mexico
Netherlands
Peru
Poland
Russia
Saudia Arabia
South Africa
South Korea
Spain
Sweden
Turkey
United States
Figure 8.1.3

Figure 8.1.4 breaks down opinions in all countries feeling that AI products and services make their lives
across demographic groups such as gender, age, easier, they are also less likely than the 35-to-49 age
household income, and employment status. IPSOS category to believe that AI products and services have
results suggest that men feel more positively about more benefits than drawbacks. Finally, households with
AI products and services than women—for example, higher incomes are more positive, compared to those
compared to women, men are more likely to report with lower incomes, about AI products and services
feeling that AI products and services make their making life easier and having more benefits than
lives easier. Age-specific opinions vary. For instance, drawbacks.
while individuals under 35 are most likely to report
Opinions About AI by Demographic Group (% Agreeing With Statement), 2022

I have a good understanding of what

artificial intelligence is 69% 60% 66% 65% 61% 57% 63% 71% 56% 64% 71% 73% 74% 67% 59%

intelligence will profoundly change 63% 57% 63% 61% 55% 56% 58% 67% 53% 58% 68% 70% 72% 64% 54%
my daily life in the next 3–5 years
Products and services using artificial 62% 58% 64% 62% 54% 56% 58% 66% 53% 58% 67% 67% 70% 63% 55%
intelligence make my life easier

intelligence have more benefits than 55% 49% 47% 53% 46% 50% 51% 57% 45% 50% 59% 63% 64% 55% 47%
drawbacks
I know which types of products and 55% 46% 54% 51% 45% 46% 50% 57% 44% 48% 58% 63% 65% 54% 44%
services use artificial intelligence
I trust companies that use artificial
intelligence as much as I trust other 53% 47% 54% 51% 44% 47% 48% 57% 45% 48% 56% 61% 62% 53% 45%
companies
intelligence have profoundly changed 51% 46% 54% 50% 41% 46% 47% 54% 43% 46% 55% 61% 62% 52% 43%
my daily life in the past 3–5 years

intelligence make me nervous 38% 41% 40% 40% 38% 41% 41% 38% 41% 37% 40% 48% 46% 40% 38%
High
Medium
Medium
Low
High
Sr. Exec./
Decision Maker
Low
Male
Female
Under 35
35 to 49
50 to 74
Business Owner
Employed
Non-Employed
Gender Age Household Income Education Employment Status
Figure 8.1.4

AI: Harm or Help?

In 2021, Lloyd’s Register Foundation, an independent A greater proportion of respondents believed that
global charity, collaborated with Gallup to poll AI will mostly help (39%) compared to a smaller
125,911 people across 121 countries about their proportion who believed that it would mostly harm
perceptions of artificial intelligence and other digital (28%). Mirroring the disparity in responses across
trends. Figure 8.1.5 shows the responses to the gender evident in the IPSOS survey, men in the
survey question, “Do you think artificial intelligence Lloyd’s-Gallup poll were more likely than women to
will mostly help or mostly harm people in this country report believing that AI will mostly help people in the
in the next 20 years?” next 20 years.
Views on Whether AI Will ‘Mostly Help’ or ‘Mostly Harm’ People in the Next 20 Years Overall and by
Gender (% of Total), 2021
Source: Lloyd’s Register Foundation and Gallup, 2022 | Chart: 2023 AI Index Report
% World % Women % Men
42%
40% 39%
35%
30% 29%
% of Respondents
28%
27%
24%
22%
20%
20%
10% 9%
8% 8%
2% 2% 2%
0%
Mostly help Mostly harm Neither Don’t have an opinion Don’t know/refused
Figure 8.1.5

Eastern Asia, Northern/Western Europe, and for every 1 response of “mostly harm” there were 4.4
Southern Europe are the regions of the world where responses suggesting that AI will “mostly help.” The
people are most likely to report believing that AI will regions whose populations are most pessimistic about
mostly help versus mostly harm (Figure 8.1.6). More the potential benefits of AI include Eastern Africa,
specifically, among the Eastern Asian survey sample, Northern Africa, and Southern Africa.
Views on Whether AI Will ‘Mostly Help’ or ‘Mostly Harm’ People in the Next 20 Years by Region:
Ratio of ‘Mostly Help’/‘Mostly Harm’, 2021
Source: Lloyd’s Register Foundation and Gallup, 2022 | Chart: 2023 AI Index Report
Eastern Asia 4.40
Northern/Western Europe 1.80
Australia and New Zealand 1.70
Eastern Europe 1.40
Southern Europe 1.30
Central Asia 1.20
Southeastern Asia 1.20
Northern America 1.00
Middle East 1.00
Southern Asia 0.90
Latin America and Caribbean 0.90
Central/Western Africa 0.80
Northern Africa 0.70
Southern Africa 0.60
Eastern Africa 0.40
0 1 2 3 4
Ratio of “Mostly Help”/“Mostly Harm”
Figure 8.1.6
The Lloyd’s Register survey also polled Perceptions of the Safety of Self-Driving Cars
(% of Total), 2021
respondents about their perceptions of Source: Lloyd’s Register Foundation and Gallup, 2022 | Chart: 2023 AI Index Report
certain AI technologies, such as self-driving

cars. The majority of survey respondents 65%, Would not feel safe
reported not feeling safe in a self-driving car

(65%), compared to only 27% who reported
feeling safe (Figure 8.1.7).
27%, Would feel safe
8%, Don’t know/refused

Figure 8.1.7

United States Americans’ Feelings Toward Increased Use of AI

Programs in Daily Life (% of Total), 2022
Source: Pew Research, 2022 | Chart: 2023 AI Index Report
In 2022, Pew Research released one of the most
45%, Equally concerned
comprehensive surveys to date about Americans’ and excited
views on AI. The survey interviewed 10,260 panelists

from a wide range of demographic groups about their
broad AI-related opinions, as well as their perspectives
on specific AI use cases.2
<1%, No answer
45% of Americans report feeling equally concerned
and excited about the use of AI programs in daily life, 37%, More concerned
than excited 18%, More excited
while 37% report feeling more concerned than excited than concerned
(Figure 8.1.8). Only 18% of Americans report feeling
more excited than concerned about AI technology. Figure 8.1.8
Which AI applications are Americans most excited

about? A large proportion report feeling very or problems (40%) (Figure 8.1.9). Americans are very or
somewhat excited about AI being used to perform somewhat concerned about AI being used to make
household chores (57%), to perform repetitive important life decisions for people (74%) and to know
workplace tasks (46%), and to diagnose medical people’s thoughts and behaviors (75%).
Americans’ Feelings on Potential AI Applications (% of Total), 2022

Very/somewhat excited Equally excited and concerned Very/somewhat concerned
Perform household chores 57% 24% 19%
Perform repetitive workplace

46% 27% 26%
tasks
Diagnose medical problem 40% 24% 35%
Handle customer service calls 27% 26% 47%
Make important life decisions

9% 16% 74%
for people
Know people’s thoughts

9% 16% 75%
and behaviors
0% 20% 40% 60% 80% 100%

% of Respondents
Figure 8.1.93
2 See Appendix for more details about the survey methodology.
3 The numbers in Figure 8.1.9 may not sum up to 100% due to rounding.

There are two specific AI use cases that Americans that police using facial recognition technology is a
are more likely to report feeling are good ideas good idea for society compared to 27% who believe it
for society rather than bad: police use of facial is a bad idea. However, Americans are not as excited
recognition technology, and social media companies about driverless passenger vehicles: More feel that
using AI to find false information on their sites (Figure driverless passenger vehicles are a bad idea for
8.1.10). More specifically, 46% of Americans believe society than a good idea.
Americans’ Perceptions of Specific AI Use Cases (% of Total), 2022

Bad idea for society Good idea for society Not sure
Facial recognition technology

27% 46% 27%
by police
Figure 8.1.8
Computer programs by social

media companies to nd false 31% 38% 30%
information on their sites
Driverless passenger vehicles 44% 26% 29%
0% 20% 40% 60% 80% 100%

% of Respondents
Figure 8.1.104

Of the sample of Americans who reported being more hacking, and digital privacy (16%); and lack of human
concerned than excited about AI, Figure 8.1.11 outlines connection (12%). Americans reported being less
the main reasons for their concern. The primary concerned about the potential loss of freedom and
reasons include loss of human jobs (19%); surveillance, issues relating to lack of oversight and regulation.
Main Reason Americans Are Concerned About AI (% of Total), 2022

Loss of human jobs 19%
Surveillance, hacking, digital privacy 16%
Lack of human connection/qualities 12%

AI will get too powerful/outsmarting
people 8%
People misusing AI 8%
People becoming too reliant on Figure 8.1.8
AI/tech 7%
AI fails, makes mistakes 6%

Concerns about government/tech
companies using AI 3%
Don’t trust AI or people wielding it 3%
Unforeseen consequences/effects 2%
Loss of freedom 2%
Human bias coded into AI 2%
Lack of oversight and regulation 2%
Other 7%
0% 5% 10% 15% 20%

% of Respondents
Figure 8.1.11

The two leading reasons that Americans report society better. A significant group also reported
being excited about AI relate to its potential to feeling excited about the potential of AI to save time
make life better and to save time (Figure 8.1.12). and increase efficiency (13%), as well as to handle
Of the respondents, 31% believe AI makes life and mundane, tedious tasks (7%).
Main Reason Americans Are Excited About AI (% of Total), 2022

Makes life, society better 31%
Saves time, more e�cient 13%
Inevitable progress, is the future 10%
Handles mundane, tedious tasks 7%
Helps with work/labor 6% Figure 8.1.8
AI is interesting, exciting 6%
Helps humans with di�cult/ 6%

dangerous tasks
More accurate than humans 4%
Helps those who are elderly/ 4%

have a disability
Personal anecdotes 2%
Other people’s fears based on

sci-�, not reality 2%
Other 7%
0% 5% 10% 15% 20% 25% 30%

% of Respondents
Figure 8.1.12

The Pew Research survey also asked participants in the degree to which people felt that AI systems
which group of people had their experiences and positively considered the experiences and views of
views taken into consideration in the design of AI men over women. Similarly, respondents felt that the
systems. Respondents felt AI systems most reflected experiences and views of Asian, Black, and Hispanic
the experiences and views of men and white adults adults, compared to those held by white adults, were
(Figure 8.1.13). There was a 15 percentage point gap not as positively considered.
People Whose Experiences and Views Are Considered in the Design of AI Systems (% of Total), 2022
Net not well Net well Not sure
Men 12% 51% 37%
Figure 8.1.8
Women 25% 36% 38%
White adults 13% 48% 39%
Asian adults 23% 33% 43%
Black adults 33% 24% 42%
Hispanic adults 33% 23% 43%
0% 20% 40% 60% 80% 100%

% of Respondents
Figure 8.1.135

How Does the Natural Language Processing (NLP)
Research Community Feel About AI?
From May to June 2022, a group of American In general, the NLP research community
researchers conducted a survey of the NLP research strongly feels that private firms have too much
community on a diverse set of issues, including the influence (77%) and that industry will produce
state of the NLP field, artificial general intelligence the most widely cited research (86%) (Figure
(AGI), and ethics, among others. According to the 8.1.14). Curiously, 67% either agreed or weakly
authors, a total of 480 individuals completed the agreed with the statement that most of NLP is
survey, 68% of whom had authored at least two dubious science. A small proportion, 30%, think
Association for Computational Linguistics (ACL) an “NLP winter”—a period when the field faces
publications between 2019 and 2022. The survey 6
a significant slowdown or stagnation in research
represents one of the most complete pictures of the and development—is coming in the next decade.
attitudes AI researchers have toward AI research.
State of the Field According to the NLP Community, 2022

Source: Michael et al., 2022 | Chart: 2023 AI Index Report
Private �rms have too

77%
much in�uence
Industry will produce

the most widely cited 86%
research
NLP winter
30%
is coming (10 years)
NLP winter
62%
is coming (30 years)
Most of NLP is
67%
dubious science
Author anonymity
63%
is worth it
0% 10% 20% 30% 40% 50% 60% 70% 80% 90%

% of Respondents That “Agree” or “Weakly Agree”
Figure 8.1.14
6 More detailed information about the survey methodology and sample group can be found in the following paper.

Research Community Feel About AI? (cont’d)
A small majority of NLP researchers believe that specific types of AI systems can actually understand
language: 51% agreed with the statement that language models (LMs) understand language, with even
more (67%) agreeing that multimodal models understand language (Figure 8.1.15).
Language Understanding According to the NLP Community, 2022

LMs understand
51%
language
Multimodal models
67%
understand language
Text-only evaluation can

measure language 36%
understanding
0% 10% 20% 30% 40% 50% 60%

Figure 8.1.15

NLP researchers also seem to believe that NLP’s with 48% of respondents feeling it is unethical. Sixty
past net impact has been positive (89%) and that its percent of researchers feel that the carbon footprint
future impact will continue to be good (87%) (Figure of AI is a major concern; however, only 41% feel that
8.1.16). The community is divided on the issue of NLP should be regulated.
using AI to predict psychological characteristics,
Ethics According to the NLP Community, 2022

NLP’s past net impact

is good 89%
NLP’s future net impact

is good 87%
It is unethical to build
easily misusable systems 59%
Ethical and scienti�c

considerations 74%
can con�ict
Ethical concerns mostly
reduce to data quality 25%
and model accuracy
It is unethical to predict
psychological 48%
characteristics
Carbon footprint is
a major concern 60%
NLP should be regulated 41%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90%

Figure 8.1.16

Although a large majority of researchers feel that AI could soon lead to revolutionary societal change
(73%), only 36% feel that AI decisions could cause nuclear-level catastrophe (Figure 8.1.17). A plurality
of researchers, 57%, held that recent research progress was leading the AI community toward Artificial
General Intelligence (AGI).
Arti cial General Intelligence (AGI) and Major Risks According to the NLP Community, 2022
AGI is an important
58%
concern
Recent progress is
moving us toward 57%
AGI
AI could soon lead

to revolutionary 73%
societal change
AI decisions could
cause nuclear-level 36%
catastrophe
0% 10% 20% 30% 40% 50% 60% 70%

Figure 8.1.17

When asked about the direction AI research is taking, the NLP community registered the strongest
responses about the following: First, there’s too much focus on benchmarks (88%); second, more work
should be done to incorporate interdisciplinary insights (82%); and third, there’s too great a focus on
scale (72%) (Figure 8.1.18).
Promising Research Programs According to the NLP Community, 2022

There’s too much

72%
focus on scale
There’s too much

88%
focus on benchmarks
On the wrong track:

37%
model architectures
On the wrong track:

41%
language generation
On the wrong track:

50%
explainable models
On the wrong track:

black-box 42%
interpretability
We should do more
to incorporate 82%
interdisciplinary insights
0% 10% 20% 30% 40% 50% 60% 70% 80% 90%
Figure 8.1.18

A further point on the NLP community’s skepticism of scale: Only 17% of respondents agreed or weakly
agreed with the statement that scaling solves practically any important problem, with a further 50%
reaffirming the importance of linguistic structure (Figure 8.1.19).
Scale, Inductive Bias, and Adjacent Fields According to the NLP Community, 2022
Scaling solves practically

17%
any important problem
Linguistic structure
50%
is necessary
Expert inductive
51%
biases are necessary
Linguistics/CogSci will
contribute to the 61%
most-cited models
0% 10% 20% 30% 40% 50% 60%

Figure 8.1.19

Index Report 2023 8.2 Social Media Data
8.2 Social Media Data

Dominant Models ChatGPT conversation has increasingly saturated social
media conversation around AI model releases more
Public attitudes toward AI can also be gauged through broadly, with sentiment growing ever more mixed.
quantitative and qualitative analyses of posts that Consumers question the implications of its launch
people make on social media. The NetBase Quid team as well as its underlying ethical principles. Another
leveraged the NetBase platform to analyze social frequent preoccupation is the bias of the system toward
conversation around AI models and new releases for certain political, ethical, or cultural beliefs.
uses across sectors from January to December 2022,
“ChatGPT passed a Wharton MBA exam. Time to
looking at 2.74 million social media posts.
overhaul education.” – @GRDecter 
Figure 8.2.1 shows the net sentiment score of various
“Alarm: ChatGPT by @OpenAI now *expressly
AI models that were released throughout the year. The
prohibits arguments for fossil fuels*. (It used to
net sentiment score expresses the ratio of positive
offer them.) Not only that, it excludes nuclear
to negative sentiment around a given topic. In this
energy from its counter-suggestions. @sama,
case, a net sentiment score of +100 means that all
what is the reason for this policy?” – @AlexEpstein 
conversation is positive; a score of -100 means that
all conversation is negative. AlphaCode had the most Finally, while GLM-130B took up very little volume
consistently high sentiment over time, as well as the of the overall social media conversation, a small
highest average sentiment for 2022, due to positive conversation of very negative sentiment grew over
press coverage on social media and practical use the system’s ties to the Chinese government and
cases of AI-driven programming. Consumers and how it was “prohibited” from using the software
media outlets embraced the practical use case of to “undermine” China’s government in any way.
programming automation. Some sample social media Technology influencer and PhD student Jesse Wood
posts relating to AlphaCode include: posted a Twitter thread about GLM-130B’s licensing
language that gained significant traction.
“#AlphaCode—a new #AI system for developing
computer code developed by @DeepMind— “The model license for GLM-130B has a
can achieve average human-level performance restriction: ‘You will not use the Software for any
in solving programming contests.” act that may undermine China’s national security
– Science Magazine, Twitter  and national unity, harm the public interest of
society, or infringe upon the rights and interests of
“DeepMind’s AlphaCode outperforms many
human beings.’” – @jrhwood 
human programmers in tricky software
challenges.” – @lunamoth

Net Sentiment Score of AI Models by Quarter, 2022

2022/Q1 2022/Q2 2022/Q3 2022/Q4
DALL-E 0 42 29 21
LaMDA 73 -9 -11 44
AlphaCode 60 79 71 70
CoPilot 29 22 15 34
PaLM 66 66 30
Gato 47 84 65
Imagen 24 65 56
Stable Di usion 35 52
Whisper 85 69
Make-A-Video 4 9
AlphaTensor 96
GLM-130B 55
BLOOM 0
CICERO 14
ChatGPT 32
Figure 8.2.17
7 The AI Index searched for sentiment surrounding the term “DALL-E,” as it was more frequently referred to on social media, rather than DALL-E 2, the official name of the text-to-image
model released by OpenAI in 2022.

Figure 8.2.2 highlights the proportion of AI-related “This story … is really sad, and I think an
social media conversation that was dominated by the important window into the risks of designing
release of particular models. ChatGPT dominated 8
systems to seem like humans, which are
consumer conversation with a rapid rise, making exacerbated by #AIhype.” – @nitashataku
up over half of consumer conversation by the end
Stable Diffusion conversation stands out as a
of 2022. Despite initial excitement, sentiment was
prominent leader in conversation volume toward
mixed by the end of the year, as some individuals
the end of 2022, but it is also a symbol of how the
became more aware of ChatGPT’s limitations.
consumer lexicon around AI models is developing.
OpenAI CEO Sam Altman even publicly commented
Many consumers debated the “originality” of what
on it being “incredibly limited” in certain respects.
Stable Diffusion produces.
“ChatGPT is incredibly limited, but good enough
“I’ve worked on neural networks, so I understand
at some things to create a misleading impression
stable diffusion pretty well. And while it can’t
of greatness. It’s a mistake to be relying on it
have original thoughts, it can come up with
for anything important right now. It’s a preview
original works.” – r/TikTokCringe
of progress; we have lots of work to do on
robustness and truthfulness.” – @SamAltman “That’s true of anywhere that datasets scrape
without permission. The thing to actually be upset
Conversation around LaMDA exploded in Q2
about is that their own generator is purposefully
2022 as an ex–Google employee reported his
using the Stable Diffusion dataset that already
experiences with a “sentient” system that spoke of
contains tons of stolen work.” – @Emily_Art
its own emotions and thoughts. Many political and
technology influencers spoke out, however, about
the “deepfake” nature of the responses of systems
like LaMDA that do not have a sense of “truth” and ChatGPT dominated
could proliferate misinformation. consumer conversation
“AI systems like LamDA and GPT-3 are with a rapid rise, making
sociopathic liars with utter indifference to truth,
deepfakers with words, every day creating
up over half of consumer
more compelling, more plausible misinformation conversation by the end
on demand. It is imperative that we develop
of 2022.
technology & policy to thwart them.” –
@GaryMarcus
8 The figures in this section consider all AI-related social media conversation. The percentage associated with the model in Figure 8.2.2 represents the share of all AI-related social media
conversation that was dominated by that model.

Select Models’ Share of AI Social Media Attention by Quarter, 2022

2022/Q1 2022/Q2 2022/Q3 2022/Q4
DALL-E 0% 1% 3% 2%
LaMDA 1% 35% 9% <1%
AlphaCode 2% <1% <1% 1%
CoPilot 10% 3% 4% 1%
PaLM <1% <1% <1%
Gato 10% 18% 3%
Imagen 5% 4% 2%
Stable Di usion 19% 19%
Whisper <1% <1%
Make-A-Video 33% 15%
AlphaTensor 1%
GLM-130B <1%
BLOOM <1%
CICERO 3%
ChatGPT 52%
Figure 8.2.2

Index Report 2023
Index Report 2023
Appendix
Index Report 2023
Appendix
Chapter 1 Research and Development 346
Chapter 2 Technical Performance 352
Chapter 3 Technical AI Ethics 363
Chapter 4 The Economy 366
Chapter 5 Education 375
Chapter 6 Policy and Governance 377
Chapter 7 Diversity 384
Chapter 8 Public Opinion 385
345
Artificial Intelligence Appendix
Index Report 2023 Chapter 1: Research and Development
Chapter 1: Research and Development

Center for Security and Methodology
To create the merged corpus, CSET deduplicated
Emerging Technology, across the listed sources using publication metadata,
Georgetown University and then combined the metadata for linked
Prepared by Sara Abdulla and James Dunham publications. To identify AI publications, CSET used an
English-language subset of this corpus: publications
The Center for Security and Emerging Technology since 2010 that appear AI-relevant.4 CSET researchers
(CSET) is a policy research organization within developed a classifier for identifying AI-related
Georgetown University’s Walsh School of Foreign publications by leveraging the arXiv repository, where
Service that produces data-driven research at the authors and editors tag papers by subject. Additionally,
intersection of security and technology, providing CSET uses select Chinese AI keywords to identify
nonpartisan analysis to the policy community. Chinese-language AI papers.5
For more information about how CSET analyzes To provide a publication’s field of study, CSET
bibliometric and patent data, see the Country Activity matches each publication in the analytic corpus
Tracker (CAT) documentation on the Emerging with predictions from Microsoft Academic Graph’s
Technology Observatory’s website.1 Using CAT, users field-of-study model, which yields hierarchical labels
can also interact with country bibliometric, patent, describing the published research field(s) of study and
and investment data.2 corresponding scores.6 CSET researchers identified
Publications from CSET Merged Corpus of the most common fields of study in our corpus of
Scholarly Literature AI-relevant publications since 2010 and recorded
Source publications in all other fields as “Other AI.” English-
CSET’s merged corpus of scholarly literature language AI-relevant publications were then tallied by
combines distinct publications from Digital Science’s their top-scoring field and publication year.
Dimensions, Clarivate’s Web of Science, Microsoft
CSET also provided year-by-year citations for AI-
Academic Graph, China National Knowledge
relevant work associated with each country. A
Infrastructure, arXiv, and Papers With Code.3
publication is associated with a country if it has at
1 https://eto.tech/tool-docs/cat/
2 https://cat.eto.tech/
3 All CNKI content is furnished by East View Information Services, Minneapolis, Minnesota, USA.
4 For more information, see James Dunham, Jennifer Melot, and Dewey Murdick, “Identifying the Development and Application of Artificial Intelligence in Scientific Text,” arXiv [cs.DL],
May 28, 2020, https://arxiv.org/abs/2002.07143.
5 This method was not used in CSET’s data analysis for the 2022 HAI Index report.
6 These scores are based on cosine similarities between field-of-study and paper embeddings. See Zhihong Shen, Hao Ma, and Kuansan Wang, “A Web-Scale System for Scientific
Knowledge Exploration,” arXiv [cs.CL], May 30, 2018, https://arxiv.org/abs/1805.12216.
Table of Contents Appendix 346

least one author whose organizational affiliation(s)

Epoch National
are located in that country. Citation counts aren’t
available for all publications; those without counts
Affiliation Analysis
weren’t included in the citation analysis. Over 70% of The AI forecasting research group Epoch maintains
English-language AI papers published between 2010 a dataset of landmark AI and ML models, along with
and 2020 have citation data available. accompanying information about their creators and
publications, such as the list of their (co)authors,
CSET counted cross-country collaborations as
number of citations, type of AI task accomplished, and
distinct pairs of countries across authors for each
amount of compute used in training.
publication. Collaborations are only counted once:
For example, if a publication has two authors from The nationalities of the authors of these papers have
the United States and two authors from China, it is important implications for geopolitical AI forecasting.
counted as a single United States-China collaboration. As various research institutions and technology
companies start producing advanced ML models, the
Additionally, publication counts by year and by
global distribution of future AI development may shift
publication type (e.g., academic journal articles,
or concentrate in certain places, which in turn affects
conference papers) were provided where available.
the geopolitical landscape because AI is expected to
These publication types were disaggregated by
become a crucial component of economic and military
affiliation country as described above.
power in the near future.
CSET also provided publication affiliation sector(s)
To track the distribution of AI research contributions
where, as in the country attribution analysis, sectors
on landmark publications by country, the Epoch
were associated with publications through authors’
dataset is coded according to the following
affiliations. Not all affiliations were characterized in
methodology:
terms of sectors; CSET researchers relied primarily
on GRID from Digital Science for this purpose, and 1. A snapshot of the dataset was taken on
not all organizations can be found in or linked to November 14, 2022. This includes papers
GRID. Where the affiliation sector is available, papers
7
about landmark models, selected using the
were counted toward these sectors, by year. Cross- inclusion criteria of importance, relevance, and
sector collaborations on academic publications uniqueness, as described in the Compute Trends
were calculated using the same method as in the dataset documentation.8
cross-country collaborations analysis. We use HAI’s
2. The authors are attributed to countries based
standard regions mapping for geographic analysis,
on their affiliation credited on the paper.
and the same principles for double-counting apply for
For international organizations, authors are
regions as they do for countries.
attributed to the country where the organization
7 See https://www.grid.ac/ for more information about the GRID dataset from Digital Science.
8 https://epochai.org/blog/compute-trends; see note on “milestone systems.”

is headquartered, unless a more specific

Large Language and
location is indicated. The number of authors
from each country represented are added
Multimodal Models
up and recorded. If an author has multiple The following models were identified by members of
affiliations in different countries, they are split the AI Index Steering Committee as the large language
between those countries proportionately. 9
and multimodal models that would be included as part
of the large language and multimodal model analysis:
3. E
ach paper in the dataset is normalized to
equal value by dividing the counts on each AlphaCode InstructGPT
paper from each country by the total number of BLOOM Jurassic-1-Jumbo
authors on that paper.10 Chinchilla Jurassic-X
4. All of the landmark publications are aggregated Codex Meena
within time periods (e.g., monthly or yearly) with CogView Megatron-LM (original,
the normalized national contributions added up DALL-E 8.3B)
to determine what each country’s contribution DALL-E 2 Megatron-Turing NLG
to landmark AI research was during each time ERNIE 3.0 530B
period. ERNIE-GEN (large) Minerva (540B)
GLM-130B OPT-175B
5. The contributions of different countries are
Gopher PaLM (540B)
compared over time to identify any trends.
GPT-2 PanGu-alpha
GPT-3 175B (davinci) Stable Diffusion (LDM-
GPT-J-6B KL-8-G)
GPT-Neo T5-3B
GPT-NeoX-20B T5-11B
Grover-Mega Turing NLG
HyperCLOVA Wu Dao 2.0
Imagen Wu Dao – Wen Yuan
9 For example, an author employed by both a Chinese university and a Canadian technology firm would be counted as 0.5 researchers from China and 0.5 from Canada.
10 This choice is arbitrary. Other plausible alternatives include weighting papers by their number of citations, or assigning greater weight to papers with more authors.

AI Conferences topic labels identified by Gonzalez et al.—as well as

the topics “machine learning,” “deep learning,” and
The AI Index reached out to the organizers of various “artificial intelligence”—GitHub provides OECD.
AI conferences in 2022 and asked them to provide AI with a list of public projects containing AI code.
information on total attendance. Some conferences GitHub updates the list of public AI projects on a
posted their attendance totals online; when this was quarterly basis, which allows OECD.AI to capture
the case, the AI Index used those reported totals and trends in AI software development over time.
did not reach out to the conference organizers.
Obtaining AI Projects’ Metadata
GitHub OECD.AI uses GitHub’s list of public AI projects
to query GitHub’s public API and obtain more
The GitHub data was provided to the AI Index
information about these projects. Project metadata
through OECD.AI, an organization with whom
may include the individual or organization that
GitHub partners that provides data on open-
created the project; the programming language(s)
source AI software. The AI Index reproduces the
(e.g., Python) and development tool(s) (e.g., Jupyter
methodological note that is included by OECD.AI on
Notebooks) used in the project; as well as information
its website, for the GitHub Data.
about the contributions—or “commits”—made to it,
Background which include the commit’s author and a timestamp.
Since its creation in 2007, GitHub has become In practical terms, a contribution or “commit” is an
the main provider of internet hosting for software individual change to a file or set of files. Additionally,
development and version control. Many technology GitHub automatically suggests topical tags to each
organizations and software developers use GitHub project based on its content. These topical tags need
as a primary place for collaboration. To enable to be confirmed or modified by the project owner(s)
collaboration, GitHub is structured into projects, or to appear in the metadata.
“repositories,” which contain a project’s files and
each file’s revision history. The analysis of GitHub
data could shed light on relevant metrics about who
is developing AI software, where, and how fast,
and who is using which development tools. These
metrics could serve as proxies for broader trends in
the field of software development and innovation.
Identifying AI Projects
Arguably, a significant portion of AI software
development takes place on GitHub. OECD.AI
partners with GitHub to identify public AI projects—
or “repositories”—following the methodology
developed by Gonzalez et al.,2020. Using the 439

Mapping Contributions to AI Projects to a As of October 2021, 71.2% of the contributions to

Country public AI projects were mapped to a country using
Contributions to public AI projects are mapped this methodology. However, a decreasing trend in
to a country based on location information at the the share of AI projects for which a location can be
contributor level and at the project level. identified is observed in time, indicating a possible lag
in location reporting.
a) Location information at the contributor level:
Measuring Contributions to AI Projects
• GitHub’s “Location” field: Contributors can
Collaboration on a given public AI project is measured
provide their location in their GitHub account.
by the number of contributions—or “commits”—made
Given that GitHub’s location field accepts free
to it.
text, the location provided by contributors is
not standardized and could belong to different To obtain a fractional count of contributions by
levels (e.g., suburban, urban, regional, or country, an AI project is divided equally by the total
national). To allow cross-country comparisons, number of contributions made to it. A country’s total
Mapbox is used to standardize all available contributions to AI projects is therefore given by the
locations to the country level. sum of its contributions—in fractional counts—to each
AI project. In relative terms, the share of contributions
• Top level domain: Where the location field
to public AI projects made by a given country is the
is empty or the location is not recognized, a
ratio of that country’s contributions to each of the
contributor’s location is assigned based on his
AI projects in which it participates over the total
or her email domain (e.g., .fr, .us, etc.).
contributions to AI projects from all countries.
b) Location information at the project level:
In future iterations, OECD.AI plans to include
• Project information: Where no location additional measures of contribution to AI software
information is available at the contributor development, such as issues raised, comments, and
level, information at the repository or project pull requests.
level is exploited. In particular, contributions
Identifying Programming Languages and
from contributors with no location information Development Tools Used in AI Projects
to projects created or owned by a known GitHub uses file extensions contained in a project to
organization are automatically assigned the automatically tag it with one or more programming
organization’s country (i.e., the country where languages and/or development tools. This implies that
its headquarters are located). For example, more than one programming language or development
contributions from a contributor with no tool could be used in a given AI project.
location information to an AI project owned by
Microsoft will be assigned to the United States.
If the above fails, a contributor’s location field is left

blank.

Measuring the Quality of AI Projects

Two quality measures are used to classify public AI
projects:
• Project impact: The impact of an AI project is

given by the number of managed copies (i.e.,
“forks”) made of that project.
• Project popularity: The impact of an AI project

is given by the number of followers (i.e., “stars”)
received by that project.
Filtering by project impact or popularity could help

identify countries that contribute the most to high
quality projects.
Measuring Collaboration
Two countries are said to collaborate on a specific
public AI software development project if there is
at least one contributor from each country with at
least one contribution (i.e., “commit”) to the project.
Domestic collaboration occurs when two contributors
from the same country contribute to a project.

Index Report 2023 Chapter 2: Technical Performance
Chapter 2: Technical Performance

ImageNet Very Deep Convolutional Networks for Large-Scale
Image Recognition
Data on ImageNet accuracy was retrieved through a ViTAEv2: Vision Transformer Advanced by Exploring
detailed arXiv literature review cross-referenced by Inductive Bias for Image Recognition and Beyond
technical progress reported on Papers With Code.
The reported dates correspond to the year in which To highlight progress on top-1 accuracy with the use
a paper was first published to arXiv, and the reported of extra training data, scores were taken from the
results (top-1 accuracy) correspond to the result following papers:
reported in the most recent version of each paper. Big Transfer (BiT): General Visual
Learn more about the LSVRC ImageNet competition Representation Learning
and the ImageNet dataset. CoAtNet: Marrying Convolution and
To highlight progress on top-1 accuracy without the Attention for All Data Sizes
use of extra training data, scores were taken from the CoCa: Contrastive Captioners Are Image-Text
following papers: Foundation Models
Meta Pseudo Labels
Aggregated Residual Transformations for
Deep Neural Networks
Exploring the Limits of Weakly Supervised Pretraining
National Institute of
Fixing the Train-Test Resolution Discrepancy:
Standards and Technology
FixEfficientNet (NIST) Face Recognition
ImageNet Classification With Deep Convolutional Vendor Test (FRVT)
Neural Networks
Data on NIST FRVT 1:1 verification accuracy by
PeCo: Perceptual Codebook for BERT
Pre-training of Vision Transformers dataset was obtained from the FRVT 1:1 verification
leaderboard.
Progressive Neural Architecture Search
Rethinking the Inception Architecture for
Computer Vision
Self-Training With Noisy Student Improves
ImageNet Classification
Some Improvements on Deep Convolutional Neural
Network Based Image Classification

Celeb-DF version of each paper. Details on the MPII benchmark

can be found in the MPII paper and MPII dataset.
Data on Celeb-DF AUC was retrieved through a
detailed arXiv literature review. The reported dates To highlight progress on percentage of correct
correspond to the year in which a paper was first keypoints without the use of extra training data, scores
published to arXiv or a method was introduced. With were taken from the following papers:
Celeb-DF, recent researchers have tested previously Bottom-Up and Top-Down Reasoning
existing deepfake detection methodologies. The year With Hierarchical Rectified Gaussians
in which a method was introduced, even if it was Cascade Feature Aggregation for
subsequently tested, is the year in which it is included Human Pose Estimation
in the report. The reported results (AUC) correspond Deeply Learned Compositional Models for
to the result reported in the most recent version of Human Pose Estimation
each paper. Details on the Celeb-DF benchmark can Efficient Object Localization Using
be found in the Celeb-DF paper. Convolutional Networks
Learning Feature Pyramids for Human Pose Estimation
To highlight progress on Celeb-DF, scores were taken
from the following papers: Stacked Hourglass Networks for
Human Pose Estimation
Deepfake Detection via Joint Unsupervised
Toward Fast and Accurate Human Pose Estimation
Reconstruction and Supervised Classification
via Soft-Gated Skip Connections
Exposing Deepfake Videos by Detecting
ViTPose: Simple Vision Transformer Baselines for
Face Warping Artifacts
Human Pose Estimation
Face X-Ray for More General Face Forgery Detection
FaceForensics++: Learning to Detect Manipulated
Facial Images
Cityscapes Challenge,
Spatial-Phase Shallow Learning: Rethinking Face
Pixel-Level Semantic
Forgery Detection in Frequency Domain Labeling Task
Data on the Cityscapes challenge, pixel-level semantic
MPII labeling task mean intersection-over-union (mIoU)
Data on MPII percentage of correct keypoints (PCK) was taken from the Cityscapes dataset, specifically
was retrieved through a detailed arXiv literature their pixel-level semantic labeling leaderboard.
review cross-referenced by technical progress More details about the Cityscapes dataset and other
reported on Papers With Code. The reported dates corresponding semantic segmentation challenges can
correspond to the year in which a paper was first be accessed at the Cityscapes dataset webpage.
published to arXiv, and the reported results (PCK)
correspond to the result reported in the most recent

Kvasir-SEG An Analysis of Scale Invariance in Object

Detection-SNIP
Data on Kvasir-SEG mean dice was retrieved through CBNet: A Novel Composite Backbone Network
a detailed arXiv literature review cross-referenced by Architecture for Object Detection
technical progress reported on Papers With Code. Deformable ConvNets v2: More Deformable,
The reported dates correspond to the year in which Better Results
a paper was first published to arXiv, and the reported DetectoRS: Detecting Objects With Recursive
results (mean dice) correspond to the result reported Feature Pyramid and Switchable Atrous Convolution
in the most recent version of each paper. Details on EVA: Exploring the Limits of Masked Visual
the Kvasir-SEG benchmark can be found in the Kvasir- Representation Learning at Scale
SEG paper. Grounded Language-Image Pre-training
To highlight progress on Kvasir-SEG, scores were Inside-Outside Net: Detecting Objects in Context
taken from the following papers: With Skip Pooling and Recurrent Neural Networks
GMSRF-Net: An Improved Generalizability With

Global Multi-Scale Residual Fusion Network for
CIFAR-10
Polyp Segmentation Data on CIFAR-10 FID scores was retrieved through
PraNet: Parallel Reverse Attention Network for a detailed arXiv literature review cross-referenced by
Polyp Segmentation technical progress reported on Papers With Code. The
ResUNet++: An Advanced Architecture for reported dates correspond to the year in which a paper
Medical Image Segmentation was first published to arXiv, and the reported results
Spatially Exclusive Pasting: A General Data (FID score) correspond to the result reported in the most
Augmentation for the Polyp Segmentation recent version of each paper. Details on the CIFAR-10
benchmark can be found in the CIFAR-10 paper.
Common Object in Context
To highlight progress on CIFAR-10, scores were taken
(COCO) from the following papers:
Data on COCO mean average precision (mAP50) was GANs Trained by a Two Time-Scale Update Rule
retrieved through a detailed arXiv literature review Converge to a Local Nash Equilibrium
cross-referenced by technical progress reported on Large Scale GAN Training for High Fidelity Natural
Papers With Code. The reported dates correspond to Image Synthesis
the year in which a paper was first published to arXiv, Refining Generative Process With Discriminator
and the reported results (mAP50) correspond to the Guidance in Score-Based Diffusion Models
result reported in the most recent version of each Score-Based Generative Modeling in Latent Space
paper. Details on the COCO benchmark can be found
Score-Based Generative Modeling Through
in the COCO paper. Stochastic Differential Equations
To highlight progress on COCO, scores were taken Self-Supervised GAN: Analysis and Improvement
from the following papers: With Multi-Class Minimax Game

STL-10 The reported dates correspond to the year in which

a paper was first published to arXiv, and the reported
Data on STL-10 FID scores was retrieved through a results (accuracy) correspond to the result reported in
detailed arXiv literature review cross-referenced by the most recent version of each paper. Human-level
technical progress reported on Papers With Code. The performance is taken from the 2021 VQA challenge.
reported dates correspond to the year in which a paper
To highlight progress on VQA accuracy without the
was first published to arXiv, and the reported results
use of extra training data, scores were taken from the
(FID score) correspond to the result reported in the
following papers:
most recent version of each paper. Details on the STL-
10 benchmark can be found in the STL-10 paper. Bilinear Attention Networks
Multimodal Compact Bilinear Pooling for Visual
To highlight progress on STL-10, scores were taken
Question Answering and Visual Grounding
from the following papers:
Oscar: Object-Semantics Aligned Pre-training
DEGAS: Differentiable Efficient Generator Search for Vision-Language Tasks
Diffusion-GAN: Training GANs With Diffusion PaLI: A Jointly-Scaled Multilingual
Discriminator Contrastive Divergence: Language-Image Model
Semi-Amortized Generative Modeling by Tips and Tricks for Visual Question Answering:
Exploring Energy of the Discriminator Learnings From the 2017 Challenge
Dist-GAN: An Improved GAN Using UNITER: UNiversal Image-TExt Representation Learning
Distance Constraints VLMo: Unified Vision-Language Pre-training With
Soft Truncation: A Universal Training Technique of Mixture-of-Modality-Experts
Score-Based Diffusion Model for High Precision
Score Estimation BEiT-3 Vs. Previous SOTA
Text-to-Image Models Data on BEiT-3 and Previous SOTA was retrieved from
the paper Wang et al., 2022.
on MS-COCO 256 × 256
FID-30K Visual Commonsense
Data on MS-COCO 256 x 256 FID 30K for Text-to- Reasoning (VCR)
Image Models was retrieved from the paper Saharia
Data on VCR Q->AR score was taken from VCR
et al., 2022.
leaderboard; the VCR leaderboard webpage further
delineates the methodology behind the VCR challenge.
Visual Question Answering Human performance on VCR is taken from Zellers et
(VQA) al., 2018. Details on the VCR benchmark can be found
Data on VQA accuracy was retrieved through a in the VCR paper.
detailed arXiv literature review cross-referenced by
technical progress reported on Papers With Code.

Kinetics-400, Kinetics-600, To highlight progress on Kinetics-600, scores were

taken from the following papers:
and Kinetics-700 Learning Spatio-Temporal Representation
Data on Kinetics-400, Kinetics-600, and Kinetics-700 With Local and Global Diffusion
accuracy was retrieved through a detailed arXiv Masked Feature Prediction for Self-Supervised
literature review cross-referenced by technical Visual Pre-training
progress reported on Papers With Code (Kinetics-400, PERF-Net: Pose Empowered RGB-Flow Net
Kinetics-600, and Kinetics-700). The reported Rethinking Spatiotemporal Feature Learning:
dates correspond to the year in which a paper was Speed-Accuracy Trade-Offs in Video Classification
first published to arXiv, and the reported results Rethinking Video ViTs: Sparse Video Tubes for
(top-1 accuracy) correspond to the result reported Joint Image and Video Learning
in the most recent version of each paper. Details on SlowFast Networks for Video Recognition
the Kinetics-400 benchmark can be found in the
To highlight progress on Kinetics-700, scores were
Kinetics-400 paper. Details on the Kinetics-600
benchmark can be found in the Kinetics-600 paper.
Details on the Kinetics-700 benchmark can be found in InternVideo: General Video Foundation Models via
the Kinetics-700 paper. Generative and Discriminative Learning
Learn to Cycle: Time-Consistent Feature Discovery
To highlight progress on Kinetics-400, scores were
for Action Recognition
Masked Feature Prediction for Self-Supervised
Co-training Transformer With Videos and Images Visual Pre-training
Improves Action Recognition
InternVideo: General Video Foundation Models via Text-to-Video Models
Generative and Discriminative Learning
on UCF-101
Large-Scale Weakly-Supervised Pre-training for
Video Action Recognition Data on UCF-101 Inception Score (IS) for Text-to-Video
Models was retrieved from the paper Hong et al., 2022,
Non-Local Neural Networks
and Singer et al., 2022.
Omni-Sourced Webly-Supervised Learning for
Video Recognition
SlowFast Networks for Video Recognition
Temporal Segment Networks: Towards Good
Practices for Deep Action Recognition

SuperGLUE
The SuperGLUE benchmark data was pulled from the SuperGLUE leaderboard. Details about the
SuperGLUE benchmark are in the SuperGLUE paper and SuperGLUE software toolkit. The tasks and
evaluation metrics for SuperGLUE are:
NAME IDENTIFIER METRIC
Broadcoverage Diagnostics AX-b Matthew’s Carr
CommitmentBank CB Avg. F1/Accuracy
Choice of Plausible Alternatives COPA Accuracy
Multi-Sentence Reading Comprehension MultiRC F1a/EM
Recognizing Textual Entailment RTE Accuracy
Words in Context WiC Accuracy
The Winograd Schema Challenge WSC Accuracy
BooIQ BooIQ Accuracy
Reading Comprehension With Commonsense Reasoning ReCoRD F1/Accuracy
Winogender Schema Diagnostic AX-g Gender Parity/Accuracy
Reading Comprehension Dataset Requiring

Logical Reasoning (ReClor)
Data on ReClor accuracy was retrieved through the ReClor leaderboard cross-referenced by technical progress
reported on Papers With Code. Details about the ReClor benchmark can be found in the ReClor paper.

Narrative Highlight: paid $90,000 for the role, and before I even started
on the case, I had to convince one top staffer to get
Just How Much Better Have paid even $100,000 to have an interview with my lead
Language Models Become? prosecutor.
Complete answer outputted by GPT-2 to the AI Index I wanted to know in advance: What percentage of
prompt request. his work was actually done as a private citizen, and
Model prompt >>> Explain to me the major how efficiently should he handle it for his clients? And
accomplishments of Theodore Roosevelt’s what advice would he give me should I end up in the
presidency. public eye?
(The most notable is the establishment of a national At no time, my job as the lead prosecutor on this
telephone and mail exchange company.) case required me to put forth any of those two or
five requests. Most people on the case, though, did.
But the only things that will change with a president
One person asked me what the case’s real value
who knows how to operate a business is his
was. That didn’t stop me from telling it to one of
understanding of the American system of government.
my top managers. And my top manager never even
The public knows all about our problems, yet at least
mentioned it.
half of the people who need it won’t know anything
about ours. We can no longer have this. Our system of In the end, it was a two-week, $40 million round-
government is the only trip tour of the government’s business, with the best
lawyer possible. In exchange, we got a chance to hear
one in which you really need a public relations firm to
his last public comment in this case, his last comment
work hard.
to me.
The system of governmental control, while working,
There was one more point — the end point of the case.
is not always effective in ensuring that the very things
that our economy needs, do not always be at the When it came up for consideration in October of 1993,
top of everybody’s list. So I wanted to know how I had the choice to give it three years, or six. And I
successful the system you employ for managing the chose six.
public’s needs is in helping the American economy
grow, how efficient the service our economy provides
is being made, and how it can better serve the needs
of a growing number of Americans.
I wanted to know.
This was before I ever hired my own lawyer to work on

the case. This was before I ever hired my own law firm
to handle government investigations. For my first major
job, at least, I had to convince two top aides to get

Large Language Models on PubMed

the Blocksworld Domain Data on PubMed recall-oriented understudy for gisting
Data on Blocksworld domain for large language models evaluation (ROUGE-1) was retrieved through a detailed
was retrieved from the paper Valmeekam et al., 2022. arXiv literature review cross-referenced by technical
progress reported on Papers With Code. The reported
arXiv dates correspond to the year in which a paper was first
published to arXiv, and the reported results (ROUGE-1)
Data on arXiv recall-oriented understudy for gisting
correspond to the result reported in the most recent
evaluation (ROUGE-1) was retrieved through a detailed
version of each paper. Details about the PubMed
arXiv literature review cross-referenced by technical
benchmark are in the PubMed paper.
progress reported on Papers With Code. The reported
dates correspond to the year in which a paper was first To highlight progress in PubMed, scores were taken
published to arXiv, and the reported results (ROUGE-1) from the following papers:
correspond to the result reported in the most recent A Discourse-Aware Attention Model for Abstractive
version of each paper. Details about the arXiv Summarization of Long Documents
benchmark are in the arXiv dataset webpage. Get to the Point: Summarization With Pointer-
Generator Networks
To highlight progress on arXiv, scores were taken from
Long Document Summarization With Top-Down
the following papers:
and Bottom-Up Inference
Big Bird: Transformers for Longer Sequences
LongT5: Efficient Text-to-Text Transformer for
A Discourse-Aware Attention Model for Long Sequences
Abstractive Summarization of Long Documents
PEGASUS: Pre-training With Extracted Gap-Sentences
Get to the Point: Summarization With for Abstractive Summarization
Pointer-Generator Networks
Sparsifying Transformer Models With Trainable
Long Document Summarization With Top-Down Representation Pooling
and Bottom-Up Inference
MemSum: Extractive Summarization of Long Abductive Natural Language
Documents Using Multi-Step Episodic Markov
Decision Processes
Inference (aNLI)
PEGASUS: Pre-training With Extracted Gap-Sentences Data on Abductive Natural Language Inference (aNLI)
for Abstractive Summarization was sourced from the Allen Institute for AI’s aNLI
leaderboard. Details on the aNLI benchmark can be
found in the aNLI paper.

SST-5 Fine-Grained To highlight progress on MMLU accuracy, scores were

Data on SST-5 Fine-Grained accuracy was retrieved
Language Models Are Few-Shot Learners
through a detailed arXiv literature review cross-
referenced by technical progress reported on Papers Language Models Are Unsupervised
Multitask Learners
With Code. The reported dates correspond to the year
in which a paper was first published to arXiv, and the Scaling Instruction-Finetuned Language Models
reported results (accuracy) correspond to the result Scaling Language Models: Methods, Analysis &
reported in the most recent version of each paper. Insights from Training Gopher
Details about the SST-5 Fine-Grained benchmark can

be found in the SST paper. Number of Commercially
To highlight progress on SST-5 Fine-Grained accuracy,
Available MT Systems
scores were taken from the following papers: Details about the number of commercially available
MT systems were sourced from the Intento report The
An Algorithm for Routing Capsules in All Domains
State of Machine Translation, 2022. Intento is a San
An Algorithm for Routing Vectors in Sequences
Francisco–based startup that analyzes commercially
Improved Semantic Representations from Tree-
available MT services.
Structured Long Short-Term Memory Networks
Improved Sentence Modeling Using Suffix
Bidirectional LSTM
VoxCeleb
Learned in Translation: Contextualized Word Vectors Data on VoxCeleb equal error rate (EER) was retrieved
from the VoxCeleb Speaker Recognition Challenge
Less Grammar, More Features
(VoxSRC).
Recursive Deep Models for Semantic Compositionality
Over a Sentiment Treebank For the sake of consistency, the AI Index reported scores
Self-Explaining Structures Improve NLP Models on the initial VoxCeleb dataset. Specifically, the AI Index
made use of the following sources of information:
MMLU ID R&D System Description to VoxCeleb Speaker
Recognition Challenge 2022
Data on MMLU accuracy was retrieved through a
The IDLAB VoXSRC-20 Submission: Large Margin
detailed arXiv literature review cross-referenced by
Fine-Tuning and Quality-Aware Score Calibration in
technical progress reported on Papers With Code. The DNN Based Speaker Verification
reported dates correspond to the year in which a paper
The SpeakIn System for VoxCeleb Speaker
was first published to arXiv, and the reported results Recognition Challenge 2021
(accuracy) correspond to the result reported in the most
VoxCeleb: A Large-Scale Speaker Identification
recent version of each paper. Details about the MMLU Dataset
benchmark can be found in the MMLU paper.
VoxCeleb: Large-Scale Speaker Verification in the Wild
VoxCeleb2: Deep Speaker Recognition

Whisper Training Time, Number

Data on Whisper for large-scale speech recognition of Accelerators, and
models was retrieved from the paper Radford et al., Performance
2022.
Data on training time, number of accelerators,
Procgen and performance for AI systems was taken from
the MLPerf Training and Inference benchmark
Data on Procgen mean-normalized score was retrieved
competitions. Details on the MLPerf Training
through a detailed arXiv literature review. The reported
benchmark can be found in the MLPerf Training
dates correspond to the year in which a paper was first
Benchmark paper, while details on MLPerf Inference
published to arXiv, and the reported results (mean-
can be found in the MLPerf Inference Benchmark
normalized score) correspond to the result reported in
paper. Information about the current benchmark
the most recent version of each paper. Details on the
categories as well as technical information about
Procgen benchmark can be found in the Procgen paper.
submission and competition subdivisions can be
To highlight progress on Procgen, scores were taken found on the MLPerf Training and MLPerf Inference
from the following papers: webpages.
Automatic Data Augmentation for Generalization in The AI Index made use of data from the following
Reinforcement Learning MLPerf Training competitions:
Leveraging Procedural Generation to Benchmark MLPerf Training v2.1, 2022
Reinforcement Learning
MLPerf Training v2.0, 2022
Procedural Generalization by Planning With
Self-Supervised World Models
Rethinking Value Function Learning for
Generalization in Reinforcement Learning MLPerf Training v0.7, 2020
The AI Index made use of data from the following

MLPerf Inference competitions:
MLPerf Inference v2.1, 2022

GPUs’ Performance and Price Carbon Footprint of Select

The AI Index collected data on GPUs’ performance and Machine Learning Models
price, building on and extending the dataset collected Data on carbon-emission estimates of select
from Epoch AI’s Trends in GPU Price-Performance machine learning models was sourced from the
blog post. paper Luccioni et al., 2022. Data on carbon-emission
The AI Index compiled a list of GPUs starting from estimates of real-life examples was retrieved from
the Median Group (2018), Sun et al. (2019), and Epoch Strubell et al., 2019.
(2022) datasets. To update and extend previous
analysis, the AI Index included new GPU releases
Energy Savings Results
for the period 2021–2023, gathering information From BCOOLER Experiment
from sources such as TechPowerUp, WikiChip, and Data on energy savings over time for the
Wikipedia entries for the product series. We also BCOOLER experiment was sourced from the
collected information about GPUs released before paper Luo et al., 2022.
2021 from the manufacturer’s catalog or Wikipedia’s
list of processors.
To disambiguate duplicates of different versions of

the same product with different specifications, the
AI Index added the part number or difference in
specification, as applicable.
To find GPU prices, the AI Index searched various

sources including the manufacturer’s website,
Wikipedia, and TechPowerUp. GPU prices have been
adjusted for inflation using CPI-U data provided by
the U.S. Bureau of Labor Statistics. Missing data for
certain GPUs was completed using additional sources,
such as the manufacturer’s website, Wikipedia,
and TechPowerUp. This includes information such
as manufacturer, type, release date, performance
(double, single, and half-precision operations per
second), die size, power, clock speed, process size,
and number of transistors.

Index Report 2023 Chapter 3: Technical AI Ethics
Chapter 3: Technical AI Ethics

Meta-Analysis of Fairness VLStereoSet: A Study of Stereotypical Bias
in Pre-trained Vision-Language Models
and Bias Metrics
For the analysis conducted on fairness and bias Natural Language Processing
metrics in AI, we identify and report on benchmark Bias Metrics
and diagnostic metrics which have been consistently
In Section 3.3, we track citations of the Perspective
cited in the academic community, reported on a public
API created by Jigsaw at Google. The Perspective API
leaderboard, or reported for publicly available baseline
has been adopted widely by researchers and engineers
models (e.g., GPT-3, BERT, ALBERT). We note that
in natural language processing. Its creators define
research paper citations are a lagging indicator of
toxicity as “a rude, disrespectful, or unreasonable
adoption, and metrics which have been very recently
comment that is likely to make someone leave a
adopted may not be reflected in the data for 2022. We
discussion,” and the tool is powered by machine
include the full list of papers considered in the 2022 AI
learning models trained on a proprietary dataset of
Index as well as the following additional papers:
comments from Wikipedia and news websites.
Beyond the Imitation Game: Quantifying and
Extrapolating the Capabilities of Language Models We include the full list of papers considered in the
2022 AI Index as well as the following additional
BBQ: A Hand-Built Bias Benchmark for
Question Answering papers:
Discovering Language Model Behaviors With AlexaTM 20B: Few-Shot Learning Using a
Model-Written Evaluations Large-Scale Multilingual Seq2Seq Model
“I’m Sorry to Hear That”: Finding New Biases in Aligning Generative Language Models With
Language Models With a Holistic Descriptor Dataset Human Values
On Measuring Social Biases in Prompt-Based Challenges in Measuring Bias via Open-Ended
Multi-task Learning Language Generation
PaLM: Scaling Language Modeling With Pathways Characteristics of Harmful Text: Towards
Perturbation Augmentation for Fairer NLP Rigorous Benchmarking of Language Models
Scaling Instruction-Finetuned Language Models Controllable Natural Language Generation With

Contrastive Prefixes
SODAPOP: Open-Ended Discovery of Social
Biases in Social Commonsense Reasoning Models DD-TIG at SemEval-2022 Task 5: Investigating the
Relationships Between Multimodal and Unimodal
Towards Robust NLG Bias Evaluation
Information in Misogynous Memes Detection and
With Syntactically-Diverse Prompts
Classification

Detoxifying Language Models With a Toxic Corpus Predictability and Surprise in Large Generative
DisCup: Discriminator Cooperative Unlikelihood Models
Prompt-Tuning for Controllable Text Generation Quark: Controllable Text Generation With
Evaluating Attribution in Dialogue Systems: Reinforced [Un]learning
The BEGIN Benchmark Red Teaming Language Models With Language Models
Exploring the Limits of Domain-Adaptive Training Reward Modeling for Mitigating Toxicity in
for Detoxifying Large-Scale Language Models Transformer-based Language Models
Flamingo: A Visual Language Model for Robust Conversational Agents Against Imperceptible
Few-Shot Learning Toxicity Triggers
Galactica: A Large Language Model for Science Scaling Instruction-Finetuned Language Models
GLaM: Efficient Scaling of Language Models StreamingQA: A Benchmark for Adaptation to New
With Mixture-of-Experts Knowledge over Time in Question Answering Models
GLM-130B: An Open Bilingual Pre-trained Model Training Language Models to Follow Instructions
Gradient-Based Constrained Sampling From With Human Feedback
Language Models Transfer Learning From Multilingual DeBERTa
HateCheckHIn: Evaluating Hindi Hate Speech for Sexism Identification
Detection Models Transformer Feed-Forward Layers Build Predictions
Holistic Evaluation of Language Models by Promoting Concepts in the Vocabulary Space
An Invariant Learning Characterization of While the Perspective API is used widely within
Controlled Text Generation machine learning research and also for measuring
LaMDA: Language Models for Dialog Applications online toxicity, toxicity in the specific domains used to
Leashing the Inner Demons: Self-Detoxification train the models undergirding Perspective (e.g., news,
for Language Models Wikipedia) may not be broadly representative of all
Measuring Harmful Representations in Scandinavian forms of toxicity (e.g., trolling). Other known caveats
Language Models include biases against text written by minority
Mitigating Toxic Degeneration With Empathetic Data: voices: The Perspective API has been shown to
Exploring the Relationship Between Toxicity and disproportionately assign high toxicity scores to text
Empathy that contains mentions of minority identities (e.g., “I
MULTILINGUAL HATECHECK: Functional Tests for am a gay man”). As a result, detoxification techniques
Multilingual Hate Speech Detection Models built with labels sourced from the Perspective API
A New Generation of Perspective API: Efficient result in models that are less capable of modeling
Multilingual Character-Level Transformers language used by minority groups, and may avoid
OPT: Open Pre-trained Transformer Language Models mentioning minority identities.
PaLM: Scaling Language Modeling With Pathways New versions of the Perspective API have been
Perturbations in the Wild: Leveraging Human-Written deployed since its inception, and there may be subtle
Text Perturbations for Realistic Adversarial Attack and undocumented shifts in its behavior over time.
Defense

RealToxicityPrompts We tally the number of papers in each category to

reach the numbers found in Figure 3.7.3. Papers
We sourced the RealToxicityPrompts dataset of are not double-counted in multiple categories. We
evaluations from the HELM benchmark website, as note that this data may not be as accurate for data
documented in v0.1.0. pre-2018 as societal impacts work at NeurIPS has
historically been categorized under a broad “AI for
AI Ethics in China social impact” umbrella, but it has recently been split
The data in this section is sourced from the 2022 paper into more granular research areas. Examples include
AI Ethics With Chinese Characteristics? Concerns workshops dedicated to machine learning for health;
and Preferred Solutions in Chinese Academia. We climate; policy and governance; disaster response;
are grateful to Junhua Zhu for clarifications and and the developing world.
correspondence. To track trends around specific technical topics at
NeurIPS as in Figures 3.7.4 to 3.7.7, we count the
AI Ethics Trends at FAccT number of papers accepted to the NeurIPS main track
and NeurIPS with titles containing keywords (e.g., “counterfactual”
or “causal” for tracking papers related to causal
To understand trends at the ACM Conference on
effect), as well as papers submitted to related
Fairness, Accountability, and Transparency, this
workshops.
section tracks FAccT papers published in conference
proceedings from 2018 to 2022. We categorize
author affiliations into academic, industry, nonprofit,
TruthfulQA
government, and independent categories, while also We sourced the TruthfulQA dataset of evaluations
tracking the location of their affiliated institution. from the HELM benchmark website, as documented
Authors with multiple affiliations are counted once in in v0.1.0.
each category (academic and industry), but multiple
affiliations of the same type (i.e., authors belonging
to two academic institutions) are counted once in the
category.
For the analysis conducted on NeurIPS publications,

we identify workshops themed around real-world
impact and label papers with a single main category in
“healthcare,” “climate,” “finance,” “developing world,”
“science,” or “other,” where “other” denotes a paper
related to a real-world use case but not in one of the
other categories. The “science” category is new in
2022, but includes retroactive analysis of papers from
previous years.

Index Report 2023 Chapter 4: The Economy
Chapter 4: The Economy

Lightcast job postings is the Job Openings and Labor Turnover
Survey (JOLTS) program, conducted by the Bureau
Prepared by Scott Bingham, Julia Nania, Layla O’Kane, of Labor Statistics. Based on comparisons between
and Bledi Taska JOLTS and Lightcast, the labor market demand
Lightcast delivers job market analytics that empower captured by Lightcast data represents over 99% of
employers, workers, and educators to make data- the total labor demand. Jobs not posted online are
driven decisions. The company’s artificial intelligence usually in small businesses (the classic example being
technology analyzes hundreds of millions of job postings the “Help Wanted” sign in a restaurant window) and
and real-life career transitions to provide insight union hiring halls.
into labor market patterns. This real-time strategic Measuring Demand for AI
intelligence offers crucial insights, such as what jobs are In order to measure the demand by employers of AI
most in demand, the specific skills employers need, and skills, Lightcast uses its skills taxonomy of over 31,000
the career directions that offer the highest potential for skills. The list of AI skills from Lightcast data are shown
workers. For more information, visit www.lightcast.io. below, with associated skill clusters. While some skills
Job Posting Data are considered to be in the AI cluster specifically,
To support these analyses, Lightcast mined its dataset for the purposes of this report, all skills below were
of millions of job postings collected since 2010. considered AI skills. A job posting was considered an
Lightcast collects postings from over 51,000 online job AI job if it mentioned any of these skills in the job text.
sites to develop a comprehensive, real-time portrait Artificial Intelligence: AIOps (Artificial Intelligence for
of labor market demand. It aggregates job postings, IT Operations), Applications of Artificial Intelligence,
removes duplicates, and extracts data from job postings Artificial General Intelligence, Artificial Intelligence,
text. This includes information on job title, employer, Artificial Intelligence Development, Artificial
industry, and region, as well as required experience, Intelligence Markup Language (AIML), Artificial
education, and skills. Intelligence Systems, Azure Cognitive Services,
Job postings are useful for understanding trends in Baidu, Cognitive Automation, Cognitive Computing,
the labor market because they allow for a detailed, Computational Intelligence, Cortana, Expert Systems,
real-time look at the skills employers seek. To assess Intelligent Control, Intelligent Systems, Interactive
the representativeness of job postings data, Lightcast Kiosk, IPSoft Amelia, Knowledge-Based Configuration,
conducts a number of analyses to compare the Knowledge-Based Systems, Multi-Agent Systems,
distribution of job postings to the distribution of official Open Neural Network Exchange (ONNX), OpenAI
government and other third-party sources in the United Gym, Reasoning Systems, Soft Computing, Syman,
States. The primary source of government data on U.S. Watson Conversation, Watson Studio, Weka

Autonomous Driving: Advanced Driver Assistance Machine Learning: AdaBoost, Apache MADlib,
Systems, Autonomous Cruise Control Systems, Apache Mahout, Apache SINGA, Apache Spark,
Autonomous System, Autonomous Vehicles, Guidance Association Rule Learning, Automated Machine
Navigation and Control Systems, Light Detection and Learning, Autonomic Computing, AWS SageMaker,
Ranging (LiDAR), OpenCV, Path Analysis, Path Finding, Azure Machine Learning, Boosting, CHi-Squared
Remote Sensing, Unmanned Aerial Systems (UAS) Automatic Interaction Detection (CHAID),
Classification And Regression Tree (CART), Cluster
Natural Language Processing (NLP): Amazon Textract,
Analysis, Collaborative Filtering, Confusion Matrix,
ANTLR, BERT (NLP Model), Chatbot, Computational
Cyber-Physical Systems, Dask (Software), Data
Linguistics, DeepSpeech, Dialog Systems, fastText,
Classification, DBSCAN, Decision Models, Decision
Fuzzy Logic, Handwriting Recognition, Hugging
Tree Learning, Dimensionality Reduction, Dlib
Face (NLP Framework), HuggingFace Transformers,
(C++ Library), Ensemble Methods, Evolutionary
Intelligent Agent, Intelligent Software Assistant,
Programming, Expectation Maximization Algorithm,
Intelligent Virtual Assistant, Kaldi, Latent Dirichlet
Feature Engineering, Feature Extraction, Feature
Allocation, Lexalytics, Machine Translation, Microsoft
Learning, Feature Selection, Gaussian Process,
LUIS, Natural Language Generation, Natural Language
Genetic Algorithm, Google AutoML, Google Cloud
Processing, Natural Language Processing Systems,
ML Engine, Gradient Boosting, H2O.ai, Hidden
Natural Language Programming, Natural Language
Markov Model, Hyperparameter Optimization,
Toolkits, Natural Language Understanding, Natural
Inference Engine, K-Means Clustering, Kernel
Language User Interface, Nearest Neighbour
Methods, Kubeflow, LIBSVM, Machine Learning,
Algorithm, OpenNLP, Optical Character Recognition
Machine Learning Algorithms, Markov Chain, Matrix
(OCR), Screen Reader, Semantic Analysis, Semantic
Factorization, Meta Learning, Microsoft Cognitive
Interpretation for Speech Recognition, Semantic
Toolkit (CNTK), MLflow, MLOps (Machine Learning
Parsing, Semantic Search, Sentiment Analysis,
Operations), mlpack (C++ Library), Naive Bayes,
Seq2Seq, Speech Recognition, Speech Recognition
Perceptron, Predictionio, PyTorch (Machine Learning
Software, Statistical Language Acquisition, Text Mining,
Library), Random Forest Algorithm, Recommendation
Tokenization, Voice Interaction, Voice User Interface,
Engine, Recommender Systems, Reinforcement
Word Embedding, Word2Vec Models
Learning, Scikit-learn (Machine Learning Library),
Neural Networks: Apache MXNet, Artificial Neural Semi-Supervised Learning, Soft Computing, Sorting
Networks, Autoencoders, Caffe, Caffe2, Chainer, Algorithm, Supervised Learning, Support Vector
Convolutional Neural Networks, Cudnn, Deep Learning, Machine, Test Datasets, Torch (Machine Learning),
Deeplearning4j, Keras (Neural Network Library), Long Training Datasets, Transfer Learning, Unsupervised
Short-Term Memory (LSTM), OpenVINO, PaddlePaddle, Learning, Vowpal Wabbit, Xgboost
Pybrain, Recurrent Neural Network (RNN), TensorFlow

Robotics: Advanced Robotics, Cognitive Robotics, intelligence, computer vision, image processing,
Motion Planning, Nvidia Jetson, Robot Framework, deep learning, TensorFlow, Pandas (software), and
Robot Operating Systems, Robotic Automation OpenCV, among others.
Software, Robotic Liquid Handling Systems, Robotic
Skill groupings are derived by expert taxonomists
Programming, Robotic Systems, Servomotor, SLAM
through a similarity-index methodology that
Algorithms (Simultaneous Localization and Mapping)
measures skill composition at the industry level.
Visual Image Recognition: 3D Reconstruction, Activity LinkedIn’s industry taxonomy and their corresponding
Recognition, Computer Vision, Contextual Image NAICS codes can be found here.
Classification, Digital Image Processing, Eye Tracking,
Skills Genome
Face Detection, Facial Recognition, Image Analysis,
For any entity (occupation or job, country, sector,
Image Matching, Image Processing, Image Recognition,
etc.), the skill genome is an ordered list (a vector) of
Image Segmentation, Image Sensor, Imagenet,
the 50 “most characteristic skills” of that entity. These
Machine Vision, Motion Analysis, Object Recognition,
most characteristic skills are identified using a TF-IDF
OmniPage, Pose Estimation, RealSense
algorithm to identify the most representative skills of
the target entity, while down-ranking ubiquitous skills
LinkedIn that add little information about that specific entity
Prepared by Murat Erer and Akash Kaura (e.g., Microsoft Word).
Country Sample TF-IDF is a statistical measure that evaluates how

Included countries represent a select sample of eligible representative a word (in this case a skill) is to a
countries with at least 40% labor force coverage by selected entity). This is done by multiplying two
LinkedIn and at least 10 AI hires in any given month. metrics:
China and India were included in this sample because
1. The term frequency of a skill in an entity (TF).
of their increasing importance in the global economy,
but LinkedIn coverage in these countries does not 2. The logarithmic inverse entity frequency of the
reach 40% of the workforce. Insights for these countries skill across a set of entities (IDF). This indicates
may not provide as full a picture as other countries, and how common or rare a word is in the entire entity
should be interpreted accordingly. set. The closer IDF is to 0, the more common the
word.
Skills (and AI Skills)
LinkedIn members self-report their skills on their So if the skill is very common across LinkedIn entities,
LinkedIn profiles. Currently, more than 38,000 distinct, and appears in many job or member descriptions, the
standardized skills are identified by LinkedIn. These IDF will approach 0. If, on the other hand, the skill
have been coded and classified by taxonomists at is unique to specific entities, the IDF will approach
LinkedIn into 249 skill groupings, which are the skill 1. More details are available at LinkedIn’s Skills
groups represented in the dataset. The top skills that Genome and LinkedIn-World Bank Methodology.
make up the AI skill grouping are machine learning,
natural language processing, data structures, artificial

AI Skills Penetration perform the job. Skills penetration is used as a signal

The aim of this indicator is to measure the intensity of AI for whether AI skills are prevalent in an occupation
skills in an entity (a particular country, industry, gender, representative in any sector where the occupation
etc.) through the following methodology: representative may exist. Examples of such
occupations include (but are not limited to): machine
• Compute frequencies for all self-added skills by
learning engineer, artificial intelligence specialist,
LinkedIn members in a given entity (occupation,
data scientist, computer vision engineer, etc.
industry, etc.) in 2015–2021.
AI Talent
• Re-weight skill frequencies using a TF-IDF model
A LinkedIn member is considered AI talent if they
to get the top 50 most representative skills in
have explicitly added AI skills to their profile and/or
that entity. These 50 skills compose the “skill
they are occupied in an AI occupation representative.
genome” of that entity.
The counts of AI talent are used to calculate talent
• Compute the share of skills that belong to the concentration metrics. For example, to calculate
AI skill group out of the top skills in the selected the country level AI talent concentration, we use
entity. the counts of AI talent at the country level vis-a-vis
the counts of LinkedIn members in the respective
Interpretation: The AI skill penetration rate signals
countries.
the prevalence of AI skills across occupations, or the
intensity with which LinkedIn members utilize AI skills Relative AI Skills Penetration
in their jobs. For example, the top 50 skills for the To allow for skills penetration comparisons across
occupation of engineer are calculated based on the countries, the skills genomes are calculated and a
weighted frequency with which they appear in LinkedIn relevant benchmark is selected (e.g., global average).
members’ profiles. If four of the skills that engineers A ratio is then constructed between a country’s and
possess belong to the AI skill group, this measure the benchmark’s AI skills penetrations, controlling for
indicates that the penetration of AI skills is estimated to occupations.
be 8% among engineers (i.e., 4/50). Interpretation: A country’s relative AI skills
Jobs or Occupations penetration of 1.5 indicates that AI skills are 1.5 times
LinkedIn member titles are standardized and grouped as frequent as in the benchmark, for an overlapping
into approximately 15,000 occupations. These are set of occupations.
not sector- or country-specific. These occupations Global Comparison
are further standardized into approximately For cross-country comparison, we present the
3,600 occupation representatives. Occupation relative penetration rate of AI skills, measured as
representatives group occupations with a common role the sum of the penetration of each AI skill across
and specialty, regardless of seniority. occupations in a given country, divided by the
AI Jobs and Occupations average global penetration of AI skills across the
An “AI” job (technically, occupation representative) overlapping occupations in a sample of countries.
is an occupation representative that requires AI skills to

Interpretation: A relative penetration rate of 2 means Interpretation: A country’s “Relative AI Skills

that the average penetration of AI skills in that country Penetration Across Genders” for women of 1.5 means
is two times the global average across the same set of that female members in that country are 1.5 times
occupations. more likely to list AI skills than the average member in
all countries pooled together across the same set of
Global Comparison: By Industry
occupations that exist in the country.
The relative AI skills penetration by country for industry
provides an in-depth sectoral decomposition of AI skill Relative AI Hiring Index
penetration across industries and sample countries.
LinkedIn Hiring Rate or Overall Hiring Rate
is a measure of hires normalized by LinkedIn
Interpretation: A country’s relative AI skill penetration
membership. It is computed as the percentage of
rate of 2 in the education sector means that the average
LinkedIn members who added a new employer
penetration of AI skills in that country is two times the
in the same period the job began, divided by
global average across the same set of occupations in
the total number of LinkedIn members in the
that sector.
corresponding location.
Global Comparison: By Gender
AI Hiring Rate is computed following the overall
The “Relative AI Skills Penetration by Gender” metric
hiring rate methodology, but only considering
provides a cross-country comparison of AI skill
members classified as AI talent.
penetrations within each gender, comparing countries’
male or female AI skill penetrations to the global
Relative AI Hiring Index is the pace of change in
average of the same gender. Since the global averages AI Hiring Rate normalized by the pace of change
are distinct for each gender, this metric should only be in Overall Hiring Rate, providing a picture of
used to compare country rankings within each gender, whether hiring of AI talent is growing at a higher,
and not for cross-gender comparisons within countries. equal, or lower rate than overall hiring in a
market. The relative AI Hiring Index is equal to
Interpretation: A country’s AI skills penetration for
1.0 when AI hiring and overall hiring are growing
women of 1.5 means that female members in that
at the same rate year on year.
country are 1.5 times more likely to list AI skills than the
average female member in all countries pooled together Interpretation: Relative AI Hiring Index shows how
across the same set of occupations that exist in the fast each country is experiencing growth in AI talent
country/gender combination. hiring relative to growth in overall hiring in the country.
Global Comparison: Across Gender A ratio of 1.2 means the growth in AI talent hiring has
The “Relative AI Skills Penetration Across Genders” outpaced the growth in overall hiring by 20%.
metric allows for cross-gender comparisons within
and across countries globally, since we compare the
countries’ male and female AI skill penetrations to the
same global average regardless of gender.

Changelog From Methodology Included in Last Year’s AI Index

1. LinkedIn ramped a new version of its industry taxonomy (see details here).
a. This has resulted in changes to our top level five key industries. We have made the full-time series
available for each industry (as with prior years).
i. “Software & IT Services” industry evolved into a wider “Technology, Information and Media,” which
encompasses media and telecommunications as well as other sub-industries.
ii. Former “Hardware & Networking” industry does not exist in the new taxonomy, so we introduced
“Professional Services” industry as the fifth industry in scope which contains a high concentration of AI
talent.
iii. Remaining “Education,” “Manufacturing,” and “Financial Services” (formerly known as “Finance”)
also had updates in their coverage resulting from the inclusion of more granular sub-industries.
b. This also resulted in minor changes in magnitudes for some metrics since the distinct number of
industries, as well as the distinct number of AI occupations defined within each country-industry pair have
changed:
i. We define AI occupations (occupation representatives that require AI skills to perform the job) and
the respective definition of AI Talent at Country-Industry level. For example, data engineers working
in the technology, information, and media industry in Germany may be identified as holding an AI
occupation, whereas data engineers working in the construction industry in the United Arab Emirates
may not be identified as AI Talent. Following the introduction of a more granular industry taxonomy
with improved accuracy, our AI Talent identifications have been improved, and results have been
reflected to the entirety of time series for each relevant metric.
ii. The following metrics have been impacted by this change in industry taxonomy: AI Talent
Concentrations, and Relative AI Hiring Rates. No directional changes were observed, only minor
changes in magnitudes.
2. We introduced a methodology change into Relative Skills Penetration metrics:
a. In the past, the data used to calculate these metrics were limited to top five industries with the highest
AI skill penetration globally: “Software & IT Services,” “Hardware & Networking,” “Manufacturing,”
“Education,” and “Finance” industries. This year we updated our coverage to all industries.

NetBase Quid subsidiary, out of business) throughout the world.

The investment data includes private investments,
Prepared by Bill Valle and Nicole Seredenko M&A, public offerings, minority stakes made by PE/
NetBase Quid delivers AI-powered consumer and VCs, corporate venture arms, governments, and
market intelligence to enable business reinvention institutions both within and outside the United States.
in a noisy and unpredictable world. The software Some data is simply unreachable—for instance, when
applies artificial intelligence to reveal patterns in large, investors’ names or funding amounts are undisclosed.
unstructured datasets and to generate visualizations NetBase Quid embeds Capital IQ data as a default
that enable users to make smart, data-driven decisions and adds in data from Crunchbase for the data
accurately, quickly, and efficiently. NetBase Quid uses points that are not captured in Capital IQ. This not
Boolean query to search for focus areas, topics, and only yields comprehensive and accurate data on
keywords within social media, news, forums and blogs, all global organizations, but it also captures early-
companies, and patents data sources, as well as other stage startups and funding events data. Company
custom datasets. NetBase Quid then visualizes these information is updated on a weekly basis.
data points based on the semantic similarity.
Earnings Calls
Search, Data Sources, and Scope NetBase Quid leverages earnings call transcript
Over 8 million global public and private company profiles data embedded from Seeking Alpha. For this report,
from multiple data sources are indexed in order to NetBase Quid has analyzed mentions of AI-related
search across company descriptions, while filtering and keywords across all earnings call transcripts from
including metadata ranging from investment information Fortune 500 companies from January 2018 through
to firmographic information, such as founded year, HQ December 2022. New earnings call transcript data is
location, and more. Company information is updated updated in NetBase Quid on the 1st and 15th of every
on a weekly basis. The NetBase Quid algorithm reads month.
a big amount of text data from each document to
Search Parameters
make links between different documents based on
Boolean query is used to search for focus areas,
their similar language. This process is repeated at an
topics, and keywords within the archived company
immense scale, which produces a network with different
database, within their business descriptions and
clusters identifying distinct topics or focus areas. Trends
websites. We can filter out the search results by
are identified based on keywords, phrases, people,
HQ regions, investment amount, operating status,
companies, and institutions that NetBase Quid identifies,
organization type (private/public), and founding
and the other metadata that is put into the software.
year. NetBase Quid then visualizes these companies
Data by semantic similarity. If there are more than 7,000
Companies companies from the search result, NetBase Quid
Organization data is embedded from Capital IQ and selects the 7,000 most relevant companies for
Crunchbase. These companies include all types of visualization based on the language algorithm.
companies (private, public, operating, operating as a

Boolean Search: “artificial intelligence” or “AI” or

McKinsey & Company
“machine learning” or “deep learning”
Data used in the Corporate Activity-Industry
Companies:
Adoption section was sourced from the McKinsey
• Global AI and ML companies that have received Global Survey “The State of AI in 2022—and a Half
investments (private, IPO, M&A) from January 1, Decade in Review.”
2013, to December 31, 2022.
The online survey was in the field from May 3, 2022,
• Global AI and ML companies that have received to May 27, 2022, and from August 15, 2022, to
over $1.5M for the last 10 years (January 1, 2013, August 17, 2022, and garnered responses from 1,492
to December 31, 2022): 7,000 out of 7,500 participants representing a full range of regions,
companies have been selected through NetBase industries, company sizes, functional specialties,
Quid’s relevance algorithm. and tenures. Of those respondents, 744 said their
organization had adopted AI in at least one function
Target Event Definitions
and were asked questions about their organization’s
• Private investments: A private placement is a AI use. To adjust for differences in response rates,
private sale of newly issued securities (equity or the data is weighted by the contribution of each
debt) by a company to a selected investor or a respondent’s nation to global GDP.
selected group of investors. The stakes that buyers
The AI Index also considered data from previous
take in private placements are often minority
iterations of the survey. More specifically, the AI
stakes (under 50%), although it is possible to take
index made use of data from:
control of a company through a private placement
as well, in which case the private placement would The State of AI in 2021
be a majority stake investment. The State of AI in 2020

Global AI Survey: AI Proves Its Worth,
• Minority investment: These refer to minority
But Few Scale Impact (2019)
stake acquisitions in NetBase Quid, which take
AI Adoption Advances, But Foundational Barriers
place when the buyer acquires less than 50% of
Remain (2018)
the existing ownership stake in entities, asset
products, and business divisions.
• M&A: This refers to a buyer acquiring more than

50% of the existing ownership stake in entities,
asset products, and business divisions.

GitHub To obtain a global view of how AI is transforming

organizations, Deloitte surveyed 2,620 global
Data on the effects of GitHub’s Copilot on developer business leaders between April 2022 and May 2022.
productivity and happiness was sourced from the Thirteen countries were represented: Australia (100
GitHub Copilot Survey conducted in 2022. respondents), Brazil (115 respondents), Canada (175
The survey was emailed to 17,420 users who had opted respondents), China (200 respondents), France (130
in to receive communications and were using GitHub respondents), Germany (150 respondents), India
Copilot for their daily programming activities. Between (200 respondents), Israel (75 respondents), Japan
February 10, 2022, and March 6, 2022, the authors (100 respondents), Singapore (100 respondents),
received 2,047 responses that could be matched with South Africa (75 respondents), the United Kingdom
usage measurements during the four-week period (200 respondents), and the United States (1,000
leading up to March 12, 2022. The survey contained respondents). All participating companies have
multiple-choice questions on demographic information adopted AI technologies and are AI users.
and Likert-type questions on different aspects of Respondents were required to meet one of the
productivity, which were randomized in the order of following criteria: responsible for AI technology
appearance to the user. spending or approval of AI investments, developing
AI technology strategies, managing or overseeing
More details can be found in Ziegler at al., 2022.
AI technology implementation, serving as an AI
technology subject matter specialist, or making
Deloitte or influencing decisions around AI technology. To
Data used in the Corporate Activity-Industry Motivation complement the blind survey, Deloitte conducted
section was sourced from Deloitte’s “State of AI in the qualitative telephone interviews with 15 AI specialists
Enterprise” surveys. from various industries. More details are available on
Deloitte’s website.
More specifically, the AI Index made use of the following
sources of information:
International Federation of
Deloitte’s State of AI in the Enterprise,
5th Edition Report (2022)
Robotics (IFR)
State of AI in the Enterprise, 4th Edition (2021) Data presented in the Robot Installations section was
Deloitte’s State of AI in the Enterprise, 3rd Edition (2020) sourced from the “World Robotics 2022” report.
State of AI in the Enterprise, 2nd Edition (2018)

The 2017 Deloitte State of Cognitive Survey (2017)

Index Report 2023 Chapter 5: Education
Chapter 5: Education
Computing Research The CRA Taulbee Survey is sent only to doctoral
departments of computer science, computer
Association (CRA Taulbee engineering, and information science/systems.
Survey) Historically, (a) Taulbee covers one-quarter to one-
third of total BS CS recipients in the United States;
Note: This year’s AI Index reused the methodological
(b) the percent of women earning bachelor’s degrees
notes that were submitted by the CRA for previous
is lower in the Taulbee schools than overall; and (c)
editions of the AI Index. For more complete delineations
Taulbee tracks the trends in overall CS production.
of the methodology used by the CRA, please consult the
individual CRA surveys that are linked below. The AI Index used data from the following iterations
of the CRA survey:
Computing Research Association (CRA) members
are 200-plus North American organizations active in CRA, 2021
computing research: academic departments of computer CRA, 2020
science and computer engineering; laboratories and CRA, 2019
centers in industry, government, and academia; and
CRA, 2018
affiliated professional societies (AAAI, ACM, CACS/
CRA, 2017
AIC, IEEE Computer Society, SIAM USENIX). CRA’s
CRA, 2016
mission is to enhance innovation by joining with industry,
government, and academia to strengthen research and CRA, 2015
advanced education in computing. Learn more about CRA, 2014

CRA here. CRA, 2013
CRA, 2012
The CRA Taulbee Survey gathers survey data during the
fall of each academic year by reaching out to over 200 CRA, 2011
PhD-granting departments. Details about the Taulbee
Survey can be found here. Taulbee doesn’t directly
survey the students. The department identifies each
new PhD’s area of specialization as well as their type
of employment. Data is collected from September to
January of each academic year for PhDs awarded in the
previous academic year. Results are published in May
after data collection closes.

Index Report 2023 Chapter 5: Education
Code.org
State Level Data
The following link includes a full description of the
methodology used by Code.org to collect its data. The
staff at Code.org also maintains a database of the state
of American K–12 education and, in this policy primer,
provides a greater amount of detail on the state of
American K–12 education in each state.
AP Computer Science Data

The AP computer science data is provided to Code.org
as per an agreement the College Board maintains with
Code.org. The AP Computer Science data comes from
the college board’s national and state summary reports.
The State of International

K–12 Education
Data on the state of international K–12 AI education was
taken from the following UNESCO report, published in
2021. The methodology is outlined in greater detail on
pages 18 to 20 in the report and, for the sake of brevity, is
not completely reproduced in the 2023 AI index.

Index Report 2023 Chapter 6: Policy and Governance
Chapter 6: Policy and Governance

Global Legislation Records on AI
For AI-related bills passed into laws, the AI Index performed searches of the keyword “artificial intelligence”
on the websites of 127 countries’ congresses or parliaments (in the respective languages) in the full text of bills.
Note that only laws passed by state-level legislative bodies and signed into law (i.e., by presidents or through
royal assent) from 2016 to 2022 are included. Laws that were approved but then repealed are not included in the
analysis. In some cases, there were databases that were only searchable by title, so site search functions were
deployed. Future AI Index reports hope to include analysis on other types of legal documents, such as regulations
and standards, adopted by state- or supranational-level legislative bodies, government agencies, etc. The AI Index
team surveyed the following databases:
Algeria China India Monaco Slovenia

Andorra Colombia Iran, Islamic Republic Montenegro South Africa
Antigua and Barbuda Croatia Iraq Morocco Spain
Argentina Cuba Ireland Mozambique Sri Lanka
Armenia Curacao Isle of Man Nauru St. Kitts and Nevis
Australia Cyprus Israel The Netherlands Suriname
Austria Czech Republic Italy New Zealand Sweden
Azerbaijan Denmark Jamaica Nicaragua Switzerland
The Bahamas Estonia Japan Niger Tajikistan
Bahrain Faroe Islands Kazakhstan Northern Marina Tanzania
Bangladesh Fiji Kenya Islands Togo
Barbados Finland Kiribati Norway Tongo
Belarus France Korea, Republic Panama Turkey
Belgium The Gambia Kosovo Papua New Guinea Tuvalu
Belize Georgia Kyrgyz Republic Philippines Uganda
Bermuda Germany Latvia Poland Ukraine
Bhutan Gibraltar Lebanon Portugal United Arab Emirates
Bolivia Greece Liechtenstein Romania United Kingdom
Brazil Greenland Lithuania Russia United States
Brunei Grenada Luxembourg Samoa Uruguay
Bulgaria Guam Macao SAR, China Saudi Arabia Vietnam
Burkina Faso Guatemala Malawi Serbia Yemen
Cameroon Guyana Malaysia Seychelles Zambia
Canada Hong Kong Malta Sierra Leone Zimbabwe
Cayman Islands Hungary Mauritius Singapore
Chile Iceland Mexico Slovak Republic
Table of Contents Appendix 37 7

United States State-Level AI Legislation

For AI-related bills passed into law, the AI Index performed searches of the keyword “artificial intelligence” on
the legislative websites of all 50 U.S. states in the full text of bills. Bills are only counted as passed into law if the
final version of the bill includes the keyword, not just the introduced version. Note that only laws passed from
2015 to 2022 are included. The count for proposed laws includes both laws that were proposed and eventually
passed as well as laws that were proposed that have not yet been passed, or are now inactive. In some cases,
databases were only searchable by title, so site search functions were deployed. The AI Index team surveyed
the following databases:
Alabama Hawaii Massachusetts New Mexico South Dakota

Alaska Idaho Michigan New York Tennessee
Arizona Illinois Minnesota North Carolina Texas
Arkansas Indiana Mississippi North Dakota Utah
California Iowa Missouri Ohio Vermont
Colorado Kansas Montana Oklahoma Virginia
Connecticut Kentucky Nebraska Oregon Washington
Delaware Louisiana Nevada Pennsylvania West Virginia
Florida Maine New Hampshire Rhode Island Wisconsin
Georgia Maryland New Jersey South Carolina Wyoming

Global AI Mentions
For mentions of AI in AI-related legislative proceedings around the world, the AI Index performed searches of
the keyword “artificial intelligence” on the websites of 81 countries’ congresses or parliaments (in the respective
languages), usually under sections named “minutes,” “hansard,” etc. In some cases, databases were only
searchable by title, so site search functions were deployed. The AI Index team surveyed the following databases:
Andorra Ecuador Japan Northern Mariana South Africa

Angola El Salvador Kenya Islands South Korea
Armenia Estonia Kosovo Norway Spain
Australia Fiji Latvia Pakistan Sri Lanka
Azerbaijan Finland Lesotho Panama Sweden
Barbados France Liechtenstein Papua New Guinea Switzerland
Belgium The Gambia Luxembourg Philippines Tanzania
Bermuda Germany Macao SAR, China Poland Trinidad and Tobago
Bhutan Gibraltar Madagascar Portugal Ukraine
Brazil Greece Malaysia Romania United Kingdom
Cabo Verde Hong Kong Maldives Russia United States
Canada Iceland Malta Samoa Uruguay
Cayman Islands India Mauritius San Marino Zambia
China 11
Ireland Mexico Seychelles Zimbabwe
Czech Republic Isle of Man Moldova Sierra Leone
Denmark Israel Netherlands Singapore
Dominican Republic Italy New Zealand Slovenia
11 The National People’s Congress is held once per year and does not provide full legislative proceedings. Hence, the counts included in the analysis only searched mentions of “artificial
intelligence” in the only public document released from the Congress meetings, the Report on the Work of the Government, delivered by the premier.

United States • Private

Sector Companies: Google AI, Microsoft
AI, Nvidia, OpenAI
Committee Mentions
• Think Tanks and Policy Institutes: American
In order to research trends on the United States’
Enterprise Institute, Aspen Institute, Atlantic
committee mentions of AI, the following search was
Council, Brookings Institute, Carnegie
conducted:
Endowment for International Peace, Cato
Website: Congress.gov Institute, Center for a New American Security,
Keyword: artificial intelligence Center for Strategic and International Studies,
Filters: Committee Reports Council on Foreign Relations, Heritage
Foundation, Hudson Institute, MacroPolo,
United States AI Policy Papers National Security Institute, New America
Foundation, RAND Corporation, Rockefeller
Organizations
Foundation, Stimson Center, Urban Institute,
To develop a more nuanced understanding of the
Wilson Center
thought leadership that motivates AI policy, we
tracked policy papers published by 55 organizations • University Institutes and Research Programs:
in the United States or with a strong presence in the AI and Humanity, Cornell University; AI Now
United States (expanded from last year’s list of 36 Institute, New York University; AI Pulse, UCLA
organizations) across four broad categories: Law; Belfer Center for Science and International
Affairs, Harvard University; Berkman Klein Center,
• Civil Society, Associations, and Consortiums:
Harvard University; Center for Information
Algorithmic Justice League, Alliance for Artificial
Technology Policy, Princeton University; Center
Intelligence in Healthcare, Amnesty International,
for Long-Term Cybersecurity, UC Berkeley;
EFF, Future of Privacy Forum, Human Rights
Center for Security and Emerging Technology,
Watch, IJIS Institute, Institute for Electrical and
Georgetown University; CITRIS Policy Lab,
Electronics Engineers, Partnership on AI
UC Berkeley; Hoover Institution, Stanford
• Consultancy: Accenture, Bain & Company, University; Institute for Human-Centered Artificial
Boston Consulting Group, Deloitte, McKinsey & Intelligence, Stanford University; Internet Policy
Company Research Initiative, Massachusetts Institute of
Technology; MIT Lincoln Laboratory; Princeton
• Government Agencies: Congressional Research
School of Public and International Affairs
Service, Defense Technical Information Center,
Government Accountability Office, Library of
Congress, Pentagon Library

Methodology • Industry and Regulation: economy, antitrust,

Each broad topic area is based on a collection of M&A, competition, finance, management, supply
underlying keywords that describe the content of the chain, telecom, economic regulation, technical
specific paper. We included 17 topics that represented standards, autonomous vehicle industry and
the majority of discourse related to AI between 2018– regulation
2021. These topic areas and the associated keywords
• Innovation and Technology: advancements and
are listed below:
improvements in AI technology, R&D, intellectual
• Health and Biological Sciences: medicine, property, patents, entrepreneurship, innovation
healthcare systems, drug discovery, care, ecosystems, startups, computer science,
biomedical research, insurance, health behaviors, engineering
COVID-19, global health
• Education and Skills: early childhood, K–12,
• Physical Sciences: chemistry, physics, astronomy, higher education, STEM, schools, classrooms,
earth science reskilling
• Energy and Environment: energy costs, climate • Workforce and Labor: labor supply and demand,
change, energy markets, pollution, conservation, talent, immigration, migration, personnel
oil and gas, alternative energy economics, future of work
• International Affairs and International Security: • Social and Behavioral Sciences: sociology,
international relations, international trade, linguistics, anthropology, ethnic studies,
developing countries, humanitarian assistance, demography, geography, psychology, cognitive
warfare, regional security, national security, science
autonomous weapons
• Humanities: arts, music, literature, language,
• Justice and Law Enforcement: civil justice, performance, theater, classics, history,
criminal justice, social justice, police, public philosophy, religion, cultural studies
safety, courts
• Equity and Inclusion: biases, discrimination,
• Communications and Media: social media, gender, race, socioeconomic inequality,
disinformation, media markets, deepfakes disabilities, vulnerable populations
• Government and Public Administration: • Privacy, Safety, and Security: anonymity, GDPR,
federal government, state government, local consumer protection, physical safety, human
government, public sector efficiency, public control, cybersecurity, encryption, hacking
sector effectiveness, government services,
• Ethics: transparency, accountability, human
government benefits, government programs,
values, human rights, sustainability, explainability,
public works, public transportation
interpretability, decision-making norms
• Democracy: elections, rights, freedoms, liberties,
personal freedoms

National AI Strategies
The AI Index did a web search to identify national strategies on AI. Below is a list of countries that were identified
as having a national AI strategy, including a link to said strategy. For certain counties, noted with an asterisk(*), the
actual strategy was not found, and a news article confirming the launch of the strategy was linked instead.
Countries with AI Strategies in Place

Algeria* Cyprus Italy Philippines Switzerland
Argentina Czech Republic Japan Poland Thailand
Australia Denmark Kenya Portugal Tunisia*
Austria Egypt, Arab Republic Korea, Republic Qatar Turkey
Bangladesh Estonia Latvia Romania Ukraine
Botswana* Finland Lithuania Russia United Arab Emirates
Brazil France Luxembourg Saudi Arabia United Kingdom
Bulgaria Germany Malta Serbia United States
Canada Greece Mauritius Sierra Leone Uruguay
Chile Hungary Mexico Singapore Vietnam
China India The Netherlands Slovenia
Colombia Indonesia Norway Spain
Croatia Ireland Peru Sweden
Countries with
AI Strategies in
Federal Budget for Nondefense AI R&D
Development Data on the federal U.S. budget for nondefense AI R&D was taken from previous
Armenia
editions of the AI Index (namely the 2021 and 2022 versions) and from the
Azerbaijan following National Science and Technology Council reports:
Bahrain Supplement to the President’s FY 2023 Budget
Belgium Supplement to the President’s FY2022 Budget
Benin
Cuba
Iceland U.S. Department of Defense Budget Requests
Israel
Data on the DoD nonclassified AI-related budget requests was taken from
Jordan
previous editions of the AI Index (namely the 2021 and 2022 versions) and from
Morocco
the following reports:
New Zealand
Nigeria
Defense Budget Overview United States Department of Defense
Fiscal Year 2023 Budget Request
Oman
Uzbekistan Defense Budget Overview United States Department of Defense
Fiscal Year 2022 Budget Request

Govini through regular government reporting processes or

human-intensive analytical approaches.
Govini is the leading commercial data company in
Moreover, beyond simply making usable an expansive
the defense technology space. Built by Govini, Ark.
body of data sources, Govini’s SaaS Platform and
ai is used at scale across the national security sector
National Security Knowledge Graph establishes high
of the U.S. federal government. This platform enables
fidelity standards in categorized and fused data to
government analysts, program managers, and
produce a comprehensive and accurate depiction
decision-makers to gain unprecedented visibility into
of federal spending, and the supporting vendor
the companies, capabilities, and capital in national
ecosystem, over time.
security to solve challenges pertaining to acquisition,
foreign influence and adversarial capital, nuclear
modernization, procurement, science and technology,
U.S. AI-Related Legal Cases
and supply chain. To identify AI-related legal cases, the AI Index research
team did a keyword search on the LexisNexis database,
Govini curated USG AI spend data from their annual
under their U.S. legal cases filter. The keywords that
Scorecard Taxonomy by applying supervised machine
were searched include “artificial intelligence,” “machine
learning (ML) and natural language processing (NLP)
learning,” and “automated decision-making.” Cases
to parse, analyze, and categorize large volumes of
that contained one of these keywords were coded
federal contracts data, including prime contracts,
according to a variety of variables of interest.
grants, and other transaction authority (OTA)
awards. Govini’s most recent scorecard focused on
critical technologies, of which AI/ML technologies
was a segment and consistent of six subsegments:
data-at-scale, decision science, computer vision,
machine learning, autonomy, and natural language
processing. By initially generating search terms and
then subsequently excluding specific terms that yield
erroneous results, Govini delivers a comprehensive
yet discriminant taxonomy of subsegments that are
mutually exclusive. Repeated keyword searches and
filters allow a consensus, data-driven taxonomy to
come into focus. Govini SMEs conduct a final review
of taxonomic structure to complement this iterative,
data-driven process.
The use of AI and supervised ML models enables the

analysis of large volumes of irregular data contained
in federal contracts—data that is often inaccessible

Index Report 2023 Chapter 7: Diversity
Chapter 7: Diversity
Computing Research
Association (CRA Taulbee
Survey)
To learn more about the diversity data from the CRA,
please read the methodological note on the CRA’s data
included in the Chapter 5 subsection of the Appendix.
Code.org
To learn more about the diversity data from Code.
org, please read the methodological note on Code.
org’s data included in the Chapter 5 subsection of the
Appendix.

Index Report 2023 Chapter 8: Public Opinion
Chapter 8: Public Opinion

IPSOS NetBase Quid
For brevity, the 2023 AI Index does not republish the Social Media Data
methodology used by the IPSOS survey that features NetBase Quid collects social media data from over
in the report. More details about the IPSOS survey’s 500 million sources in real time and analyzes this data
methodology can be found in the actual survey. through AI-powered Natural Language Processing.
This process parses out language and breaks out
Lloyd’s Register Foundation posts by filters such as drivers of positive and negative
and Gallup sentiment, emotions, and behaviors, allowing for
deeper insights to be reached. To understand public
For brevity, the 2023 AI Index does not republish the
perception of advancements in artificial intelligence,
methodology used by the Lloyd’s Register Foundation
NetBase Quid analyzed social media conversation
and Gallup survey that features in the report. More
around AI and AI model releases from January 2022
details about the Lloyd’s Register Foundation and
to December 2022. First, the NetBase Quid team
Gallup survey methodology can be found in the actual
analyzed conversation around AI to understand key
survey.
drivers of general sentiment around AI advancements,
such as ethical, cultural, and economic concerns and
Pew Research perceptions among consumers. Then, the NetBase
For brevity, the 2023 AI Index does not republish Quid team leveraged the platform for a more targeted
the methodology used by the Pew Research survey analysis of the same conversation, understanding
that features in the report. More details on the Pew volume and sentiment around the major AI model
Research survey methodology can be found in the updates and releases in 2022. This NetBase Quid
actual survey. analysis ultimately showcases the relationship
between public perception and the advancement of
AI, leveraging targeted analytics tools to understand
both specific reactions to model releases as well as a
wider consumer conversation and what drives it.

Index Report 2023

Artificial Intelligence Index Report 2023

Uploaded by

Copyright:

Available Formats

Artificial Intelligence Index Report 2023

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Artificial Intelligence Index Report 2023

Uploaded by

Copyright:

Available Formats

Artificial

From the Co-Directors

Jack Clark and Ray Perrault

Top Ten Takeaways

1 Industry races ahead of academia.

3 AI is both helping and

Top Ten Takeaways (cont’d)

7 For the first time in the last decade,

Staff and Researchers

How to Cite This Report

The AI Index 2023 Annual Report by Stanford University is licensed under

Public Data and Tools

AI Index and Stanford HAI

We welcome feedback and new ideas for next year.

Analytics and Research Partners

Research and Development

Policy and Governance

We thank the following organizations and individuals who provided

Center for Security and LinkedIn

Chapter 1 Research and Development 20

Chapter 2 Technical Performance 69

Chapter 3 Technical AI Ethics 125

Chapter 4 The Economy 168

Chapter 5 Education 234

Chapter 6 Policy and Governance 263

Chapter 7 Diversity 296

Chapter 8 Public Opinion 319

ACCESS THE PUBLIC DATA

China continues to lead in total AI journal, conference, and repository publications.

Chapter 2: Technical Performance

Performance saturation on traditional benchmarks. AI continued to post state-of-the-art results,

Chapter 3: Technical AI Ethics

Interest in AI ethics continues to skyrocket. The number of accepted submissions to FAccT, a

Chapter 4: The Economy

Chapter 4: The Economy (cont’d)

Chapter 6: Policy and Governance

Chapter 8: Public Opinion

Table of Contents Chapter 1 Preview 20

Research and Development

By Field of Study 26 System Types 49

By Sector 27 Sector Analysis 50

Cross-Country Collaboration 29 National Affiliation 51

Cross-Sector Collaboration 31 Systems 51

AI Journal Publications 32 Authorship 53

Overview 32 Parameter Trends 54

By Region 33 Compute Trends 56

By Geographic Area 34 Large Language and Multimodal Models 58

Citations 35 National Affiliation 58

AI Conference Publications 36 Parameter Count 60

Overview 36 Training Compute 61

By Region 37 Training Cost 62

AI Repositories 40 Conference Attendance 64

Table of Contents Chapter 1 Preview 21

Table of Contents Chapter 1 Preview 22

AI research is on the rise, across Large language models

Table of Contents Chapter 1 Preview 23

Number of AI Publications in the World, 2010–21

Table of Contents Chapter 1 Preview 24

By Type of Publication book chapters, theses, and unknown document types