Term Governance IA
Term Governance IA
Term Governance IA
The field of artificial intelligence is rapidly evolving across different sectors and disparate industries, leaving business,
technology and government professionals without a common lexicon and shared understanding of terms and phrases
used in AI governance. Even a search to define "artificial intelligence" returns a range of definitions and examples. From
the cinematic, like HAL 9000 from "2001: A Space Odyssey," to the creative, like Midjourney and DALL-E generative
art, to the common, like email autocorrect and mobile maps, the use cases and applications of AI continue to grow and
expand into all aspects of life.
This glossary was developed with reference to numerous materials and designed to provide succinct, but nuanced,
definitions and explanations for some of the most common terms related to AI today. The explanations aim to present
both policy and technical perspectives and add to the robust discourse on AI governance. Although there are some
shared terms and definitions, this glossary is separate from the official IAPP Glossary of Privacy Terms.
A B C D E F G H I L M N O P R S T U V Key terms
TERM DEFINITION
The obligation and responsibility of the creators, operators and regulators of an AI system
to ensure the system operates in a manner that is ethical, fair, transparent and compliant
Accountability1 with applicable rules and regulations (see fairness and transparency). Accountability
ensures that actions, decisions and outcomes of an AI system can be traced back to the
entity responsible for it.
A subfield of AI and machine learning where an algorithm can select some of the data it
learns from. Instead of learning from all the data it is given, an active learning model requests
Active learning additional data points that will help it learn the best.
→ Also called query learning.
TERM DEFINITION
Automated
The process of making a decision by technological means without human involvement.
decision-making
There are several types of bias within the AI field. Computational bias is a systematic error
or deviation from the true value of a prediction that originates from a model’s assumptions
or the data itself (see input data). Cognitive bias refers to inaccurate individual judgment
Bias
or distorted thinking, while societal bias leads to systemic prejudice, favoritism, and/or
discrimination in favor of or against an individual or group. Bias can impact outcomes and
pose a risk to individual rights and liberties.
A machine learning method that aggregates multiple versions of a model (see machine
learning model) trained on random subsets of a data set. This method aims to make a model
Bootstrap aggregating more stable and accurate.
→ Sometimes referred to as bagging.
Classification model A type of model (see machine learning model) used in machine learning that is designed
(Classifiers) to take input data and sort it into different categories or classes.
Clustering An unsupervised machine learning method where patterns in the data are identified and
(or clustering algorithms) evaluated, and data points are grouped accordingly into clusters based on their similarity.
A field of AI that enables computers to process and analyze images, videos and other
Computer vision
visual inputs.
TERM DEFINITION
The principle of ensuring that AI systems and their decision-making processes can be
questioned or challenged. This ability to contest or challenge the outcomes, outputs
Contestability and/or actions of AI systems can help promote transparency and accountability within
AI governance.
→ Also called redress.
A large collection of texts or data that a computer uses to find patterns, make predictions
Corpus or generate specific outcomes. The corpus may include structured or unstructured data and
cover a specific topic or a variety of topics.
A type of supervised learning model used in machine learning (see machine learning model)
Decision tree
that represents decisions and their potential consequences in a branching structure.
A subfield of AI and machine learning that uses artificial neural networks. Deep learning
Deep learning is especially useful in fields where raw data needs to be processed, like image recognition,
natural language processing and speech recognition.
A type of model (see machine learning model) used in machine learning that directly maps
input features to class labels and analyzes for patterns that can help distinguish between
Discriminative model
different classes. It is often used for text classification tasks, like identifying the language of a
piece of text. Examples are traditional neural networks, decision trees and random forests.
A form of AI that draws inferences from a knowledge base to replicate the decision-making
Expert system
abilities of a human expert within a specific field, like a medical diagnosis.
The ability to describe or provide sufficient information about how an AI system generates
a specific output or arrives at a decision in a specific context to a predetermined addressee.
Explainability XAI is important in maintaining transparency and trust in AI.
→ Acronym: XAI
Data discovery process techniques that take place before training a machine learning model
Exploratory data analysis in order to gain preliminary insights into a data set, such as identifying patterns, outliers, and
anomalies and finding relationships among variables.
A machine learning method that allows models (see machine learning model) to be trained
on the local data of multiple edge devices or servers. Only the updates of the local model, not
Federated learning
the training data itself, are sent to a central location where they get aggregated into a global
model — a process that is iterated until the global model is fully trained.
TERM DEFINITION
A large-scale, pretrained model for AI capabilities, such as language (see large language
Foundation model model), vision, robotics, reasoning, search or human interaction, that can function as the
base for other applications. The model is trained on extensive and diverse data sets.
The ability of a model (see machine learning model) to understand the underlying patterns
Generalization and trends in its training data and apply what it has learned to make predictions or decisions
about new, unseen data.
A field of AI that uses machine learning models trained on large data sets to create new
Generative AI content, such as written text, code, images, music, simulations and videos. These models
are capable of generating novel outputs based on input data or user prompts.
A type of algorithm that makes the optimal choice to achieve an immediate objective at a
Greedy algorithms particular step or decision point, based on the available information and without regard for
the longer-term optimal solution.
Instances where a generative AI model creates content that either contradicts the source or
Hallucinations
creates factually incorrect outputs under the appearance of fact.
A type of machine learning process where a trained model (see machine learning model)
Inference
is used to make predictions or decisions based on input data.
Data provided to or directly acquired by a learning algorithm or model (see machine learning
Input data model) for the purpose of producing an output. It forms the basis upon which the machine
learning model will learn, make predictions and/or carry out tasks.
A form of AI that utilizes deep learning algorithms to create models (see machine learning
model) trained on massive text data sets to analyze and learn patterns and relationships
among characters, words and phrases. There are generally two types of LLMs: generative
models that make text predictions based on the probabilities of word sequences learned
Large language model from its training data (see generative AI) and discriminative models that make classification
predictions based on probabilities of data features and weights learned from its training
data (see discriminative model). The term "large" generally refers to the model’s capacity
measured by the number of parameters.
→ Acronym: LLM
A subfield of AI involving algorithms that enable computer systems to iteratively learn from
and then make decisions, inferences or predictions based on data (see input data). These
algorithms build a model from training data to perform a specific task on new data without
being explicitly programmed to do so.
Machine learning implements various algorithms that learn and improve by experience in
Machine learning
a problem-solving process that includes data cleansing, feature selection, training, testing
and validation. Companies and government agencies deploy machine learning algorithms for
tasks such as fraud detection, recommender systems, customer inquiries, natural language
processing, health care, or transport and logistics.
→ Acronym: ML
TERM DEFINITION
A type of model used in machine learning (see machine learning model) that can process
more than one type of input or output data, or "modality," at the same time. For example,
a multimodal model can take both an image and text caption as input and then produce a
Multimodal models
unimodal output in the form of a score indicating how well the text caption describes the
image. These models are highly versatile and useful in a variety of tasks, like image captioning
and speech recognition.
A subfield of AI that helps computers understand, interpret and manipulate human language
Natural language by transforming information into content. It enables machines to read text or spoken
processing language, interpret its meaning, measure sentiment, and determine which parts are important
for understanding.
A type of model (see machine learning model) used in machine learning that mimics the
way neurons in the brain interact with multiple processing layers, including at least one
Neural networks hidden layer. This layered approach enables neural networks to model complex nonlinear
relationships and patterns within data. Artificial neural networks have a range of applications,
such as image recognition and medical diagnosis.
A concept in machine learning in which a model (see machine learning model) becomes too
Overfitting specific to the training data and cannot generalize to unseen data, which means it can fail to
make accurate predictions on new data sets.
The process of effectively monitoring and supervising an AI system to minimize risks, ensure
regulatory compliance and uphold responsible practices. Oversight is important for effective
Oversight
AI governance, and mechanisms may include certification processes, conformity assessments
and regulatory authorities responsible for enforcement.
Steps performed after a machine learning model has been run to adjust the output of that
model. This can include adjusting a model’s outputs and/or using a holdout data set —
Post processing
data not used in the training of the model — to create a function that runs on the model’s
predictions to improve fairness or meet business requirements.
Steps taken to prepare data for a machine learning model, which can include cleaning the
data, handling missing values, normalization, feature extraction and encoding categorical
Preprocessing variables. Data preprocessing can play a crucial role in improving data quality, mitigating bias,
addressing algorithmic fairness concerns, and enhancing the performance and reliability of
machine learning algorithms.
A supervised machine learning (see supervised learning) algorithm that builds multiple
decision trees and merges them together to get a more accurate and stable prediction. Each
Random forest decision tree is built with a random subset of the training data (see bootstrap aggregating),
hence the name "random forest." Random forests are helpful to use with data sets that are
missing values or very complex.
TERM DEFINITION
A machine learning method that trains a model to optimize its actions within a given
environment to achieve a specific goal, guided by feedback mechanisms of rewards and
penalties. This training is often conducted through trial-and-error interactions or simulated
Reinforcement learning
experiences that do not require external data. For example, an algorithm can be trained to
earn a high score in a video game by having its efforts evaluated and rated according to
success toward the goal.
An attribute of an AI system that ensures it behaves as expected and performs its intended
Reliability
function consistently and accurately, even with new data that it has not been trained on.
An attribute of an AI system that ensures a resilient system that maintains its functionality
Robustness and performs accurately in a variety of environments and circumstances, even when faced
with changed inputs or an adversarial attack.
The development of AI systems that are designed to minimize potential harm to individuals,
Safety
society, property and the environment.
A subset of machine learning where the model (see machine learning model) is trained on
input data with known desired outputs. These two groups of data are sometimes called
Supervised learning predictors and targets, or independent and dependent variables, respectively. This type
of learning is useful for training an AI to group data into specific categories or making
predictions by understanding the relationship between two variables.
Data generated by a system or model (see machine learning model) that can mimic and
resemble the structure and statistical properties of real data. It is often used for testing
Synthetic data
or training machine learning models, particularly in cases where real-world data is limited,
unavailable or too sensitive to use.
A subset of the data set used to provide an unbiased evaluation of a final model (see
Testing data machine learning model). It is used to test the performance of the machine learning model
with new data at the very end of the model development process.
A subset of the data set that is used to train a model (see machine learning model) until it
Training data
can accurately predict outcomes, find patterns or identify structures within the training data.
A type of model (see machine learning model) used in machine learning in which an
Transfer learning model algorithm learns to perform one task, such as recognizing cats, and then uses that learned
knowledge as a basis when learning a different but related task, such as recognizing dogs.
TERM DEFINITION
In most cases this is used interchangeably with the terms responsible AI and ethical AI,
which all refer to principle-based AI development and governance (see AI governance),
Trustworthy AI
including the principles of security, safety, transparency, explainability, accountability, privacy,
nondiscrimination/nonbias (see bias), among others.
A concept in machine learning in which a model (see machine learning model) fails to fully
capture the complexity of the training data. This may result in poor predictive ability and/or
Underfitting inaccurate outputs. Factors leading to underfitting may include too few model parameters or
epochs, having too high a regularization rate, or using an inappropriate or insufficient set of
features in the training data.
A subset of machine learning where the model is trained by looking for patterns in an
unclassified data set with minimal human supervision. The AI is provided with preexisting
Unsupervised learning data sets and then analyzes those data sets for patterns. This type of learning is useful
for training an AI for techniques such as clustering data (outlier detection, etc.) and
dimensionality reduction (feature learning, principal component analysis, etc.).
A subset of the data set used to assess the performance of the model (see machine learning
Validation data model) during the training phase. Validation data is used to fine-tune the parameters of a
model and prevent overfitting before the final evaluation using the test data set.
A statistical measure that reflects how far a set of numbers are spread out from their
average value in a data set. A high variance indicates the data points are spread widely
around the mean. A low variance indicates the data points are close to the mean. In machine
Variance
learning, higher variance can lead to overfitting. The trade-off between variance and bias
is a fundamental concept in machine learning. Model complexity tends to reduce bias but
increase variance. Decreasing complexity reduces variance but increases bias.
Notes
1 Different definition than IAPP privacy training glossary.
More resources
ѕ AI Governance Center
ѕ AI topic page
ѕ AI Body of Knowledge
IAPP disclaims all warranties, expressed or implied, with respect to the contents of this material, including any warranties
Key Terms for AI Governance Published June
8 2023
of accuracy, merchantability, or fitness for a particular purpose. Nothing herein should be construed as legal advice.
© 2023 International Association of Privacy Professionals. All rights reserved. Find the latest version at iapp.org/resources