TACN-VD-1-4

COURSE INTRODUCTION
Welcome to this course on generative AI with large language models. Large

language models or LLMs are a very exciting technology. But despite all the
buzz and hype, one of the thing that is still underestimated by many people is
their power as a developer too. Specifically, there are many machine learning
and AI applications that used to take me many months to build that you can
now build in days or maybe even small numbers of weeks. This course will
take a deep dive with you into how LLM technology actually works including
going through many of the technical details, like model training, instruction
tuning, fine-tuning, the generative AI project life cycle framework to help you
plan and execute your projects and so on. Generative AI and LLMs specifically
are a general purpose technology. That means that similar to other general
purpose technologies like deep learning and electricity, is useful not just for a
single application, but for a lot of different applications that span many
corners of the economy. Similar to the rise of deep learning that started
maybe 15 years ago or so, there's a lot of important where it lies ahead of us
that needs to be done over many years by many people, I hope including you,
to identify use cases and build specific applications. Because a lot of with this
technology is so new and so few people really know how to use them, many
companies are also right now scrambling to try to find and hire people that
actually know how to build applications with LLMs. I hope that this course will
also help you, if you wish, better position yourself to get one of those jobs. I'm
thrilled to bring you this course along with a group of fantastic instructors
from the AWS team, Antje Barth, Mike Chambers, Shelbee Eigenbrode who are
here with me today, as well as a fourth instructor Chris Fregly, who will be
presenting the labs. Antje and Mike are both generative AI developer
advocates. Shelby and Chris are both generative AI solutions architects. All of
them have a lot of experience helping many different companies build many,
many creative applications using LLMs. I look forward to all of them sharing
this rich hands-on experience in this course. We've develop the content for
this course with inputs from many industry experts and applied scientists at
Amazon, AWS, Hugging Face and many top universities around the world.
Andrea, perhaps you can say a bit more about this course. Sure. Thanks
Andrew. It's a pleasure to work with you again on this course and the exciting
area of generative AI. With this course on generative AI with large language
models, we've created a series of lessons meant for AI enthusiasts, engineers,
or data scientists. Looking to learn the technical foundations of how LLMs
work, as well as the best practices behind training, tuning, and deploying
them. In terms of prerequisites, we assume you are already familiar with
Python programming and at least basic data science and machine learning
concepts. If you have some experience with either Python or TensorFlow, that
should be enough. In this course, you will explore in detail the steps that make
up a typical generative AI project lifecycle, from scoping the problem and
selecting a language model to optimizing a model for deployment and
integrating into your applications. This course covers all of the topics, not just
at a shallow level, but we'll take the time to make sure you come away with a
deep technical understanding of all of these technologies and be well-
positioned to really know what you're doing when you build your own
generative AI projects. Mike, why don't you tell us a little bit more details
about what the learners will see in each week? Absolutely, Andrea. Thank you.
In Week 1, you will examine the transformer architecture that powers large
language models, explore how these models are trained, and understand the
compute resources required to develop these powerful LLMs. You'll also learn
about a technique called in-context learning. How to guide the model to
output at inference time with prompt engineering, and how to tune the most
important generation parameters of LLMs for tuning your model output. In
Week 2, you'll explore options for adapting pre-trained models to specific
tasks and datasets via a process called instruction fine tuning. Then in Week
3, you'll see how to align the output of language models with human values in
order to increase helpfulness and decrease potential harm and toxicity.
Though we don't stop at the theory. Each week includes a hands-on lab where
you'll be able to try out these techniques for yourself in an AWS environment
that includes all the resources you need for working with large models at no
cost to you. Shelby, can you tell us a little bit more about the hands-on labs?
Sure thing, Mike. In the first hands-on lab, you'll construct a compare different
prompts and inputs for a given generative task, in this case, dialogue
summarization. You'll also explore different inference parameters and
sampling strategies to gain intuition on how to further improve the generative
model of responses. In the second hands-on lab, you'll find tune it existing
large language model from Hugging Face, a popular open-source model hub.
You'll play with both full fine-tuning and parameter efficient fine tuning or
PEFT for short. You'll see how PEFT lets you make your workflow much more
efficient. In the third lab, you get hands-on with reinforcement learning from
human feedback or RLHF, you'll build a reward model classifier to label model
responses as either toxic or non-toxic. Don't worry if you don't understand all
these terms and concepts just yet. You'll dive into each of these topics in
much more detail throughout this course. I'm thrilled to have Andrea, Mike,
Shelby as well as Tris presenting this course to you that takes a deep
technical dive into LLMs. You come away from this course having practice with
many different concrete code examples for how to build or use LLMs. I'm sure
that many of the code snippets will end up being directly useful in your own
work. I hope you enjoy the course and use what you learn to build some really
exciting applications. So that, let's go on to the next video where we start
diving into how LLMs are being used to build applications.
INTRODUCTION
Welcome back. There's a lot of exciting material to go over this week, and one
of the first topics that Mike will share with you in a little bit is a deep dive into
how transformer networks actually work. >> Yeah, so look, it's a complicated
topic, right? In 2017, the paper came out, Attention is all You Need, and it laid
out all of these fairly complex data processes which are going to happen
inside the transformer architecture. So we take a little bit of a high level view,
but we do go down into some depths. We talk about things like self-attention
and the multi-headed self-attention mechanism. So we can see why it is that
these models actually work, how it is that they actually gain an understanding
of language. >> And it's amazing how long the transformer architecture has
been around and it's still state of the art for many models. >> I remember
after I saw the transformer paper when it first came out, I thought, yep, I get
this equation. I acknowledge this is a math equation. But what's it actually
doing? And it's always seemed a little bit magical. It took me a long time
playing with it to finally go, okay, this is why it works. And so I think in this
first week, you learn the intuitions behind some of these terms you may have
heard before, like multi-headed attention. What is that and why does it make
sense? And why did the transformer architecture really take off? I think
attention had been around for a long time, but actually thought it was, one of
the things that really made to take off was it allowed attention to work in a
massively parallel way. So it made it work on modern GPUs and could scale it
up. I think these nuances around transformers are not well-understood by
many, so looking forward to when you deep dive into that. >> Absolutely, I
mean, the scale is part of it and how it's able to take in all that data. I just
want to say as well, though, that we're not going to go into this at such a level
which is going to make people's heads explode. If they want to do that, then
they can go ahead and read that paper too. What we're going to do is we're
going to look at the really important parts of that transformer architecture
that gives you the intuition you need so that you can actually make practical
use out of these models. >> One thing I've been surprised and delighted by is
how transformers, even though this course focuses on text, it's been really
interesting to see how that basic transformer architecture is creating a
foundation for vision transformers as well. So even though in this course you
learn mostly about large language models, models about text, I think
understanding transformers is also helping people understand this really
exciting vision transformer and other modalities as well. It's going to be a
really critical building block for a lot of machine learning. >> Absolutely. >>
And then beyond transformers, there's a second major topic that looking
forward to having this first week cover, which is the Generative AI project
Lifecycle. I know a lot of people are thinking, boy, does all this LM stuff, what I
do of it? And the Generative AI project Lifecycle, which will talk about in a little
bit, helps you plan out how to think about building your own Generative AI
project. >> That's right, and the Generative AI project Lifecycle walks you
through the individual stages and decisions you have to make when you're
developing Generative AI applications. So one of the first things you have to
decide is whether you're taking a foundation model off the shelf or you're
actually pre-training your own model and then as a follow up, whether you
want to fine tune and customize that model maybe for your specific data. >>
Yeah, in fact, there's so many large language model options out there, some
open source, some not open source, that I see many developers wondering,
which of these models do I want to use? And so having a way to evaluate it
and then also choose the right model sizing. I know in your other work, you've
talked about when do you need a giant model, 100 billion or even much bigger
versus when can a 1 to 30 billion parameter model or even sub 1 billion
parameter model be just fantastic for a specific application? >> Exactly, so
there might be use cases where you really need the model to be very
comprehensive and able to generalize to a lot of different tasks. And there
might be use cases where you're just optimizing for a single-use case, right?
And you can potentially work with a smaller model and achieving similar or
even very good results. >> Yeah, I think that might be one of the really
surprising things for some people to learn is that you can actually use quite
small models and still get quite a lot of capability out of them. >> Yeah, I
think when you want your large language model to have a lot of general
knowledge about the world, when you wanted to know stuff about history and
philosophy and the sizes and how to write Python code and so on and so on. It
helps to have a giant model with hundreds of billions of parameters. But for a
single task like summarizing dialogue or acting as a customer service agent
for one company, for applications like that, sometimes you can use hundreds
of billions of parameters models. But that's not always necessary. So lots of
really exciting material to get into this week. With that, let's go on to the next
video when Mike will kick things off with a deep dive into many different use
cases of large language models.
GERNERATIVE & LLM

Okay, let's get started, in this lesson, we're going to set the scene. We'll talk
about large language models, their use cases, how the models work, prompt
engineering, how to make creative text outputs, and outline a project lifecycle
for generative AI projects. Given your interest in this course, it's probably safe
to say that you've had a chance to try out a generative AI tool or would like to.
Whether it be a chat bot, generating images from text, or using a plugin to
help you develop code, what you see in these tools is a machine that is
capable of creating content that mimics or approximates human ability.
Generative AI is a subset of traditional machine learning. And the machine
learning models that underpin generative AI have learned these abilities by
finding statistical patterns in massive datasets of content that was originally
generated by humans. Large language models have been trained on trillions
of words over many weeks and months, and with large amounts of compute
power. These foundation models, as we call them, with billions of parameters,
exhibit emergent properties beyond language alone, and researchers are
unlocking their ability to break down complex tasks, reason, and problem
solve. Here are a collection of foundation models, sometimes called base
models, and their relative size in terms of their parameters. You'll cover these
parameters in a little more detail later on, but for now, think of them as the
model's memory. And the more parameters a model has, the more memory,
and as it turns out, the more sophisticated the tasks it can perform.
Throughout this course, we'll represent LLMs with these purple circles, and in
the labs, you'll make use of a specific open source model, flan-T5, to carry out
language tasks. By either using these models as they are or by applying fine
tuning techniques to adapt them to your specific use case, you can rapidly
build customized solutions without the need to train a new model from
scratch. Now, while generative AI models are being created for multiple
modalities, including images, video, audio, and speech, in this course you'll
focus on large language models and their uses in natural language generation.
You will see how they are built and trained, how you can interact with them
via text known as prompts. And how to fine tune models for your use case and
data, and how you can deploy them with applications to solve your business
and social tasks. The way you interact with language models is quite different
than other machine learning and programming paradigms. In those cases, you
write computer code with formalized syntax to interact with libraries and APIs.
In contrast, large language models are able to take natural language or
human written instructions and perform tasks much as a human would. The
text that you pass to an LLM is known as a prompt. The space or memory that
is available to the prompt is called the context window, and this is typically
large enough for a few thousand words, but differs from model to model. In
this example, you ask the model to determine where Ganymede is located in
the solar system. The prompt is passed to the model, the model then predicts
the next words, and because your prompt contained a question, this model
generates an answer. The output of the model is called a completion, and the
act of using the model to generate text is known as inference. The completion
is comprised of the text contained in the original prompt, followed by the
generated text. You can see that this model did a good job of answering your
question. It correctly identifies that Ganymede is a moon of Jupiter and
generates a reasonable answer to your question stating that the moon is
located within Jupiter's orbit. You'll see lots of examples of prompts and
completions in this style throughout the course.
LLM use cases and tasks
You could be forgiven for thinking that LLMs and generative AI are focused on
chats tasks. After all, chatbots are highly visible and getting a lot of attention.
Next word prediction is the base concept behind a number of different
capabilities, starting with a basic chatbot. However, you can use this
conceptually simple technique for a variety of other tasks within text
generation. For example, you can ask a model to write an essay based on a
prompt, to summarize conversations where you provide the dialogue as part
of your prompt and the model uses this data along with its understanding of
natural language to generate a summary. You can use models for a variety of
translation tasks from traditional translation between two different languages,
such as French and German, or English and Spanish. Or to translate natural
language to machine code. For example, you could ask a model to write some
Python code that will return the mean of every column in a DataFrame and the
model will generate code that you can pass to an interpreter. You can use
LLMs to carry out smaller, focused tasks like information retrieval. In this
example, you ask the model to identify all of the people and places identified
in a news article. This is known as named entity recognition, a word
classification. The understanding of knowledge encoded in the model's
parameters allows it to correctly carry out this task and return the requested
information to you. Finally, an area of active development is augmenting LLMs
by connecting them to external data sources or using them to invoke external
APIs. You can use this ability to provide the model with information it doesn't
know from its pre-training and to enable your model to power interactions with
the real-world. You'll learn much more about how to do this in week 3 of the
course. Developers have discovered that as the scale of foundation models
grows from hundreds of millions of parameters to billions, even hundreds of
billions, the subjective understanding of language that a model possesses also
increases. This language understanding stored within the parameters of the
model is what processes, reasons, and ultimately solves the tasks you give it,
but it's also true that smaller models can be fine tuned to perform well on
specific focused tasks. You'll learn more about how to do this in week 2 of the
course. The rapid increase in capability that LLMs have exhibited in the past
few years is largely due to the architecture that powers them. Let's move on
to the next video to take a closer look.
Text generation before transformers
It's important to note that generative algorithms are not new. Previous generations of language
models made use of an architecture called recurrent neural networks or RNNs. RNNs while
powerful for their time, were limited by the amount of compute and memory needed to perform
well at generative tasks. Let's look at an example of an RNN carrying out a simple next-word
prediction generative task. With just one previous words seen by the model, the prediction can't
be very good. As you scale the RNN implementation to be able to see more of the preceding
words in the text, you have to significantly scale the resources that the model uses. As for the
prediction, well, the model failed here. Even though you scale the model, it still hasn't seen
enough of the input to make a good prediction. To successfully predict the next word, models
need to see more than just the previous few words. Models needs to have an understanding of the
whole sentence or even the whole document. The problem here is that language is complex. In
many languages, one word can have multiple meanings. These are homonyms. In this case, it's
only with the context of the sentence that we can see what kind of bank is meant. Words within a
sentence structures can be ambiguous or have what we might call syntactic ambiguity. Take for
example this sentence, "The teacher taught the students with the book." Did the teacher teach
using the book or did the student have the book, or was it both? How can an algorithm make
sense of human language if sometimes we can't? Well in 2017, after the publication of this paper,
Attention is All You Need, from Google and the University of Toronto, everything changed. The
transformer architecture had arrived. This novel approach unlocked the progress in generative AI
that we see today. It can be scaled efficiently to use multi-core GPUs, it can parallel process
input data, making use of much larger training datasets, and crucially, it's able to learn to pay
attention to the meaning of the words it's processing. And attention is all you need. It's in the
title.

TACN-VD-1-4

Uploaded by

Copyright:

Available Formats

TACN-VD-1-4

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

TACN-VD-1-4

Uploaded by

Copyright:

Available Formats

COURSE INTRODUCTION

Welcome to this course on generative AI with large language models. Large

GERNERATIVE & LLM

You might also like