Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
31 views

LLM Training - A simple visual guide beginners

LLM Training - A simple visual guide for guidance for beginners

Uploaded by

vishu sablok
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

LLM Training - A simple visual guide beginners

LLM Training - A simple visual guide for guidance for beginners

Uploaded by

vishu sablok
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

MASTERING LLM

PRESENTS:
COFFEE BREAK
CONCEPTS

How LLMs are trained?


A simple guide to
understand LLM
Training

@MASTERING-LLM-
LARGE-LANGUAGE-
MODEL
Step 1 : Pre-training
Step 1 is to train a model on a massive dataset
from the internet to predict the next word -
This is usually called as Language Model

01
@MASTERING-LLM-LARGE-LANGUAGE-MODEL
Cool so i can use this model?
Not Yet
In step 1, the model understands how to
predict the next word but doesn't
understand any instructions.

Model just completes next words


02
@MASTERING-LLM-LARGE-LANGUAGE-MODEL
Step 2 : Supervised fine-tuning
(SFT) or instruction tuning
We need to teach the model now to
understand specific instructions, step 2
helps model learn instructions.

03
@MASTERING-LLM-LARGE-LANGUAGE-MODEL
I got a model now? Wait not
yet. Lets look into below
senarios
The Instruction models (SFT) are not helpful,
honest ,and harmless (HHH), we need to teach
them this so that they learn to respond with
HHH

SOURCE

04
@MASTERING-LLM-LARGE-LANGUAGE-MODEL
Step 3 : RLHF
We need to teach the model the human
preferences and focus on being helpful,
honest and harmless (HHH)
In this step, model is asked to generate multiple outputs
and humans will rank this output from best to worst.

The simple goal of RLHF is to replace


human feedback with a model which
understands human preferences.

05
@MASTERING-LLM-LARGE-LANGUAGE-MODEL
Final Model
In final step:
The instruction model is used to
generate an answer
Once the answer is generated, the
reward model (Replacement of humans)
will generate a score.
This score is used to improve the output
until the desired accuracy or number of
iteration is reached.

06
@MASTERING-LLM-LARGE-LANGUAGE-MODEL
Summary
Language model just understands how
to predict next words.

SFT or instruction tuning teaches model


on how to follow the instructions on
multiple different tasks.

RLHF helps more improve answers on


human preferences like helpful, honest
and harmless (HHH)
Check this paper to learn more about
LLM alignments

New alignment methods include


methods like DPO which we will cover
soon.

Comment below on which topic you


want to understand next in this "Coffee
Break Concepts" series and we will
include those topics in the upcoming
weeks
07
@MASTERING-LLM-LARGE-LANGUAGE-MODEL
www.masteringllm.com

LLM Interview
Course
Want to Prepare yourself for an
LLM Interview?
100+ Questions spanning 14 categories

Curated 100+ assessments for each


category

Well-researched real-world interview


questions based on FAANG & Fortune
500 companies
Focus on Visual learning
Real Case Studies & Certification

Coupon Code - LLM50


Coupon is valid till 31th Jan 2024
www.masteringllm.com

AgenticRAG with
LlamaIndex
Want to learn why AgenticRAG is
future of RAG?
Master RAG fundamentals through practical
case studies

Understand how to overcome limitations of RAG

Introduction to AgenticRAG & techniques like


Routing Agents, Query planning agents,
Structure planning agents, and React agents
with human in loop.

5 real-time case studies with code


walkthroughs

You might also like