LLM
LLM
1. Definition:
○ Bias and Fairness: LLMs can inherit biases from the data they
are trained on, leading to biased or unethical outputs.
○ Computational Costs: Training large models requires significant
computational power and energy resources.
○ Overfitting: LLMs may overfit to certain patterns in the data,
reducing their generalization ability.
7. Future Directions:
○ Multimodal Models: Models that can handle both text and other
data types (e.g., images, audio) for richer understanding and
generation.
○ Efficiency Improvements: Research is focused on making LLMs
more efficient in terms of computation, training data, and
fine-tuning techniques.
○ Ethical AI: Ensuring LLMs are fair, transparent, and free from
harmful biases.
Transformer Model
1. Self-Attention Mechanism:
○ Purpose: It allows the model to weigh the importance of different
words in a sentence relative to each other, regardless of their
position.
○ How it works:
■ The input is converted into three vectors for each word:
Query (Q), Key (K), and Value (V).
■ The attention score is computed by taking the dot product of
the Query and Key vectors, followed by a softmax
operation.
■ The attention scores determine how much focus each word
should have on other words when generating a word’s
representation.
○ Advantages: Allows the model to capture long-range
dependencies and relationships between words, unlike
RNNs/LSTMs which struggle with long sequences.
2. Positional Encoding:
Architecture Diagram:
Advantages of Transformers:
Summary:
Components of an LSTM:
1. Forget Gate:
Workflow of an LSTM:
1. Input Data: The LSTM receives the input data at each time step.
2. Forget Gate: It decides what information to discard from the previous
time step.
3. Input Gate: It updates the cell state with new information.
4. Cell State Update: The memory cell updates its state based on the
forget and input gates.
5. Output Gate: It generates the output for the current time step and
passes the new hidden state to the next time step.
Key Advantages of LSTM:
Applications of LSTM:
Limitations of LSTMs:
Variants of LSTM:
Summary:
1. Sequential Processing:
○ In RNNs, the same set of weights is used across all time steps,
which helps the network generalize across different parts of the
sequence and reduces the number of parameters compared to
fully connected layers for each time step.
3. Hidden State:
1. Input Sequence:
○ At each time step tt, the RNN receives the current input x[t]x[t]
and the previous hidden state h[t−1]h[t-1]. The hidden state is
updated using a combination of the current input and the previous
hidden state.
○ The hidden state update is typically computed as:
h[t]=tanh(Wh⋅[h[t−1],x[t]]+bh)h[t] = \tanh(W_h \cdot [h[t-1], x[t]] +
b_h) Where:
■ h[t]h[t] is the hidden state at time step tt.
■ WhW_h is the weight matrix for the hidden layer.
■ bhb_h is the bias term.
■ tanh\tanh is a common activation function.
3. Output Layer:
○ At each time step, the RNN can produce an output y[t]y[t] based
on the hidden state: y[t]=Wy⋅h[t]+byy[t] = W_y \cdot h[t] + b_y
Where:
■ y[t]y[t] is the output at time step tt.
■ WyW_y is the weight matrix for the output layer.
■ byb_y is the output bias.
4. Backpropagation Through Time (BPTT):
Advantages of RNNs:
○ The same weights are used across all time steps, reducing the
number of parameters compared to other neural network
architectures like fully connected networks, leading to better
generalization and less overfitting.
3. Learning Temporal Dependencies:
Limitations of RNNs:
1. Vanishing and Exploding Gradient Problems:
Variants of RNNs:
Applications of RNNs:
Summary:
1. Query Encoding
2. Document Retrieval
4. Generation of Response
5. Output Response
● Input: The system’s response and any feedback (e.g., from users or
evaluations).
● Processing: This feedback can be used for fine-tuning the retrieval
and generation components of the RAG system. Feedback may involve
adjusting the retrieval model to improve document relevance or
fine-tuning the generative model to improve response quality.
● Purpose: To continually improve the system's performance over time,
making it more accurate and effective in responding to queries.
1. Classification
● Example Tasks:
○ Binary Classification: Predicting whether a patient has a
disease or not (Yes/No).
○ Multi-Class Classification: Predicting the type of an animal from
a set of options like "Cat," "Dog," or "Rabbit."
○ Multi-Label Classification: Predicting multiple categories for a
single input, e.g., categorizing a movie into genres like "Action"
and "Comedy."
● Algorithms: Common algorithms used in classification include:
○ Logistic Regression
○ Decision Trees
○ Random Forests
○ Support Vector Machines (SVM)
○ K-Nearest Neighbors (KNN)
○ Naive Bayes
○ Neural Networks
● Evaluation Metrics:
2. Regression
Summary of Differences:
Both classification and regression are essential tasks in machine learning, and
selecting the appropriate one depends on whether the output is categorical or
continuous.
Generative AI Overview:
Key Concepts:
○ Used for creating new data points that resemble a given dataset
by learning a probabilistic mapping between input data and a
latent space. Commonly used in image and speech synthesis.
3. Transformers (GPT, T5, BERT):
Conclusion:
Prompt Engineering
○ The prompt should be clear and specific about the task. Vague or
ambiguous instructions can lead to imprecise or irrelevant
responses.
○ Example: Instead of asking, "Tell me about climate change," you
might specify, "Explain the causes and effects of climate change
in simple terms."
2. Format and Structure:
○ The way you frame the task or define the goal can greatly
influence how the model responds. Different prompts can lead to
different styles, tones, or forms of response (e.g., conversational,
formal, concise, elaborate).
○ Example: For a task requiring explanation, you could say: "Explain
in simple terms," or for a more academic tone: "Discuss in-depth."
1. Few-Shot Learning:
1. Text Generation:
CNN is a type of neural network that uses convolutional layers to extract features from images
Convolutional Neural Networks (CNNs)
Convolutional Neural Networks (CNNs) are a class of deep learning models primarily used for
analyzing visual data, such as images and videos. They are designed to automatically and
adaptively learn spatial hierarchies of features through convolutional layers.
○ Pooling layers are used to reduce the spatial dimensions (width and height) of the
input, decreasing computational complexity while retaining important information.
○ Max Pooling is the most common type of pooling, where the maximum value is
selected from a patch of the image.
○ Average Pooling is another method, where the average value is computed.
○ Pooling helps the model become more invariant to small translations, distortions, and
distortions in the input image.
4. Fully Connected Layer (Dense Layer):
○ After passing through multiple convolutional and pooling layers, the output is
flattened into a 1D vector and passed through one or more fully connected layers.
○ These layers connect every neuron to every other neuron in the previous layer,
enabling the model to learn complex relationships between the features.
○ The fully connected layer typically ends with a softmax or sigmoid activation,
depending on the task (classification or regression).
5. Softmax / Sigmoid Output Layer:
○ For classification tasks, the final layer often uses the Softmax activation for
multi-class problems or Sigmoid for binary classification. These layers produce
probabilities that sum to 1 (Softmax) or output a probability between 0 and 1
(Sigmoid) for the respective classes.
1. Input Image: The input is typically an image (e.g., 224x224 pixels with RGB channels).
2. Convolution: The input image is passed through several convolutional layers where filters
are applied to extract low-level features (edges, textures, etc.).
3. Activation (ReLU): After each convolution operation, the result is passed through the ReLU
activation function to introduce non-linearity.
4. Pooling: A pooling layer is applied to downsample the feature maps, reducing the spatial
dimensions.
5. Flattening: The 2D feature maps are flattened into a 1D vector to feed into fully connected
layers.
6. Fully Connected Layer: The flattened vector is passed through one or more dense layers to
learn high-level features and patterns.
7. Output: For classification tasks, the final layer uses Softmax or Sigmoid to output class
probabilities.
Advantages of CNNs:
○ CNNs automatically learn the important features of the data (such as edges,
textures, and shapes) without manual feature engineering.
2. Parameter Sharing:
○ The same filter (kernel) is used across the entire image, reducing the number of
parameters and computational complexity compared to traditional fully connected
neural networks.
3. Translation Invariance:
○ Through pooling layers, CNNs are less sensitive to the exact location of features,
making them robust to translations (shifts) in the image.
4. Scalability:
○ CNNs can scale well with large datasets, making them suitable for real-world tasks
like image and video analysis.
Applications of CNNs:
1. Image Classification:
○ CNNs are widely used for tasks like object classification (e.g., recognizing whether
an image contains a cat or a dog).
2. Object Detection:
○
CNNs can be used to detect specific objects within an image (e.g., finding faces in a
photo).
3. Semantic Segmentation:
○CNNs are used to assign a class label to each pixel in an image, which is essential
for tasks like medical image analysis.
4. Image Generation (e.g., GANs):
Summary:
Convolutional Neural Networks (CNNs) are a powerful and efficient class of neural networks
designed to handle visual data by automatically extracting spatial features and reducing the
complexity of the model. Through convolutional layers, pooling layers, and fully connected layers,
CNNs can recognize complex patterns and are widely used in image recognition, object detection,
and video analysis.
The RAG (Retrieval-Augmented Generation) pipeline is a hybrid approach that combines the
power of retrieval-based and generation-based models to answer queries more effectively. It
enhances the capabilities of language models by retrieving relevant information from external
documents or databases and using that information to generate more accurate and contextually
relevant responses.
1. Query Input:
● The process begins when a user submits a query or prompt that requires a response. This
could be a question, a task, or any form of text input.
2. Retrieval Phase:
● Retriever Model: The query is passed through a retrieval model (typically a Dense
Retriever or TF-IDF based retriever) that searches an external database or corpus to find
relevant documents or passages that may contain the information needed to answer the
query.
● The retriever returns a ranked list of documents, passages, or snippets relevant to the query.
○ Dense Retrieval (using embeddings): The query and the documents in the database
are converted into vectors using pre-trained models (such as BERT or other
transformers). Cosine similarity or other distance metrics are used to rank documents
based on their similarity to the query.
○ Sparse Retrieval (using keyword matching): Simple keyword matching or vector
space models (like BM25) are used for document retrieval.
● After the retrieval phase, the documents are ranked based on their relevance to the query.
● Often, the top-k most relevant documents (e.g., the top 5 or 10) are selected for use in the
next step.
4. Generation Phase:
● Generator Model: A generative language model (usually a model like GPT or BART)
takes the query and the retrieved documents as context.
● The model uses the relevant passages retrieved by the retriever to generate a more
accurate, fluent, and coherent response. This step involves combining the retrieved
information with the model's internal knowledge (learned during training) to generate a final
response.
5. Final Output:
● The output from the generator model is the final answer to the query, which incorporates
both the model's knowledge and the relevant external information retrieved during the first
phase.
● This answer is then returned to the user.
● Retriever: Responsible for fetching relevant documents from an external knowledge base.
● Generator: Generates a natural language response based on the retrieved documents and
the query.
● Knowledge Base: A large collection of documents or a database from which the retriever
fetches information. This could be anything from a search engine index to a more structured
database.
● Fusion: In some RAG implementations, the retriever and generator models can be jointly
trained to optimize the interaction between retrieval and generation, improving the quality of
the final output.
Let’s say you want to ask a question about a recent scientific discovery:
2. Retrieval:
○The retriever model searches a large corpus (e.g., research papers, articles, or
scientific journals) to find documents related to quantum computing advancements.
○ It might return articles such as "Quantum Computing Breakthrough in 2023," "New
Quantum Algorithms" and "Recent Advances in Quantum Hardware."
3. Generation:
○ The generator model takes the retrieved documents and the query and generates a
response like: "The latest breakthrough in quantum computing involves the
development of a new quantum error-correcting code that promises to improve the
reliability of quantum computers, demonstrated by researchers at MIT in 2023."
4. Output: The generated response is returned to the user.
● RAG-Token: A token-level version where the retrieval process is integrated into the
generation process at the token level. Each token generation step may depend on the
retrieved documents.
● RAG-Sequence: A sequence-level version where entire documents are retrieved and
provided as context for generating a sequence of tokens.
Conclusion:
The RAG pipeline is powerful for tasks that require answering questions or generating content based
on up-to-date or extensive external knowledge. By combining retrieval and generation, it enhances
the ability of language models to provide accurate and relevant answers while reducing the
dependency on vast internal knowledge storage.
Concept of "Context Window" in LLMs
The context window in large language models (LLMs) refers to the portion of the input text that the
model considers at any given time when generating predictions or responses. It represents the span
of text (such as words, tokens, or characters) that the model can "see" or "attend to" during the
process of understanding or generating language.
1. Fixed Size: The context window has a fixed size, typically measured in terms of the number
of tokens (words or subwords) the model can process simultaneously. For example, GPT-3
has a context window of 2048 tokens, while GPT-4 has a much larger one (up to 32,768
tokens in some cases). Once this window is exceeded, the model can no longer attend to
earlier tokens unless they are within the current window.
2. Token Representation: Each token in the context window corresponds to a unit of meaning
(e.g., a word or subword), and the model processes these tokens in parallel to understand
and generate text. The context window ensures the model can take in surrounding tokens to
generate coherent responses.
3. Sliding Window: In some models, the context window can slide as new tokens are
processed. Once a certain number of tokens are consumed or generated, older tokens fall
outside the window, and new tokens are incorporated.
○ The size of the context window directly limits the amount of information the model
can process at once. If the context window is too small, the model may miss
important dependencies from earlier parts of the text. If the window is large, the
computational cost increases.
○ This limitation becomes evident in tasks that require understanding long documents,
maintaining coherence over long dialogues, or recalling information from earlier in a
conversation or text.
2. Handling Long-Range Dependencies:
○ When fine-tuning an LLM on a specific task, the context window can affect how the
model generalizes to tasks requiring long-form reasoning. A task involving multiple
steps or long conversations can benefit from a large context window, ensuring that
the model retains relevant information throughout the process.
5. Context Window in Chatbots and Conversational Models:
○ In conversational models, the context window is vital for keeping track of the
conversation history. A larger context window allows the model to consider earlier
parts of the conversation when generating the next response, leading to more
coherent and contextually relevant interactions.
1. Out-of-Window Information:
○ Researchers are exploring ways to extend the concept of the context window through
memory-augmented networks or retrieval-augmented generation (RAG) models.
These approaches try to address the limitations of fixed context windows by
introducing external memory or external retrieval systems to maintain access to more
extensive information.
Conclusion:
The context window in LLMs plays a critical role in determining how much of the input text the model
can "remember" and use for generating accurate and coherent outputs. Larger context windows
allow models to handle more information at once, improving performance in tasks that require
long-term dependencies, but at the cost of greater computational demands. Balancing window size
with efficiency and accuracy is crucial for optimizing the performance of LLMs.
Fine-tuning in the context of LLMs involves taking a pre-trained model and further training it on a
smaller, task-specific dataset. This process helps the model adapt its general language
understanding to the nuances of the specific application, thereby improving performance.
This is an important technique because it leverages the broad language knowledge acquired during
pre-training while modifying the model to perform well on specific applications, such as sentiment
analysis, text summarization, or question-answering.
LLMs handle out-of-vocabulary (OOV) words or tokens using techniques like subword tokenization
(e.g., Byte Pair Encoding or BPE, and WordPiece). These techniques break down unknown words
into smaller, known subword units that the model can process.
This approach ensures that even if a word is not seen during training, the model can still understand
and generate text based on its constituent parts, improving flexibility and robustness.
What are embedding layers, and why are they important in LLMs?
Embedding layers are a significant component in LLMs used to convert categorical data, such as
words, into dense vector representations. These embeddings capture semantic relationships
between words by representing them in a continuous vector space where similar words exhibit
stronger proximity. The importance of embedding layers in LLMs includes:
Researchers and practitioners have developed numerous evaluation metrics to gauge the
performance of an LLM. Common metrics include:
Incorporating external knowledge into an LLM can be achieved through several methods: