Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Prompt Design
04: Structured Data, Assistants, & RAG
1. Importance of Structured Data
2. How to Generate Structured Data from LLMs
3. Importance of Consistency in LLM Outputs
4. How to Generate Consistent Responses
5. Vector Databases and Semantic Search
6. Retrieval Augmented Generation
7. Assistants
Goals
John went to Paris on 1 August 2023.
Named Entity Recognition
John went to Paris on 1 August 2023.
● John => PERSON
● Paris => LOCATION
● 1 August 2023 => DATE
Traditional Approaches
● Rules-Based
● Task-Specific Machine Learning Model
Zero Shot NER with LLMs
Structured Data
Structured Data
Types of Data
Important Structures
● CSV
● JSON
● HTML/XML
Important Questions:
1. Should the data be hierarchical (nested).
2. Do I want to preserve the input data? If
so, how?
3. What is the intended usage of the data?
4. How much data will I have (scalability)?
CSV Comma Separated Value
CSV
JSON JavaScript Object Notation
JSON
HTML HyperText Markup Language
HTML
<p>
Not that <span class="person">Belladonna
Took</span> ever had any adventures after she
became Mrs. <span class="person">Bungo
Baggins</span>.
<span class="person">Bungo</span>, that was
<span class="person">Bilbo</span>’s father, built
the most luxurious hobbit-hole for her
(and partly with her money) that was to be found
either under <span class="place">The Hill</span>
or over <span class="place">The Hill</span>
or across <span class="place">The Water</span>,
and there they remained to the end of their days.
</p>
XML eXtensible Markup Language
XML
<text>
<sentence>
Not that <person>Belladonna Took</person> ever had any
adventures after she became Mrs. <person>Bungo Baggins</person>.
</sentence>
<sentence>
<person>Bungo</person>, that was <person>Bilbo</person>’s
father, built the most luxurious hobbit-hole for her
(and partly with her money) that was to be found either under
<place>The Hill</place> or over <place>The Hill</place>
or across <place>The Water</place>, and there they remained to
the end of their days.
</sentence>
</text>
Exercise 1 (10 min): Generate Structured
Data Output for “John went to Paris on 1
August 2023.”
Importance of Structured Output
Exercise 2 (10 min): Create your Own Texts
and Try to get the Same Output each time,
first in the same chat, then in different chats.
Few-Shot NER.
Practical Applications with Real World Data
An ANCYL member who was shot
and severely injured by SAP
members at Lephoi, Bethulie,
Orange Free State (OFS) on 17
April 1991. Police opened fire on a
gathering at an ANC supporter's
house following a dispute between
two neighbours, one of whom was
linked to the ANC and the other to
the SAP and a councillor.
Assistants
Vector Databases
Representing
Texts
Digitally
Embeddings
● The apple is in the tree.
○ 1-[0.01234, -0.23456, 0.87654,
0.45678, -0.56123, 0.65432,
0.12345, -0.77123, 0.08456,
0.34567, ...]
○ 2-different vector
○ 3-different vector
○ 4-different vector
○ 1-[0.01234, -0.23456, 0.87654,
0.45678, -0.56123, 0.65432,
0.12345, -0.77123, 0.08456,
0.34567, ...]
○ 5-different vector
Vector
Database
What is it?
● It holds vectors in a database
as storage.
● Similar vectors are stored
closer.
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Vector
Database
How do we use a vector
database?
● We populate a vector database
with by using a machine
learning model to vectorize
data and send them to the
database.
Vector
Database
Why use a vector database?
Vector
Database
Why use a vector database?
● Vector databases allow users
to store vector data in a way
that allows users to query it
and find similarity based on a
vector-level similarity, rather
than explicit human-defined
similarity.
Vector
Database
What is it?
● A vector database holds
numerous vectors or
embeddings of data.
Sometimes, the database will
also store the original data
alongside these vectors.
Vector Database Stacks
Vector Database Stacks
Vector Database
Stacks
What is available to us?
● Python, Annoy, Streamlit
○ Cheap, easy to deploy, great for
smaller datasets, but requires a
little bit of knowledge to build from
scratch
○ Best for smaller databases (under
10,000 data)
● Python, txtAI
○ Cheap and easy to use, more
resource intensive but easy to
deploy
○ Allows for easy interpretability (via
highlighting)
Multi-Modal
How does it work?
Retrieval-Augmented Generation
How tall is Wookie?
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
How tall is Wookie?
RAG
What is it?
● RAG allows for you to combine
the strengths of large language
models (LLMs) with vector
databases
● It limits the chances for an LLM
to hallucinate (generate fake
information)
● It uses a vector database to
find relevant material to a query
RAG
What is it?
● RAG allows for you to combine
the strengths of large language
models (LLMs) with vector
databases
● It limits the chances for an LLM
to hallucinate (generate fake
information)
● It uses a vector database to
find relevant material to a query
1
2
3
4
5 6

More Related Content

Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"

  • 1. Prompt Design 04: Structured Data, Assistants, & RAG
  • 2. 1. Importance of Structured Data 2. How to Generate Structured Data from LLMs 3. Importance of Consistency in LLM Outputs 4. How to Generate Consistent Responses 5. Vector Databases and Semantic Search 6. Retrieval Augmented Generation 7. Assistants Goals
  • 3. John went to Paris on 1 August 2023.
  • 4. Named Entity Recognition John went to Paris on 1 August 2023. ● John => PERSON ● Paris => LOCATION ● 1 August 2023 => DATE
  • 5. Traditional Approaches ● Rules-Based ● Task-Specific Machine Learning Model
  • 6. Zero Shot NER with LLMs
  • 8. Structured Data Types of Data Important Structures ● CSV ● JSON ● HTML/XML Important Questions: 1. Should the data be hierarchical (nested). 2. Do I want to preserve the input data? If so, how? 3. What is the intended usage of the data? 4. How much data will I have (scalability)?
  • 10. CSV
  • 12. JSON
  • 14. HTML <p> Not that <span class="person">Belladonna Took</span> ever had any adventures after she became Mrs. <span class="person">Bungo Baggins</span>. <span class="person">Bungo</span>, that was <span class="person">Bilbo</span>’s father, built the most luxurious hobbit-hole for her (and partly with her money) that was to be found either under <span class="place">The Hill</span> or over <span class="place">The Hill</span> or across <span class="place">The Water</span>, and there they remained to the end of their days. </p>
  • 16. XML <text> <sentence> Not that <person>Belladonna Took</person> ever had any adventures after she became Mrs. <person>Bungo Baggins</person>. </sentence> <sentence> <person>Bungo</person>, that was <person>Bilbo</person>’s father, built the most luxurious hobbit-hole for her (and partly with her money) that was to be found either under <place>The Hill</place> or over <place>The Hill</place> or across <place>The Water</place>, and there they remained to the end of their days. </sentence> </text>
  • 17. Exercise 1 (10 min): Generate Structured Data Output for “John went to Paris on 1 August 2023.”
  • 19. Exercise 2 (10 min): Create your Own Texts and Try to get the Same Output each time, first in the same chat, then in different chats.
  • 21. Practical Applications with Real World Data An ANCYL member who was shot and severely injured by SAP members at Lephoi, Bethulie, Orange Free State (OFS) on 17 April 1991. Police opened fire on a gathering at an ANC supporter's house following a dispute between two neighbours, one of whom was linked to the ANC and the other to the SAP and a councillor.
  • 24. Representing Texts Digitally Embeddings ● The apple is in the tree. ○ 1-[0.01234, -0.23456, 0.87654, 0.45678, -0.56123, 0.65432, 0.12345, -0.77123, 0.08456, 0.34567, ...] ○ 2-different vector ○ 3-different vector ○ 4-different vector ○ 1-[0.01234, -0.23456, 0.87654, 0.45678, -0.56123, 0.65432, 0.12345, -0.77123, 0.08456, 0.34567, ...] ○ 5-different vector
  • 25. Vector Database What is it? ● It holds vectors in a database as storage. ● Similar vectors are stored closer.
  • 27. Vector Database How do we use a vector database? ● We populate a vector database with by using a machine learning model to vectorize data and send them to the database.
  • 28. Vector Database Why use a vector database?
  • 29. Vector Database Why use a vector database? ● Vector databases allow users to store vector data in a way that allows users to query it and find similarity based on a vector-level similarity, rather than explicit human-defined similarity.
  • 30. Vector Database What is it? ● A vector database holds numerous vectors or embeddings of data. Sometimes, the database will also store the original data alongside these vectors.
  • 33. Vector Database Stacks What is available to us? ● Python, Annoy, Streamlit ○ Cheap, easy to deploy, great for smaller datasets, but requires a little bit of knowledge to build from scratch ○ Best for smaller databases (under 10,000 data) ● Python, txtAI ○ Cheap and easy to use, more resource intensive but easy to deploy ○ Allows for easy interpretability (via highlighting)
  • 36. How tall is Wookie?
  • 38. How tall is Wookie?
  • 39. RAG What is it? ● RAG allows for you to combine the strengths of large language models (LLMs) with vector databases ● It limits the chances for an LLM to hallucinate (generate fake information) ● It uses a vector database to find relevant material to a query
  • 40. RAG What is it? ● RAG allows for you to combine the strengths of large language models (LLMs) with vector databases ● It limits the chances for an LLM to hallucinate (generate fake information) ● It uses a vector database to find relevant material to a query 1 2 3 4 5 6