8 Data Modeling Patterns in Redis
8 Data Modeling Patterns in Redis
8 Data Modeling Patterns in Redis
temperatures location
employees
products
courses instructors
courses_instructors
© 2022 Redis
2
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 The Embedded Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
By the end of the book you will have. . . . . . . . . . . . . . . . . . . 3 The Partial Embed Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
SQL versus NoSQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 The Aggregate Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Modeling 1-to-1 Relationships . . . . . . . . . . . . . . . . . . . . . . 11 The Polymorphic Pattern. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
1-to-1 Relationships using SQL . . . . . . . . . . . . . . . . . . . . . . . . . 12 The Bucket Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
1-to-1 Relationships using Redis. . . . . . . . . . . . . . . . . . . . . . . . 13 The Revison Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
1-to-1 Relationships using Redis with the Partial Embed The Tree and Graph Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 The Schema Version Pattern . . . . . . . . . . . . . . . . . . . . . . . . . 55
Modeling Many-to-Many Relationships . . . . . . . . . . . . . 17 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Pattern 1: Relationship with Bounded Sides . . . . . . . . . . . 18
Pattern 2: Relationship with One Unbounded Side . . . . . 20
The Aggregate Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Without the Aggregate Pattern . . . . . . . . . . . . . . . . . . . . . . . 24
With the Aggregate Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . 26
The Polymorphic Pattern . . . . . . . . . . . . . . . . . . . . . . . . . 28
The Bucket Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Working with Time-series Data in Redis . . . . . . . . . . . . . . . 34
Aggregating with Time-series Data with Redis . . . . . . . . . 37
The Revision Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
The Tree and Graph Pattern. . . . . . . . . . . . . . . . . . . . . . . . 44
The Schema Version Pattern . . . . . . . . . . . . . . . . . . . . . . . 48
These questions are only a small subset of what you need to ask yourself when you start working with NoSQL. A common
misconception with NoSQL databases is that since they are “schemaless” you don’t need to worry about your schema.
In reality, your schema is incredibly important regardless of what database you choose. You also need to ensure that the
schema you choose will scale well with the database you plan to use.
In this e-book you will learn how to approach data modeling in NoSQL, specifically within the context of Redis. Redis is a
great database for demonstrating several NoSQL patterns and practices. Not only is Redis commonly used and loved by
developers, it also is a multi-model database. This means that while many of the patterns covered in this e-book apply to
different types of databases (e.g. document, graph, time series, etc.), with Redis you can apply all of the patterns in
a single database.
When building applications you are probably using an object-oriented language like JavaScript, Java, C#, or others.
Your data is represented as strings, lists, sets, hashes, JSON, and so on. However, if you store data in a SQL database
or a document database, you need to squeeze and transform the data into several tables or collections. You also need
complex queries (such as SQL queries) to get the data out. This is called impedance mismatch and is the fundamental
reason why NoSQL exists.
A large application might use other systems for data storage such as Neo4J for graph data, MongoDB for document
data, InfluxDB for time series, etc. Using separate databases turns an impedance mismatch problem into a database
orchestration problem. You have to juggle multiple connections to different databases, as well as learn the different client
libraries used.
With Redis, in addition to the basic data structures such as strings, lists, sets, and hashes, you can also store advanced
data structures such as RedisJSON for documents, RediSearch for secondary indexing, RedisGraph for graph data,
RedisTimeSeries for time-series data, and RedisBloom for probabilistic data (think leaderboards).
This reduces impedance mismatch because your data is stored in one of 15 structures with little or no transformations.
You can also use a single connection (or connection pool) and client library to access your data. What you end up with is
a simplified architecture with purpose-built models that are blazing fast and simple to manage. For this reason, this e-book
will use Redis to explain several of the NoSQL data modeling patterns.
Most developers have at least a little understanding of SQL and how to model data in it. This is because SQL is widely
used and there are several incredible books and even full courses devoted to it. NoSQL is quickly growing and becoming
more popular. But given that when you’re talking about NoSQL you’re talking about more than just a document store, there
is a lot of ground to cover. That’s why when covering certain NoSQL data modeling patterns in this e-book, you will be
presented with what it might look like to model the data in SQL as well.
When you approach data modeling in SQL you are typically focused on relationships, as SQL is meant for set-based
operations on relational data. NoSQL doesn’t have this constraint and is more flexible in the way you model data.
However, this can lead to schemas that are overly complex. When considering NoSQL schema design, always think about
performance and try to keep things simple.
So to kick things off, let’s start by looking at something that is very near and dear to a SQL developer’s heart: relationships.
Picture 1 Picture 2
1-to-1 Relationships
using SQL Picture 3
1-to-1 Relationships
using Redis
In Redis, similar to a relational Code Example 2
database, you can create a
collection called products
class ProductDetail(EmbeddedJsonModel):
and another called product_
description: str
details. But with RedisJSON
manufacturer: str
you can improve this by simply
dimensions: str
embedding product_images
weight: str
and product_details directly
images: List[str]
into the Products collection.
Then, when you query the
class Product(JsonModel):
Products collection, specify
name: str = Field(index=True)
which fields you need based
image: str = Field(index=True)
on which view you are trying
price: int = Field(index=True)
to create.
details: Optional[ProductDetail]
This will allow you to easily keep
all the data in one place. This is
called the Embedded Pattern
and is one of the most common
patterns you will see in NoSQL
document databases like
RedisJSON. Code Example 2
uses Python and a client library
called Redis OM (an ORM for
Redis) to model Products and
ProductDetails. Note that
ProductDetails is embedded
into Products directly, so all
of the data for a product will
be stored within the same
document.
1-to-Many Using the entity relationship in Picture 5, you would need two SQL statements to get a product and its reviews. Code
Example 5 demonstrates what the SQL might look like. Your API would need to join the two queries together before
Relationships sending the data to the client.
using SQL
In a relational database, you Picture 5
would have a table called
products and another table
called product_reviews. Picture
5 shows the entity relationship
diagram for products and
product_reviews.
Code Example 5
SELECT
id, ‘name‘, ‘image‘, price
FROM
products
WHERE
id = 1;
SELECT
‘name‘ , rating, publish_date, comment
FROM
product_reviews
WHERE
product_id = 1;
1-to-Many Picture 6
Let’s say a product can have up to three videos. We
Relationships still have a 1-to-many relationship between products
using Redis and videos but since the number of videos is limited,
we can model this by embedding a list of video URLs,
shown in Picture 6, into our products collection.
In Redis, similar to a
relational database, you
could create two collections
called products, and
product_reviews exactly
like the entities above.
This strategy (having two Code Example 6
separate collections) works
Code Example 6 shows how you would embed a list
well for documents that are
class Product(JsonModel): of videos directly into your products collection using
unbounded and can keep
name: str = Field(index=True) RedisJSON and Redis OM for Python. When making a
growing.
image: str = Field(index=True) query, if you don’t want to show the videos, you can leave
price: int = Field(index=True) them out of your FT.SEARCH query (See Code Example 7).
For example, a product could
videos: Optional[List[str]]
have hundreds of thousands
of reviews, but it might only
have a few related videos.
Reviews in this case are Code Example 7
unbounded, but videos are
bounded. If you have a 1-to- async def get_products():
many relationship where the results = await connections \
“many” is limited to just a few .get_redis_connection() \
documents, then you can .execute_command(
simply embed it directly in f’FT.SEARCH {Product.Meta.index_name} * LIMIT 0 10 RETURN 3 name image
the parent document. price’
)
return Product.from_redis(results)
1-to-Many Picture 7
Relationships
using Redis with
the Partial Embed
Pattern
You can also combine these
techniques if it makes sense
for the application you are
building. For example, let’s say
even though your product
reviews are unbounded, you
want to quickly show the recent Code Example 8
reviews all the time. Instead of
doing two different queries,
class ProductDetail(JsonModel):
you can simply embed the
product_id: str = Field(index=True)
recent reviews directly into
reviewer: str
the parent document and still
rating: str
keep the rest of the reviews in a
published_date: datetime.date
different collection. This is called
comment: str
the partial embed pattern.
Picture 7 shows the entity
class Product(JsonModel):
relationship diagram for partially
name: str = Field(index=True)
embedding product_reviews.
image: str = Field(index=True)
price: int = Field(index=True)
Code Example 8 shows the
videos: Optional[List[str]]
data model for products with
recent_reviews: Optional[List[ProductReview]]
embedded recent reviews.
await review.save()
await product.save()
Modeling Many-to-Many
Relationships
Many-to-Many relationships are very common and can be modeled in several ways with NoSQL databases. Here are
the two most common data modeling patterns for many-to-many relationships.
Pattern 1:
Many-to-Many
Relationship with
Bounded Sides
Imagine you are creating an Picture 9
app for an online school that
has courses and instructors.
There is a many-to-many
relationship between courses
and instructors, but the list
of instructors who teach a
course is bounded on both
sides, meaning there will be a
limited number of instructors
teaching a course and a limited
number of courses taught by an
instructor.
Picture 10
In a relational database, you
might have a table called
courses and another table
called instructors. Then you
would have a junction table
called courses_instructors that
would store the relationship
between courses and
instructors. This can be seen in
Picture 9.
Code Example 10 uses Redis OM for Python to model will automatically create an index for the specified keys.
Courses with a name field and an instructors field The “get_courses_with_instructor” function takes in an
that is a list of strings representing the unique keys for instructor key and returns all of the courses that contain
instructors. There is also an Instructors collection with that instructor. The “get_instructors_with_course” does
a name field and a courses field that is a list of strings the opposite, returning instructors for a given course. The
representing the unique keys for the courses. Note the two-way embedding pattern works well when both sides
code, “Field(index=True)” is used to enable searching of the relationship are bounded. But what about when
for instructors and courses using RediSearch. Redis OM one side is unbounded?
Pattern 2:
Many-to-Many
Relationship with
One Unbounded
Side
Now consider the relationship Picture 11
between courses and students.
Let’s assume this is an online
school, and there could be any
number of students enrolled in
a course. This is an unbounded
many-to-many relationship
on the course side. However,
the student side is bounded
because a student will only
enroll in a limited number
of courses.
Code Example 11
class Course(JsonModel):
name: str
instructors: Optional[List[str]] = Field(index=True)
class Student(JsonModel):
name: str = Field(index=True)
courses: Optional[List[str]] = Field(index=True)
Code Example 11 shows Courses with the name and To recap, data modeling for many-to-many relationships
instructor fields. Since the number of students in a can be represented by embedding one or both sides
course is unbounded, you don’t need to store a list of of the relationship depending upon whether a side is
students in each course document. Instead, Students bounded or unbounded. If both sides are bounded, then
has a name field and a courses field that is a list of strings you can embed the relationship on both sides. If only
representing the unique keys for the courses in which a one side is bounded, then you should avoid embedding
student is enrolled. You also see two functions, one for the unbounded side. You should also favor embedding
finding students in a course and the other for finding references unless you have information that is primarily
courses that have a specific student enrolled. This is how static and won’t change over time.
you model many-to-many relationships when one side of
the relationship is unbounded and the other is bounded.
Picture 12
This is called the Aggregate Picture 13 When a new review is added you can increment the
Pattern, also known as the count and add the new rating to the existing ratings
Computed Pattern. In our sum. Then, when a customer searches for a product,
e-commerce example, you can you read the sum and the count of ratings and calculate
store the number of reviews the average on the front end by dividing the sum by the
as well as the sum of ratings count. This way every time a customer visits the page,
on each product’s JSON the server and the database just need to return the pre-
document. This can be seen in calculated values, resulting in improved performance.
Picture 13. Let’s look at a before and after using the Aggregate
Pattern in code.
productEntity.entityData.numReviews += 1;
productEntity.entityData.sumRatings += rating;
return Promise.all([
productRepo.save(productEntity),
productReviewRepo.createAndSave({
productId,
author,
rating,
For example, a game console and a pair of earbuds might have similar fields such as the product name, brand, model
number, sku, and reviews. However the game console has some unique properties such as storage type, number of
HDMI ports, etc. The pair of earbuds also has unique fields such as battery life, connection type, fit, etc.
Picture 16
Aggregating With the Bucket Pattern and Redis, you can automatically aggregate your data as you go along. Say, for example,
you want to keep track of the average hourly temperature reading. Redis can do this automatically for you with the
Time-series Data TS.CREATERULE command. Let’s see what this looks like in code.
with Redis Code Example 17 shows two new time series added, temperature:daily and temperature:monthly. It also shows two rules
created using TS.CREATERULE to take the time-weighted average temperatures as they are added to temperature:raw
While it is nice to get a view of and store them in the respective daily and monthly time series.
all the data, what is also nice is
to be able to see the average
temperature over a period of Code Example 17
time. You can do this using
the TS.RANGE command and
async function createTimeSeries() {
specifying an AGGREGATE
const client = await getClient();
command of twa, for time-
const exists = await client.execute(‘EXISTS temperature:raw’.split(‘ ‘));
weighted average, as well as a
bucket duration. Let’s specify a
if (exists === 1) {
bucket duration of the number
return;
of milliseconds in a month so
}
we can see the average monthly
temperature in our time series.
const commands = [
‘TS.CREATE temperature:raw DUPLICATE_POLICY LAST’,
You can use the TS.RANGE
‘TS.CREATE temperature:daily DUPLICATE_POLICY LAST’,
command to get the average
‘TS.CREATE temperature:monthly DUPLICATE_POLICY LAST’,
temperature over a period of
‘TS.CREATERULE temperature:raw temperature:daily AGGREGATION twa 86400000’,
time. However, as you store
‘TS.CREATERULE temperature:raw temperature:monthly AGGREGATION twa
more measurements the time it
2629800000’,
takes to calculate the average
];
will increase. There is a better
way to handle this using the
for (let command of commands) {
Bucket pattern.
await client.execute(command.split(‘ ‘));
}
}
The TS.CREATERULE command takes in a sourceKey, destinationKey, aggregator function, and bucketDuration. The
sourceKey is the key to the source time series where you are storing your raw data. The destinationKey is where you want
to store the new, bucketed time series. The aggregator is the function you want to use for your buckets. In our case, we
will use twa to store the time-weighted average. Finally, the bucketDuration is the timespan in milliseconds for your buck-
ets.
Note that you should never explicitly add to the bucketed time series as it will be done automatically for you. Also, the rule
does not retroactively apply to an existing time series. Only new samples that are added to the source time series will be
aggregated. So if you look at the differences between Code Example 16 and Code Example 17 you will see that we only
needed to add the two new time-series keys and the two rules. Redis takes care of the rest!
Now if we want to get the average monthly temperature we can simply query the monthly time series with the following
TS.RANGE command.
TS.RANGE temperature:monthly 0
Note that you don’t have to specify any aggregator function because it’s already done for you using TS.CREATERULE. That
not only makes the command more readable than the aggregate command we had to run previously, but it also runs
much faster.
While Redis is incredibly fast when performing aggregate queries, using the bucket pattern to keep track of aggregate
values as you go is much faster.
As seen in Picture 18, in SQL you might store all posts in a table and have the revisions in a separate table. Then,
when you want to view the revisions for a specific post you need to query the latest version from the posts table and
join all the revisions from the revisions table that match that post.
Picture 18
return Post.from_redis(results)
For example, imagine you are building an enterprise resource planning (ERP) system. One of the most important parts
of the system is the org chart. At the very least, you need to be able to show details about each employee, where
they are located, and who they report to (or who reports to them). The most logical way to store this data is in a tree.
Let’s look at how you might do this in traditional SQL as well as NoSQL with Redis using the Tree and Graph Pattern.
Storing trees in SQL is straightforward, as SQL is designed specifically for relationship modeling. To model your org
chart, you might have two tables (shown in Picture 20): employees and locations.
Picture 20
Using Picture 20 let’s look at two potential SQL queries you might want to make. You need one query for getting
employees who work at a specific location and another query for getting employees who work for a specific
manager.
For example, when your app started maybe you were storing one email address per user but now you need to store
multiple email addresses. You could add additional columns such as “email2”, “email3”, etc. However, a better way is
to use a list of email addresses.
While the most future-proof way is to use an embedded list, the problem is all your existing users are stored with
email address fields directly in their document rather than in the “email addresses” list. In addition, all of your existing
code is using the email address fields on a user, not from within the email address list. So what do you do? This is
where you can use the Schema Version pattern to your advantage.
The Schema Version pattern is a way of assigning a version to your data model. This is usually done at the document
level, but you may also choose to version all of your data as part of an API version. It is recommended that you
always assign a version to your documents so that you can change them in the future without having to worry about
immediately migrating all of your data and code. If you’re using Redis your schema is flexible, and you can make
changes to your existing schema without any downtime.
If you aren’t already using the Schema Version Pattern the good news is you can start today without making any
significant changes to your application logic. Let’s dive into some code to see how you might introduce the Schema
Version Pattern into an existing system.
return user.toJSON();
}
return user.toJSON();
}
user.name = data.name;
user.email = data.email;
await repo.save(user);
return user.toJSON();
This is working well for now, but what happens when we The best way to do this is to create a translation function
need to change users to have a list of email addresses? that you run whenever a new user is created or an old
We need to change the existing schema and write some user is updated. The reason you want to do this is twofold.
code to incrementally migrate old users to the new First, you only want to migrate a user document one
schema. time, so you don’t want to translate it during read time.
Second, you want to allow your existing applications to
continue to use older schemas. Let’s see how we might
incrementally migrate old documents to use the new
schema while supporting existing application logic.
Conclusion
I hope you enjoyed this e-book. Let it serve as a reference for you as you go out and build amazing applications
using NoSQL and Redis! Remember, even though all of the examples in this e-book use Redis, the same patterns and
principles apply to other NoSQL databases.
Also, keep in mind that all of the patterns mentioned can be used together in your application. When you are
approaching building an application, have these patterns in the back of your mind, and figure out which pattern applies
best to the problem you are trying to solve. You have been given the tools and knowledge needed to build applications
using NoSQL. Now you just need to get started!