Lecture 4 - CS50's Introduction To Databases With SQL
Lecture 4 - CS50's Introduction To Databases With SQL
OpenCourseWare
Donate (https://cs50.harvard.edu/donate)
Lecture 4
• Introduction
• Views
• Simplifying
◦ Questions
• Aggregating
◦ Questions
• Common Table Expression (CTE)
• Partitioning
◦ Questions
• Securing
• Soft Deletions
• Fin
Introduction
Thus far, we have learned about concepts that allow us to design complex databases and write data
into them. Now, we will explore ways in which to obtain views from these databases.
Let’s go back to the database containing books longlisted for the International Booker Prize. Here is a
snapshot of tables from this database.
To �nd a book written by the author Han Kang, we would need to go each of through the three table
above — �rst �nding the author’s ID, then the corresponding book IDs and then the book titles.
Instead, is there a way to put together related information from the three tables in a single view?
Yes, we can use the JOIN command in SQL to combine rows from two or more tables based on a
related column between them. Here is a visual representation of how these tables could be joined in
order to line up authors and their books.
This makes it simple to observe that Han Kang authored The White Book.
One can also imagine removing the ID columns here, such that our view looks like the following.
Views
A view is a virtual table de�ned by a query.
Say we wrote a query to join three tables, as in the previous example, and then select the relevant
columns. The new table created by this query can be saved as a view, to be further queried later on.
Views are useful for:
simplifying: putting together data from different tables to be queried more simply,
aggregating: running aggregate functions, like �nding the sum, and storing the results,
partitioning: dividing data into logical pieces,
securing: hiding columns that should be kept secure. While there are other ways in which
views can be useful, in this lecture we will focus on the above four.
Simplifying
Let us open up longlist.db on SQLite and run the .schema command to verify that the three
tables we saw in the previous example are created: authors , authored and books .
To select the books written by Fernanda Melchor, we would write this nested query.
The above query is complex — there are three SELECT queries in the nested query. To simplify this,
let us �rst use JOIN to create a view containing authors and their books.
In a new terminal, let us connect to longlist.db again, and run the following query.
Observe that it is important to specify how two tables are joined, or the columns they are
joined on.
Tip: The primary key column of one table is usually joined to the corresponding foreign key
column of the other table!
Running this will pull up a table containing all the author names next to the titles of the books
they have authored.
To save the virtual table created in the previous step as a view, we need to change the query.
The view created here is called longlist . This view can now be used exactly as we would use a
table in SQL.
Let us write a query to see all the data within this view.
Using this view, we can considerably simplify the query needed to �nd the books written by
Fernanda Melchor.
A view, being a virtual table, does not consume much more disk space to create. The data within a
view is still stored in the underlying tables, but still accessible through this simpl�ed view.
Questions
Can we manipulate views to be ordered, or displayed differently?
Yes, we can order books in a view in much the same way as we can in a table.
As an example, let us display the data within the longlist view, ordered by the book titles.
We could also have the view itself be ordered. We can do this by including an ORDER BY clause
in the query used to create the view.
Aggregating
In longlist.db we have a table containing individual ratings given to each book. In previous weeks,
we saw how to �nd the average rating of every book, rounded to 2 decimal places.
The results of the above query can be made more useful by displaying the title of every book, and
perhaps the year in which each book was longlisted. This information is present in the books table.
Here, we use a JOIN to combine information from the ratings and books tables, joining on
the book ID column.
Notice the order of operations in this query — in particular, the placement of the GROUP BY
operation at the end of the query after the two tables are joined.
This aggregated data can be stored in a view.
On adding more data to the ratings table, to obtain an up-to-date aggregate, we need to simply
requery the view using a SELECT command like the above!
Each time a view is created, it gets added to the schema. We can verify this by running .schema to
observe that longlist and average_book_ratings are now part of this database’s schema.
To create temporary views that are not stored in the database schema, we can use CREATE
TEMPORARY VIEW . This command creates a view that exists only for the duration of our connection
with the database.
To �nd the average rating of books per year, we can use the view we already created.
Notice that we select the rating column from average_book_ratings , which already contains the
average ratings per book. Next, we group these by year and calculate the average ratings again,
which gives us the average rating per year!
We can store the results in a temporary view.
Questions
Can temporary views be used to test whether a query works or not?
Yes, this is a great use case for temporary views! To generalize a little, temporary views are used
when we want to organize data in some way without actually storing that organization long-term.
Next, we create a CTE containing the average ratings per book. We then use the average ratings per
book to calculate the average ratings per year, in much the same way as we did before.
WITH "average_book_ratings" AS (
SELECT "book_id", "title", "year", ROUND(AVG("rating"), 2) AS "rating" FROM "ratings"
JOIN "books" ON "ratings"."book_id" = "books"."id"
GROUP BY "book_id"
)
SELECT "year" ROUND(AVG("rating"), 2) AS "rating" FROM "average_book_ratings"
GROUP BY "year";
Partitioning
Views can be used to partition data, or to break it into smaller pieces that will be useful to us or an
application. For example, the website for the International Booker Prize has a page of longlisted
books for each year the prize was awarded. However, our database stores all the longlisted books in
a single table. For the sake of creating the website, or a different purpose, it might be useful to have
a different table (or view) of books for each year.
Let us create a view to store books longlisted in 2022.
Questions
Can views be updated?
No, because views do not have any data in the way that tables do. Views actually pull data from the
underlying tables each time they are queried. This means that when an underlying table is updated,
the next time the view is queried, it will display updated data from the table!
Securing
Views can be used to enhance database security by limiting access to certain data.
Consider a rideshare company’s database with a table rides that looks like the following.
If we were to give this data to an analyst, whose job is to �nd the most popular ride routes, it would
be irrelevant and indeed, not secure to give them the names of individual riders. Rider names are
likely categorized as Personally Identi�able Information (PII) which companies are not allowed to
share indiscriminately.
Views can be handy in this situation — we can share with the analyst a view containing the origin
and destination of rides, but not the rider names.
To try this out, let us open rideshare.db in our terminal. Running .schema should reveal one table
called rides in this database.
We can create a view with the relevant columns, while omitting the rider column altogether. But
we will go one step further here, and create a rider column to display an anonymous rider for each
row in the table. This will indicate to the analyst that while we have rider names in the database, the
names have been anonymized for security.
CREATE VIEW "analysis" AS
SELECT "id", "origin", "destination", 'Anonymous' AS "rider"
FROM "rides";
Although we can create a view that anonymizes data, SQLite does not allow access control. This
means that our analyst could simply query the original rides table and see all the rider names we
went to great lengths to omit in the analysis view.
Soft Deletions
As we saw in previous weeks, a soft deletion involves marking a row as deleted instead of removing
it from the table.
For example, a piece of art called “Farmers working at dawn” is marked as deleted from the
collections table by changing the value in the deleted column from 0 to 1.
We can imagine creating a view to display only the art that is not deleted.
To try this, let us open mfa.db in our terminal. The collections table does not have a deleted
column yet, so we need to add it. The default value here will be 0, to indicate that the row is not
deleted.
Now, let us perform a soft delete on the artwork “Farmers working at dawn”, by updating it to have 1
in the deleted column.
UPDATE "collections"
SET "deleted" = 1
WHERE "title" = 'Farmers working at dawn';
We can create a view to display information about the rows that are not deleted.
We can display the data in this view to verify that “Farmers working at dawn” is not present.
We already know that it is not possible to insert data into or delete data from a view. However, we
can set up a trigger that inserts into or deletes from the underlying table! The INSTEAD OF trigger
allows us to do this.
Every time we try to delete rows from the view, this trigger will instead update the deleted
column of the row in the underlying table collections , thus completing the soft deletion.
We use the keyword OLD within our update clause to indicate that the ID of the row updated
in collections should be the same as the ID of the row we are trying to delete from
current_collections .
Similarly, we can create a trigger that inserts data into the underlying table when we try to insert it
into a view.
There are two situations to consider here. We could be trying to insert into a view a row that already
exists in the underlying table, but was soft deleted. We can write the following trigger to handle this
situation.
The WHEN keyword is used to check if the accession number of the artwork already exists in
the collections table. This works because an accession number, as we know from previous
weeks, uniquely identi�es every piece of art in this table.
If the artwork does exist in the underlying table, we set its deleted value to 0, indicating a
reversal of the soft deletion.
The second situation occurs when we are trying to insert a row that does not exist in the underlying
table. The following trigger handles this situation.
CREATE TRIGGER "insert_when_new"
INSTEAD OF INSERT ON "current_collections"
FOR EACH ROW
WHEN NEW."accession_number" NOT IN (
SELECT "accession_number" FROM "collections"
)
BEGIN
INSERT INTO "collections" ("title", "accession_number", "acquired")
VALUES (NEW."title", NEW."accession_number", NEW."acquired");
END;
When the accession number of the inserted data is not already present within collections , it
inserts the row into the table.
Fin
This brings us to the conclusion of Lecture 4 about Viewing in SQL!