Database Systems CMP3101: Lecture #1 October 4, 2011
Database Systems CMP3101: Lecture #1 October 4, 2011
[2] Jeffrey D. Ullman, Jennifer Widom, 2007. A First Course in Database Systems. Prentice Hall [3] Rebecca Riordan, 2005. Designing Effective Database Systems. Addison- Wesley ISBN 0321290933.
2
Course Format
Lectures Tuesday , 10:00 am - 12:00 pm Tutorials Friday 08:00 am 10: 00 am
Software Tools
MySQL DBMS
Free Downloadable community edition from oracle Better installed as part of a bundle LAMP, XAMP or WAMP
Course Outline
History and Overview of Database Systems
(Assignment I Due 11th October, 2011 2-3 pages)
Fundamentals of Database Systems Database Query Languages Data Modelling Relational Database Design Transaction Processing Distributed Databases
5
Database
What is a database ?
Database
What is a database ? A collection of files storing related data
Give examples of databases Accounts database; payroll database; Makerere students database; Amazons products database; airline reservation database
8
10
An Example
The Internet Movie Database http://www.imdb.com Entities: Actors (800k), Movies (400k), Directors, Relationships: who acted where, who directed what,
11
Tables
Actor:
id fName lName gende r
Cast:
pid mid
195428 Tom
Hanks M
Hanks F
195428 ...
337166
Name
year
337166
...
1995 . ..
12
SQL
SELECT *
FROM Actor
13
SQL
SELECT count(*)
FROM Actor
14
SQL
SELECT *
FROM Actor
SQL
SELECT *
FROM Actor, Casts, Movie WHERE lname='Hanks' and Actor.id = Casts.pid and Casts.mid=Movie.id and Movie.year=1995
16
Cast:
lName
Hanks
Movie:
mid id ... ... Name year 1995
fNam e
gende r
17
Statistics
18
Recovery
Transfer 1,000,000 from account #4662 to #7199: X = Read(Account_1); X.amount = X.amount 1,000,000; Write(Account_1, X); Y = Read(Account_2); Y.amount = Y.amount + 1,000,000; Write(Account_2, Y);
19
Recovery
Transfer 1,000,000 from account #4662 to #7199: X = Read(Account_1); X.amount = X.amount 1,000,000; Write(Account_1, X);
CRASH !
Concurrency Control
How to overdraft your account:
User 1
X = Read(Account); if (X.amount > 100) { dispense_money( ); X.amount = X.amount 100; } else error(Insufficient funds);
User 2
X = Read(Account); if (X.amount > 100) { dispense_money( ); X.amount = X.amount 100; } else error(Insufficient funds);
21
Transactions
Recovery Concurrency control
Decision-Support
Many aggregate/group-by queries. Sometimes called data warehouse
24
25
Data Management
Data Management is more than databases ! Here is an example of a problem: Alice sends Bob in random order all the numbers 1, 2, 3, , 100000000000000000000 She does not repeat any number But she misses exactly one Help Bob find out which one is missing !
After you solve it, make it a bit harder: Alice misses exactly ten numbers Help Bob find out which ones are missing !
26
SQL
Data Definition Language (DDL)
Create/alter/delete tables and their attributes Following lectures...
27
Table name
Tables in SQL
Key
Attribute names
Product
PName
Gizmo Powergizmo SingleTouch MultiTouch Tuples or rows
Price
$19.99 $29.99 $149.99 $203.99
Category
Gadgets Gadgets Photography Household
Manufacturer
GizmoWorks GizmoWorks Canon Hitachi
28
Record (tuple)
Has atomic attributes
Table (relation)
A set of tuples
29
Price
$19.99 $29.99 $149.99 $203.99
Category
Gadgets Gadgets Photography Household
Manufacturer
GizmoWorks GizmoWorks Canon Hitachi
selection
Powergizmo
Gizmo
Powergizmo SingleTouch MultiTouch
$19.99
$29.99 $149.99 $203.99
Gadgets
Gadgets Photography Household
GizmoWorks
GizmoWorks Canon Hitachi
SELECT PName, Price, Manufacturer FROM Product WHERE Price > 100
Details
Case insensitive:
SELECT = Select = select Product = product
Constants:
abc - yes
abc - no
32
Eliminating Duplicates
SELECT DISTINCT category FROM Product
Category Gadgets Photography Household
Compare to:
Category
33
34
PName
Price
Category
Manufacturer
Gizmo
Powergizmo SingleTouch MultiTouch
$19.99
$29.99 $149.99 $203.99
Gadgets
Gadgets Photography Household
GizmoWorks
GizmoWorks Canon Hitachi
SELECT DISTINCT category FROM Product ORDER BY category SELECT Category FROM Product ORDER BY PName SELECT DISTINCT category FROM Product ORDER BY PName
? ? ?
35
Key
Product
PName Gizmo Powergizmo SingleTouch MultiTouch Price $19.99 $29.99 $149.99 $203.99 Category Gadgets Gadgets Photography Household Manufacturer GizmoWorks GizmoWorks Canon Hitachi
36
Foreign key
Joins
Product (PName, Price, Category, Manufacturer) Company (CName, stockPrice, Country) Find all products under $200 manufactured in Japan; Join return their names and prices.
SELECT PName, Price FROM Product, Company WHERE Manufacturer=CName AND Country=Japan AND Price <= 200
37
Joins
Product
PName Gizmo Powergizmo SingleTouch MultiTouch Price $19.99 $29.99 $149.99 $203.99 Category Gadgets Gadgets Photography Household Manufacturer GizmoWorks GizmoWorks Canon Hitachi
Company
Cname GizmoWorks Canon Hitachi StockPrice 25 65 15 Country USA Japan Japan
SELECT PName, Price FROM Product, Company WHERE Manufacturer=CName AND Country=Japan AND Price <= 200
PName SingleTouch
Price $149.99
38
Tuple Variables
Person(pname, address, worksfor) Company(cname, address)
SELECT DISTINCT pname, address FROM Person, Company WHERE worksfor = cname SELECT DISTINCT Person.pname, Company.address FROM Person, Company WHERE Person.worksfor = Company.cname SELECT DISTINCT x.pname, y.address FROM Person AS x, Company AS y WHERE x.worksfor = y.cname
39
Which address ?
In Class
Product (pname, price, category, manufacturer) Company (cname, stockPrice, country) Find all Chinese companies that manufacture products both in the toy category SELECT cname
FROM
WHERE
40
In Class
Product (pname, price, category, manufacturer) Company (cname, stockPrice, country) Find all Chinese companies that manufacture products both in the electronic and toy categories SELECT cname
FROM
WHERE
Dan Suciu -- p544 Fall 2010 41
43
44
Company
Cname GizmoWorks Canon Hitachi StockPrice 25 65 15 Country USA Japan Japan
Country
USA USA
45
Subqueries
A subquery is another SQL query nested inside a larger query Such inner-outer queries are called nested queries A subquery may occur in:
1. A SELECT clause 2. A FROM clause 3. A WHERE clause
Rule of thumb: avoid writing nested queries when possible; keep in mind that sometimes its impossible
46
1. Subqueries in SELECT
Product ( pname, price, company) Company(cname, city)
For each product return the city where it is manufactured
SELECT X.pname, (SELECT Y.city FROM Company Y WHERE Y.cname=X.company) FROM Product X
1. Subqueries in SELECT
Product ( pname, price, company) Company(cname, city)
Whenever possible, dont use a nested queries:
SELECT pname, (SELECT city FROM Company WHERE cname=company) FROM Product
=
SELECT pname, city FROM Product, Company WHERE cname=company Dan Suciu -- p544 Fall 2010
1. Subqueries in SELECT
Product ( pname, price, company) Company(cname, city)
Compute the number of products made in each city
SELECT DISTINCT city, (SELECT count(*) FROM Product WHERE cname=company) FROM Company
2. Subqueries in FROM
Product ( pname, price, company) Company(cname, city)
Find all products whose prices is > 20 and < 30
SELECT X.city FROM (SELECT * FROM Product AS Y WHERE Y.price > 20) AS X WHERE X.price < 30
3. Subqueries in WHERE
Product ( pname, price, company) Existential quantifiers Company( cname, city) Find all cities that make some products with price < 100
Using EXISTS:
SELECT DISTINCT Company.city FROM Company WHERE EXISTS (SELECT * FROM Product WHERE company = cname and Produc.price < 100)
51
3. Subqueries in WHERE
Product ( pname, price, company) Existential quantifiers Company( cname, city) Find all cities that make some products with price < 100
Predicate Calculus (a.k.a. First Order Logic) { y | x. Company(x,y) (z. p. Product(z,p,x) p < 100) }
52
3. Subqueries in WHERE
Product ( pname, price, company) Existential quantifiers Company( cname, city) Find all cities that make some products with price < 100
Using IN
SELECT DISTINCT Company.city FROM Company WHERE Company.cname IN (SELECT Product.company FROM Product WHERE Produc.price < 100)
53
3. Subqueries in WHERE
Product ( pname, price, company) Existential quantifiers Company( cname, city) Find all cities that make some products with price < 100
Using ANY:
SELECT DISTINCT Company.city FROM Company WHERE 100 > ANY (SELECT price FROM Product WHERE company = cname)
54
3. Subqueries in WHERE
Product ( pname, price, company) Existential quantifiers Company( cname, city) Find all cities that make some products with price < 100
Now lets unnest it:
SELECT DISTINCT Company.cname FROM Company, Product WHERE Company.cname = Product.company and Product.price < 100
55
3. Subqueries in WHERE
Product ( pname, price, company) Universal quantifiers Company( cname, city)
Find all cities with companies that make only products with price < 100
3. Subqueries in WHERE
Product ( pname, price, company) Universal quantifiers Company( cname, city)
Find all cities with companies that make only products with price < 100 Predicate Calculus (First Order Logic) { y | x. Company(x,y) (z. p. Product(z,p,x) p < 100) }
57
3. Subqueries in WHERE
De Morgans Laws: (A B) = A B (A B) = A B x. P(x) = x. P(x) x. P(x) = x. P(x)
(A B) = A B
{ y | x. Company(x,y) (z. p. Product(z,p,x) p < 100) } = { y | x. Company(x,y) (zp. Product(z,p,x) p 100) } = { y | x. Company(x,y)) } { y | x. Company(x,y) Dan Suciu -- p544 Fall 2010 (zp. Product(z,p,x) p 100) } 58
3. Subqueries in WHERE
1. Find the other companies: i.e. s.t. some product 100
SELECT DISTINCT Company.city FROM Company WHERE Company.cname IN (SELECT Product.company FROM Product WHERE Produc.price >= 100
2. Find all companies s.t. all their products have price < 100
SELECT DISTINCT Company.city FROM Company WHERE Company.cname NOT IN (SELECT Product.company FROM Product WHERE Produc.price >= 100
Dan Suciu -- p544 Fall 2010
59
3. Subqueries in WHERE
Product ( pname, price, company) Universal quantifiers Company( cname, city)
Find all cities with companies that make only products with price < 100 Using EXISTS:
SELECT DISTINCT Company.city FROM Company WHERE NOT EXISTS (SELECT * FROM Product WHERE company = cname and Produc.price >= 100)
60
3. Subqueries in WHERE
Product ( pname, price, company) Universal quantifiers Company( cname, city)
Find all cities with companies that make only products with price < 100 Using ALL:
SELECT DISTINCT Company.city FROM Company WHERE 100 > ALL (SELECT price FROM Product WHERE company = cname)
61
Monotone Queries
A query Q is monotone if:
Whenever we add tuples to one or more of the tables the answer to the query cannot contain fewer tuples
62
Find drinkers that frequent some bar that serves some beer they like. x: y. z. Frequents(x, y)Serves(y,z)Likes(x,z)
Find drinkers that frequent only bars that serves some beer they like. x: y. Frequents(x, y) (z. Serves(y,z)Likes(x,z))
Find drinkers that frequent some bar that serves only beers they like. x: y. Frequents(x, y)z.(Serves(y,z) Likes(x,z))
Find drinkers that frequent only bars that serves only beer they like. x: y. Frequents(x, y) z.(Serves(y,z) Likes(x,z)) Dan Suciu -- p544 Fall 2010
64
Aggregation
SELECT avg(price) FROM Product WHERE maker=Toyota SELECT count(*) FROM Product WHERE year > 1995
SQL supports several aggregation operations: sum, count, min, max, avg
Aggregation: Count
COUNT applies to duplicates, unless otherwise stated: SELECT Count(category) FROM Product WHERE year > 1995 We probably want: SELECT Count(DISTINCT category) FROM Product WHERE year > 1995
66
same as Count(*)
More Examples
Purchase(product, date, price, quantity)
Simple Aggregations
Purchase
Product Bagel Bagel Banana Price 3 1.50 0.5 Quantity 20 20 50
Banana
Banana
2
4
10
10
90 (= 60+30)
68
70
1&2. FROM-WHERE-GROUPBY
Product Bagel Bagel Banana Banana Banana Price 3 1.50 0.5 2 4 Quantity 20 20 50 10 10
71
3. SELECT
Product Bagel Bagel Banana Banana Banana Price 3 1.50 0.5 2 4 Quantit y 20 20 50 10 10
Product TotalSales
Bagel Banana 40 20
72
SELECT DISTINCT x.product, (SELECT Sum(y.quantity) FROM Purchase y WHERE x.product = y.product AND price > 1) AS TotalSales FROM Purchase x WHERE price > 1 Why twice ? 73
Another Example
SELECT product, sum(quantity) AS SumSales max(price) AS MaxQuantity FROM Purchase GROUP BY product
What does it mean ?
Rule of thumb: Every group in a GROUP BY is non-empty ! If we want to include empty groups in the output, then we need either a subquery, or a left outer join (see later)
74
HAVING Clause
Same query, except that we consider only products that had at least 100 buyers. SELECT product, Sum(quantity) FROM Purchase WHERE price > 1 GROUP BY product HAVING Sum(quantity) > 30 HAVING clause contains conditions on aggregates.
75
Why ?
S = may contain attributes a1,,ak and/or any aggregates but NO OTHER ATTRIBUTES C1 = is any condition on the attributes in R1,,Rn C2 = is any condition on aggregate expressions
76
2.
3. 4.
Advanced SQLizing
1. Unnesting Aggregates
2. Finding witnesses
78
Unnesting Aggregates
Product ( pname, price, company) Company(cname, city)
Find the number of companies in each city
SELECT DISTINCT city, (SELECT count(*) FROM Company Y WHERE X.city = Y.city) FROM Company X
Equivalent queries
Unnesting Aggregates
Product ( pname, price, company) Company(cname, city)
Find the number of products made in each city
SELECT DISTINCT X.city, (SELECT count(*) FROM Product Y, Company Z WHERE Y.cname=Z.company AND Z.city = X.city) FROM Company X SELECT X.city, count(*) FROM Company X, Product Y WHERE X.cname=Y.company GROUP BY X.city
More Unnesting
Author(login,name) Wrote(login,url) Find authors who wrote 10 documents:This is SQL by Attempt 1: with nested queries
a novice
SELECT DISTINCT Author.name FROM Author WHERE count(SELECT Wrote.url FROM Wrote WHERE Author.login=Wrote.login) > 10
81
More Unnesting
Find all authors who wrote at least 10 documents: Attempt 2: SQL style (with GROUP BY)
SELECT Author.name FROM Author, Wrote WHERE Author.login=Wrote.login GROUP BY Author.name HAVING count(wrote.url) > 10 This is SQL by an expert
82
Finding Witnesses
Store(sid, sname) Product(pid, pname, price, sid)
83
Finding Witnesses
Finding the maximum price is easy
SELECT Store.sid, max(Product.price) FROM Store, Product WHERE Store.sid = Product.sid GROUP BY Store.sid
But we need the witnesses, i.e. the products with max price
84
Finding Witnesses
To find the witnesses, compute the maximum price in a subquery SELECT Store.sname, Product.pname FROM Store, Product, (SELECT Store.sid AS sid, max(Product.price) AS p FROM Store, Product WHERE Store.sid = Product.sid GROUP BY Store.sid, Store.sname) X WHERE Store.sid = Product.sid and Store.sid = X.sid and Product.price = X.p
85
Finding Witnesses
There is a more concise solution here:
SELECT Store.sname, x.pname FROM Store, Product x WHERE Store.sid = x.sid and x.price >= ALL (SELECT y.price FROM Product y WHERE Store.sid = y.sid)
86
NULLS in SQL
Whenever we dont have a value, we can put a NULL Can mean many things:
Value does not exists Value exists but is unknown Value not applicable Etc.
The schema specifies for each attribute if can be null (nullable attribute) or not How does SQL cope with tables that have NULLs ?
87
Null Values
If x= NULL then 4*(3-x)/7 is still NULL If x= NULL then x=Joe is UNKNOWN In SQL there are three boolean values:
FALSE = 0 UNKNOWN = 0.5 TRUE =1
88
Null Values
C1 AND C2 = min(C1, C2) C1 OR C2 = max(C1, C2) NOT C1 = 1 C1
SELECT * FROM Person WHERE (age < 25) AND (height > 6 OR weight > 190)
E.g. age=20 heigth=NULL weight=200
Null Values
Unexpected behavior:
SELECT * FROM Person WHERE age < 25 OR age >= 25 Some Persons are not included !
90
Null Values
Can test for NULL explicitly:
x IS NULL x IS NOT NULL
SELECT * FROM Person WHERE age < 25 OR age >= 25 OR age IS NULL
Outerjoins
Product(name, category) Purchase(prodName, store) join: An inner
SELECT Product.name, Purchase.store FROM Product, Purchase WHERE Product.name = Purchase.prodName Same as: SELECT Product.name, Purchase.store FROM Product JOIN Purchase ON Product.name = Purchase.prodName But Products that never sold will be lost !
92
Outerjoins
Product(name, category) Purchase(prodName, store)
If we want the never-sold products, need an outerjoin:
SELECT Product.name, Purchase.store FROM Product LEFT OUTER JOIN Purchase ON Product.name = Purchase.prodName
93
Product
Name Gizmo Camera OneClick Category gadget Photo Photo Name Gizmo Camera Camera OneClick
Purchase
ProdName Gizmo Camera Camera Store Wiz Ritz Wiz NULL
94
Application
Compute, for each product, the total number of sales in September Product(name, category) Purchase(prodName, month, store) SELECT Product.name, count(*) FROM Product, Purchase WHERE Product.name = Purchase.prodName and Purchase.month = September GROUP BY Product.name
Whats wrong ?
95
Application
Compute, for each product, the total number of sales in September Product(name, category) Purchase(prodName, month, store)
SELECT Product.name, count(store) FROM Product LEFT OUTER JOIN Purchase ON Product.name = Purchase.prodName and Purchase.month = September GROUP BY Product.name Now we also get the products who sold in 0 quantity
96
Outer Joins
Left outer join:
Include the left tuple even if theres no match
97