Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
9 views

RD SQL Notes

The document provides an introduction to relational databases and database management systems. It defines key terms like data, information and knowledge. It also describes the characteristics, components and architecture of a database system. Some additional topics covered include data models, SQL, relations, entity relationship diagrams and database normalization.

Uploaded by

Shantnu Sharma
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

RD SQL Notes

The document provides an introduction to relational databases and database management systems. It defines key terms like data, information and knowledge. It also describes the characteristics, components and architecture of a database system. Some additional topics covered include data models, SQL, relations, entity relationship diagrams and database normalization.

Uploaded by

Shantnu Sharma
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 119

Module I

Introduction to Relational
Database
Introduction
 Database is a collection of related data and data is a collection
of facts and figures that can be processed to produce
information.

 Data represents recordable facts.

 The processed data is called Information, which is based on


facts. For example, if we have data about marks obtained by all
students, we can then conclude about toppers and average
marks.
Data, Information and knowledge

Data is unprocessed facts and figures without any added interpretation or


analysis. Example : "The price of crude oil is $80 per barrel."
Information is data that has been interpreted so that it has meaning for the user.
"The price of crude oil has risen from $70 to $80 per barrel" gives meaning to
someone who tracks oil prices.
Knowledge is a combination of information, experience and insight that may
benefit the individual or the organization. "When crude oil prices go up by $10
per barrel, it's likely that petrol prices will rise by 2p per litre" is knowledge. So
the question is should we fill the tank today? (decision?)
Database Management System
 A Database Management System (DBMS) stores data in
such a way that it becomes easier to retrieve, manipulate,
and produce information.

 Example : MySQL, Oracle, Access, Microsoft SQL server,


MongoDB, NoSQL, PostgreSQL etc.,
Three types of Databases
• RDBMS (Relational Database Management System)
• NoSQL (Non Relational Database model)
• OLAP database
Who use database?

Analysts Data Science Specialist


Characteristics Database

 Real-world entity − A DBMS should be more realistic and represent


real-world entities to design its architecture. For example, a school
database may use students as an entity and their age as an attribute.

 Relation-based tables − DBMS allows entities and relations among them


to form tables. A user can understand the architecture of a database just by
looking at the table names.

 Isolation of data and application − A database system is entirely different


than its data.
Database characteristics

 Less redundancy − DBMS follows the rules of normalization, which


splits a relation when any of its attributes is having redundancy in values.
Normalization is a mathematically rich and scientific process that reduces
data redundancy.

 Security − DBMS offers methods to impose constraints while


entering data into the database and retrieving the same at a later stage.
Architectural view- DBMS
Application (Middle)
User (Presentation)
Tier − For a user, this
Tier − At this layer,
application tier presents
multiple views of the
an abstracted view of the
database can be provided
database. End-users are
by the application.
unaware of any existence
of the database beyond
the application. At the
other end, the database
tier is not aware of any
other user beyond the
Database (Data) Tier − application tier.
The database resides along
with its query processing
languages.
Data models
 Data models define how the logical structure of a database is
modeled.

 Data models define how data is connected to each other and


how they are processed and stored inside the system.

 Earlier data models were not so scientific, hence they were prone to
introduce lots of duplication and update anomalies.
What is SQL (pronounced ess-que-ell, or sequel).

What is SQL?
 SQL stands for Structured Query Language
 SQL lets you access and manipulate databases
 SQL became a standard of the American National Standards Institute (ANSI) in 1986, and of the
International Organization for Standardization (ISO) in 1987
What Can SQL do?
 SQL can execute queries against a database
 SQL can retrieve data from a database
 SQL can insert records in a database
 SQL can update records in a database
 SQL can delete records from a database
 SQL can create new databases
 SQL can create new tables in a database
 SQL can create stored procedures in a database
 SQL can create views in a database
 SQL can set permissions on tables, procedures, and views
Database – View
Relation

The data in RDBMS is stored in database objects called tables. A table is a collection of
related data entries and it consists of columns and rows.
Example: Relation/Table
Example – Relation/Table

Source: Google images


Characteristics of a Relation
 Data is stored in tables called relations.
 Relations can be normalized.
 In normalized relations, values saved are atomic values.
 Each row in a relation contains a unique value.
 Each column in a relation contains values from a same domain.
Information system components

Fig. Example Manufacturing information system view


Example: Relation Between tables

Source: Google images


Product – Sales relations
Employee-Department relations
Enterprise databases
Example : College Registration Database
DWH star schema – Relational model

The figure on the left side


shows an example Sales
Analysis Star Schema in
Data Warehouse
MySQL - RDBMS
 MySQL is the most popular Open Source Relational SQL
Database Management System.

 MySQL is one of the best RDBMS being used for developing


various web-based software applications and standalone applications.

 MySQL is open source software provided under the GPL License.

 Most of the SQL database programs also have their own proprietary
extensions in addition to the SQL standard!
E-R: Peterchen and Bachman
notations

Entity Relationship (ER) Diagram is diagrammatic


representation of data in databases.
Entity - student Notation
Student-course relationship

A relationship type represents the association between entity types. For


example,‘Enrolled in’ is a relationship type that exists between entity type Student and
Course. In ER diagram, relationship type is represented by a diamond and connecting the
entities with lines.

1…1 indicate cardinality.


Student database view
1..1 cardinality using set notation

* A set represents a set of entities/relations


Cardinalities

When entities in one entity set can take part only once in the relationship set
and entities in other entity set can take part more than once in the relationship
set, cardinality is many to one.

The above fig represents that a student can take only One Course but one course can
be taken by Many students. So the cardinality will be n to 1. It means that for one
course there can be n students but for one student, there will be only one course.
Sample tables for E-R diagram

Write sample tables from the following:


Students: S01,S02,S03….
 A student takes only one course.
Courses: C01,C02,C03…
 A student takes more than one course
Student name : alphabets
 Write the student no, name and course taken
Case: Internet Sales Model

Source: Google images


Case: Example

E-R diagrams captures business rules


Complete Participation

In the above diagram we have total participation at E1 end. Only the primary key of E1,
which is in total participation should be allowed as the primary key of the reduced table,
since if the primary key of E2 is used, it might have null values for many of its entries, since
its participation is only partial and may not have corresponding entries for all its values.
E-R diagram [order-products]

*Source: Google images E-R diagram


E-R diagram[Rental database]

*Source: Google images E-R diagram


Key terms
 Database − A database is a collection of tables, with related data.
 Table − A table is a matrix with data. A table in a database looks like a simple
spreadsheet.
 Column − One column (data element) contains data of one and the same kind, for
example the column postcode.
 Row − A row (= tuple, entry or record) is a group of related data, for example the data
of one subscription.
 Redundancy − Storing data twice, redundantly to make the system faster.
 Primary Key − A primary key is unique. A key value can not occur twice in one table.
With a key, you can only find one row.
 Foreign Key − A foreign key is the linking pin between two tables.
 Compound Key − A compound key (composite key) is a key that consists of multiple
columns, because one column is not sufficiently unique.
 Index − An index in a database resembles an index at the back of a book.
 Referential Integrity − Referential Integrity makes sure that a foreign key value
always points to an existing row.
References
 Data warehousing, Fundamentals for IT Professions, Paulraj
Ponniah, Wiley publications. 2 nd edition, ISBN: 978-81-265-
3729-7(Print), 2015 (reprint).
 Data Base –SQL, Complete Reference.
DWHDM

Module – II: Introduction to Data Warehouse

S.Srinivas
BUISNESS CHALLENGE ?

 Converting Data into knowledge and making it


available throughout the organization.

2
INTELLIGENCE-FUNDAMENTAL PRINCIPLES

 Data –Raw facts and numbers.


 Information –
Assigning meaning to data
Simplified: data in context.
 Knowledge
- is the subjective interpretation of Information and
approach to act upon in the mind of perceiver.
 Intelligence
- Intelligence or wisdom embodies awareness, insight,
moral judgments, and principles to construct new
knowledge and improve upon existing ones.
3
WHAT BUSINESS DO?
 Most companies collect a large amount of data from
their business operations. To keep track of that
information, a business need to use a wide range of
software programs , such as Excel, Access and
different database applications for various
departments throughout their organization. Using
multiple software programs makes it difficult to
retrieve information in a timely manner and to
perform analysis of the data.
4
WHAT BI DO?

 Business Intelligence is a tool that places the right


information, into the right hands, at the right time.

5
WHAT BI DO….CONTINUE

 BI is a technology-driven process for analyzing data and presenting


actionable information to help corporate executives, business
managers and other end users make more informed business
decisions.

 BI encompasses a variety of tools, applications and methodologies


that enable organizations to collect data from internal systems and
external sources, prepare it for analysis, develop and run queries
against the data, and create reports, dashboards and data
visualizations to make the analytical results available to corporate
decision makers as well as operational workers.- TechTarget.com

6
BI ADVANTAGES
 Improve Decision making
 Optimizing internal business processes

 Increase operational efficiency

 Driving new revenues

 Competitive advantages over business rivals.

 BI systems can also help companies identify


market trends and spot business problems that
need to be addressed.
 BI data can include historical information

 Enable BI analysis to support both strategic and


tactical decision-making processes. 7
8
WHAT BI DO …
 Traditional approach:
BI tools were primarily used by data analysts
and other IT professionals who ran analyses and
produce reports with query results for business
users.
o Current Trend:
Business executives and workers are using BI
software themselves. i.e. self-service BI

9
DWH IS ENVIRONMENT NOT
PRODUCT
 The end result is the creation of a new computing
environment for the purpose of providing the strategic
information for every business needs

10
DATA WAREHOUSE DEFINITION
 According to Bill Inmon, A data warehouse is a
subject oriented, integrated, nonvolatile, and time
variant collection of data in support of
management decisions.

 According to Sean Kelly, Data warehouse is


separate, available, integrated, Time stamped,
Subject oriented, Non volatile and accessible.

11
SUBJECT ORIENTED
 Data is not stored by operational applications, but
by business subjects.
 Example operational applications: order processing,
customer billing, customer accounts, savings
accounts, consumer loans, Account receivables ,
claim processing etc.,
 DWH Subjects: sales, products, customer, account,
policy etc.,

12
INTEGRATED

Savings
A/C

Subject=Account

Checking
A/C

Note: Integration require standardization:


codes, naming convention, attributes,
Measurements
Loan A/C 13
SUBJECT….
 The subject oriented design enhances user ability
to quickly analyze large multidimensional data
sets.

14
TIME-VARIANT
 Operational systems support day-to-day data.
 DWH data is used analysis and decision making

 Data is stored as snapshots over past and current


trends.
 Data contains time element
 Allows for analysis
 Relates information to the present
 Enable forecast

Example Query: what are the total sales of product A for the
15
last 3 years on New Year day across region “Y”
NONVOLATILE

Loads
OLTP DWH

Read, Add/Change, Delete Read

16
EXERCISES

1) For an airline company, how can strategic


information increases the number of frequent
flyers? Write specific details.

17
WHAT IS BUSINESS INTELLIGENCE?

 Business Intelligence is a technology based on


customer and profit oriented models that reduces
operating costs and provide increased profitability by
improving productivity, sales, service and helps to
make decision making capabilities at no time.

18
BI DEFINITION

 Converting data into knowledge and making it available


throughout the organization are the job of processes and
applications known as “Business Intelligence”.

 BI is a term that encompasses a broad range of analytical


software and solutions for gathering, consolidating, analyzing
and providing access to information in a way that is supposed
to let an enterprise’s users make better business decisions.

19
BI SOFTWARES

 The term BI includes software for ETL, data


warehousing, database query and reporting,
multidimensional/OLAP data analysis, data mining
and visualization.

20
BI-BUSINESS QUESTIONS
 Business Intelligence: Finance
What is the net income, expenses, gross profit,
and net profit for this quarter, year?

Business Intelligence: Accounts

 What is the sales amount this month and what is


the outstanding pending payment?

 Business Intelligence: Purchase:

 Who is the vendor to be contacted to purchase


products?
21
BI-BUSINESS QUESTIONS
 Business Intelligence: Production
How many products are manufactured in each
production unit today, weekly, monthly?
 Business Intelligence: Sales

How many products have been sold in each area today,


weekly, monthly?
 Business Intelligence: Quality
How many products have been defective today, weekly,
monthly, quarterly, yearly?
 Business Intelligence: Service

Are the customers satisfied with the quality?

22
EXAMPLE.

 Data: The numbers 100 or 5.


 Information: Principal amount : $100, Interest rate, 5%

 Knowledge: At the end of Year I get $105 back

 Intelligence: Concept of growth.

23
KPI’S

 Business intelligence often uses KPI”s(key


Performance Indicators) to assess the present state of
business and to prescribe a course of action.

24
KPI’S

 Key Performance Indicators (KPI) are financial and


non-financial metrics used to help an organization
define and measure progress toward organizational
goals.

 Example Marketing KPIS’


 Customer related numbers
 Number of customers acquired

 Customer attrition

 Turnover

 Outstanding balances held by customers etc.,

25
BUSINESS INTELLIGENCE -CORPORATE
 Business Intelligence Tools help to gather, store, access and
analyze corporate data to aid in decision-making.

 Example applications:
 Customer profiling
 Customer support
 Market research
 Market segmentation
 Product profitability
 Statistical analysis
 Inventory and distribution analysis.

26
BUSINESS INTELLIGENCE -CORPORATE
 Business Intelligence Tools help to gather, store, access and
analyze corporate data to aid in decision-making.

 Example applications:
 Customer profiling
 Customer support
 Market research
 Market segmentation
 Product profitability
 Statistical analysis
 Inventory and distribution analysis.

27
CORPORATE BI SYSTEM

28

Source: IBM Redbook, IBM Framework for e-business, SG-24-6248-00, Sept., 2001, ibm.com/redbooks
29
BI SOLUTIONS ARCHITECTURE

Relationship
Solutions Management
Billing/Payment
Enabling OLAP Systems
MOLAP Data Mining
Technologies
XML Advertising/
Data Warehousing Promotions
Core Messaging
Technologies Databases Data Marts
Virtual Data Views Personalization
Meta Data
Data Networks
Expert Systems Security Supply Chain
Management
CORBA/IIOP
COM/DCOM Knowledge
Business Management
Performance
Source: Kalakota, e-business 2.0 Measurement
Roadmap for Success 30
MAJOR BI INFORMATION COMPONENTS

 OLTP – online transaction processing for


gathering operational data
 Data Warehouse – operational data
collected & stored in a form for analyzing
 OLAP – online analytical processing for
interactive insight and decision-making
 Data Mining – Extract previously
unknown, comprehensible information for
decision support
 Metadata – used to describe the kinds of
information stored 31

DATA WAREHOUSE
ARCHITECTURES

32
DWH ARCHITECTURE –1 FOR BI

EIS /DSS

Metadata

Select Query Tools


Extract
Transform Data
Integrate Warehouse OLAP/ROLAP
Maintain

Web Browsers
Operational
Systems/Data Middleware/
API Data Mining
Data
Preparation 33
WAREHOUSE ARCHITECTURE - 2

Metadata

EIS /DSS
Data Mart

Metadata
Select Query Tools
Extract
Transform Data Mart
Integrate OLAP/ROLAP
Maintain Metadata

Web Browsers
Operational Data Mart
Systems/Data Middleware/
Data API Data Mining
Preparation 34
REFERENCE BOOK
 Data warehousing, Fundamentals for IT
Professions, Paulraj Punniah, Wiley publications.
2 nd edition, ISBN: 978-81-265-3729-7(Print),
2015 (reprint).

35
HOW IT DIFFERS FROM
OPERATIONAL DATABASES
Data Warehousing  Operational data usually
Accessing data organized by application
Transforming data
Distributing data
requirements
Storing data  Data needs to be organized
Automating processes
Managing metadata
differently for decision-making
Finding and understanding data purposes
Data Analysis  BI requires consolidation of data
Getting answers
Creating OLAP applications
across multiple systems.
Analyzing the data  Tools are needed to extract,
Other analysis tools
cleanse, and transform data
36
SQL Basics

Module - III
SQL Introduction
Standard language for querying and manipulating data

Structured Query Language

Many standards out there:


• ANSI SQL, SQL92 (a.k.a. SQL2), SQL99 (a.k.a. SQL3), ….
• Vendors support various subsets: watch for fun discussions in class !
SQL
• Data Definition Language (DDL)
– Create/alter/delete tables and their attributes
– Following lectures...
• Data Manipulation Language (DML)
– Query one or more tables
– Insert/delete/modify tuples in tables
Table name Attribute names

Tables in SQL
Product

PName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

SingleTouch $149.99 Photography Canon

MultiTouch $203.99 Household Hitachi

Tuples or rows
SQL CONSTRAINTS

Following are some of the most commonly used constraints available in


SQL.

•NOT NULL Constraint − Ensures that a column cannot have NULL value.
•DEFAULT Constraint − Provides a default value for a column when none is
specified.
•UNIQUE Constraint − Ensures that all values in a column are different.
•PRIMARY Key − Uniquely identifies each row/record in a database table.
•FOREIGN Key − Uniquely identifies a row/record in any of the given database
table.
•CHECK Constraint − The CHECK constraint ensures that all the values in a
column satisfies certain conditions.
•INDEX − Used to create and retrieve data from the database very quickly.

Why integrity constraints?


Integrity constraints are used to ensure accuracy and
consistency of the data in a relational database. Data integrity
is handled in a relational database through the concept of
Date in MySQL

• DATE - Stores a date value in the form YYYY-MM-DD.


For example 2008-10-23.
• CREATE TABLE `MySampleDB`.`orders` (`order_no` INT NOT NULL
AUTO_INCREMENT, `order_item` varchar(30) not null, `order_date`
DATE NOT NULL, `order_delivery` DATE NOT NULL, PRIMARY
KEY (`order_no`) ) ;
• SELECT MONTHNAME(NOW());
• SELECT NOW();
• Curdate( ) :Returns the current date
• Year () : Returns the year
• Week():Returns the week number
• DATETIME - Stores a date and time value of the form YYYY-MM-DD
HH:MM:SS. For example 2008-10-23 10:37:22.

• 2008-10-23 10:37:22 : store form


Statistics – Aggregations using MySQL

• |1,WildTech 250Gb 1700 | SATA Disk Drive | 120 |


• | 2 | Moto Razr | Mobile Phone | 200 |
• | 3 | Microsoft 10-20 Keyboard | Ergonmoc Keyboard | 49
• | 4 | EasyTech Mouse 7632 | Cordless Mouse | 49 |
• | 5 | Dell XPS 400 | Desktop PC | 999 |
• | 6 | Buffalo AirStation Turbo G | Wireless Ethernet Bridge | 60 |
• | 7 | Apple iPod Touch | Portable Music/Movie Player | 199 |
• | 8 | Apple iPhone 8Gb | Smart Phone | 39| 9 |

• SELECT AVG(prod_price) AS price_ag FROM products;


• SELECT AVG(prod_price) AS price_avg FROM products
WHERE prod_price BETWEEN 10 and 199;
• SELECT COUNT(*) FROM products;
• SELECT COUNT(prod_price) AS low_price_items FROM
products WHERE prod_price < 200;
• ELECT MAX(prod_price) AS max_price FROM products;
• SELECT MIN(prod_price) AS min_price FROM products;
Aggregate funcitons in MySQL

• SELECT MAX(prod_price) AS max_price,


MIN(prod_price) AS max_price FROM
product;
References
• Data warehousing, Fundamentals for IT
Professions, Paulraj Ponniah, Wiley
publications. 2 nd edition, ISBN: 978-81-
265-3729-7(Print), 2015 (reprint).
• Data Base –SQL, Complete Reference.
Tables Explained
• The schema of a table is the table name and its
attributes:
Product(PName, Price, Category, Manfacturer)

• A key is an attribute whose values are unique;


we underline a key

Product(PNumber, Price, Category, Manfacturer)


Data Types in SQL
• Atomic types:
– Characters: CHAR(20), VARCHAR(50)
– Numbers: INT, BIGINT, SMALLINT, FLOAT
– Others: MONEY, DATETIME, …

• Every attribute must have an atomic type


– Hence tables are flat
– Why ?
Tables Explained
• A tuple = a record
– Restriction: all attributes are of atomic type

• A table = a set of tuples


– Like a list…
– …but it is unorderd:
no first(), no next(), no last().
SQL Query
Basic form: (plus many many more bells and whistles)

SELECT <attributes>
FROM <one or more relations>
WHERE <conditions>
Simple SQL Query
Product PName Price Category Manufacturer
Gizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
MultiTouch $203.99 Household Hitachi

SELECT *
FROM Product
WHERE category=‘Gadgets’

PName Price Category Manufacturer


Gizmo $19.99 Gadgets GizmoWorks

“selection” Powergizmo $29.99 Gadgets GizmoWorks


Simple SQL Query
Product PName Price Category Manufacturer
Gizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
MultiTouch $203.99 Household Hitachi

SELECT PName, Price, Manufacturer


FROM Product
WHERE Price > 100

PName Price Manufacturer


“selection” and SingleTouch $149.99 Canon
“projection” MultiTouch $203.99 Hitachi
Notation
Input Schema

Product(PName, Price, Category, Manfacturer)

SELECT PName, Price, Manufacturer


FROM Product
WHERE Price > 100

Answer(PName, Price, Manfacturer)

Output Schema
Details
• Case insensitive:
– Same: SELECT Select select
– Same: Product product
– Different: ‘Seattle’ ‘seattle’

• Constants:
– ‘abc’ - yes
– “abc” - no
The LIKE operator
SELECT *
FROM Products
WHERE PName LIKE ‘%gizmo%’

• s LIKE p: pattern matching on strings


• p may contain two special symbols:
– % = any sequence of characters
– _ = any single character
Eliminating Duplicates
Category
SELECT DISTINCT category Gadgets
FROM Product Photography
Household

Compare to:
Category
Gadgets
SELECT category Gadgets
FROM Product Photography
Household
Ordering the Results
SELECT pname, price, manufacturer
FROM Product
WHERE category=‘gizmo’ AND price > 50
ORDER BY price, pname

Ordering is broken by the second attribute on the ORDER BY


list, etc.

Ordering is ascending, unless you specify the DESC keyword.


PName Price Category Manufacturer
Gizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
MultiTouch $203.99 Household Hitachi

SELECT DISTINCT category


FROM Product
ORDER BY category ?
SELECT Category
FROM Product
ORDER BY PName
?
SELECT DISTINCT category
FROM Product
ORDER BY PName
?
Keys and Foreign Keys
Company
CName StockPrice Country

GizmoWorks 25 USA
Key
Canon 65 Japan

Hitachi 15 Japan

Product
PName Price Category Manufacturer
Foreign
Gizmo $19.99 Gadgets GizmoWorks
key
Powergizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
MultiTouch $203.99 Household Hitachi
Aggregation
SELECT avg(price) SELECT count(*)
FROM Product FROM Product
WHERE maker=“Toyota” WHERE year > 1995

SQL supports several aggregation operations:

sum, count, min, max, avg

Except count, all aggregations apply to a single attribute


Aggregation: Count
COUNT applies to duplicates, unless otherwise stated:

SELECT Count(category) same as Count(*)


FROM Product
WHERE year > 1995

We probably want:

SELECT Count(DISTINCT category)


FROM Product
WHERE year > 1995
More Examples
Purchase(product, date, price, quantity)

SELECT Sum(price * quantity)


FROM Purchase
What do
they mean ?
SELECT Sum(price * quantity)
FROM Purchase
WHERE product = ‘bagel’
Purchase
Simple Aggregations
Product Date Price Quantity
Bagel 10/21 1 20
Banana 10/3 0.5 10
Banana 10/10 1 10
Bagel 10/25 1.50 20

SELECT Sum(price * quantity)


FROM Purchase 50 (= 20+30)
WHERE product = ‘bagel’
Grouping and Aggregation
Purchase(product, date, price, quantity)

Query: Find total sales after 10/1/2005 per product.

SELECT product, Sum(price*quantity) AS TotalSales


FROM Purchase
WHERE date > ‘10/1/2005’
GROUP BY product

Let’s see what this means…


Grouping and Aggregation

1. Compute the FROM and WHERE clauses.

2. Group by the attributes in the GROUPBY

3. Compute the SELECT clause: grouped attributes and aggregates.


1&2. FROM-WHERE-GROUPBY

Product Date Price Quantity


Bagel 10/21 1 20
Bagel 10/25 1.50 20
Banana 10/3 0.5 10
Banana 10/10 1 10
3. SELECT
Product Date Price Quantity Product TotalSales
Bagel 10/21 1 20
Bagel 10/25 1.50 20 Bagel 50
Banana 10/3 0.5 10
Banana 10/10 1 10
Banana 15

SELECT product, Sum(price*quantity) AS TotalSales


FROM Purchase
WHERE date > ‘10/1/2005’
GROUP BY product
GROUP BY v.s. Nested Quereis
SELECT product, Sum(price*quantity) AS TotalSales
FROM Purchase
WHERE date > ‘10/1/2005’
GROUP BY product

SELECT DISTINCT x.product, (SELECT Sum(y.price*y.quantity)


FROM Purchase y
WHERE x.product = y.product
AND y.date > ‘10/1/2005’)
AS TotalSales
FROM Purchase x
WHERE x.date > ‘10/1/2005’
Another Example
What does
it mean ?

SELECT product,
sum(price * quantity) AS SumSales
max(quantity) AS MaxQuantity
FROM Purchase
GROUP BY product
HAVING Clause

SELECT product, Sum(price * quantity)


FROM Purchase
WHERE date > ‘10/1/2005’
GROUP BY product
HAVING Sum(quantity) > 30

*HAVING clause contains conditions on aggregates.


General form of Grouping and Aggregation

SELECT S
FROM R1,…,Rn
WHERE C1
GROUP BY a1,…,ak Why ?
HAVING C2

S = may contain attributes a1,…,ak and/or any aggregates but NO OTHER


ATTRIBUTES
C1 = is any condition on the attributes in R1,…,Rn
C2 = is any condition on aggregate expressions
General form of Grouping and Aggregation
SELECT S
FROM R1,…,Rn
WHERE C1
GROUP BY a1,…,ak
HAVING C2

Evaluation steps:
1. Evaluate FROM-WHERE, apply condition C1
2. Group by the attributes a1,…,ak
3. Apply condition C2 to each group (may have aggregates)
4. Compute aggregates in S and return the result
Insertions
General form:

INSERT INTO R(A1,…., An) VALUES (v1,…., vn)

Example: Insert a new purchase to the database:


INSERT INTO Purchase(buyer, seller, product, store)
VALUES (‘Joe’, ‘Fred’, ‘wakeup-clock-espresso-machine’,
‘The Sharper Image’)

Missing attribute  NULL.


May drop attribute names if give them in order.
Insertions
INSERT INTO PRODUCT(name)

SELECT DISTINCT Purchase.product


FROM Purchase
WHERE Purchase.date > “10/26/01”

The query replaces the VALUES keyword.


Here we insert many tuples into PRODUCT
Insertion: an Example
Product(name, listPrice, category)
Purchase(prodName, buyerName, price)
prodName is foreign key in Product.name

Suppose database got corrupted and we need to fix it:


Purchase
Product
prodName buyerName price
name listPrice category
camera John 200

gizmo 100 gadgets gizmo Smith 80

camera Smith 225

Task: insert in Product all prodNames from Purchase


Insertion: an Example
INSERT INTO Product(name)

SELECT DISTINCT prodName


FROM Purchase
WHERE prodName NOT IN (SELECT name FROM Product)

name listPrice category

gizmo 100 Gadgets

camera - -
Insertion: an Example
INSERT INTO Product(name, listPrice)

SELECT DISTINCT prodName, price


FROM Purchase
WHERE prodName NOT IN (SELECT name FROM Product)

name listPrice category

gizmo 100 Gadgets

camera 200 -

camera ?? 225 ?? - Depends on the implementation


Deletions
Example:

DELETE FROM PURCHASE

WHERE seller = ‘Joe’ AND


product = ‘Brooklyn Bridge’

Factoid about SQL: there is no way to delete only a single


occurrence of a tuple that appears twice
in a relation.
Updates
Example:

UPDATE PRODUCT
SET price = price/2
WHERE Product.name IN
(SELECT product
FROM Purchase
WHERE Date =‘Oct, 25, 1999’);
NULL values

A NULL value is different from a zero value or a field that


contains spaces. A field with a NULL value is one that has been
left blank during record creation
COALESCE(cus
tomer_name,
'John Doe') .
SELECT
order_id,
product_name,
discount,
CASE
WHEN discount IS
NULL THEN 0
ELSE discount
END
FROM orders;

You might also like