Introduction To Data Science and SQL
Introduction To Data Science and SQL
Introduction
By Sagar Bhilare
M. Sc Artificial Intelligence, BE- Mechanical Engineering
7 Years experience in Data Science mentoring and Product
development
What is Data Science
Data science is the field of study that combines domain expertise, programming
skills, and knowledge of mathematics and statistics to extract meaningful insights
from data.
What is the most important
component of Data Science?
Why do you think organizations need
Data Scientist?
Decision making
Some Decision to make!!
- Should I carry an Umbrella?
- Which color cloth should I wear?
- Which route to choose for office?
- Choosing a gift for someone?
Some More decision to make
How do you?
- Decide whether a patient is ill based on some symptoms
- Decide whether to approve loan to a person based on his borrowing behavior
- Decide a strategy to improvise customer retention
- Decide whether a email is spam or not spam
- Decide and predict if a person may choose to leave a service
- Decide and recommend a best possible product/service or information to a user
Some Examples of Data Driven
Decision making
- OTT Platforms
- E Commerce Websites
- Government Institutions
- Cab and pool services
- Hotel bookings and travel tourism
- Healthcare sector
- Pharmacy sector
- Automotive sector
- Social Media Content
Roles of a Data Science professional
- Database administrator
- Data Scientist
- Machine Learning Engineer
- AI developer
- Tableau Developer
- BI developer
- Deep Learning Engineer
Is there any hype created for Data
Science?
Data Created per day by an
Ecommerce website
Flipkart
- Flipkart sees 1.6 million users per second on Day 1 of festive sale (Economic Times)
- Flipkart gets 10 terabytes of user data each day from browsing, searching, buying
or not buying, as well as behavior and location. This jumps to 50 terabytes on Big
Billion Day sales days.
- What do you think flipkart does with this data?
- Is it Humanly possible to analyze this large chunk of data?
What is Data
Data is a collection of facts, such as numbers, words, measurements, observations or
just descriptions of things.
Types of Constraints
- NOT NULL - Ensures that a column cannot have a NULL value
- UNIQUE - Ensures that all values in a column are different
- PRIMARY KEY - A combination of a NOT NULL and UNIQUE. Uniquely identifies each row in a table
- FOREIGN KEY - Prevents actions that would destroy links between tables
- CHECK - Ensures that the values in a column satisfies a specific condition
- DEFAULT - Sets a default value for a column if no value is specified
NOT NULL
CREATE TABLE teachers (
ID int NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255) NOT NULL,
Age int
);
Select * from student where name like ‘a%’ or name like ‘v%’
Select * from student where name not like ‘a%’ or name like ‘v%’
IN
Select * from student where hometown in (‘Mumbai’, ‘Bangalore’, ‘Pune’)
Select * from student where hometown not in (‘Mumbai’, ‘Bangalore’, ‘Pune’)
BETWEEN
Select * from student Where class between 7 and 10
Sorting the data
We can sort the data in mysql with the help of command order by and keyword
desc
SELECT * FROM
table1 LEFT JOIN table2 ON table1.id = table2.id
UNION
SELECT * FROM
table1 RIGHT JOIN table2 ON table1.id = table2.id
CROSS JOIN
A CROSS JOIN produces a cartesian product between the two
tables,
returning all possible combinations of all rows.
It has no ON clause because you're just joining everything to
everything
SELECT column_name(s)
FROM table1
CROSS JOIN table2;
.
Self Join
SELECT s1.col_name, s2.col_name...
FROM table1 s1, table1 s2
WHERE s1.common_col_name = s2.common_col_name;
Group By
It is used to summarize the data based on some category.
Summarization can happen with aggregation functions such as mean, count, sum
min and max
SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s)
ORDER BY column_name(s);
TASK 2
Fetch the names and ids of the employees who are working in the IT department.
select employee_id, first_name, department_name from employees join
departments on employees.department_id= departments.department_id;
Fetch the first name, last name, job id, job title, minimum salary and maximum
salary of all employees
select first_name, last_name, department_name, min_salary, max_salary from
employees join departments on employees.department_id=
departments.department_idjoin jobs on employees.job_id= jobs.job_id ;
Identify the top 10 cities which have the largest number of employees
select city, count(employee_id) from employees join departments
using(department_id) join locations using(location_id) group by city order by
count(employee_id) desc;
Fetch the employee ids and names of all employees whose last working day in the
organization was 1999-12-31
select employee_id, first_name from employees join job_history
using(employee_id) where end_date= "1999-12-31";
Fetch the employee id , first name, department name and total experience of all
employees who have completed at least 25 years in the organization
select first_name, salary, year(now())- year(hire_date) as Total_Experience from
employees where year(now())- year(hire_date)>25 ;
Fetch the details of top 3 countries where most of the employees are working
select country_name, count(*) from employees join departments
using(department_id) join locations using(location_id) join countries
using(country_id)group by country_name order by count(*) desc limit 3;
Fetch the details of those employees who have completed 10 years in the
organization
select first_name, salary, year(now())- year(hire_date) as Total_Experience from
employees where year(now())- year(hire_date)>10 ;
Fetch the department wise cost to the company
select department_name, sum(salary) as TotalCost from employees join
departments using(department_id) group by department_name order by
sum(salary) desc ;
Fetch the details of employees whose salary is greater than average salary
select first_name, last_name , salary from employees where salary> (select
avg(salary) from employees);
Find the names of all employees whose salary is greater than 60% of their
departments total salary bill.
select first_name, last_name , salary from employees where salary in (select
sum(salary) from employees group by department_id having
sum(salary)>0.6*sum(salary) ) order by department_id ;
Subqueries
A MySQL subquery is a query nested within another query
such as SELECT, INSERT, UPDATE or DELETE.
It is nested inside the where or having clauses.
Database Engine executes the inner query and
uses the result to calculate the result of the outer query.
select country_name from countries where country_id= any( select country_id from
locations where city in ('Tokyo', 'Venice'));
all
It compares the single column value with range of other value such that if all of
subqueries meet the condition.
98
OVER() & PARTITION BY clauses
Example 1:
OVER() SELECT <column list> <AGGR_FUNC> OVER()
FROM <TABLENAME>;
● Indicates that a function will be applied to all the rows
returned by the query
99
OVER() & PARTITION BY clauses
Example 1:
OVER() SELECT <column list> <AGGR_FUNC> OVER()
FROM <TABLENAME>;
● Indicates that a function will be applied to all the rows
returned by the query
10
0
For the windows functions queries, use
the world database and use the table as
“city”
ROW_NUMBER()
row_number() over(partition by
CountryCode) as rownumber
from city;
10
2
RANK()
from city;
10
3
DENSE_RANK()
● When partition changes, the next rank is the next select ID,Name,CountryCode,District,
number after the previous rank Population,
from city;
10
4
LAG()
select ID,Name,CountryCode,District,
Population,
lag(Population) over(partition by
CountryCode) as lag_Population
from city;
10
5
LEAD()
select ID,Name,CountryCode,District,
Population,
lead(Population) over(partition by
CountryCode order by ID ) as
lead_Population
from city;
10
6
MIN(), MAX(), SUM(), AVG()
● Display the minimum, maximum, total and average select ID, Name, CountryCode,Population,
value of a numeric column for every group
max(Population) over(partition by
CountryCode) as maxPopulation,
min(Population) over(partition by
CountryCode) as minPopulation,
sum(Population) over(partition by
CountryCode) as sumPopulation,
avg(Population) over(partition by
CountryCode) as avgPopulation
from city;
10
7
Stored Procedure
MySQL stored procedures are pre-compiled SQL statements stored in a database,
so the code can be reused over and over again.
The stored procedure is SQL statements wrapped within the CREATE PROCEDURE
statement
A set of pre-compiled SQL statements
Every procedure has a unique name, input and output parameters, and SQL
statements
Can be called multiple times from different applications (front-end, back-end
and/or from other stored procedures / triggers)
Has various uses – like data analysis and impute, add / change / delete / select data
from table(s)
Executed manually by using the procedure name from application with the relevant
parameters
Some databases support recursion
Create Procedure Procedure_Name( )
Begin
SQL Queries..
End
The SQL Queries and code must be written between BEGIN and END keywords.
To create the procedure
STATEMENT LEVEL
o Executed only once for any I/U/D statement
o If no rows are affected, trigger is not fired
CREATE TRIGGER trigger_name
(AFTER | BEFORE) (INSERT | UPDATE | DELETE)
ON table_name FOR EACH ROW
BEGIN
--variable declarations
--trigger code
END;
To create a trigger in MySQL
We will create an AFTER UPDATE trigger that will create a log entry each time a
record in the table ‘product’ gets updated
For this, we will also create a new log table