Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
21 views

SQL for Data Analysis.pdf

This document serves as a beginner's guide to SQL for data analytics, explaining its importance in managing relational databases and facilitating data retrieval, transformation, and analysis. It covers fundamental SQL concepts, commands for database manipulation, aggregate functions, and the significance of data types and constraints. The guide emphasizes the necessity of mastering SQL for effective data analytics and encourages continuous practice for proficiency.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

SQL for Data Analysis.pdf

This document serves as a beginner's guide to SQL for data analytics, explaining its importance in managing relational databases and facilitating data retrieval, transformation, and analysis. It covers fundamental SQL concepts, commands for database manipulation, aggregate functions, and the significance of data types and constraints. The guide emphasizes the necessity of mastering SQL for effective data analytics and encourages continuous practice for proficiency.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

By Parvej Shaikh

SQL for Data Analytics – A Beginner's Guide

Introduction to SQL:
• What is SQL?
SQL (Structured Query Language) is a programming
language used to manage and manipulate relational databases. It
allows users to store, retrieve, update and delete data efficiently.
SQL is widely used in data analytics, business intelligence, and data
science.
• Why is SQL Crucial for Data Analytics?
SQL is essential for data analytics because:
o It allows efficient data retrieval from large databases.
o It enables data transformation and cleaning for analysis.
o It supports aggregations and calculations for generating
insights.
o It helps in joining multiple datasets to uncover
relationships.
o It integrates with BI tools like Power BI & Tableau for
reporting.
How SQL Helps in Data Analytics
➢ Data Extraction – Query large datasets efficiently.
➢ Data Cleaning – Remove duplicates, filter values, and
standardize data.
➢ Data Aggregation – Summarize data with SUM(),
AVG(), etc.
➢ Data Relationships – Use joins to merge multiple tables.
➢ Trend Analysis – Apply window functions to track
changes over time.
Database Concepts
• Tables: Collections of related data stored in rows and columns.
• Rows (Records): Individual data entries in a table.
• Columns (Fields): Specific attributes of the data.
Role of SQL in Querying Databases
SQL is used to interact with databases in three main ways:
1️ Retrieving Data:
➢ SQL helps fetch relevant data using SELECT queries.
• Basic SQL Syntax
SELECT column_name(s) FROM table_name WHERE
condition;
➢ SELECT – Choose columns to retrieve.
➢ FROM – Specify the table.
➢ WHERE – Filter records based on conditions.

1. Create database
Code: create database sample2
2. Use the database
Code: use sample2

3. Create table
Code: create table customer
(
customerid int identity(1,1) primary key,
customernumber int not null unique check (customernumber>0),
lastname varchar(30) not null,
firstname varchar(30) not null,
areacode int default 71000,
address varchar(50),
country varchar(50) default 'Malaysia'
)

4. Insert values into table


Code: insert into customer values
(100,'Fang Ying','Sham','418999','sdadasfdfd',default),
(200,'Mei Mei','Tan',default,'adssdsadsd','Thailand'),
(300,'Albert','John',default,'dfdsfsdf',default)

5. Display record from table


Code: -- display all records
select * from customer
-- display particular columns
select customerid, customernumber, lastname, firstname
from customer

6. Add new column to table


Code: alter table customer
add phonenumber varchar(20)

7. Add values to newly added column/ Update table


Code: update customer set phonenumber='1234545346' where
customerid=1
update customer set phonenumber='45554654' where
customerid=2

8. Delete a column
Code: alter table customer
drop column phonenumber

9. Delete record from table --if not put ‘where’, will delete all record
Code: delete
from customer
where country='Thailand'
10. Delete table
Code: drop table customer

11. Change data type


Code: alter table customer
alter column phonenumber varchar(1️0)

SQL Functions and Clauses


➢ Aggregate Functions in SQL
Aggregate functions perform calculations on a set of values and return a
single value. These functions are commonly used in data analysis to
summarize and extract insights from large datasets.
SUM (), AVG (), COUNT (), MIN (), and MAX () are commonly used
for analytics. GROUP BY organizes results by category (e.g.,
department-wise salary).
➢ Grouping Data with Aggregate Functions
To find the total salary per department:
SELECT department, SUM(salary) AS total_salary FROM employees
GROUP BY department;
➢ Filtering and Sorting Data in SQL
When working with large datasets, filtering and sorting help refine and
organize the data for better analysis.
➢ 1️ Filtering Data with the WHERE Clause
The WHERE clause is used to filter records based on conditions.
Using Comparison Operators in WHERE
Comparison operators (=, >, <, AND, OR) refine data selection.
Operator Description Example
= Equals WHERE department = 'IT'
!= or <> Not equal WHERE department <> 'HR'
> Greater than WHERE salary > 60000
< Less than WHERE age < 40
>= Greater than or equal to WHERE salary >= 75000
<= Less than or equal to WHERE age <= 30

GROUP BY & HAVING in SQL


When working with large datasets, we often need to group data by
categories and apply aggregate functions like SUM(), COUNT(),
AVG(), etc.
The GROUP BY clause helps achieve this by summarizing data for each
unique group. Additionally, HAVING is used to filter grouped results
based on aggregate values, similar to how WHERE filters individual
rows.
1 GROUP BY – Grouping Data by a Specific Field
The GROUP BY clause is used with aggregate functions to group data
based on one or more columns.
2 HAVING – Filtering Grouped Data
The HAVING clause filters groups after aggregation, whereas WHERE
filters rows before grouping.
Find departments where total salary exceeds 130,000:
SELECT department, SUM(salary) AS total_salary FROM employees
GROUP BY department HAVING
SUM(salary) > 130000;

Data Manipulation
JOINS: Combining Data from Multiple Tables
JOIN is used to combine rows from two or more tables based on a
related column. This helps in fetching meaningful insights from
multiple datasets.
Types of SQL JOINs
JOIN Type Description
INNER JOIN Returns only matching rows from both tables.
LEFT JOIN Returns all rows from the left table and matching
rows from the right table.
RIGHT JOIN Returns all rows from the right table and matching
rows from the left table.
FULL OUTER Returns all rows when there is a match in either left
or right table.

SQL Data Types and Constraints


1️ SQL Data Types
When creating tables in SQL, defining the correct data type for each
column is important because:
✔ It ensures data integrity (prevents invalid data).
✔ It optimizes storage and performance.
✔ It avoids unexpected errors in queries.
Common SQL Data Types
Data Type Description Example
INT Stores whole numbers 100, -50
DECIMAL(p, s) Stores precise decimal DECIMAL(10,2) →
values 99999999.99
VARCHAR(n) Variable-length text (up to 'John', 'Hello'
n characters)
CHAR(n) Fixed-length text (always n 'A', 'Yes'
characters)
TEXT Large text data 'This is a long
paragraph.'
DATE Stores date (YYYY-MM- 2024-07-20
DD)
DATETIME Stores date and time 2024-07-20 14:30:00
BOOLEAN Stores TRUE or FALSE (1 1 (TRUE), 0 (FALSE)
or 0)
Choosing the right data type helps reduce storage size and improves
performance!
2 SQL Constraints
SQL constraints enforce rules on the data in a table to ensure accuracy
and reliability.
Constraint Description Example Usage
PRIMARY Ensures a column has id INT PRIMARY KEY
KEY unique, non-null values.
Identifies each row uniquely.
FOREIGN Ensures referential integrity FOREIGN KEY (dept_id)
KEY by linking two tables. REFERENCES
departments(id)
NOT NULL Prevents a column from name VARCHAR(50)
having NULL values. NOT NULL
UNIQUE Ensures all values in a email VARCHAR(100)
column are different. UNIQUE
CHECK Ensures values meet a salary DECIMAL(10,2)
specific condition. CHECK
(salary > 0)
DEFAULT Assigns a default value if no status VARCHAR(10)
value is provided. DEFAULT
'Active'
Why Are Data Types & Constraints Important?
Prevents invalid data entry (e.g., non-numeric values in INT
columns).
Ensures relationships between tables (FOREIGN KEY).
Improves database performance by optimizing storage.
Prevents duplicate or missing values (UNIQUE, NOT NULL).
Conclusion
• SQL is essential for data extraction and transformation.
• Mastering queries, joins, and optimizations can improve data
analytics efficiency.
• Continuous practice and real-world applications enhance SQL
proficiency.

You might also like