Basic SQL and Etl
Basic SQL and Etl
Primary Key can't accept null values. Foreign key can accept multiple null value.
We can have only one Primary key in a table. We can have more than one foreign key in a table.
By default, Primary key is clustered index and Foreign key do not automatically create an index,
data in the database table is physically organized clustered or non-clustered. You can manually create
in the sequence of clustered index. an index on foreign key.
CHAR VARCHAR
used to store character string value of fixed Used to store alphanumeric data of variable
length length.
The length varies from 0 to 255. The length varies from 0 to 65,535.
Uses static memory allocation. Uses dynamic memory allocation.
Char should be used when the length of the Varchar should be used only when the length of
variable is known. the variable is not known
Varchar supports up to 8000 characters in the field Nvarchar only supports up to 4000 characters
definition
Having where
HAVING clause can only be used with SELECT
query. Means if you want perform INSERT, UPDATE We can use WHERE clause with SELECT,
and DELETE clause it will retuns an error. INSERT, UPDATE and DELETE clause
Example "Update Mas_Employee Set Salary = 1500 Example it works fine for "Update
Having Id =1" Query will be generated error like Mas_Employee Set Salary = 1500 WHERE Id
"Incorrect syntax near the keyword 'HAVING'. ". =1".
WHERE clause is used for filtering rows and it
HAVING clause is used to filter groups in SQL. applies on each and every row.
WHERE clause is used before GROUP BY
HAVING clause is used after GROUP BY clause. clause.
We can't use aggregate functions in the where
clause unless it is in a sub query contained in a
We can use aggregate function in HAVING clause. HAVING clause.
AVg,MIN,MAX,COUNT,SUM,
Examples:
1.
1 SELECT
2 categoryid, SUM(unitsinstock)
3 FROM
4 products
5 GROUP BY categoryid;
2.
SELECT
categoryid, AVG(unitsinstock)
3
FROM
4
products
5
GROUP BY categoryid;
3.
SELECT
COUNT(*)
FROM
products;
4.
SELECT
MAX(unitsinstock)
FROM
products;
Scalar Functions:
rand(10): This will generate random number of 10 characters.
upper('dotnet')
round(17.56719,3): This will round off the given number to 3 places of decimal means 17.567
lower('DOTNET')
ltrim(' dotnet')
convert(int, 15.56): This will convert the given float value to integer means 15.
Ananlytic Functions:
Lead
Lag
JOINS:
INNER JOIN ,LEFT JOIN ,RIGHT JOIN,CARTESION PRODUCT,OUTER JOIN
EMP
Id name FN LN deptid
1 ram abc a 1
2 sham xyz b 1
3 chai hdy s 2
4 poo sud p 3
5 tanu dhjc d 3
6 vee dbhh y 3
7 bari jcjc l 1
8 sita hdh n NULL
DEPT
deptnam
Id e
1 IT
2 EC
3 mech
4 civil
5 bio
6 chem
INNER JOIN:
The join that displays only the rows that have a match in both the joined tables is known as inner
join.
select * from emp a inner join dept b
on a.deptid=b.id
deptnam
Id name FN LN deptid id e
1 ram abc a 1 1 IT
2 sham xyz b 1 1 IT
3 chai hdy s 2 2 EC
4 poo sud p 3 3 mech
5 tanu dhjc d 3 3 mech
6 vee dbhh y 3 3 mech
7 bari jcjc l 1 1 IT
LEFT JOIN:
Left join displays all the rows from first table and matched rows from second table like that.
select * from emp a left join dept b
on a.deptid=b.id
deptnam
Id name FN LN deptid id e
1 ram abc a 1 1 IT
2 sham xyz b 1 1 IT
3 chai hdy s 2 2 EC
4 poo sud p 3 3 mech
5 tanu dhjc d 3 3 mech
6 vee dbhh y 3 3 mech
7 bari jcjc l 1 1 IT
8 sita hdh n NULL NULL NULL
Right Join:
Right outer join displays all the rows of second table and matched rows from first table like that.
deptnam
Id name FN LN deptid id e
1 ram abc a 1 1 IT
2 sham xyz b 1 1 IT
7 bari jcjc l 1 1 IT
3 chai hdy s 2 2 EC
4 poo sud p 3 3 mech
5 tanu dhjc d 3 3 mech
6 vee dbhh y 3 3 mech
NULL NULL NULL NULL NULL 4 civil
NULL NULL NULL NULL NULL 5 bio
NULL NULL NULL NULL NULL 6 chem
Full outer join returns all the rows from both tables whether it has been matched or not.
select * from emp a FULL OUTER JOIN dept b
on a.deptid=b.id
deptnam
Id name FN LN deptid id e
1 ram abc a 1 1 IT
2 sham xyz b 1 1 IT
3 chai hdy s 2 2 EC
4 poo sud p 3 3 mech
5 tanu dhjc d 3 3 mech
6 vee dbhh y 3 3 mech
7 bari jcjc l 1 1 IT
8 sita hdh n NULL NULL NULL
NULL NULL NULL NULL NULL 4 civil
NULL NULL NULL NULL NULL 5 bio
NULL NULL NULL NULL NULL 6 chem
CROSS join:
A cross join that produces Cartesian product of the tables that are involved in the join. The size of a
Cartesian product is the number of the rows in the first table multiplied by the number of rows in the
second table like this.
select * from emp a cross join dept b
deptnam
Id name FN LN deptid id e
1 ram abc a 1 1 IT
2 sham xyz b 1 1 IT
3 chai hdy s 2 1 IT
4 poo sud p 3 1 IT
5 tanu dhjc d 3 1 IT
6 vee dbhh y 3 1 IT
7 bari jcjc l 1 1 IT
8 sita hdh n NULL 1 IT
1 ram abc a 1 2 EC
2 sham xyz b 1 2 EC
3 chai hdy s 2 2 EC
4 poo sud p 3 2 EC
5 tanu dhjc d 3 2 EC
6 vee dbhh y 3 2 EC
7 bari jcjc l 1 2 EC
8 sita hdh n NULL 2 EC
1 ram abc a 1 3 mech
2 sham xyz b 1 3 mech
3 chai hdy s 2 3 mech
4 poo sud p 3 3 mech
5 tanu dhjc d 3 3 mech
6 vee dbhh y 3 3 mech
7 bari jcjc l 1 3 mech
8 sita hdh n NULL 3 mech
1 ram abc a 1 4 civil
2 sham xyz b 1 4 civil
3 chai hdy s 2 4 civil
4 poo sud p 3 4 civil
5 tanu dhjc d 3 4 civil
6 vee dbhh y 3 4 civil
7 bari jcjc l 1 4 civil
8 sita hdh n NULL 4 civil
1 ram abc a 1 5 bio
2 sham xyz b 1 5 bio
3 chai hdy s 2 5 bio
4 poo sud p 3 5 bio
5 tanu dhjc d 3 5 bio
6 vee dbhh y 3 5 bio
7 bari jcjc l 1 5 bio
8 sita hdh n NULL 5 bio
1 ram abc a 1 6 chem
2 sham xyz b 1 6 chem
3 chai hdy s 2 6 chem
4 poo sud p 3 6 chem
5 tanu dhjc d 3 6 chem
6 vee dbhh y 3 6 chem
7 bari jcjc l 1 6 chem
8 sita hdh n NULL 6 chem
Self join:
Joining the table itself called self join. Self join is used to retrieve the records having some relation or
similarity with other records in the same table. Here, we need to use aliases for the same table to set
a self join between single table and retrieve records satisfying the condition in where clause.
CTE Function:
COALESCE:
Returns the first non null value.
Middlenam
Id Firstname e Lastname
1 sam null Null
2 null todd tanzan
3 null null Sara
4 ben parker Null
5 james nick nancy
Output
Id Name
1 Sam
2 Todd
3 Sara
4 Ben
5 James
FirstNam
e Surname PetName
Prasad Null Null
Raju Null Null
Null Kulkarni Null
Null Shinde Null
Null Null Cherry
Name
Prasad
Raju
Kulkarni
Shinde
Only for 2 columns it can be used, if you try to give Petname then an error is thrown.
So for using multiple columns use Coalesce function.
Select coalesce (Firstname, Surname, Petname) as Name from Person
Output:
Name
Prasad
Raju
Kulkarni
Shinde
Cherry
This function allows you to replace a NULL value with another Value.
It is helpful, when there is no data for a particular column but you want to display something else.
Example 1:
A common concept in database tables is to store a start and end date for a record, which is used to track
when a record is effective for
In some cases, the end date is set to NULL if there is no end date or if it is the active record.
Prosp
1 ABC 1-Jan-17 10-Jan-17
ect
In
4 WXY Progre 26-Feb-17 (null)
ss
In the above table we don’t want to display a null or empty value, so we have to use a NVL function.
We first need to work out what value we want to display. This could be a static date (i.e., 31-DEC-9999).
Query:
Select CUSTOMER_ID,
CUSTOMER_NAME,
STATUS,
START_DATE,
from customer_history;
Result:
Prospec
1 ABC 1-Jan-17 10-Jan-17
t
In
4 WXY Progres 26-Feb-17 31-Dec-99
s
So the NVL function can be used to translate a NULL date value to something else.
Example 2:
If there is a missing data that should be populated, if you are loading a data from one table into another
table and cannot use NULL values for some reason.
In the above table status column cannot be null. And the business rule specifies that we need a value for
status, but we can’t update the table.
Query:
Select CUSTOMER_ID,
CUSTOMER_NAME,
START_DATE,
END_DATE
From customer_history;
Result:
CUSTOMER CUSTOMER_NA STAT START_DA END_DA
_ID ME US TE TE
Prospec
1 ABC 1-Jan-17 10-Jan-17
t
In
4 WXY Progres 26-Feb-17 (null)
s
Unkno
6 FED 4-Feb-17 15-Feb-17
wn
This now satisfies the business rule that says the status column cannot be null in your new table.
SQL CONSTRAINTS:
Constraints are used to set the rules for all records in the table. If any constraints get violated then it can
abort the action that caused it.
Constraints are defined while creating the database itself with CREATE TABLE statement or even after
the table is created once with ALTER TABLE statement.
NOT NULL: That indicates that the column must have some value and cannot be left null.
CREATE TABLE Persons (
ID int NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255) NOT NULL,
Age int
);
UNIQUE: This constraint is used to ensure that each row and column has unique value and no value is
being repeated in any other row or column.
CREATE TABLE Persons (
ID int NOT NULL UNIQUE,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Age int
);
PRIMARY KEY: This constraint is used in association with NOT NULL and UNIQUE constraints such as on
one or the combination of more than one columns to identify the particular record with a unique
identity.
CREATE TABLE Persons (
ID int NOT NULL PRIMARY KEY,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Age int
);
FOREIGN KEY: It is used to ensure the referential integrity of data in the table and also matches the
value in one table with another using Primary Key.
CREATE TABLE Orders (
OrderID int NOT NULL PRIMARY KEY,
OrderNumber int NOT NULL,
PersonID int FOREIGN KEY REFERENCES Persons(PersonID)
);
CHECK: It is used to ensure whether the value in columns fulfills the specified condition
CREATE TABLE Persons (
ID int NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Age int CHECK (Age>=18)
);
RANK:
customer
Syntax:
rank() over (partition by order by clause)
Example 1:
Example 2:
select * ,rank() over (partition by Occupation order by income desc) as rank from
customer
FN LN Education Occupation Income rank
partial high
Christy mehta school clerical 50000 1
partial high
Rob yang school clerical 45000 2
Managemen
Rob johnson Bachelor t 80000 1
managemen
John miller masters degree t 80000 1
managemen
Christy carlson graduate degree t 70000 3
John yang Bachelor Professional 90000 1
Christy zhu Bachelor Professional 80000 2
John ruiz Bachelor Professional 70000 3
skilled
Rob hung high school manual 60000 1
skilled
Ruben thores partial college manual 50000 2
Dense Rank:It will assign the rank number to each record present in a partition without
skipping the rank numbers
Row_Number: It will assign the sequential rank number to each unique record present in a
partition.
Using Dense Rank 2nd ,3rd,4th,5th , Using Top keyword in sql server 2nd ,
6th highest salary 3rd,4th,5th ,6th highest salary Using Not In for 2nd highest salary
select * from select top 1 salary from SELECT MAX(salary) FROM emp
(select *,dense_rank() over (order by (select distinct top 2 salary from emp WHERE salary NOT IN (SELECT MAX(salary) FROM
salary desc) as rk from emp) as a order by salary desc) as temp emp )
where rk=2 order by salary
This allows to access data from a subsequent row without using any self join.
Syntax:
FN LN Sales
rob ab 24.99
sham xy 59.43
sita fd 29.59
FN LN Sales RESULT
rob ab 24.99 59.43
sham xy 59.43 29.59
sita fd 29.59 NULL
Lag:
FN LN Sales RESULT
rob ab 24.99 NULL
sham xy 59.43 24.99
sita fd 29.59 59.43
Used to create a temporary table that will only exist for duration of a query.
They are used to create a temporary table whose content you can reference
in order to simplify the query structure.
Example:
WITH CTE As
(select prodid,proddesc,price from products where price>20)
Output:
proddes
prodid c price
2 Butter 30
3 Milk 47
Is every foreign key value must have corresponding primary key value.
STRING FUNCTIONS:
Charindex(expressiontofind,expressiontosearch,start location)
Output:5
SUBSTRING:
SUBSTRING(expression,start,length)
Select name,substring(name,3,2) as sub
Substring(‘abcdef’,2,3)
Reverse(String)
REPLACE:
REPLACE(‘abcdefghij’,’cde’,’xxx’;
Op:abxxxfghij
LEN:
LEN(STRING)
SELECT LEN(‘Technosoft.com’)
13
ETL
Derivation:Applying rules to your data that derive new calculated values from
existing data.
Joining:
Make sure that ETL application reports invalid data and replaces with default
values.
ETL BUGS:
Calculation.
User interface
Load Condition
Source
Structure Validation
Data completeness
Data quality
Null validation
Duplicate
CTE FUNCTION----Difficult
A Common Table Expression (CTE) is the result set of a query which exists
temporarily and for use only within the context of a larger query. Much like a
derived table, the result of a CTE is not stored and exists only for the duration of the
query.
Example:
AS
(SELECT NationalIDNumber,
JobTitle
FROM HumanResources.Employee)
SELECT EmployeeNumber,
Title
FROM Employee_CTE
Or
The SQL Server CTE, also called Common Table Expressions is used to
generate a temporary named set (like a temporary table) that exists for the
duration of a query. We can define this CTE within the execution scope of a
single SELECT, INSERT, DELETE, or UPDATE statement.
From the below figure you can observe that [Employee table] table have
fourteen records
And [Department] table have eight records.
SQL CTE Example
In this simple example, we will show you, How to write a simple CTE in SQL
Server.
OUTPUT