Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
15 views

SQL -Data Cleaning

Sql question

Uploaded by

559aryan.ar3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

SQL -Data Cleaning

Sql question

Uploaded by

559aryan.ar3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Data

Engineering
Data Cleaning
using SQL

Seekho Bigdata Institute


https://www.seekhobigdata.com/
1. Removing Duplicates

Purpose:
To delete duplicate rows from a table and retain unique records only.

Syntax:

DELETE FROM table_name


WHERE id NOT IN (
SELECT MIN(id)
FROM table_name
GROUP BY column1, column2, ...
);

Explanation:

This query identifies duplicates by grouping records based on


certain columns (like column1, column2) and keeps the row with the
minimum id value. All other duplicates are deleted.

https://www.seekhobigdata.com/
2. Handling NULL Values

Purpose:
To replace or fill NULL values in columns to ensure consistency in data.

Syntax:

UPDATE table_name
SET column_name = default_value
WHERE column_name IS NULL;

Explanation:

This replaces NULL values in a specific column with a specified


default_value, making the dataset more complete.

https://www.seekhobigdata.com/
3. Trimming Whitespaces

Purpose:
To remove extra spaces from text data that can cause inconsistencies
during analysis.

Syntax:

UPDATE table_name
SET column_name = TRIM(column_name);

Explanation:

The TRIM function removes spaces from both ends of the text data
in column_name.
This ensures data cleanliness by eliminating unwanted spaces.

https://www.seekhobigdata.com/
4. Standardizing Data (e.g., Case Consistency)

Purpose:
To maintain uniformity in data entries (e.g., all text in uppercase or
lowercase).

Syntax:

UPDATE table_name
SET column_name = UPPER(column_name);

Explanation:

Using UPPER or LOWER, you can make all text data in a column
consistent in case (e.g., all uppercase).
This helps avoid case-sensitive issues.

https://www.seekhobigdata.com/
5. Filtering Out Unwanted Data (e.g., Rows with Invalid
Data)
Purpose:
To delete rows that don’t meet specific quality criteria.

Syntax:

DELETE FROM table_name


WHERE column_name = 'invalid_value';

Explanation:

Removes rows where data doesn’t meet specified conditions (like


invalid_value), ensuring that only high-quality data remains.

https://www.seekhobigdata.com/
6. Converting Data Types

Purpose:
To ensure columns have the correct data type (e.g., converting text
dates to DATE type).

Syntax:

ALTER TABLE table_name


ALTER COLUMN column_name TYPE new_data_type USING
expression;

Explanation:

Converting data types makes it easier to work with data


consistently.
USING expression can handle any necessary transformations (e.g.,
converting string format dates to DATE).

https://www.seekhobigdata.com/
7. Removing Outliers (Advanced)

Purpose:
To remove extreme values that can skew data analysis.

Syntax:

DELETE FROM table_name


WHERE column_name < lower_limit OR column_name >
upper_limit;

Explanation:

Specify a lower_limit and upper_limit for acceptable values in a


column.
This keeps only the data within this range, removing outliers.

https://www.seekhobigdata.com/
Thank you

Seekho Bigdata Institute


https://www.seekhobigdata.com/
+91 99894 54737

You might also like