Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
32 views

EDA Code SQL

This document describes analyzing medical inventory data through exploratory data analysis using SQL. It calculates measures of central tendency, dispersion, skewness, and kurtosis on the original and cleaned data tables to understand the data distribution for business decisions.

Uploaded by

Prathmesh Patil
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

EDA Code SQL

This document describes analyzing medical inventory data through exploratory data analysis using SQL. It calculates measures of central tendency, dispersion, skewness, and kurtosis on the original and cleaned data tables to understand the data distribution for business decisions.

Uploaded by

Prathmesh Patil
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Medical inventory optimization

Exploratory Data Analysis (SQL) by Prathmesh Patil


Software: MySQL Workbench

Business decisions based on the unclean ‘projectfinaldata’ table

1. Set the current database to "med_inventory".

USE med_inventory;

2. Displaying the table.

SELECT * FROM projectfinaldata LIMIT 20;

Output:

3. Calculating the first moment (measures of central tendency such as mean, median, mode) for
the dataset.

Mean:

SELECT

ROUND(AVG(Quantity), 2) AS mean_quantity,

ROUND(AVG(ReturnQuantity), 2) AS mean_return_quantity,

ROUND(AVG(Final_Cost), 2) AS mean_final_cost,

ROUND(AVG(Final_Sales), 2) AS mean_final_sales,

ROUND(AVG(RtnMRP), 2) AS mean_rtnmrp FROM projectfinaldata;

Output:
Median:

SELECT

ROUND(AVG(Final_Cost), 2) AS median_final_cost,

ROUND(AVG(Final_Sales), 2) AS median_final_sales,

ROUND(AVG(Quantity), 2) AS median_quantity,

ROUND(AVG(ReturnQuantity), 2) AS median_return_quantity,

ROUND(AVG(RtnMRP), 2) AS median_rtnmrp

FROM (
SELECT Final_Cost, Final_Sales, Quantity, ReturnQuantity, RtnMRP,

ROW_NUMBER() OVER (ORDER BY Final_Cost) AS row_num,

COUNT(*) OVER () AS total_rows FROM projectfinaldata

) AS subquery
WHERE row_num IN (FLOOR((total_rows + 1) / 2), CEILING((total_rows + 1) / 2));

Mode:

SELECT

mode_quantity.mode_value AS mode_quantity,

mode_return_quantity.mode_value AS mode_return_quantity,

mode_final_cost.mode_value AS mode_final_cost,

mode_final_sales.mode_value AS mode_final_sales,

mode_rtnmrp.mode_value AS mode_rtnmrp

FROM (

SELECT Quantity AS mode_value, COUNT(*) AS mode_count

FROM projectfinaldata

GROUP BY Quantity

ORDER BY COUNT(*) DESC

LIMIT 1
) AS mode_quantity,

SELECT ReturnQuantity AS mode_value, COUNT(*) AS mode_count

FROM projectfinaldata

GROUP BY ReturnQuantity

ORDER BY COUNT(*) DESC

LIMIT 1

) AS mode_return_quantity,

SELECT Final_Cost AS mode_value, COUNT(*) AS mode_count

FROM projectfinaldata

GROUP BY Final_Cost

ORDER BY COUNT(*) DESC

LIMIT 1

) AS mode_final_cost,

SELECT Final_Sales AS mode_value, COUNT(*) AS mode_count

FROM projectfinaldata

GROUP BY Final_Sales

ORDER BY COUNT(*) DESC

LIMIT 1

) AS mode_final_sales,

SELECT RtnMRP AS mode_value, COUNT(*) AS mode_count

FROM projectfinaldata

GROUP BY RtnMRP

ORDER BY COUNT(*) DESC

LIMIT 1

) AS mode_rtnmrp;

Output:
4. Calculating the second moment (measures of dispersion such as variance, standard deviation,
range) for the dataset.

Variance:

SELECT

ROUND(VARIANCE(Quantity), 2) AS variance_quantity,

ROUND(VARIANCE(ReturnQuantity), 2) AS variance_return_quantity,

ROUND(VARIANCE(Final_Cost), 2) AS variance_final_cost,

ROUND(VARIANCE(Final_Sales), 2) AS variance_final_sales,

ROUND(VARIANCE(RtnMRP), 2) AS variance_rtnmrp

FROM projectfinaldata;

Output:

Standard Deviation:

SELECT

ROUND(STDDEV(Quantity), 2) AS stddev_quantity,

ROUND(STDDEV(ReturnQuantity), 2) AS stddev_return_quantity,

ROUND(STDDEV(Final_Cost), 2) AS stddev_final_cost,

ROUND(STDDEV(Final_Sales), 2) AS stddev_final_sales,

ROUND(STDDEV(RtnMRP), 2) AS stddev_rtnmrp

FROM projectfinaldata;

Output:
Range:

SELECT

MAX(Quantity) - MIN(Quantity) AS range_quantity,

MAX(ReturnQuantity) - MIN(ReturnQuantity) AS range_return_quantity,

MAX(Final_Cost) - MIN(Final_Cost) AS range_final_cost,

MAX(Final_Sales) - MIN(Final_Sales) AS range_final_sales,

MAX(RtnMRP) - MIN(RtnMRP) AS range_rtnmrp

FROM projectfinaldata;

Output:

5. Calculating the third moment (skewness) for the dataset.

Skewness:

SELECT 'Quantity' AS column_name,

ROUND((SUM(POW(Quantity - (SELECT AVG(Quantity) FROM projectfinaldata), 3)) / (COUNT(*) *


POW(STDDEV(Quantity), 3))), 2) AS skewness_value

FROM projectfinaldata

UNION ALL

SELECT 'ReturnQuantity' AS column_name,


ROUND((SUM(POW(ReturnQuantity - (SELECT AVG(ReturnQuantity) FROM projectfinaldata), 3)) /
(COUNT(*) * POW(STDDEV(ReturnQuantity), 3))), 2) AS skewness_value

FROM projectfinaldata

UNION ALL

SELECT 'Final_Cost' AS column_name,

ROUND((SUM(POW(Final_Cost - (SELECT AVG(Final_Cost) FROM projectfinaldata), 3)) / (COUNT(*) *


POW(STDDEV(Final_Cost), 3))), 2) AS skewness_value

FROM projectfinaldata

UNION ALL

SELECT 'Final_Sales' AS column_name,


ROUND((SUM(POW(Final_Sales - (SELECT AVG(Final_Sales) FROM projectfinaldata), 3)) / (COUNT(*) *
POW(STDDEV(Final_Sales), 3))), 2) AS skewness_value

FROM projectfinaldata

UNION ALL

SELECT 'RtnMRP' AS column_name,


ROUND((SUM(POW(RtnMRP - (SELECT AVG(RtnMRP) FROM projectfinaldata), 3)) / (COUNT(*) *
POW(STDDEV(RtnMRP), 3))), 2) AS skewness_value

FROM projectfinaldata;

Output:

6. Calculating the fourth moment (kurtosis) for the dataset.

Kurtosis:

SELECT

ROUND((SUM(POWER(Quantity - avg_value, 4)) / (COUNT(Quantity) * POWER(STDDEV(Quantity),


4))), 2) AS kurtosis_quantity,

ROUND((SUM(POWER(ReturnQuantity - avg_value, 4)) / (COUNT(ReturnQuantity) *


POWER(STDDEV(ReturnQuantity), 4))), 2) AS kurtosis_return_quantity,

ROUND((SUM(POWER(Final_Cost - avg_value, 4)) / (COUNT(Final_Cost) *


POWER(STDDEV(Final_Cost), 4))), 2) AS kurtosis_final_cost,

ROUND((SUM(POWER(Final_Sales - avg_value, 4)) / (COUNT(Final_Sales) *


POWER(STDDEV(Final_Sales), 4))), 2) AS kurtosis_final_sales,

ROUND((SUM(POWER(RtnMRP - avg_value, 4)) / (COUNT(RtnMRP) * POWER(STDDEV(RtnMRP), 4))),


2) AS kurtosis_rtnmrp

FROM

(SELECT

AVG(Quantity) AS avg_value,

STDDEV(Quantity) AS stddev_value,

COUNT(Quantity) AS count_value
FROM projectfinaldata) AS subquery, projectfinaldata;

Output:

Business decisions based on ‘cleaned_table’ with pre-processed data

1. Calculating the first moment (measures of central tendency such as mean, median, mode) for
the dataset.

Mean:

SELECT

ROUND(AVG(Quantity), 2) AS mean_quantity,

ROUND(AVG(ReturnQuantity), 2) AS mean_return_quantity,

ROUND(AVG(Final_Cost), 2) AS mean_final_cost,

ROUND(AVG(Final_Sales), 2) AS mean_final_sales,

ROUND(AVG(RtnMRP), 2) AS mean_rtnmrp FROM cleaned_table;


Output:

Median:

SELECT

ROUND(AVG(Final_Cost), 2) AS median_final_cost,

ROUND(AVG(Final_Sales), 2) AS median_final_sales,

ROUND(AVG(Quantity), 2) AS median_quantity,

ROUND(AVG(ReturnQuantity), 2) AS median_return_quantity,

ROUND(AVG(RtnMRP), 2) AS median_rtnmrp

FROM (
SELECT Final_Cost, Final_Sales, Quantity, ReturnQuantity, RtnMRP,

ROW_NUMBER() OVER (ORDER BY Final_Cost) AS row_num,

COUNT(*) OVER () AS total_rows FROM cleaned_table


) AS subquery

WHERE row_num IN (FLOOR((total_rows + 1) / 2), CEILING((total_rows + 1) / 2));

Mode:

SELECT

mode_quantity.mode_value AS mode_quantity,

mode_return_quantity.mode_value AS mode_return_quantity,

mode_final_cost.mode_value AS mode_final_cost,

mode_final_sales.mode_value AS mode_final_sales,

mode_rtnmrp.mode_value AS mode_rtnmrp

FROM (

SELECT Quantity AS mode_value, COUNT(*) AS mode_count

FROM cleaned_table

GROUP BY Quantity

ORDER BY COUNT(*) DESC

LIMIT 1

) AS mode_quantity,

SELECT ReturnQuantity AS mode_value, COUNT(*) AS mode_count

FROM cleaned_table

GROUP BY ReturnQuantity

ORDER BY COUNT(*) DESC

LIMIT 1

) AS mode_return_quantity,

SELECT Final_Cost AS mode_value, COUNT(*) AS mode_count

FROM cleaned_table

GROUP BY Final_Cost
ORDER BY COUNT(*) DESC

LIMIT 1

) AS mode_final_cost,

SELECT Final_Sales AS mode_value, COUNT(*) AS mode_count

FROM cleaned_table

GROUP BY Final_Sales

ORDER BY COUNT(*) DESC

LIMIT 1

) AS mode_final_sales,

SELECT RtnMRP AS mode_value, COUNT(*) AS mode_count

FROM cleaned_table

GROUP BY RtnMRP

ORDER BY COUNT(*) DESC

LIMIT 1

) AS mode_rtnmrp;

Output:

2. Calculating the second moment (measures of dispersion such as variance, standard deviation,
range) for the dataset.

Variance:

SELECT

ROUND(VARIANCE(Quantity), 2) AS variance_quantity,

ROUND(VARIANCE(ReturnQuantity), 2) AS variance_return_quantity,

ROUND(VARIANCE(Final_Cost), 2) AS variance_final_cost,

ROUND(VARIANCE(Final_Sales), 2) AS variance_final_sales,

ROUND(VARIANCE(RtnMRP), 2) AS variance_rtnmrp
FROM cleaned_table;

Output:

Standard Deviation:

SELECT

ROUND(STDDEV(Quantity), 2) AS stddev_quantity,

ROUND(STDDEV(ReturnQuantity), 2) AS stddev_return_quantity,

ROUND(STDDEV(Final_Cost), 2) AS stddev_final_cost,

ROUND(STDDEV(Final_Sales), 2) AS stddev_final_sales,

ROUND(STDDEV(RtnMRP), 2) AS stddev_rtnmrp

FROM cleaned_table;

Output:

Range:

SELECT

MAX(Quantity) - MIN(Quantity) AS range_quantity,

MAX(ReturnQuantity) - MIN(ReturnQuantity) AS range_return_quantity,

MAX(Final_Cost) - MIN(Final_Cost) AS range_final_cost,

MAX(Final_Sales) - MIN(Final_Sales) AS range_final_sales,

MAX(RtnMRP) - MIN(RtnMRP) AS range_rtnmrp

FROM cleaned_table;

Output:
3. Calculating the third moment (skewness) for the dataset.

Skewness:

SELECT 'Quantity' AS column_name,

ROUND((SUM(POW(Quantity - (SELECT AVG(Quantity) FROM cleaned_table), 3)) / (COUNT(*) *


POW(STDDEV(Quantity), 3))), 2) AS skewness_value

FROM cleaned_table

UNION ALL

SELECT 'ReturnQuantity' AS column_name,

ROUND((SUM(POW(ReturnQuantity - (SELECT AVG(ReturnQuantity) FROM cleaned_table), 3)) /


(COUNT(*) * POW(STDDEV(ReturnQuantity), 3))), 2) AS skewness_value

FROM cleaned_table

UNION ALL

SELECT 'Final_Cost' AS column_name,

ROUND((SUM(POW(Final_Cost - (SELECT AVG(Final_Cost) FROM cleaned_table), 3)) / (COUNT(*) *


POW(STDDEV(Final_Cost), 3))), 2) AS skewness_value

FROM cleaned_table

UNION ALL

SELECT 'Final_Sales' AS column_name,

ROUND((SUM(POW(Final_Sales - (SELECT AVG(Final_Sales) FROM cleaned_table), 3)) / (COUNT(*) *


POW(STDDEV(Final_Sales), 3))), 2) AS skewness_value

FROM cleaned_table

UNION ALL

SELECT 'RtnMRP' AS column_name,

ROUND((SUM(POW(RtnMRP - (SELECT AVG(RtnMRP) FROM cleaned_table), 3)) / (COUNT(*) *


POW(STDDEV(RtnMRP), 3))), 2) AS skewness_value

FROM cleaned_table;

Output:
4. Calculating the fourth moment (kurtosis) for the dataset.

Kurtosis:

SELECT

ROUND((SUM(POWER(Quantity - avg_value, 4)) / (COUNT(Quantity) * POWER(STDDEV(Quantity),


4))), 2) AS kurtosis_quantity,

ROUND((SUM(POWER(ReturnQuantity - avg_value, 4)) / (COUNT(ReturnQuantity) *


POWER(STDDEV(ReturnQuantity), 4))), 2) AS kurtosis_return_quantity,

ROUND((SUM(POWER(Final_Cost - avg_value, 4)) / (COUNT(Final_Cost) *


POWER(STDDEV(Final_Cost), 4))), 2) AS kurtosis_final_cost,

ROUND((SUM(POWER(Final_Sales - avg_value, 4)) / (COUNT(Final_Sales) *


POWER(STDDEV(Final_Sales), 4))), 2) AS kurtosis_final_sales,

ROUND((SUM(POWER(RtnMRP - avg_value, 4)) / (COUNT(RtnMRP) * POWER(STDDEV(RtnMRP), 4))),


2) AS kurtosis_rtnmrp

FROM

(SELECT

AVG(Quantity) AS avg_value,

STDDEV(Quantity) AS stddev_value,

COUNT(Quantity) AS count_value

FROM projectfinaldata) AS subquery, cleaned_table;

Output:
Comparison table showing the business decisions results for the unclean and clean data:

Observation:

Overall, the results indicate that the unclean data exhibits higher mean, variance, standard deviation,
range, skewness, and kurtosis values compared to the clean data. This suggests greater
inconsistencies, variability, and potential outliers in the unclean data. Cleaning the data has resulted
in more stable and normalized distributions with reduced variability and potential biases, making it
more reliable for business decision-making.
Bounce Rate Analysis:

1. Finding the percentage of customers who bounced (returned a product with a final sale price of
0) out of the total number of customers.

SELECT ROUND((bounced_customers / total_customers) * 100, 2) AS bounce_rate

FROM

(SELECT COUNT(DISTINCT Patient_ID) AS total_customers

FROM cleaned_table

WHERE Typeofsales IN ('Sale', 'Return')) AS t1,

(SELECT COUNT(DISTINCT Patient_ID) AS bounced_customers

FROM cleaned_table

WHERE Typeofsales = 'Return' AND Final_Sales = 0) AS t2;

Output:

Insight:

We can understand that around 22.85% of customers in the ‘cleaned_table’ faced a situation where
they returned medicines with a Final_Sales value of 0. This means that a significant portion of
customers did not get the medicines they needed, which could lead to dissatisfaction. To improve
business success and increase revenue, it is important to reduce this bounce rate by ensuring
customers receive the medicines they require.

2. Finding the number of drugs in each subcategory that have been returned without making a sale
(Final_Sales = 0).

SELECT SubCat, COUNT(DISTINCT DrugName) AS num_returned_drugs

FROM cleaned_table

WHERE Typeofsales = 'Return' AND Final_Sales = 0

GROUP BY SubCat

ORDER BY num_returned_drugs DESC;


Output:

Insight:

We can observe that the subcategory "INJECTIONS" has the highest count of returned drug names
with 98 occurrences, followed by the subcategory "TABLETS & CAPSULES" with 63 occurrences,
indicating a potential issue with customer satisfaction, product quality, or other factors that lead to
returns for these two subcategories.

3. Finding the formulation with the highest return count within the "INJECTIONS" and "TABLETS &
CAPSULES" subcategories.

SELECT SubCat, Formulation, return_count

FROM (

SELECT SubCat, Formulation, COUNT(*) AS return_count,

ROW_NUMBER() OVER (PARTITION BY SubCat ORDER BY COUNT(*) DESC) AS rn

FROM cleaned_table

WHERE Typeofsales = 'Return' AND Final_Sales = 0 AND SubCat IN ('INJECTIONS', 'TABLETS &
CAPSULES')

GROUP BY SubCat, Formulation

) AS subquery

WHERE rn = 1;
Output:

Insight:

We can observe that within the "INJECTIONS" subcategory, the Formulation "Form1" has the highest
return count with 398 occurrences. Similarly, within the "TABLETS & CAPSULES" subcategory, the
Formulation "Form1" again has the highest return count with 77 occurrences.

4. Finding the count of occurrences of Formulation "Form1" for each Department (Dept) where the
SubCat is either "INJECTIONS" or "TABLETS & CAPSULES".

SELECT Dept, COUNT(*) AS form1_count

FROM cleaned_table

WHERE Formulation = 'Form1' AND SubCat IN ('INJECTIONS', 'TABLETS & CAPSULES')

GROUP BY Dept

ORDER BY form1_count DESC;

Output:

Insight: The Formulation "Form1" appears to be significantly more prevalent in Department1


compared to other departments which likely corresponds to the Pharmacy department, has a high
demand or usage of the "Form1" formulation. This could indicate that "Form1" is a commonly
prescribed or requested medication within this department.

5. Finding the count of occurrences of Typeofsales as 'Return' for each Department (Dept).

SELECT Dept, COUNT(*) AS return_count

FROM cleaned_table

WHERE Typeofsales = 'Return'

GROUP BY Dept;
Output:

Insight: Department1 has a relatively higher count of return occurrences compared to other
departments. This suggests that there may be more instances of customers returning products in
Department1.

6. Finding the count of occurrences of Typeofsales as 'Return' for each Specialisation within
Department1 and Formulation as 'Form1'

SELECT Specialisation, COUNT(*) AS return_count

FROM cleaned_table

WHERE Typeofsales = 'Return' AND Dept = 'Department1' AND Formulation = 'Form1'

GROUP BY Specialisation

ORDER BY return_count DESC;

Output:

Insight: We can observe that Specialisation4 and Specialisation7 are experiencing a relatively higher
number of returns compared to other Specialisations within Department1 and Formulation as
'Form1'.
Conclusion:

Based on the patterns and trends gained from the analysis of the dataset, the following conclusions
can be drawn:

1. Focus on Subcategories: The subcategories "INJECTIONS" and "TABLETS & CAPSULES" require
special attention due to their higher counts of returned drug names. The hospital should conduct a
thorough analysis of these subcategories to identify the underlying causes and take necessary steps
to address customer satisfaction, product quality, or other issues contributing to returns.

2. Evaluation of Formulation: The "Form1" formulation stands out with the highest return counts in
both the "INJECTIONS" and "TABLETS & CAPSULES" subcategories. It is essential to thoroughly
evaluate this formulation, considering factors such as product effectiveness, potential side effects,
and customer preferences. Improvements in the formulation or alternative options should be
explored to reduce returns.

3. Supplier Assessment: Assess the performance of suppliers providing products to Department1,


particularly those supplying the "Form1" formulation. Evaluate factors such as product quality,
reliability, and adherence to customer requirements. Consider alternative suppliers or negotiate with
existing ones to improve the quality and availability of these products.

4. Inventory Management: Ensure efficient inventory management for Department1, particularly for
products with the "Form1" formulation. Optimize stock levels, expiration dates, and replenishment
processes to minimize instances of expired or obsolete products. Proper inventory management can
help reduce returns and maintain a more cost-effective inventory.

5. Efficient Return Management: Department1, representing the Pharmacy department, experiences


a relatively higher count of return occurrences. The hospital should focus on implementing efficient
return management practices, such as streamlining return processes, improving inventory
management, and addressing any issues specific to Department1 that contribute to returns.

6. Specialisation-specific Strategies: Specialisation4 and Specialisation7 within Department1 and


associated with the "Form1" formulation exhibit a higher number of returns. The hospital should
collaborate with healthcare professionals in these specialisations to understand the reasons behind
returns, gather insights, and implement specialisation-specific strategies to reduce returns and
enhance customer satisfaction.

By taking these specific conclusions into consideration, the hospital can make informed business
decisions and implement targeted strategies to reduce bounce rate, improve customer satisfaction,
and achieve the economic success criteria.

You might also like