EDA Code SQL
EDA Code SQL
USE med_inventory;
Output:
3. Calculating the first moment (measures of central tendency such as mean, median, mode) for
the dataset.
Mean:
SELECT
ROUND(AVG(Quantity), 2) AS mean_quantity,
ROUND(AVG(ReturnQuantity), 2) AS mean_return_quantity,
ROUND(AVG(Final_Cost), 2) AS mean_final_cost,
ROUND(AVG(Final_Sales), 2) AS mean_final_sales,
Output:
Median:
SELECT
ROUND(AVG(Final_Cost), 2) AS median_final_cost,
ROUND(AVG(Final_Sales), 2) AS median_final_sales,
ROUND(AVG(Quantity), 2) AS median_quantity,
ROUND(AVG(ReturnQuantity), 2) AS median_return_quantity,
ROUND(AVG(RtnMRP), 2) AS median_rtnmrp
FROM (
SELECT Final_Cost, Final_Sales, Quantity, ReturnQuantity, RtnMRP,
) AS subquery
WHERE row_num IN (FLOOR((total_rows + 1) / 2), CEILING((total_rows + 1) / 2));
Mode:
SELECT
mode_quantity.mode_value AS mode_quantity,
mode_return_quantity.mode_value AS mode_return_quantity,
mode_final_cost.mode_value AS mode_final_cost,
mode_final_sales.mode_value AS mode_final_sales,
mode_rtnmrp.mode_value AS mode_rtnmrp
FROM (
FROM projectfinaldata
GROUP BY Quantity
LIMIT 1
) AS mode_quantity,
FROM projectfinaldata
GROUP BY ReturnQuantity
LIMIT 1
) AS mode_return_quantity,
FROM projectfinaldata
GROUP BY Final_Cost
LIMIT 1
) AS mode_final_cost,
FROM projectfinaldata
GROUP BY Final_Sales
LIMIT 1
) AS mode_final_sales,
FROM projectfinaldata
GROUP BY RtnMRP
LIMIT 1
) AS mode_rtnmrp;
Output:
4. Calculating the second moment (measures of dispersion such as variance, standard deviation,
range) for the dataset.
Variance:
SELECT
ROUND(VARIANCE(Quantity), 2) AS variance_quantity,
ROUND(VARIANCE(ReturnQuantity), 2) AS variance_return_quantity,
ROUND(VARIANCE(Final_Cost), 2) AS variance_final_cost,
ROUND(VARIANCE(Final_Sales), 2) AS variance_final_sales,
ROUND(VARIANCE(RtnMRP), 2) AS variance_rtnmrp
FROM projectfinaldata;
Output:
Standard Deviation:
SELECT
ROUND(STDDEV(Quantity), 2) AS stddev_quantity,
ROUND(STDDEV(ReturnQuantity), 2) AS stddev_return_quantity,
ROUND(STDDEV(Final_Cost), 2) AS stddev_final_cost,
ROUND(STDDEV(Final_Sales), 2) AS stddev_final_sales,
ROUND(STDDEV(RtnMRP), 2) AS stddev_rtnmrp
FROM projectfinaldata;
Output:
Range:
SELECT
FROM projectfinaldata;
Output:
Skewness:
FROM projectfinaldata
UNION ALL
FROM projectfinaldata
UNION ALL
FROM projectfinaldata
UNION ALL
FROM projectfinaldata
UNION ALL
FROM projectfinaldata;
Output:
Kurtosis:
SELECT
FROM
(SELECT
AVG(Quantity) AS avg_value,
STDDEV(Quantity) AS stddev_value,
COUNT(Quantity) AS count_value
FROM projectfinaldata) AS subquery, projectfinaldata;
Output:
1. Calculating the first moment (measures of central tendency such as mean, median, mode) for
the dataset.
Mean:
SELECT
ROUND(AVG(Quantity), 2) AS mean_quantity,
ROUND(AVG(ReturnQuantity), 2) AS mean_return_quantity,
ROUND(AVG(Final_Cost), 2) AS mean_final_cost,
ROUND(AVG(Final_Sales), 2) AS mean_final_sales,
Median:
SELECT
ROUND(AVG(Final_Cost), 2) AS median_final_cost,
ROUND(AVG(Final_Sales), 2) AS median_final_sales,
ROUND(AVG(Quantity), 2) AS median_quantity,
ROUND(AVG(ReturnQuantity), 2) AS median_return_quantity,
ROUND(AVG(RtnMRP), 2) AS median_rtnmrp
FROM (
SELECT Final_Cost, Final_Sales, Quantity, ReturnQuantity, RtnMRP,
Mode:
SELECT
mode_quantity.mode_value AS mode_quantity,
mode_return_quantity.mode_value AS mode_return_quantity,
mode_final_cost.mode_value AS mode_final_cost,
mode_final_sales.mode_value AS mode_final_sales,
mode_rtnmrp.mode_value AS mode_rtnmrp
FROM (
FROM cleaned_table
GROUP BY Quantity
LIMIT 1
) AS mode_quantity,
FROM cleaned_table
GROUP BY ReturnQuantity
LIMIT 1
) AS mode_return_quantity,
FROM cleaned_table
GROUP BY Final_Cost
ORDER BY COUNT(*) DESC
LIMIT 1
) AS mode_final_cost,
FROM cleaned_table
GROUP BY Final_Sales
LIMIT 1
) AS mode_final_sales,
FROM cleaned_table
GROUP BY RtnMRP
LIMIT 1
) AS mode_rtnmrp;
Output:
2. Calculating the second moment (measures of dispersion such as variance, standard deviation,
range) for the dataset.
Variance:
SELECT
ROUND(VARIANCE(Quantity), 2) AS variance_quantity,
ROUND(VARIANCE(ReturnQuantity), 2) AS variance_return_quantity,
ROUND(VARIANCE(Final_Cost), 2) AS variance_final_cost,
ROUND(VARIANCE(Final_Sales), 2) AS variance_final_sales,
ROUND(VARIANCE(RtnMRP), 2) AS variance_rtnmrp
FROM cleaned_table;
Output:
Standard Deviation:
SELECT
ROUND(STDDEV(Quantity), 2) AS stddev_quantity,
ROUND(STDDEV(ReturnQuantity), 2) AS stddev_return_quantity,
ROUND(STDDEV(Final_Cost), 2) AS stddev_final_cost,
ROUND(STDDEV(Final_Sales), 2) AS stddev_final_sales,
ROUND(STDDEV(RtnMRP), 2) AS stddev_rtnmrp
FROM cleaned_table;
Output:
Range:
SELECT
FROM cleaned_table;
Output:
3. Calculating the third moment (skewness) for the dataset.
Skewness:
FROM cleaned_table
UNION ALL
FROM cleaned_table
UNION ALL
FROM cleaned_table
UNION ALL
FROM cleaned_table
UNION ALL
FROM cleaned_table;
Output:
4. Calculating the fourth moment (kurtosis) for the dataset.
Kurtosis:
SELECT
FROM
(SELECT
AVG(Quantity) AS avg_value,
STDDEV(Quantity) AS stddev_value,
COUNT(Quantity) AS count_value
Output:
Comparison table showing the business decisions results for the unclean and clean data:
Observation:
Overall, the results indicate that the unclean data exhibits higher mean, variance, standard deviation,
range, skewness, and kurtosis values compared to the clean data. This suggests greater
inconsistencies, variability, and potential outliers in the unclean data. Cleaning the data has resulted
in more stable and normalized distributions with reduced variability and potential biases, making it
more reliable for business decision-making.
Bounce Rate Analysis:
1. Finding the percentage of customers who bounced (returned a product with a final sale price of
0) out of the total number of customers.
FROM
FROM cleaned_table
FROM cleaned_table
Output:
Insight:
We can understand that around 22.85% of customers in the ‘cleaned_table’ faced a situation where
they returned medicines with a Final_Sales value of 0. This means that a significant portion of
customers did not get the medicines they needed, which could lead to dissatisfaction. To improve
business success and increase revenue, it is important to reduce this bounce rate by ensuring
customers receive the medicines they require.
2. Finding the number of drugs in each subcategory that have been returned without making a sale
(Final_Sales = 0).
FROM cleaned_table
GROUP BY SubCat
Insight:
We can observe that the subcategory "INJECTIONS" has the highest count of returned drug names
with 98 occurrences, followed by the subcategory "TABLETS & CAPSULES" with 63 occurrences,
indicating a potential issue with customer satisfaction, product quality, or other factors that lead to
returns for these two subcategories.
3. Finding the formulation with the highest return count within the "INJECTIONS" and "TABLETS &
CAPSULES" subcategories.
FROM (
FROM cleaned_table
WHERE Typeofsales = 'Return' AND Final_Sales = 0 AND SubCat IN ('INJECTIONS', 'TABLETS &
CAPSULES')
) AS subquery
WHERE rn = 1;
Output:
Insight:
We can observe that within the "INJECTIONS" subcategory, the Formulation "Form1" has the highest
return count with 398 occurrences. Similarly, within the "TABLETS & CAPSULES" subcategory, the
Formulation "Form1" again has the highest return count with 77 occurrences.
4. Finding the count of occurrences of Formulation "Form1" for each Department (Dept) where the
SubCat is either "INJECTIONS" or "TABLETS & CAPSULES".
FROM cleaned_table
GROUP BY Dept
Output:
5. Finding the count of occurrences of Typeofsales as 'Return' for each Department (Dept).
FROM cleaned_table
GROUP BY Dept;
Output:
Insight: Department1 has a relatively higher count of return occurrences compared to other
departments. This suggests that there may be more instances of customers returning products in
Department1.
6. Finding the count of occurrences of Typeofsales as 'Return' for each Specialisation within
Department1 and Formulation as 'Form1'
FROM cleaned_table
GROUP BY Specialisation
Output:
Insight: We can observe that Specialisation4 and Specialisation7 are experiencing a relatively higher
number of returns compared to other Specialisations within Department1 and Formulation as
'Form1'.
Conclusion:
Based on the patterns and trends gained from the analysis of the dataset, the following conclusions
can be drawn:
1. Focus on Subcategories: The subcategories "INJECTIONS" and "TABLETS & CAPSULES" require
special attention due to their higher counts of returned drug names. The hospital should conduct a
thorough analysis of these subcategories to identify the underlying causes and take necessary steps
to address customer satisfaction, product quality, or other issues contributing to returns.
2. Evaluation of Formulation: The "Form1" formulation stands out with the highest return counts in
both the "INJECTIONS" and "TABLETS & CAPSULES" subcategories. It is essential to thoroughly
evaluate this formulation, considering factors such as product effectiveness, potential side effects,
and customer preferences. Improvements in the formulation or alternative options should be
explored to reduce returns.
4. Inventory Management: Ensure efficient inventory management for Department1, particularly for
products with the "Form1" formulation. Optimize stock levels, expiration dates, and replenishment
processes to minimize instances of expired or obsolete products. Proper inventory management can
help reduce returns and maintain a more cost-effective inventory.
By taking these specific conclusions into consideration, the hospital can make informed business
decisions and implement targeted strategies to reduce bounce rate, improve customer satisfaction,
and achieve the economic success criteria.