Sector - : Aba Project Pharmaceutical
Sector - : Aba Project Pharmaceutical
Sector - : Aba Project Pharmaceutical
Sector - Pharmaceutical
Submitted By-
Ranita Ray- 19202170
Rawmik Kumar Mondal- 19202171
Santwana Mishra- 19202176
Suman Patra- 19202190
Sunetra Biswas- 19202191
Sweta Pallavi- 19202192
Tanishque Kumar Patra- 19202193
Dataset: The Dividend dataset consist of 4588 rows with 38 columns, where information of
BSE listed companies are extracted. Form this dataset a total of 189 pharma companies are
selected manually, which was used for the analysis purpose in R. And a total of 34 variables
or columns were selected ignoring the year column for the purpose of the analysis. In the
dataset the column representing the L- 3 is taken as 2017, L-2 is taken as 2018, L-1 is taken
as 2019 and L is taken as 2020.According to this all the variable names belonging to a
particular year is defined. For example: Quick_ratio.2017, Quick_ratio.2018,
Quick_ratio.2019, Quick_ratio.2020 and so on.
R programming: The Analysis of the dataset was done with the following steps:
Import of dataset. The dataset named as A to Z Pharma companies which is a CSV file was
first imported in R using the read.csv code and store in a variable named Company.
Installation of packages: Packages that were installed and called before proceeding with
analysis are the tidyverse and ggplot2. Thereafter the library for the same were called.
Changed the column names: For changing the column names, first the column names were
called using the function called colnames(Company) and then using vector the column names
are defined and then assigned it to colnames(Company) code. This changes the previous
column names to the new ones.
Read the data: For reading the dataset clearly tidy format is used I.e. Company %>%
as_tibble(Company) and to get the first 10 information about the dataset function head is
used.
Data cleaning:
While reading the dataset it was found out that there are some negative values in certain
variable i.e. the Return on net worth. A negative return on net worth could impact the
analysis. So, viewing that it is necessary to remove negative values from the return on net
worth and retain only those values with positive figures.
For that first the companies having return on net worth was selected using tidy format. And
then using filter function, the return on net worth variable which were having negative values
are carried out. To RoNW values with negative number were then assigned to NAs, as it is
easier to removes Negative values while converting it into NA. Then After having assigned it
to the NAs, a select command is used to check whether all the negative values have been
converted to NAs or not. Similarly, the process of converting all negative values to NAs has
been done for each years of RoNW variable.
Thereafter, to get complete dataset of company having no NA values, a function called
complete.cases() is used. And to remove NAs from the entire dataset a function called
na.omit() is used and then assigned it to the variable named Company. So, after removing the
NAs a total of 65 observations and 34 variables are retained in the dataset.
Descriptive statistics:
To carry out the descriptive statistics the function called summary(Company) is used.
Through this command we can observe the mean, medium, range, Quartiles values for each
of the 34 variables. To see the output of the summary, refer the below pictures.
From Summary of Company dataset, we can identify the mean, median and IQR i.e. the
difference between 1st Quartile and 3rd Quartile for different variables. From the observation
of range we can also identify which variable could have outliers in its dataset.
For Example: If we look at the Quick_Ratio.2020 variable then we can say that 25% of data
lies between Min and 1st Qu. Similarly, 25% of data also lies between 3rd Qu. And Max.
However, here between 3rd Qu. And Max, there is a huge range of difference in values i.e.
(from 1.9 to 42.4). whereas between Min and 1st Qu there is minimal difference in their range
i.e. (from 0.38 to 0.68). So, this shows that there can be a possibility of outlier in the
Quick_Ratio.2020 variable.
To have a better understanding about the outlier in the variables, boxplot is used to observe
how data are distributed in company dataset. For example: below is a picture of boxplot
featuring how data for Debt_Equity.2017 variable are distributed.
From this thick line position (which is also known as medium that consist of 50% of
data), we can observe that, data has been compactly placed below the medium (0.0-
0.2). Above the medium, data are more openly distributed. That means most of the
data are clubbed towards the lower region i.e. between (0.0-0.2)
The points that are flying above the target line are the outliers. Outliers occurs above
75th or below 25% percentile which is 1.5 times the IOR.
Similarly, for each of the variables boxplot is carried out to see their distribution of data.
outlier: For removing the outliers form the variables the following code is used, which is
shown below. To show how we can find outlier in a virable, Quick_ratio.2017 variable has
been taken as an example.
As we known that outlier are the data points that tells how much it is falling aways i.e. 1.5
times the inter quartile range (IQR). IQR is nothing but the difference between the 1st
quartile and 3rd quartile. For detecting outliers, we have to find out IQR and
Range/Quantiles. After that we have to fix a cutoff which will be considered as benchmark
point, that data pint point beyond the benchmark value will be considered as outlier. So these
will be values that we will not considered in the variable and the points that falls below the
benmark will be considered as the total data points.
As here we have taken the Quick_ratio.2017 varibale, whose outiler falls above the 75th
percentile, falling away by 1.5 times the IQR. So, here the benchmark formula would be
1.94(i.e. value of 75%) + 1.5*IOQ of Quick_ratio.2017, which comes as 3.8. So, any point
beyond the 3.8 will be considered as outliers for the variable.
Data Visualisation:
1) To find out what are the companies that are having return on net
worth > 10% for the year 2017.
Here in this plot we can saw that a list of around 26 companies among 65 pharma
companies are having Return on net worth more than 10% for the year 2017. Return on
net worth is a ratio of Profit after tax and net worth. A company having RoNW of 10% or
more is considered strong company. So it indicates that these 26 companies are running
their business smoothly by efficiently utilising the shareholders fund.
2) To find out what are the companies that are having quick ratio of 1 for the year
2020
The Plot shows that there is only one company for the year 2020 having a Quick ratio of 1 to
1. Quick ratio of 1:1 is considered as satisfactory for the company good financial condition.
So, among 65 companies only Ipca Laboratories is having a Quick ratio of 1:1. It indicates
that the company is having equal quick assets to its current liabilities and it was able to pay of
its obligations easily without selling its long term assets.
3) To find out what are the companies that are having quick ratio of less than 1 for
the year 2020.
The above plot indicates these are number of companies that are having a quick
Ration of less than 1 to 1. That means these compnaies will find it difficult to pay of
their short term libabilities because their assets are not liquid enough to convert it into
cash immetialdy.
4) To find out the companies that are DEBT free for the year 2020.
This plot indicates that a totla number of seven companies out of 65 pharma companies are
having zero debt for the year 2020. It can be interpreted that these are the debt free companies
for the year 2020, who mainly rely upon equity capital to finance their business rather than
taking any long term loan.