CSIT Module 1 Notes.
CSIT Module 1 Notes.
In summary, the role of Business Analytics is to provide organizations with the tools
and insights they need to make informed decisions, improve operations, and drive
growth.
❖ Life cycle of BA
Life Cycle of Business Analytics:
I. Business Problem Identification: The first stage of the Business Analytics life
cycle involves identifying the business problem or opportunity that requires
analysis.
II. Data Collection: The next stage involves collecting relevant data from various
sources, such as internal databases, external sources, and third-party data
providers.
III. Data Preparation: The collected data is then cleaned, transformed, and prepared
for analysis.
IV. Data Analysis: The prepared data is then analyzed using various statistical and
quantitative methods to identify patterns, trends, and insights.
V. Data Visualization: The insights gained from data analysis are then visualized
using graphs, charts, and other visual aids to help decision-makers understand
and interpret the results.
VI. Decision Making: Based on the insights gained from data analysis and
visualization, decision-makers can make informed decisions and take action to
address the business problem or opportunity.
VII. Monitoring and Evaluation: The final stage involves monitoring and evaluating
the outcomes of the decisions made and the impact on the business
III. Financial Analytics: Business Analytics can be used to analyze financial data
and identify areas for cost savings, improve revenue generation, and mitigate
risks.
IV. HR Analytics: Business Analytics can be used to analyze employee data, such
as performance, satisfaction, and turnover, to help businesses optimize their
human resources.
VI. Project Management Skills: A Business Analyst should have good project
management skills to manage timelines, priorities, and resources effectively.
VIII. Attention to Detail: A Business Analyst should have strong attention to detail
to ensure accuracy and completeness of data analysis and reporting.
In summary, a Business Analyst should have a combination of technical and soft skills
to analyze data, communicate effectively with stakeholders, and support business
decision-making.
❖ Summarizing Data
Summarizing data is the process of presenting the key features or characteristics of a dataset in
a concise and meaningful way. It involves analyzing and reducing large amounts of data into a
more manageable and understandable format.
The most common ways of summarizing data are through descriptive statistics, which provide
information on the central tendency, variability, and distribution of the data. Examples of
descriptive statistics include mean, median, mode, range, standard deviation, and frequency
distribution.
Other ways of summarizing data include visualization techniques, such as graphs and charts,
which can help to identify patterns and relationships in the data. Common types of visualization
techniques include bar charts, line graphs, scatter plots, and histograms.
❖ Normality Test
Normality test is a statistical method used to determine whether a given dataset is normally
distributed or not. Normal distribution is a statistical term that refers to the distribution of data
around the mean in a bell-shaped curve. Many statistical tests and models assume that the data
is normally distributed, so it is important to check whether this assumption is valid before
conducting further analysis.
There are several methods of testing for normality, including graphical methods, such as
histograms and normal probability plots, and statistical tests, such as the Shapiro-Wilk test,
Anderson-Darling test, and Kolmogorov-Smirnov test.
Histograms and normal probability plots are graphical methods that visually represent the
distribution of data. A histogram is a graph that displays the frequency distribution of the data,
while a normal probability plot is a graph that plots the observed data against the expected
values of a normal distribution. If the data follows a normal distribution, the plotted points
should form a straight line.
Statistical tests for normality involve comparing the observed data to a normal distribution.
The Shapiro-Wilk test, for example, tests whether the data follows a normal distribution by
comparing the observed data to the expected values of a normal distribution. The Anderson-
Darling test and the Kolmogorov-Smirnov test are other statistical tests that can be used to test
for normality.
If the dataset fails the normality test, it may not be appropriate to use methods that assume
normal distribution, such as parametric statistical tests or models. In such cases, non-parametric
tests or models may be more appropriate.
❖ Homogeneity Test
Homogeneity test is a statistical method used to determine whether the variances of two or
more groups of data are equal or not. Homogeneity of variance is an assumption of many
statistical tests, such as the t-test and analysis of variance (ANOVA). Violation of this
assumption can lead to biased results and incorrect conclusions.
There are several methods of testing for homogeneity of variance, including graphical methods,
such as boxplots and scatter plots, and statistical tests, such as Levene's test and Bartlett's test.
Boxplots and scatter plots are graphical methods that can be used to visually inspect whether
the variances of different groups are similar or not. If the boxplots or scatter plots show similar
variability among the groups, then the assumption of homogeneity of variance may be valid.
Levene's test is a statistical test that compares the variances of two or more groups of data by
calculating the absolute deviations of each observation from the group mean, and then
comparing the mean absolute deviations across groups. The test produces a p-value, which
indicates whether the variances are significantly different or not. A significant p-value indicates
that the variances are not equal, while a non-significant p-value suggests that the variances are
similar.
Bartlett's test is another statistical test that compares the variances of two or more groups of
data by calculating the sum of squared differences between each observation and the group
mean, and then comparing the sum of squared differences across groups. The test also produces
a p-value, with a significant p-value indicating that the variances are not equal.
In summary, homogeneity tests are important to ensure that the assumption of equal variances
is met before conducting statistical tests that rely on this assumption. If the assumption is
violated, alternative tests or models may be needed to obtain accurate and valid results.
❖ Graphical Charts
Graphical charts are visual representations of data that help to convey information in a clear
and concise way. Charts are used to display numerical or qualitative data in a way that is easy
to understand, and they are a common tool in business, finance, science, and other fields.
Graphical charts in Excel are an effective way to visualize data and present it in a clear and
concise manner. Here are some of the benefits of using graphical charts in Excel:
2. Communication: Graphical charts can be easily shared with others and help to
communicate complex information in a visually appealing way.
Overall, the use of graphical charts in Excel can help users to better understand their data,
communicate insights to others, and make informed decisions.
➢ Histogram
(a) A histogram is a graphical chart that displays the distribution of numerical
data. The data is divided into intervals, or bins, and the height of each bar
represents the frequency or count of values within each bin. The horizontal
axis of a histogram represents the range of values in the data, and the vertical
axis represents the frequency or count.
(b) Histograms are commonly used to show the distribution of data, including
the shape of the distribution, the center of the data, and the spread of the
data. They are especially useful when working with large data sets, as they
allow users to quickly identify patterns and trends in the data.
➢ Box Plots
A box plot, also known as a box and whisker plot, is a graphical chart that displays the
distribution of numerical data and identifies any outliers. It shows the median, quartiles, and
range of the data in a compact and efficient manner.
1. Median: The middle value in the data set, which separates the lower and upper
halves of the data.
2. Quartiles: The data is divided into four equal parts, with the first quartile (Q1)
representing the 25th percentile, the second quartile representing the median,
and the third quartile (Q3) representing the 75th percentile.
3. Interquartile Range (IQR): The distance between the first and third quartiles,
representing the middle 50% of the data.
4. Whiskers: Lines extending from the box that show the range of the data.
Typically, they extend up to 1.5 times the IQR from the quartiles. Any data
points outside the whiskers are considered outliers.
5. Outliers: Data points that fall outside the whiskers and are significantly different
from the rest of the data.
Box plots are commonly used to compare the distribution of data between different
groups or variables. They are especially useful when working with large data sets, as
they provide a quick and clear overview of the distribution of the data and highlight
any potential outliers.
In Excel, creating a box plot is relatively straightforward. Simply select the data and
choose "Box and Whisker" from the chart types under the Insert tab. Excel will
automatically generate a box plot based on the selected data.
➢ Bar Plots.
A box plot, also known as a box and whisker plot, is a graphical chart that displays the
distribution of numerical data and identifies any outliers. It shows the median, quartiles,
and range of the data in a compact and efficient manner.
a. Median: The middle value in the data set, which separates the lower and
upper halves of the data.
b. Quartiles: The data is divided into four equal parts, with the first quartile
(Q1) representing the 25th percentile, the second quartile representing
the median, and the third quartile (Q3) representing the 75th percentile.
c. Interquartile Range (IQR): The distance between the first and third
quartiles, representing the middle 50% of the data.
d. Whiskers: Lines extending from the box that show the range of the data.
Typically, they extend up to 1.5 times the IQR from the quartiles. Any
data points outside the whiskers are considered outliers.
e. Outliers: Data points that fall outside the whiskers and are significantly
different from the rest of the data.
Box plots are commonly used to compare the distribution of data between different
groups or variables. They are especially useful when working with large data sets, as
they provide a quick and clear overview of the distribution of the data and highlight any
potential outliers.
In Excel, creating a box plot is relatively straightforward. Simply select the data and
choose "Box and Whisker" from the chart types under the Insert tab. Excel will
automatically generate a box plot based on the selected data.
➢ Scatterplot
A scatter plot is a graphical chart that displays the relationship between two variables.
It uses a set of points to represent the values of the two variables, with one variable
plotted along the horizontal axis (x-axis) and the other variable plotted along the vertical
axis (y-axis).
Each point on the scatter plot represents a single data point that includes values for both
variables. The pattern of points on the scatter plot can reveal the nature of the
relationship between the two variables, such as whether the variables are positively
correlated, negatively correlated, or not correlated at all.
a. Correlation: Scatter plots can show the strength and direction of the
correlation between two variables. Positive correlation indicates that the
variables increase or decrease together, while negative correlation
indicates that they move in opposite directions. No correlation indicates
that there is no relationship between the variables.
b. Outliers: Scatter plots can highlight any unusual or extreme data points
that fall outside the general pattern of the data.
c. Clustering: Scatter plots can reveal clusters or groups of data points that
share similar values for both variables.
Scatter plots are commonly used in fields such as statistics, finance, and data analysis
to explore and visualize the relationship between two variables. In Excel, creating a
scatter plot is relatively straightforward. Simply select the two variables that you want
to plot, choose "Scatter" from the chart types under the Insert tab, and Excel will
automatically generate a scatter plot based on the selected data.