Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
68 views

BA File

This document is a project report submitted by Anju S Nair (roll number 00220803920) for their MBA program at Guru Gobind Singh Indraprastha University, Delhi. The report discusses business analytics and Excel functions. It covers topics like types of data classification in statistics, measures of central tendency, data summarization techniques including tables, charts and histograms, as well as concepts like frequency distribution and relative frequency.

Uploaded by

Sweety 12
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views

BA File

This document is a project report submitted by Anju S Nair (roll number 00220803920) for their MBA program at Guru Gobind Singh Indraprastha University, Delhi. The report discusses business analytics and Excel functions. It covers topics like types of data classification in statistics, measures of central tendency, data summarization techniques including tables, charts and histograms, as well as concepts like frequency distribution and relative frequency.

Uploaded by

Sweety 12
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 68

A PROJECT REPORT

ON

BUSINESS ANALYTICS & EXCEL FUNCTIONS

Masters in Business Administration (MBA)

Guru Gobind Singh Indraprastha University, Delhi Student


Name: Anju S Nair
Roll no: 00220803920
Subject Teacher: Mr. Gaurav Sharma

Bhagwan Parshuram Institute of Technology School of


Business Administration
New Delhi -110089
Batch 2020 - 2022
UNIT 1
Business Statistics
The Business Statistics and Analysis Specialization is designed to equip you with a basic
understanding of business data analysis tools and techniques. You’ll master essential
spreadsheet functions, build descriptive business data measures, and develop your
aptitude for data modeling. You’ll also explore basic probability concepts, including
measuring and modeling uncertainty, and you’ll use various data distributions, along with
the Linear Regression Model, to analyze and inform business decisions. The
Specialization culminates with a Capstone Project in which you’ll apply the skills and
knowledge you’ve gained to an actual business problem.

Types of Classification of Data in Statistics


Qualitative or Categorical Data
Qualitative data, also known as the categorical data, describes the data that fits into the
categories. Qualitative data are not numerical. The categorical information involves
categorical variables that describe the features such as a person’s gender, home town etc.
Categorical measures are defined in terms of natural language specifications, but not in
terms of numbers.
Sometimes categorical data can hold numerical values (quantitative value), but those
values do not have mathematical sense. Examples of the categorical data are birthdate,
favourite sport, school postcode. Here, the birthdate and school postcode hold the
quantitative value, but it does not give numerical meaning.

Nominal Data
Nominal data is one of the types of qualitative information which helps to label the
variables without providing the numerical value. Nominal data is also called the nominal
scale. It cannot be ordered and measured. But sometimes, the data can be qualitative and
quantitative. Examples of nominal data are letters, symbols, words, gender etc.
The nominal data are examined using the grouping method. In this method, the data are
grouped into categories, and then the frequency or the percentage of the data can be
calculated. These data are visually represented using the pie charts.

Ordinal Data
Ordinal data/variable is a type of data which follows a natural order. The significant
feature of the nominal data is that the difference between the data values is not
determined. This variable is mostly found in surveys, finance, economics, questionnaires,
and so on.
The ordinal data is commonly represented using a bar chart. These data are investigated
and interpreted through many visualisation tools. The information may be expressed using
tables in which each row in the table shows the distinct category.

Quantitative or Numerical Data


Quantitative data is also known as numerical data which represents the numerical value
(i.e., how much, how often, how many). Numerical data gives information about the
quantities of a specific thing. Some examples of numerical data are height, length, size,
weight, and so on. The quantitative data can be classified into two different types based
on the data sets. The two different classifications of numerical data are discrete data and
continuous data.

Discrete Data
Discrete data can take only discrete values. Discrete information contains only a finite
number of possible values. Those values cannot be subdivided meaningfully. Here, things
can be counted in the whole numbers.
Example: Number of students in the class

Continuous Data
Continuous data is data that can be calculated. It has an infinite number of probable
values that can be selected within a given specific range.
Example: Temperature range

Data Summarization

The term Data Summarization refers to presenting the summary of generated data in an
easily comprehensible and informative manner. Presenting the raw data, (the data that
was generated which is essentially the entire repertoire of datasets- individual
measurements) is not practical in many cases.

Tabular Presentation
A table helps to represent even a large amount of data in an engaging, easy to read, and
coordinated manner. The data is arranged in rows and columns. This is one of the most
popularly used forms of presentation of data as data tables are simple to prepare and
read.
Objectives Of Tabulation

 To simplify the complex data


 To bring out essential features of the data
 To facilitate comparison
 To facilitate statistical analysis
 Saving of space

Graphic Presentation
Graphic presentation represents a highly developed body of techniques for elucidating,
interpreting, and analyzing numerical facts by means of points, lines, areas, and other
geometric forms and symbols. Graphic techniques are especially valuable in presenting
quantitative data in a simple, clear, and effective manner, as well as facilitating
comparisons of values, trends, and relationships. They have the additional advantages of
succinctness and popular appeal; the comprehensive pictures they provide can bring out
hidden facts and relationships and contribute to a more balanced understanding of a
problem.
Charts

Charts are a great way to visually represent all kinds of information, from the simple to the
very complex.
You can have a variety of data which can be used in presentations. Some of these chart
types include:

 Time Series
 Bar Charts
 Combo Charts
 Pie Charts
 Tables
 Geo Map
 Scorecard
 Scatter Charts
 Bullet Charts
 Area Chart
 Text & Images
Histogram

A histogram is used to summarize discrete or continuous data. In other words, it provides


a visual interpretation of numerical data by showing the number of data points that fall
within a specified range of values (called “bins”). It is similar to a vertical bar graph.
However, a histogram, unlike a vertical bar graph, shows no gaps between the bars.

Frequency Distribution 

A frequency distribution is a representation, either in a graphical or tabular format, that


displays the number of observations within a given interval. The interval size depends on
the data being analysed and the goals of the analyst. The intervals must be mutually
exclusive and exhaustive.

RELATIVE FREQUENCY

The number of times an event occurs is called a frequency. Relative frequency is an


experimental one, but not a theoretical one. Since it is an experimental one, it is possible
to obtain different relative frequencies when we repeat the experiments. To calculate the
frequency, we need
Frequency count for the total population
Frequency count for a subgroup of the population

How to Calculate Relative Frequency?

The ratio of the number of times a value of the data occurs in the set of all outcomes to
the number of all outcomes gives the value of relative frequency.

Let’s understand the Relative Frequency formula with the help of an example

Let’s look at the table below to see how the weights of the people are distributed.

Step 1: To convert the frequencies into relative frequencies, we need to do the following
steps.

Step 2: Divide the given frequency by the total N i.e., 40 in the above case (Total sum of all
frequencies).

Step 3: Divide the frequency by total number Let’s see how: 1/ 40 = 0.25.

Example: Let us solve a few more examples to understand the concepts better.

This is a frequency table to see how many students have got marks between given
intervals in Maths.
Marks Frequency Relative Frequency

45 – 50 3 3 / 40 x 100 = 0.075

50 – 55 1 1 / 40 x 100 = 0.025

55 – 60 1 1 / 40 x 100 = 0.075

60 -65 6 6 / 40 x 100 = 0.15

65 – 70 8 8 / 40 x 100 = 0.2

70 – 80 3 3 / 40 x 100 = 0.275

80 -90 11 11 / 40 x 100 = 0.075

90 – 100 7 1 / 40 x 100 = 0.025

Measures of Central Tendency & Dispersion


Central tendency is a descriptive summary of a dataset through a single value that reflects
the centre of the data distribution. Along with the variability (dispersion) of a dataset,
central tendency is a branch of descriptive statistics.

The central tendency is one of the most quintessential concepts in statistics. Although it
does not provide information regarding the individual values in the dataset, it delivers a
comprehensive summary of the whole dataset.

Measures of Central Tendency

Generally, the central tendency of a dataset can be described using the following
measures:

Mean (Average): Represents the sum of all values in a dataset divided by the total number
of the values.

Median: The middle value in a dataset that is arranged in ascending order (from the
smallest value to the largest value). If a dataset contains an even number of values, the
median of the dataset is the mean of the two middle values.

Mode: Defines the most frequently occurring value in a dataset. In some cases, a dataset
may contain multiple modes, while some datasets may not have any mode at all.

Standard deviation: A standard deviation is a statistic that measures the dispersion of a


dataset relative to its mean. The standard deviation is calculated as the square root of
variance by determining each data point's deviation relative to the mean. If the data points
are further from the mean, there is a higher deviation within the data set; thus, the more
spread out the data, the higher the standard deviation.

Variance: The term variance refers to a statistical measurement of the spread between
numbers in a data set. More specifically, variance measures how far each number in the
set is from the mean and thus from every other number in the set. Variance is often
depicted by this symbol: σ2. It is used by both analysts and traders to
determine volatility and market security. The square root of the variance is the standard
deviation (σ), which helps determine the consistency of an investment’s returns over a
period of time.

Even though the measures above are the most commonly used to define central tendency,
there are some other measures, including, but not limited to, geometric mean, harmonic
mean, midrange, and geometric median.

The selection of a central tendency measure depends on the properties of a dataset. For
instance, the mode is the only central tendency measure for categorical data, while a
median works best with ordinal data.

Although the mean is regarded as the best measure of central tendency for quantitative
data, that is not always the case. For example, the mean may not work well with
quantitative datasets that contain extremely large or extremely small values. The extreme
values may distort the mean. Thus, you may consider other measures.

PROBABILITY DISTRIBUTION

The probability distribution is one of the major theories of statistical analysis. It gives the
possibility of achieving each outcome in a randomly given event. The probabilities of all
outcomes can be known through the probability distribution. A tad bit of recalling of the
probability theory can be of much help to thoroughly understand probability distribution.
Probability is one of the phenomena that helps us measure the certainty or uncertainty of
different outcomes in a given event.

 PROBABILITY DISTRIBUTION TYPES :


A) Cumulative or Normal Probability Distribution

The cumulative probability distribution can otherwise be known as a continuous


probability distribution. Under this category, the set of all the outcomes which can be
achieved can have values on a continuous range. Let us take the example of a set of real
numbers, as they are continuous and all the possible outcomes can also be real numbers.
And in the same way, complex numbers such as the whole number, prime numbers, etc.,
can also be examples. But these are all mathematical examples.

We should also know some real-life examples of continuous probability distributor. The
temperature of the day can be considered as one of the real-life examples of continuous
probability. And after achieving the outcomes, a distribution table can be made. Some
other examples of the normal probability distribution are rolling f a dice, judgments in the
competitions, sizes of female shoes, tossing of coins, range of weight of newborns,
population height of the world, etc.

B) Discrete or Binomial Probability Distribution


When the sets of outcomes are discrete, the distribution is known as discrete probability
Let’s say, for instance, that dice are rolled, hence, all the outcomes that are achieved are
discretely giving a mass of outcomes which is also known as probability mass function.
Some of the major examples of binomial probability distributions can be- finding several
used materials in a manufacturing field, taking a survey of negative and positive
feedbacks of people on anything, a number of women and men in an organization,
calculating how many people watch a channel through survey, etc.

Example of a Probability Distribution


As a simple example of a probability distribution, let us look at the number observed when
rolling two standard six-sided dice. Each die has a 1/6 probability of rolling any single
number, one through six, but the sum of two dice will form the probability distribution
depicted in the image below. Seven is the most common outcome (1+6, 6+1, 5+2, 2+5,
3+4, 4+3). Two and twelve, on the other hand, are far less likely (1+1 and 6+6).

CONTINUOUS DISTRIBUTION
 A continuous distribution is one in which data can take on any value within a specified
range (which may be infinite). A continuous distribution has an infinite number of possible
values, and the probability associated with any particular value of a continuous
distribution is null. Therefore, continuous distributions are normally described in terms of
probability density, which can be converted into the probability that a value will fall within a
certain range.

Example of a continuous distribution:


The continuous normal distribution can describe the distribution of weight of adult males.
For example, you can calculate the probability that a man weighs between 160 and 170
pounds.
The shaded region under the curve in this example represents the range from 160 and 170
pounds. The area of this range is 0.136; therefore, the probability that a randomly selected
man weighs between 160 and 170 pounds is 13.6%. The entire area under the curve
equals 1.0.

However, the probability that X is exactly equal to some value is always zero because the
area under the curve at a single point, which has no width, is zero. For example, the
probability that a man weighs exactly 190 pounds to infinite precision is zero. You could
calculate a nonzero probability that a man weighs more than 190 pounds, or less than 190
pounds, or between 189.9 and 190.1 pounds, but the probability that he weighs exactly
190 pounds is zero.

DISCRETE FUNCTIONS
A discrete distribution is a probability distribution that depicts the occurrence of discrete
(individually countable) outcomes, such as 1, 2, 3... or zero vs. one. The binomial
distribution, for example, is a discrete distribution that evaluates the probability of a "yes"
or "no" outcome occurring over a given number of trials, given the event's probability in
each trial—such as flipping a coin one hundred times and having the outcome be "heads".

Distribution is a statistical concept used in data research. Those seeking to identify the
outcomes and probabilities of a particular study will chart measurable data points from a
data set, resulting in a probability distribution diagram. There are many types of probability
distribution diagram shapes that can result from a distribution study, such as the normal
distribution ("bell curve").

Statisticians can identify the development of either a discrete or continuous distribution by


the nature of the outcomes to be measured. Unlike the normal distribution, which is
continuous and accounts for any possible outcome along the number line, a discrete
distribution is constructed from data that can only follow a finite or discrete set of
outcomes.

Discrete Distribution Example

Types of discrete probability distributions include:

 Poisson
 Bernoulli
 Binomial
 Multinomial

Example:

Consider an example where we are counting the number of people walking into a store in
any given hour. The values would need to be countable, finite, non-negative integers. It
would not be possible to have 0.5 people walk into a store, and it would not be possible to
have a negative amount of people walk into a store. Therefore, the distribution of the
values, when represented on a distribution plot, would be discrete.

UNIT 2
Simple Linear Regression
Simple linear regression is used to find out the best relationship between a single input
variable (predictor, independent variable, input feature, input parameter) & output variable
(predicted, dependent variable, output feature, output parameter) provided that both
variables are continuous in nature. This relationship represents how an input variable is
related to the output variable and how it is represented by a straight line.
To understand this concept, let us have a look at scatter plots. Scatter diagrams or plots
provides a graphical representation of the relationship of two continuous variables.

Coefficient of Determination
The coefficient of determination (denoted by R2) is a key output of regression analysis. It
is interpreted as the proportion of the variance in the dependent variable that is
predictable from the independent variable.
 The coefficient of determination is the square of the correlation (r) between
predicted y scores and actual y scores; thus, it ranges from 0 to 1.
 With linear regression, the coefficient of determination is also equal to the square
of the correlation between x and y scores.
 An R2 of 0 means that the dependent variable cannot be predicted from the
independent variable.
 An R2 of 1 means the dependent variable can be predicted without error from the
independent variable.
 An R2 between 0 and 1 indicates the extent to which the dependent variable is
predictable. An R2 of 0.10 means that 10 percent of the variance in Y is predictable
from X; an R2 of 0.20 means that 20 percent is predictable; and so on.
Regression analysis in Excel

Dependent variable (aka criterion variable) is the main factor you are trying to understand


and predict.

Independent variables (aka explanatory variables, or predictors) are the factors that might


influence the dependent variable.

Regression analysis helps you understand how the dependent variable changes when one
of the independent variables varies and allows to mathematically determine which of
those variables really has an impact.

Technically, a regression analysis model is based on the sum of squares, which is a


mathematical way to find the dispersion of data points. The goal of a model is to get the
smallest possible sum of squares and draw a line that comes closest to the data.

In statistics, they differentiate between a simple and multiple linear regression. Simple


linear regression models the relationship between a dependent variable and one
independent variables using a linear function. If you use two or more explanatory variables
to predict the dependent variable, you deal with multiple linear regression. If the dependent
variable is modeled as a non-linear function because the data relationships do not follow a
straight line, use nonlinear regression instead. The focus of this tutorial will be on a simple
linear regression.

As an example, let's take sales numbers for umbrellas for the last 24 months and find out
the average monthly rainfall for the same period. Plot this information on a chart, and the
regression line will demonstrate the relationship between the independent variable
(rainfall) and dependent variable (umbrella sales):

Linear regression equation

Mathematically, a linear regression is defined by this equation:

y = bx + a + ε

Where:

 x is an independent variable.


 y is a dependent variable.
 a is the Y-intercept, which is the expected mean value of y when all x variables are
equal to 0. On a regression graph, it's the point where the line crosses the Y axis.
 b is the slope of a regression line, which is the rate of change for y as x changes.
 ε is the random error term, which is the difference between the actual value of a
dependent variable and its predicted value.

The linear regression equation always has an error term because, in real life, predictors are
never perfectly precise. However, some programs, including Excel, do the error term
calculation behind the scenes. So, in Excel, you do linear regression using the least
squares method and seek coefficients a and b such that:

y = bx + a

For our example, the linear regression equation takes the following shape:

Umbrellas sold = b * rainfall + a

There exist a handful of different ways to find a and b. The three main methods to perform
linear regression analysis in Excel are:

 Regression tool included with Analysis ToolPak


 Scatter chart with a trendline
 Linear regression formula

:
Residual Analysis
Residual (or error) represents unexplained (or residual) variation after fitting a regression
model. It is the difference (or left over) between the observed value of the variable and
the value suggested by the regression model.

The difference between the observed value of the dependent variable (y) and the
predicted value (ŷ) is called the residual (e). Each data point has one residual.

Residual = Observed value – Predicted value

e=y–ŷ

Both the sum and the mean of the residuals are equal to zero. That is, Σ e = 0 and e = 0.

Residual Plots – A residual plot is a graph that shows the residuals on the vertical axis
and the independent variable on the horizontal axis. If the points in a residual plot are
randomly dispersed around the horizontal axis, a linear regression model is appropriate
for the data; otherwise, a non-linear model is more appropriate.

Tools for analysing residuals – For the basic analysis of residuals you will use the usual
descriptive tools and scatterplots (plotting both fitted values and residuals, as well as the
dependent and independent variables you have included in your model.

CONFIDENCE INTERVALS
Confidence Intervals are estimates that are calculated from sample data to determine
ranges likely to contain the population parameter (mean, standard deviation) of interest.
For example, if our population is (2,6), a confidence interval of the mean suggests that
the population mean is likely between 2 and
1. And how confidently can we say this? Obviously 100%, right? Because we know all the
values and we can calculate it very easily.

But in real-life problems, this is not the case. It is not always feasible or possible to study
the whole population. So what do we do? We take sample data. But can we rely on one
sample? No, because different samples from the same data will produce different mean.

So we take numerous random samples (from the same population) and calculate
confidence intervals for each sample and a certain percentage of these ranges will
contain the true population parameter.

This certain percentage is called the confidence level. A 95% confidence level means that
out of 100 random samples taken, I expect 95 of the confidence intervals to contain the
true population parameter.

PREDICTION INTERVALS
The range that likely contains the value of the dependent variable for a single new
observation given specific values of the independent variables, is the prediction interval.

The prediction interval predicts in what range a future individual observation will fall,
while a confidence interval shows the likely range of values associated with some
statistical parameter of the data, such as the population mean.

MULTIPLE LINEAR REGRESSION


Multiple linear regression refers to a statistical technique that is used to predict the
outcome of a variable based on the value of two or more variables. It is sometimes
known simply as multiple regression, and it is an extension of linear regression. The
variable that we want to predict is known as the dependent variable, while the variables
we use to predict the value of the dependent variable are known as independent or
explanatory variables.

To find the best-fit line for each independent variable, multiple linear regression
calculates three things:

 The regression coefficients that lead to the smallest overall model error.
 The t-statistic of the overall model.
 The associated p-value (how likely it is that the t-statistic would have occurred
by chance if the null hypothesis of no relationship between the independent
and dependent variables was true).

It then calculates the t-statistic and p-value for each regression coefficient in the model.

INTERPRETATION OF REGRESSION ANALYSIS


If you've ever wondered how two or more pieces of data relate to each other (e.g.
how GDP is impacted by changes in unemployment and inflation), or if you've ever had
your boss ask you to create a forecast or analyze predictions based on relationships
between variables, then learning regression analysis would be well worth your time.

In this article, you'll learn the basics of simple linear regression, sometimes called 'ordinary
least squares' or OLS regression—a tool commonly used in forecasting and financial
analysis. We will begin by learning the core principles of regression, first learning about
covariance and correlation, and then moving on to building and interpreting a regression
output. Popular business software such as Microsoft Excel can do all the regression
calculations and outputs for you, but it is still important to learn the underlying mechanics.

Variables
At the heart of a regression model is the relationship between two different variables,
called the dependent and independent variables. For instance, suppose you want to
forecast sales for your company and you've concluded that your company's sales go up
and down depending on changes in GDP.

The sales you are forecasting would be the dependent variable because their value
"depends" on the value of GDP and the GDP would be the independent variable. You would
then need to determine the strength of the relationship between these two variables in
order to forecast sales. If GDP increases/decreases by 1%, how much will your sales
increase or decrease?
Covariance
The formula to calculate the relationship between two variables is called covariance. This
calculation shows you the direction of the relationship. If one variable increases and the
other variable tends to also increase, the covariance would be positive. If one variable goes up
and the other tends to go down, then the covariance would be negative.
The actual number you get from calculating this can be hard to interpret because it isn't
standardized. A covariance of five, for instance, can be interpreted as a positive
relationship, but the strength of the relationship can only be said to be stronger than if the
number was four or weaker than if the number was six.

Correlation Coefficient
We need to standardize the covariance in order to allow us to better interpret and use it in
forecasting, and the result is the correlation calculation. The correlation calculation simply
takes the covariance and divides it by the product of the standard deviation of the two
variables. This will bind the correlation between a value of -1 and +1.

A correlation of +1 can be interpreted to suggest that both variables move perfectly


positively with each other and a -1 implies they are perfectly negatively correlated. In our
previous example, if the correlation is +1 and the GDP increases by 1%, then sales would
increase by 1%. If the correlation is -1, a 1% increase in GDP would result in a 1% decrease
in sales—the exact opposite.

Regression Equation
Below is the formula for a simple linear regression. The "y" is the value we are trying to
forecast, the "b" is the slope of the regression line, the "x" is the value of our independent
value, and the "a" represents the y-intercept. The regression equation simply describes the
relationship between the dependent variable (y) and the independent variable (x).

The intercept, or "a," is the value of y (dependent variable) if the value of x (independent
variable) is zero, and so is sometimes simply referred to as the 'constant.' So if there was
no change in GDP, your company would still make some sales. This value, when the
change in GDP is zero, is the intercept. Take a look at the graph below to see a graphical
depiction of a regression equation. In this graph, there are only five data points
represented by the five dots on the graph. Linear regression attempts to estimate a line
that best fits the data (a line of best fit) and the equation of that line results in the
regression equation.

Regressions in Excel
Now that you understand some of the background that goes into a regression analysis,
let's do a simple example using Excel's regression tools. We'll build on the previous
example of trying to forecast next year's sales based on changes in GDP. The next table
lists some artificial data points, but these numbers can be easily accessible in real life.

Yea Sale
GDP
r s
201 1.00
100
5 %
201 1.90
250
6 %
201 2.40
275
7 %
201 2.60
200
8 %
201 2.90
300
9 %

We can see that there is going to be a positive correlation between sales and GDP. Both
tend to go up together. Using Excel, all you have to do is click the Tools drop-down menu,
select Data Analysis  and from there choose Regression. The popup box is easy to fill in
from there; your Input Y Range is your "Sales" column and your Input X Range is the
change in GDP column; choose the output range for where you want the data to show up
on your spreadsheet and press OK. You should see something similar to what is given in
the table below:

   Regression Statistics             Coefficients

0.829224 Interce 34.5840


Multiple R
3 pt 9
 
88.1555
0.687613 GDP
R Square 2
 
Adjusted 0.583484    
- -
 
R Square
 
Standard 51.02180  
-
Error 7 -
   
Observations 5
- -

Interpretation

The major outputs you need to be concerned about for simple linear regression are the R-
squared, the intercept (constant) and the GDP's beta (b) coefficient. The R-squared
number in this example is 68.7%. This shows how well our model predicts or forecasts the
future sales, suggesting that the explanatory variables in the model predicted 68.7% of the
variation in the dependent variable. Next, we have an intercept of 34.58, which tells us that
if the change in GDP was forecast to be zero, our sales would be about 35 units. And
finally, the GDP beta or correlation coefficient of 88.15 tells us that if GDP increases by 1%,
sales will likely go up by about 88 units.

HETEROSCEDASTICITY
The word “heteroscedasticity” comes from the Greek, and quite literally means data with a
different (hetero) dispersion (skedasis). In simple terms, heteroscedasticity is any set of
data that isn’t homoscedastic. More technically, it refers to data with unequal variability
(scatter) across a set of second, predictor variables.

Multicollinearity
Multicollinearity is the occurrence of high intercorrelations among two or more
independent variables in a multiple regression model. Multicollinearity can lead to skewed
or misleading results when a researcher or analyst attempts to determine how well each
independent variable can be used most effectively to predict or understand the dependent
variable in a statistical model.

There are four types of Multicollinearity


 Perfect Multicollinearity – It exists when the independent variables in the equation
predict the perfect linear relationship.
 High Multicollinearity – It refers to the linear relationship between the two or more
independent variables which are not perfectly correlated to each other.
 Structural Multicollinearity – This is caused by the researcher himself by inserting
different independent variables in the equation.
 Data based Multicollineaarity – It is caused by experiments that are poorly
designed by the researcher.

Causes of Multicollinearity
Independent Variables, Change in the parameters of the Variables do that a little change in
the variables. There is a significant impact on the result & Data Collection refers to the
sample of the Selected population being taken.

Examples of Multicollinearity
Let’s assume that ABC Ltd, a KPO, has been hired by a pharmaceutical company to
provide research services and statistical analysis on the diseases in India. For this, ABC ltd
has selected age, weight, profession, height, and health as the prima facie parameters.
In the above example, there is a multicollinearity situation since the independent variables
selected for the study are directly correlated to the results. Hence it would be advisable for
the researcher to adjust the variables first before starting any project since the results will
be directly impacted because of the selected variables here.

UNIT 3

The process to formulate a Linear Programming problem


Let us look at the steps of defining a Linear Programming problem generically:

1. Identify the decision variables


2. Write the objective function
3. Mention the constraints
4. Explicitly state the non-negativity restriction

For a problem to be a linear programming problem, the decision variables, objective


function and constraints all have to be linear functions.

If all the three conditions are satisfied, it is called a Linear Programming Problem.

Formulating a problem 

Example: Consider a chocolate manufacturing company that produces only two types of


chocolate – A and B. Both the chocolates require Milk and Choco only.  To manufacture
each unit of A and B, the following quantities are required:

Each unit of A requires 1 unit of Milk and 3 units of Choco

Each unit of B requires 1 unit of Milk and 2 units of Choco

The company kitchen has a total of 5 units of Milk and 12 units of Choco. On each sale, the
company makes a profit of

Rs 6 per unit A sold

Rs 5 per unit B sold.

Now, the company wishes to maximize its profit. How many units of A and B should it
produce respectively?

Solution: The first thing I’m gonna do is represent the problem in a tabular form for better
understanding.

Milk Choco Profit per unit


A 1 3  Rs 6
B 1 2  Rs 5
Total 5 12
 

Let the total number of units produced by A be = X

Let the total number of units produced by B be = Y

Now, the total profit is represented by Z

The total profit the company makes is given by the total number of units of A and B
produced multiplied by its per-unit profit of Rs 6 and Rs 5 respectively.

Profit: Max Z = 6X+5Y

which means we have to maximize Z.


The company will try to produce as many units of A and B to maximize the profit. But the
resources Milk and Choco are available in a limited amount.

As per the above table, each unit of A and B requires 1 unit of Milk. The total amount of
Milk available is 5 units. To represent this mathematically,

X+Y ≤ 5

Also, each unit of A and B requires 3 units & 2 units of Choco respectively. The total
amount of Choco available is 12 units. To represent this mathematically,

3X+2Y ≤ 12

Also, the values for units of A can only be integers.

So we have two more constraints, X ≥ 0  &  Y ≥ 0

For the company to make maximum profit, the above inequalities have to be satisfied.

APPLICATION OF LINEAR PROGRAMMING

Linear programming is used to obtain optimal solutions for operations research. Using
linear programming allows researchers to find the best, most economical solution to a
problem within all of its limitations, or constraints. Many fields use linear programming
techniques to make their processes more efficient. These include food and agriculture,
engineering, transportation, manufacturing and energy.

Multiple Criteria Decision Making


When taking a decision, there might not always be a finite number of choices or there
might be many alternatives to the original decision. There is also some possibility for not
having a suitable choice for the criterion. Multiple Criteria Decision Making (MCDM) is an
approach designed for the evaluation of problems with a finite or an infinite number of
choices.

These steps can be briefly described as follows.


 Identifying the objective/goal of the decision-making process This step is
straightforward and involves the correct identification of the goal or the final output
of the decision-making process.
 Selection of criteria: The criteria should be consistent with the decision and also
should be independent of each criterion. They should also be represented on the
same and a measurable scale and should be inter- related with the alternatives.
 Selection of alternatives: When selecting alternatives, attributes such as availability
and comparability must be taken into consideration. The alternatives also should be
realistic and practical.
 Selection of the Weighing Methods: The weighting of criteria specifies the
importance of them. This weight can be determined using the two methods
(Compensatory and Outranking) methods discussed before in this section.
 Aggregation: This step will separate the best alternative from the available options.
 This could be a mathematical function or an average.

GOAL PROGRAMMIMG
Goal programming is a branch of multi-objective optimization, which in turn is a branch of
multi-criteria decision analysis (MCDA). It can be thought of as an extension or
generalisation of linear programming to handle multiple, normally conflicting objective
measures. Each of these measures is given a goal or target value to be achieved.
Deviations are measured from these goals both above and below the target. Unwanted
deviations from this set of target values are then minimised in an achievement function.
This can be a vector or a weighted sum dependent on the goal programming variant
used. As satisfaction of the target is deemed to satisfy the decision maker(s),
an underlying satisficing philosophy is assumed. Goal programming is used to perform
three types of analysis:

 Determine the required resources to achieve a desired set of objectives.

 Determine the degree of attainment of the goals with the available resources.

 Providing the best satisfying solution under a varying number of resources and
priorities of the goals.

major strength of goal programming is its simplicity and ease of use. This accounts for
the large number of goal programming applications in many and diverse fields. Linear
goal programmes can be solved using linear programming software as either a single
linear programme, or in the case of the lexicographic variant, a series of connected linear
programmes.
Goal programming can hence handle relatively large numbers of variables, constraints
and objectives. A debated weakness is the ability of goal programming to produce
solutions that are not Pareto efficient. This violates a fundamental concept of decision
theory, that no rational decision maker will knowingly choose a solution that is not Pareto
efficient. However, techniques are available to detect when this occurs and project the
solution onto the Pareto efficient solution in an appropriate manner.

Linear Programming vs Goal Programming

Unquestionably, linear programming models are among the most commercially


successful applications of operations research. But, one of the limitations of linear
programming is that its objective function is unidimensional, i.e., the decision maker
strives for a single objective, such as profit maximization or cost minimization. To the
contrary, in goal programming, the objective function contains primarily the
deviational variables that represent each goal or sub-goal.

Another limitation of LP is that the management must accurately quantify the


relationship of the variables in cardinal values (numbers that express exact values
such as 1, 3, 4.5, etc.). But, when the goals are incommensurable, then these goals
cannot be assigned cardinal values. These two shortcomings can be overcome by
using the goal programming technique.

The setting of appropriate weights in the goal programming model is another area
that has caused debate, with some authors suggesting the use of the analytic
hierarchy process or interactive methods for this purpose.

ANALYTIC HIERARCHY PROCESS (AHP)


The Analytic Hierarchy Process (AHP) is a method for organizing and analyzing
complex decisions, using math and psychology. It was developed by Thomas L. Saaty
in the 1970s and has been refined since then. It contains three parts: the ultimate goal
or problem you're trying to solve, all of the possible solutions, called alternatives, and
the criteria you will judge the alternatives on. AHP provides a rational framework for a
needed decision by quantifying its criteria and alternative options, and for relating
those elements to the overall goal.
Stakeholders compare the importance of criteria, two at a time, through pair-wise
comparisons. Example, do you care about job benefits or having a short commute
more, and by how much more? AHP converts these evaluations into numbers, which
can be compared to all of the possible criteria. This quantifying capability
distinguishes the AHP from other decision making techniques.
In the final step of the process, numerical priorities are calculated for each of the
alternative options. These numbers represent the most desired solutions, based on all
users' values. 

EXAMPLE OF HOW IT WORKS


We've mentioned how AHP is unique because it can quantify criteria and alternatives,
but what does that really look like? As an end user of Prioritization Helper, you won't see
the calculations going on behind the scenes. Here is a quick look of the calculations
behind a result.
Let's pretend the Smith family wants to decide the best city to live in - City A, B, C, or D.
The goal is to determine which city is best, given the criteria - Culture, Close to Family,
Jobs, Housing, and Transportation. They weigh the criteria, and compare the four city
alternatives to the criteria. The following tables illustrate the derived data based on their
input. In general, all of the decimals will add up to 1, and higher decimals equals a higher
priority.
Table 2 shows how the criteria were rated against each other. Looking at the top row,
Culture scored a "3" above Housing and a "5" above Transportation, while Family scored
a "5" above Culture, and Jobs scored a "2" above Culture. This gives Culture 15.2% of the
criteria priority, with the most important criteria being Family, at 43.3%.

The next table demonstrates the weights of each alternative against the criteria Family.
Here, City C was the closest to family, while City D was the furthest. This would be
repeated for every criteria.

Finally, the weighed importance of each criteria is then multiplied against the score of
each alternative to get the weighed score (For City A's weighted Cultural score: .152 x .
163 = .024776). Add all new criteria numbers together to get the Overall Priority score
(For City A: .024776 + .09093 + .018864 + .085095 + .009462 = .229)

APPLICATIONS OF GOAL PLANNING


 Analytics Producers: those who do the work of analyzing data and developing
decision support systems that leverage analytics techniques for making better
decisions.
 Analytics Consumers: the members of the organization that will be responsible
for executing the results of the analytical work.
 Analytics Champions: the leaders in the organization that act as sponsors of
projects. They typically lead a component of the organization that will benefit
from the application of analytics.
 Analytics Enablers: the supporting parts of an organization, such as information
technology, data stewards, and graphical user interface designers, who provide
supporting functions for successful analytics projects.
 Identify new business processes that can realize value from analytics capabilities
 Define detailed functional designs for analytic models and supporting data
structures
 Define detailed technical designs for analytic models and supporting data
sources
 Assess the volume, velocity, and variety of an analytic subject area
 Build analytic models and data staging area to support incoming data sources
 Provide training and education for business analysts throughout the organization
on the types of analytic models and data sets available
 Test and deploy new analytic models into production

AHP IN SOLVING PROBLEMS


Although the AHP is one of the most advanced methods available in the field of
management science and operations research, the complexity involved in using this tool
makes it difficult to apply. Thankfully software tools have been built which automate the
mathematics intensive part. The user has to follow a simple methodology of data
collection which is then fed into the tool to get the results.

Here is the procedure for doing the same:

Step 1: Define Alternatives

The AHP process begins by defining the alternatives that need to be evaluated. These
alternatives could be the different criteria that solutions must be evaluated against. They
could also be the different features of a product that need to be weighted to better
understand the customers perception. At the end of step 1, a comprehensive list of all the
available alternatives must be ready.

Step 2: Define the Problem and Criteria

The next step is to model the problem. According to AHP methodology, a problem is a
related set of sub problems. The AHP method therefore relies on breaking the problem
into a hierarchy of smaller problems. In the process of breaking down the sub-problem,
criteria to evaluate the solutions emerge. However, like root cause analysis, a person can
go on and on to deeper levels within the problem. When to stop breaking the problem into
smaller sub problems is a subjective judgement.

Example: A firm needs to decide on the best investment option amongst stocks, bonds,
real estate and gold. If the AHP method is used, the problem of best investment will be
broken down into smaller problems like protection from downfall, maximum chance of
appreciation, liquidity in the market and so on. Each of these sub problems can then be
broken into smaller problems till the management feels that the appropriate criteria has
been reached.

Step 3: Establish Priority amongst Criteria Using Pairwise Comparison

The AHP method uses pairwise comparison to create a matrix. For example the firm will
be asked to weigh the relative importance of protection from downfall vs. liquidity. Then
in the next matrix, there will be a pairwise comparison between liquidity and chance of
appreciation and so on. The managers will be expected to fill this data as per the
expectations of the end consumer or the people who are going to use the process.

Step 4: Check Consistency

This step is inbuilt in most software tools that help solve AHP problems. For instance if I
say that liquidity is twice as important as protection from downfall and in the next matrix I
say that protection from downfall is half as important as chance of appreciation, then the
following situation emerges:

Liquidity = 2 (Protection from downfall)

Protection from downfall = ½ (Chance of appreciation)

Therefore, Liquidity must equal chance of appreciation.

However, if in the pairwise comparison of liquidity and chance of appreciation, if I have


given a weight of more or less than 1, then my data is inconsistent. Inconsistent data
gives inconsistent results, hence prevention is better than cure.

Step 5: Get the Relative Weights

The software tool will run the mathematical calculation based on the data and assign
relative weights to the criteria. Once the equation is ready with weighted criteria, one can
evaluate the alternatives to get the best solution that matches their needs.

UNIT 4
What Is Stochastic Modeling?
Stochastic modeling is a form of financial model that is used to help make investment
decisions. This type of modeling forecasts the probability of various outcomes under
different conditions, using random variables.
Stochastic modeling presents data and predicts outcomes that account for certain levels
of unpredictability or randomness. Companies in many industries can employ stochastic
modeling to improve their business practices and increase profitability. In the financial
services sector, planners, analysts, and portfolio managers use stochastic modeling
to manage their assets and liabilities and optimize their portfolios.

Who Uses Stochastic Modeling?


Stochastic modeling is used in a variety of industries around the world. The insurance
industry, for example, relies heavily on stochastic modeling to predict how company
balance sheets will look at a given point in the future. Other sectors, industries, and
disciplines that depend on stochastic modeling include stock investing, statistics,
linguistics, biology, and quantum physics.

Markov model
A Markov model is a Stochastic method for randomly changing systems where it is
assumed that future states do not depend on past states. These models show all possible
states as well as the transitions, rate of transitions and probabilities between them.

Markov models are often used to model the probabilities of different states and the rates
of transitions among them. The method is generally used to model systems. Markov
models can also be used to recognize patterns, make predictions and to learn the
statistics of sequential data.

There are four types of Markov models that are used situationally:

Markov chain - used by systems that are autonomous and have fully observable states

Hidden Markov model - used by systems that are autonomous where the state is partially
observable.

Markov decision processes - used by controlled systems with a fully observable state.

Partially observable Markov decision processes - used by controlled systems where the
state is partially observable.

Markov models can be expressed in equations or in graphical models. Graphic Markov


models typically use circles (each containing states) and directional arrows to indicate
possible transitional changes between them. The directional arrows are labeled with the
rate or the variable one for the rate. Applications of Markov modeling include modeling
languages, natural language processing (NLP), image processing, bioinformatics, speech
recognition and modeling computer hardware and software systems.

MARKOV DECISION PROCESS IN SEQUENTIAL


DECISION MAKING

Markov Decision Process (MDP) is a foundational element of reinforcement learning (RL).


MDP allows formalization of sequential decision making where actions from a state not
just influences the immediate reward but also the subsequent state. It is a very useful
framework to model problems that maximizes longer term return by taking sequence of
actions. Chapter 3 of the book “Reinforcement Learning — An Introduction” by Sutton and
Barto provides an excellent introduction to MDP.

Expressing a problem as an MDP is the first step towards solving it through techniques
like dynamic programming or other techniques of RL. A robot playing a computer game or
performing a task are often naturally maps to an MDP. But many other real- world
problems can be solved through this framework too. Not many real -world examples are
readily available though. This article provides some real -world examples of finite MDP. We
also show the corresponding transition graphs which effectively summarizes the MDP
dynamics. Such examples can serve as good motivation to study and develop skills to
formulate problems as MDP.

FUTURE TRENDS IN BUSINESS ANALYTICS

Artificial Intelligence

AI aims to make machines perform what is typically done by complex human


intelligence. AI & ML are transfiguring the way we interact with our analytics & data
management while adding in security measures must be considered. The fact is that it
will impact our lives if we like it or not.

Solutions like AI algorithms based on the most advanced neural networks render high
accuracy in inconsistent detection as it learns from past trends & patterns. That way, any
unusual event will be instantly registered & the system will alert the user.

AI offers exclusive insights capability in BI solutions. It thoroughly evaluates your dataset


automatically without requiring any effort on your part. You simply pick the data source
you want to assess & the variance that the algorithm should focus on.

Another rising factor in BI's future is testing AI in a duel. For example, one AI will create a
realistic image, and another one will try to ascertain if the image is artificial or not. This
concept is also called GANs (Generative adversarial networks) and can be utilized in
online verification processes.

Data Visualization

Data discovery has raised its impact in the previous year. Data visualization was listed in
the top 2 BI trends in the Business Application Research Centre survey.

A crucial element to consider is that data visualization tools depend upon a process, and
later, the produced findings will bring business value. It needs understanding the
relationship between data in the form of visual analysis, data preparation, & guided
advanced analytics.

Data visualizations have transformed into state-of-the-art solutions to present & interact
with several graphics on one screen, whether it's focused on building sales charts or all-
inclusive interactive reports. Since humans process visual data better, data visualization
will be the essential addition in BI trend 2021.

Data security

Data & information security has been on everyone's mind in 2020 and will continue to
create a buzz in 2021. The privacy regulation's implementation, like GDPR in the EU & the
CCPA in the USA, has set building blocks for data security & management of user's
details.

Irrespective of the advancements, the global investment in information security products


& services will rise by 2.4% than last year. Despite the pandemic hampering the growth, it
certainly didn't entirely stop it. Gartner highlights the primary drivers to worldwide
security spending’s.
Privacy regulations
- The need to address digital business risks
- Concentrate on creating detection & response capabilities

SaaS BI

Several businesses have switched to SaaS BI to access any data from the cloud & gain
more flexibility from any gadget. Such technologies that allow data movement & access
from various places will continue to rise as one of the most imp BI trends in 2021.

SaaS is getting remote-friendly, and disparate teams that require solutions will enhance
their business processes & guarantee there are no obstacles by working remotely.

Predictive & Prescriptive Analytics Tools

Predictive analytics is the practice of pulling information from current data sets to predict
future possibilities. It's an extension of data mining that refers to historical data. The
predictive analysis involves forecasted future data & thus always includes the likelihood
of errors from its definition. The predictive research shows what might happen in the
future with an acceptable degree of reliability with some alternative scenarios & risk
assessment.

BUSINESS ANALYTICS
EXCEL FUNCTIONS AND FORMULAS
SUM Function
The SUM function adds values. You can add individual values, cell references or ranges or a mix
of all three.
Formula: =SUM (D5:I5)

1. Select a cell next to the numbers you want to sum: To sum a column, select the cell
immediately below the last value in the column.
2. Click the AutoSum button on either the Home or Formulas tab.
3. Press the Enter key to complete the formula.
AVG Function
Returns the average (arithmetic mean) of the arguments.
Formula: =AVERAGE (D5:I5)

1. Click a cell below, or to the right, of the numbers for which you want to find the average.
2. On the Home tab, in the Editing.
Count Function
The COUNT function counts the number of cells that contain numbers, and counts numbers within
the list of arguments. Use the COUNT function to get the number of entries in a number field
that is in a range or array of numbers.
Formula: =COUNT (D5:D19)

1. Select a Blank Cell, let's say cell E2


2. Enter formula =COUNT (B2:B9,”>0″) into the Formula Bar.
3. Press Enter.
Min & Max Function

For MIN
The Excel MIN function returns the smallest numeric value in a range of values. The MIN
function ignores empty cells, the logical values TRUE and FALSE, and text values. Get the
smallest value.
Formula: = MIN (D5:D19)

1. Get the smallest value.


2. The smallest value in the array.
3. =MIN (number1, [number2], ...)
4. number1 - Number, reference to numeric value, or range that contains numeric values.

For Max
MAX will return the largest value in a given list of arguments. From a given set of numeric
values, it will return the highest value.
Formula: =MAX (D5:D19)

1. Select any cell where you want to see the result.


2. Select MAX function.
3. A new dialog box named “Function Arguments” opens.
4. Enter the 1st and 2nd Number
DATE Function

Date Formula
The Excel DATE function creates a valid date from individual year, month, and day components.
The DATE function is useful for assembling dates that need to change dynamically based on other
values in a worksheet.
Formula: = DAY (C5)

Now Formula
The NOW function in Excel is a formula that displays the current date and time. It is
automatically refreshed anytime the workbook is opened or a change is made.
Formula: = NOW ()

Today Formula
The TODAY function returns the current date, and will continually update each time the
worksheet is updated. Use F9 to force the worksheet to recalculate and update the value. The
value returned by the TODAY function is a standard Excel date.
Formula: = TODAY ()

Month Formula
The Excel MONTH function extracts the month from a given date as number between 1 to 12.
You can use the MONTH function to extract a month number from a date into a cell, or to feed a
month number into another function like the DATE function
Formula: = MONTH (C6)
Year Formula
The Excel YEAR function returns the year component of a date as a 4-digit number. You can
use the YEAR function to extract a year number from a date into a cell or to extract and feed a
year value into another formula, like the DATE function.
Formula: = YEAR (C7)

NETWORKDAYS Function
The NETWORKDAYS Function calculates the number of workdays between two dates in
Excel. When using the function, the number of weekends are automatically excluded. It also
allows you to skip specified holidays and only count business days. It is categorized in Excel as a
Date/Time Function.
Formula: NETWORKDAYS (B11,C11,D11:D14)

NETWORKDAYS. INTL Function


The Excel NETWORKDAYS. INTL function calculates the number of working days between
two dates. NETWORKDAYS. INTL can optionally exclude a list of holidays and provides a way to
specify which days of the week are considered weekends.
Formula: NETWORKDAYS.INTL(B11,C11,1,D11:D14)
1. Be on the cell, where you want the number of working days to be displayed.
2. Write the formula =NETWORKDAYS. ...
3. For weekends you will get a list of options, from which you need to choose your required
weekend holidays.

Conditional Formatting
Conditional formatting is a feature in many spreadsheet applications that allows you to apply
specific formatting to cells that meet certain criteria. It is most often used as color-based formatting
to highlight, emphasize, or differentiate among data and information stored in a spreadsheet.

1. Select the range A1:A10.


2. On the Home tab, in the Styles group, click Conditional Formatting.
3. Click Highlight Cells Rules, Greater Than.
4. Enter the value 80 and select a formatting style.
5. Click OK. Result. Excel highlights the cells that are greater than 80.
6. Change the value of cell A1 to 81.
Pivot Table
A Pivot Table is used to summarise, sort, reorganise, group, count, total or average data
stored in a table. It allows us to transform columns into rows and rows into columns. It allows
grouping by any field (column), and using advanced calculations on them.

1. Click a cell in the source data or table range.


2. Go to Insert > PivotTable. ...
3. Excel will display the Create PivotTable dialog with your range or table name selected.
4. In the Choose where you want the PivotTable report to be placed section, select New
Worksheet, or Existing Worksheet.
Pivot Chart
A Pivot chart is an interactive way to quickly summarize large amounts of data. You
can use a Pivot chart to analyze numerical data in detail, and answer unanticipated questions
about your data. 

1. Select a cell in your table.


2. Select PivotTable Tools > Analyze > PivotChart .
3. Select a chart.
4. Select OK.

DATA VALIDATION
Data validation is an Excel feature that lets you control what users enter a cell, it allows
to restrict the various user while entering data types such as date, whole number, decimal,
text.
Basically, there are seven (7) types of data you can set:
o Whole Number - to restrict the cell to accept only whole numbers.
o Decimal - to restrict the cell to accept only decimal numbers.
o List - to pick data from the drop-down list.
o Date - to restrict the cell to accept only date.
o Time - to restrict the cell to accept only time.
o Text Length - to restrict the length of the text.
o Custom – for custom formula.

 Select the cell(s) you want to create a rule for.


 Select Data >Data Validation.
 On the Settings tab, under Allow, select an option:
 Under Data, select a condition.
 Set the other required values based on what you chose for Allow and Data.
 Select the Input Message tab and customize a message users will see when
entering data.
 Select the Show input message when cell is selected checkbox to display the
message when the user selects or hovers over the selected cell(s).
 Select the Error Alert tab to customize the error message and to choose a Style.
 Select OK.
Sort 
When you use the filter and sort option on an excel spreadsheet, it allows you to narrow down a
large spreadsheet to show just the information you are wanting to see.

1. Select any cell in the data range.


2. On the Data tab, in the Sort & Filter group, click Sort.
3. In the Sort dialog box, under Column, in the Sort by box, select the first column that you
want to sort.
4. Under Sort On, select the type of sort.
5. Under Order, select how you want to sort.
Filter
In addition to sorting, you may find that adding a filter allows you to better analyze your data.
When data is filtered, only rows that meet the filter criteria will display and other rows will be
hidden. With filtered data, you can then copy, format, print, etc., your data, without having to sort
or move it first.

1. Select any cell within the range.


2. Select Data > Filter.
3. Select the column header arrow .
4. Select Text Filters or Number Filters, and then select a comparison, like Between.
5. Enter the filter criteria and select OK.
Find and Replace
Use the Find and Replace features in Excel to search for something in your workbook, such as a
particular number or text string. You can either locate the search item for reference, or you can
replace it with something else. You can include wildcard characters such as question marks, tildes,
and asterisks, or numbers in your search terms. You can search by rows and columns, search within
comments or values, and search within worksheets or entire workbooks.

 For Find
 In the Find what: box, type the text or numbers you want to find.
 Click Find Next to run your search.
 You can further define your search if needed: Within: To search for data in a
worksheet or in an entire workbook, select Sheet or Workbook.
 For Replace
 Select the cells that have the formula in which you want to replace the reference.
 If you want to replace in the entire worksheet, select the entire worksheet.
 Go to Home –> Find and Select –> Replace (Keyboard Shortcut – Control + H).
 Click on Replace All.

Hyperlink
The HYPERLINK function creates a shortcut that jumps to another location in the current
workbook, or opens a document stored on a network server, an intranet, or the Internet. When
you click a cell that contains a HYPERLINK function, Excel jumps to the location listed, or opens
the document you specified.
1. Select cell A2.
2. On the Insert tab, in the Links group, click Link. The 'Insert Hyperlink' dialog box appears.
3. Click 'Place in This Document' under Link to.
4. Type the Text to display, the cell reference, and click OK. Result:

IF Function
Use the IF function, one of the logical functions, to return one value if a condition is true and
another value if it's false.

1. Select the cell where you want to insert the IF formula.


2. Type =IF(
3. Insert the condition that you want to check, followed by a comma (,).
4. Insert the value to display when the condition is TRUE, followed by a comma (,).
5. Insert the value to display when the condition is FALSE.
AND Function
The Excel AND function is a logical function used to require more than one condition at the
same time. AND returns either TRUE or FALSE. To test if a number in A1 is greater than zero and
less than 10, use =AND(A1>0,A1<10).
1. Select the cell that you want to evaluate. ...
2. On the Formulas tab, in the Formula Auditing group, click Evaluate Formula.
3. Click Evaluate to examine the value of the underlined reference. ...
4. Continue until each part of the formula has been evaluated.
5. To see the evaluation again, click Restart.
OR Function
The OR function is a logical function to test multiple conditions at the same time. OR returns
either TRUE or FALSE. For example, to test A1 for either "x" or "y", use =OR(A1="x",A1="y").

1. Select a cell.
2. Click the Insert Function button. The 'Insert Function' dialog box appears.
3. Search for a function or select a function from a category. ...
4. Click OK. ...
5. Click in the Range box and select the range A1:C2.
6. Click in the Criteria box and type >5.
7. Click OK.
SUBSTITUTE Function
The Excel SUBSTITUTE function replaces text in a given string by matching.

1. Select the range of cells where you want to replace text or numbers.
2. Press the Ctrl + H shortcut to open the Replace tab of the Excel Find and Replace dialog.
3. In the Find what box type the value to search for, and in the Replace with box type the
value to replace with.
REPLACE Function
The function will replace part of a text string, based on the number of characters you specify,
with a different text string. In financial analysis, the REPLACE function can be useful if we wish to
remove text from a cell when the text is in a variable position.
LOWER Function
LOWER function converts all letters in the specified string to lowercase. If there are characters in
the string that are not letters, they are unaffected by this function.

1. Convert text to lower case.


2. =LOWER (text)
3. text - The text that should be converted to lower case.
4. The LOWER function converts a text string to all lowercase letters.

UPPER Function
Use the UPPER function to replace every lowercase letter in a character string with
an uppercase letter.

5. Convert text to upper case.


6. =LOWER (text)
7. text - The text that should be converted to upper case.
8. The UPPER function converts a text string to all uppercase letters.
Flash Fill 
Flash Fill automatically fills your data when it senses a pattern. For example, you can use
Flash Fill to separate first and last names from a single column, or combine first and last names
from two different columns.

1. Enter the full name in cell C2, and press ENTER.


2. Start typing the next full name in cell C3.
3. To accept the preview, press ENTER.

Text to Columns 
Text to Columns is an amazing feature in Excel that deserves a lot more credit than it usually
gets. As it's name suggests, it is used to split the text into multiple columns. For example, if you
have a first name and last name in the same cell, you can use this to quickly split these into two
different cells.

1. Add entries to the first column and select them all.


2. Choose the Data tab atop the ribbon.
3. Select Text to Columns.
4. Ensure Delimited is selected and click Next.
5. Clear each box in the Delimiters section and instead choose Comma and Space.
6. Click Finish.

Goal Seek
Goal Seek is a process of calculating a value by performing what-if analysis on a given set of
values.

1. On the Data tab, in the Data Tools group, click What-If Analysis, and then click Goal Seek.
2. In the Set cell box, enter the reference for the cell that contains the formula that you want to
resolve. ...
3. In the To value box, type the formula result that you want.
LOOKUP Function 
The Excel LOOKUP function performs an approximate match lookup in a one-column or one-row
range, and returns the corresponding value from another one-column or one-row range.
LOOKUP's default behavior makes it useful for solving certain problems in Excel.
Syntax: = LOOKUP ($B$16, $B$2: $B$10, A$2: A$10)

1. Organize the data.


2. Tell the function what to lookup.
3. Tell the function where to look.
4. Tell Excel what column to output the data from.
5. Exact or approximate match.

HLOOKUP
If you wish to get an array, you need to select the number of cells that are equal to the number of
rows that you want HLOOKUP to return.
Syntax: = HLOOKUP (C3, $B$12: $G$13,2, TRUE)

1. Open the HLOOKUP formula in the salary column and select the lookup value as Emp ID.
2. Next thing is we need to select the table array, i.e., the main table. ...
3. Now, we need to mention the row number, i.e., from which row of the main table we are
looking for the data. ...
4. The final part is a range lookup.
VLOOKUP
VLOOKUP stands for 'Vertical Lookup'. It is a function that makes Excel search for a certain
value in a column (the so called 'table array'), in order to return a value from a different column in
the same row.
Syntax: = VLOOKUP (C3, $G$2: $H$7,2, TRUE)

1. value - The value to look for in the first column of a table.


2. table - The table from which to retrieve a value.
3. col_index - The column in the table from which to retrieve a value.
4. range_lookup - [optional] TRUE = approximate match (default). FALSE = exact match.
VLOOKUP MATCH
VLOOKUP MATCH allows you to perform a matrix lookup – instead of just looking up a vertical
value, the MATCH portion of the formula turns your column reference into a dynamic horizontal
lookup as well.
Formula: =VLOOKUP ( lookup value , table_array , col_index_num , [range_lookup] )

1. In the parentheses, enter your lookup value, followed by a comma. ...


2. Enter your table array or lookup table, the range of data you want to search, and a comma:
(H2,B3:F25,
3. Enter column index number. ...
4. Enter the range lookup value, either TRUE or FALSE.
IFERROR function
The Excel IFERROR function returns a custom result when a formula generates an error, and
a standard result when no error is detected. IFERROR is an elegant way to trap and manage
errors without using more complicated nested IF statements. Trap and handle errors. The value
you specify for error conditions.
SYNTAX: =IFERROR(value,value_if_error).

1. Step 1: Click on the cell F5.


2. Step 2: Write down the formula: =IFERROR(C5/D5,”-“).
3. Step 3: Press Enter.
4. Step 4: Now drag and copy the same formula to all the cells.
Nested IF functions
Nested IF functions, meaning one IF function inside of another, allows you to test multiple
criteria and increases the number of possible outcomes. We want to determine a student's grade
based on their score.

1. Click the cell in which you want to enter the formula.


2. To start the formula with the function, click Insert Function on the formula bar.
3. In the Or select a category box, select All.
4. To enter another function as an argument, enter the function in the argument box that you
want.
COUNTIF 
COUNTIF is an Excel function to count cells in a range that meet a single condition. COUNTIF can
be used to count cells that contain dates, numbers, and text. The criteria used in COUNTIF
supports logical operators (>,<,<>,=) and wildcards (*,?) for partial matching. A number
representing cells counted.
Syntax
=COUNTIF (range, criteria)

1. Select a Blank Cell, let's say cell E2 (in our Case)


2. Enter formula =COUNTIF (B2:B9,”>0″) into the Formula Bar.
3. Press Enter.
4. The Selected Blank cell will populate the number of Cells Greater than 0. In our Case, we
get the value as 8.
SUMIF function
The SUMIF function is a worksheet function that adds all numbers in a range of cells based on
one criteria.
Syntax
=SUMIF (range, criteria, [sum range])

1. Select an empty cell.


2. Determine the initial cell range.
3. Determine the SUMIF criteria.
4. Determine your sum_range criteria.
AVERAGEIF function 
AVERAGEIF function returns the average (arithmetic mean) of all numbers in a range of
cells, based on a given criteria. The AVERAGEIF function is a built-in function in Excel that is
categorized as a Statistical Function.
Syntax 
=AVERAGEIF (range, criteria, [average range])

1. Open the AVERAGEIF function in one of the cells.


2. Select the range as product list, i.e., from A2 to A10.
3. So, now out of a selected range of products for which product we need to find the average.
4. Next up for which numbers we need to find the average.
INDEX function
The INDEX function is one of Excel's most powerful features. The older brother of the much-used
VLOOKUP , INDEX  allows you to look up values in a table based off of other rows and columns.
And, unlike VLOOKUP , it can be used on rows, columns, or both at the same time.

1. Type “=INDEX(” and select the area of the table, then add a comma.
2. Type the row number for Kevin, which is “4,” and add a comma.
3. Type the column number for Height, which is “2,” and close the bracket.
MATCH Function
The MATCH function searches for a specified item in a range of cells, and then returns the
relative position of that item in the range. ... For example, if the range A1:A3 contains the values 5,
25, and 38, then the formula =MATCH(25,A1:A3,0) returns the number 2, because 25 is the
second item in the range.

1. Type “=MATCH(” and link to the cell containing “Height”… the criteria we want to look up.
2. Select all the cells across the top row of the table.
3. Type zero “0” for an exact match.

You might also like