BA File
BA File
ON
Nominal Data
Nominal data is one of the types of qualitative information which helps to label the
variables without providing the numerical value. Nominal data is also called the nominal
scale. It cannot be ordered and measured. But sometimes, the data can be qualitative and
quantitative. Examples of nominal data are letters, symbols, words, gender etc.
The nominal data are examined using the grouping method. In this method, the data are
grouped into categories, and then the frequency or the percentage of the data can be
calculated. These data are visually represented using the pie charts.
Ordinal Data
Ordinal data/variable is a type of data which follows a natural order. The significant
feature of the nominal data is that the difference between the data values is not
determined. This variable is mostly found in surveys, finance, economics, questionnaires,
and so on.
The ordinal data is commonly represented using a bar chart. These data are investigated
and interpreted through many visualisation tools. The information may be expressed using
tables in which each row in the table shows the distinct category.
Discrete Data
Discrete data can take only discrete values. Discrete information contains only a finite
number of possible values. Those values cannot be subdivided meaningfully. Here, things
can be counted in the whole numbers.
Example: Number of students in the class
Continuous Data
Continuous data is data that can be calculated. It has an infinite number of probable
values that can be selected within a given specific range.
Example: Temperature range
Data Summarization
The term Data Summarization refers to presenting the summary of generated data in an
easily comprehensible and informative manner. Presenting the raw data, (the data that
was generated which is essentially the entire repertoire of datasets- individual
measurements) is not practical in many cases.
Tabular Presentation
A table helps to represent even a large amount of data in an engaging, easy to read, and
coordinated manner. The data is arranged in rows and columns. This is one of the most
popularly used forms of presentation of data as data tables are simple to prepare and
read.
Objectives Of Tabulation
Graphic Presentation
Graphic presentation represents a highly developed body of techniques for elucidating,
interpreting, and analyzing numerical facts by means of points, lines, areas, and other
geometric forms and symbols. Graphic techniques are especially valuable in presenting
quantitative data in a simple, clear, and effective manner, as well as facilitating
comparisons of values, trends, and relationships. They have the additional advantages of
succinctness and popular appeal; the comprehensive pictures they provide can bring out
hidden facts and relationships and contribute to a more balanced understanding of a
problem.
Charts
Charts are a great way to visually represent all kinds of information, from the simple to the
very complex.
You can have a variety of data which can be used in presentations. Some of these chart
types include:
Time Series
Bar Charts
Combo Charts
Pie Charts
Tables
Geo Map
Scorecard
Scatter Charts
Bullet Charts
Area Chart
Text & Images
Histogram
Frequency Distribution
RELATIVE FREQUENCY
The ratio of the number of times a value of the data occurs in the set of all outcomes to
the number of all outcomes gives the value of relative frequency.
Let’s look at the table below to see how the weights of the people are distributed.
Step 1: To convert the frequencies into relative frequencies, we need to do the following
steps.
Step 2: Divide the given frequency by the total N i.e., 40 in the above case (Total sum of all
frequencies).
Step 3: Divide the frequency by total number Let’s see how: 1/ 40 = 0.25.
This is a frequency table to see how many students have got marks between given
intervals in Maths.
Marks Frequency Relative Frequency
45 – 50 3 3 / 40 x 100 = 0.075
50 – 55 1 1 / 40 x 100 = 0.025
55 – 60 1 1 / 40 x 100 = 0.075
65 – 70 8 8 / 40 x 100 = 0.2
70 – 80 3 3 / 40 x 100 = 0.275
The central tendency is one of the most quintessential concepts in statistics. Although it
does not provide information regarding the individual values in the dataset, it delivers a
comprehensive summary of the whole dataset.
Generally, the central tendency of a dataset can be described using the following
measures:
Mean (Average): Represents the sum of all values in a dataset divided by the total number
of the values.
Median: The middle value in a dataset that is arranged in ascending order (from the
smallest value to the largest value). If a dataset contains an even number of values, the
median of the dataset is the mean of the two middle values.
Mode: Defines the most frequently occurring value in a dataset. In some cases, a dataset
may contain multiple modes, while some datasets may not have any mode at all.
Variance: The term variance refers to a statistical measurement of the spread between
numbers in a data set. More specifically, variance measures how far each number in the
set is from the mean and thus from every other number in the set. Variance is often
depicted by this symbol: σ2. It is used by both analysts and traders to
determine volatility and market security. The square root of the variance is the standard
deviation (σ), which helps determine the consistency of an investment’s returns over a
period of time.
Even though the measures above are the most commonly used to define central tendency,
there are some other measures, including, but not limited to, geometric mean, harmonic
mean, midrange, and geometric median.
The selection of a central tendency measure depends on the properties of a dataset. For
instance, the mode is the only central tendency measure for categorical data, while a
median works best with ordinal data.
Although the mean is regarded as the best measure of central tendency for quantitative
data, that is not always the case. For example, the mean may not work well with
quantitative datasets that contain extremely large or extremely small values. The extreme
values may distort the mean. Thus, you may consider other measures.
PROBABILITY DISTRIBUTION
The probability distribution is one of the major theories of statistical analysis. It gives the
possibility of achieving each outcome in a randomly given event. The probabilities of all
outcomes can be known through the probability distribution. A tad bit of recalling of the
probability theory can be of much help to thoroughly understand probability distribution.
Probability is one of the phenomena that helps us measure the certainty or uncertainty of
different outcomes in a given event.
We should also know some real-life examples of continuous probability distributor. The
temperature of the day can be considered as one of the real-life examples of continuous
probability. And after achieving the outcomes, a distribution table can be made. Some
other examples of the normal probability distribution are rolling f a dice, judgments in the
competitions, sizes of female shoes, tossing of coins, range of weight of newborns,
population height of the world, etc.
CONTINUOUS DISTRIBUTION
A continuous distribution is one in which data can take on any value within a specified
range (which may be infinite). A continuous distribution has an infinite number of possible
values, and the probability associated with any particular value of a continuous
distribution is null. Therefore, continuous distributions are normally described in terms of
probability density, which can be converted into the probability that a value will fall within a
certain range.
However, the probability that X is exactly equal to some value is always zero because the
area under the curve at a single point, which has no width, is zero. For example, the
probability that a man weighs exactly 190 pounds to infinite precision is zero. You could
calculate a nonzero probability that a man weighs more than 190 pounds, or less than 190
pounds, or between 189.9 and 190.1 pounds, but the probability that he weighs exactly
190 pounds is zero.
DISCRETE FUNCTIONS
A discrete distribution is a probability distribution that depicts the occurrence of discrete
(individually countable) outcomes, such as 1, 2, 3... or zero vs. one. The binomial
distribution, for example, is a discrete distribution that evaluates the probability of a "yes"
or "no" outcome occurring over a given number of trials, given the event's probability in
each trial—such as flipping a coin one hundred times and having the outcome be "heads".
Distribution is a statistical concept used in data research. Those seeking to identify the
outcomes and probabilities of a particular study will chart measurable data points from a
data set, resulting in a probability distribution diagram. There are many types of probability
distribution diagram shapes that can result from a distribution study, such as the normal
distribution ("bell curve").
Poisson
Bernoulli
Binomial
Multinomial
Example:
Consider an example where we are counting the number of people walking into a store in
any given hour. The values would need to be countable, finite, non-negative integers. It
would not be possible to have 0.5 people walk into a store, and it would not be possible to
have a negative amount of people walk into a store. Therefore, the distribution of the
values, when represented on a distribution plot, would be discrete.
UNIT 2
Simple Linear Regression
Simple linear regression is used to find out the best relationship between a single input
variable (predictor, independent variable, input feature, input parameter) & output variable
(predicted, dependent variable, output feature, output parameter) provided that both
variables are continuous in nature. This relationship represents how an input variable is
related to the output variable and how it is represented by a straight line.
To understand this concept, let us have a look at scatter plots. Scatter diagrams or plots
provides a graphical representation of the relationship of two continuous variables.
Coefficient of Determination
The coefficient of determination (denoted by R2) is a key output of regression analysis. It
is interpreted as the proportion of the variance in the dependent variable that is
predictable from the independent variable.
The coefficient of determination is the square of the correlation (r) between
predicted y scores and actual y scores; thus, it ranges from 0 to 1.
With linear regression, the coefficient of determination is also equal to the square
of the correlation between x and y scores.
An R2 of 0 means that the dependent variable cannot be predicted from the
independent variable.
An R2 of 1 means the dependent variable can be predicted without error from the
independent variable.
An R2 between 0 and 1 indicates the extent to which the dependent variable is
predictable. An R2 of 0.10 means that 10 percent of the variance in Y is predictable
from X; an R2 of 0.20 means that 20 percent is predictable; and so on.
Regression analysis in Excel
Regression analysis helps you understand how the dependent variable changes when one
of the independent variables varies and allows to mathematically determine which of
those variables really has an impact.
As an example, let's take sales numbers for umbrellas for the last 24 months and find out
the average monthly rainfall for the same period. Plot this information on a chart, and the
regression line will demonstrate the relationship between the independent variable
(rainfall) and dependent variable (umbrella sales):
y = bx + a + ε
Where:
The linear regression equation always has an error term because, in real life, predictors are
never perfectly precise. However, some programs, including Excel, do the error term
calculation behind the scenes. So, in Excel, you do linear regression using the least
squares method and seek coefficients a and b such that:
y = bx + a
For our example, the linear regression equation takes the following shape:
There exist a handful of different ways to find a and b. The three main methods to perform
linear regression analysis in Excel are:
:
Residual Analysis
Residual (or error) represents unexplained (or residual) variation after fitting a regression
model. It is the difference (or left over) between the observed value of the variable and
the value suggested by the regression model.
The difference between the observed value of the dependent variable (y) and the
predicted value (ŷ) is called the residual (e). Each data point has one residual.
e=y–ŷ
Both the sum and the mean of the residuals are equal to zero. That is, Σ e = 0 and e = 0.
Residual Plots – A residual plot is a graph that shows the residuals on the vertical axis
and the independent variable on the horizontal axis. If the points in a residual plot are
randomly dispersed around the horizontal axis, a linear regression model is appropriate
for the data; otherwise, a non-linear model is more appropriate.
Tools for analysing residuals – For the basic analysis of residuals you will use the usual
descriptive tools and scatterplots (plotting both fitted values and residuals, as well as the
dependent and independent variables you have included in your model.
CONFIDENCE INTERVALS
Confidence Intervals are estimates that are calculated from sample data to determine
ranges likely to contain the population parameter (mean, standard deviation) of interest.
For example, if our population is (2,6), a confidence interval of the mean suggests that
the population mean is likely between 2 and
1. And how confidently can we say this? Obviously 100%, right? Because we know all the
values and we can calculate it very easily.
But in real-life problems, this is not the case. It is not always feasible or possible to study
the whole population. So what do we do? We take sample data. But can we rely on one
sample? No, because different samples from the same data will produce different mean.
So we take numerous random samples (from the same population) and calculate
confidence intervals for each sample and a certain percentage of these ranges will
contain the true population parameter.
This certain percentage is called the confidence level. A 95% confidence level means that
out of 100 random samples taken, I expect 95 of the confidence intervals to contain the
true population parameter.
PREDICTION INTERVALS
The range that likely contains the value of the dependent variable for a single new
observation given specific values of the independent variables, is the prediction interval.
The prediction interval predicts in what range a future individual observation will fall,
while a confidence interval shows the likely range of values associated with some
statistical parameter of the data, such as the population mean.
To find the best-fit line for each independent variable, multiple linear regression
calculates three things:
The regression coefficients that lead to the smallest overall model error.
The t-statistic of the overall model.
The associated p-value (how likely it is that the t-statistic would have occurred
by chance if the null hypothesis of no relationship between the independent
and dependent variables was true).
It then calculates the t-statistic and p-value for each regression coefficient in the model.
In this article, you'll learn the basics of simple linear regression, sometimes called 'ordinary
least squares' or OLS regression—a tool commonly used in forecasting and financial
analysis. We will begin by learning the core principles of regression, first learning about
covariance and correlation, and then moving on to building and interpreting a regression
output. Popular business software such as Microsoft Excel can do all the regression
calculations and outputs for you, but it is still important to learn the underlying mechanics.
Variables
At the heart of a regression model is the relationship between two different variables,
called the dependent and independent variables. For instance, suppose you want to
forecast sales for your company and you've concluded that your company's sales go up
and down depending on changes in GDP.
The sales you are forecasting would be the dependent variable because their value
"depends" on the value of GDP and the GDP would be the independent variable. You would
then need to determine the strength of the relationship between these two variables in
order to forecast sales. If GDP increases/decreases by 1%, how much will your sales
increase or decrease?
Covariance
The formula to calculate the relationship between two variables is called covariance. This
calculation shows you the direction of the relationship. If one variable increases and the
other variable tends to also increase, the covariance would be positive. If one variable goes up
and the other tends to go down, then the covariance would be negative.
The actual number you get from calculating this can be hard to interpret because it isn't
standardized. A covariance of five, for instance, can be interpreted as a positive
relationship, but the strength of the relationship can only be said to be stronger than if the
number was four or weaker than if the number was six.
Correlation Coefficient
We need to standardize the covariance in order to allow us to better interpret and use it in
forecasting, and the result is the correlation calculation. The correlation calculation simply
takes the covariance and divides it by the product of the standard deviation of the two
variables. This will bind the correlation between a value of -1 and +1.
Regression Equation
Below is the formula for a simple linear regression. The "y" is the value we are trying to
forecast, the "b" is the slope of the regression line, the "x" is the value of our independent
value, and the "a" represents the y-intercept. The regression equation simply describes the
relationship between the dependent variable (y) and the independent variable (x).
The intercept, or "a," is the value of y (dependent variable) if the value of x (independent
variable) is zero, and so is sometimes simply referred to as the 'constant.' So if there was
no change in GDP, your company would still make some sales. This value, when the
change in GDP is zero, is the intercept. Take a look at the graph below to see a graphical
depiction of a regression equation. In this graph, there are only five data points
represented by the five dots on the graph. Linear regression attempts to estimate a line
that best fits the data (a line of best fit) and the equation of that line results in the
regression equation.
Regressions in Excel
Now that you understand some of the background that goes into a regression analysis,
let's do a simple example using Excel's regression tools. We'll build on the previous
example of trying to forecast next year's sales based on changes in GDP. The next table
lists some artificial data points, but these numbers can be easily accessible in real life.
Yea Sale
GDP
r s
201 1.00
100
5 %
201 1.90
250
6 %
201 2.40
275
7 %
201 2.60
200
8 %
201 2.90
300
9 %
We can see that there is going to be a positive correlation between sales and GDP. Both
tend to go up together. Using Excel, all you have to do is click the Tools drop-down menu,
select Data Analysis and from there choose Regression. The popup box is easy to fill in
from there; your Input Y Range is your "Sales" column and your Input X Range is the
change in GDP column; choose the output range for where you want the data to show up
on your spreadsheet and press OK. You should see something similar to what is given in
the table below:
Regression Statistics Coefficients
Interpretation
The major outputs you need to be concerned about for simple linear regression are the R-
squared, the intercept (constant) and the GDP's beta (b) coefficient. The R-squared
number in this example is 68.7%. This shows how well our model predicts or forecasts the
future sales, suggesting that the explanatory variables in the model predicted 68.7% of the
variation in the dependent variable. Next, we have an intercept of 34.58, which tells us that
if the change in GDP was forecast to be zero, our sales would be about 35 units. And
finally, the GDP beta or correlation coefficient of 88.15 tells us that if GDP increases by 1%,
sales will likely go up by about 88 units.
HETEROSCEDASTICITY
The word “heteroscedasticity” comes from the Greek, and quite literally means data with a
different (hetero) dispersion (skedasis). In simple terms, heteroscedasticity is any set of
data that isn’t homoscedastic. More technically, it refers to data with unequal variability
(scatter) across a set of second, predictor variables.
Multicollinearity
Multicollinearity is the occurrence of high intercorrelations among two or more
independent variables in a multiple regression model. Multicollinearity can lead to skewed
or misleading results when a researcher or analyst attempts to determine how well each
independent variable can be used most effectively to predict or understand the dependent
variable in a statistical model.
Examples of Multicollinearity
Let’s assume that ABC Ltd, a KPO, has been hired by a pharmaceutical company to
provide research services and statistical analysis on the diseases in India. For this, ABC ltd
has selected age, weight, profession, height, and health as the prima facie parameters.
In the above example, there is a multicollinearity situation since the independent variables
selected for the study are directly correlated to the results. Hence it would be advisable for
the researcher to adjust the variables first before starting any project since the results will
be directly impacted because of the selected variables here.
UNIT 3
If all the three conditions are satisfied, it is called a Linear Programming Problem.
Formulating a problem
The company kitchen has a total of 5 units of Milk and 12 units of Choco. On each sale, the
company makes a profit of
Now, the company wishes to maximize its profit. How many units of A and B should it
produce respectively?
Solution: The first thing I’m gonna do is represent the problem in a tabular form for better
understanding.
The total profit the company makes is given by the total number of units of A and B
produced multiplied by its per-unit profit of Rs 6 and Rs 5 respectively.
As per the above table, each unit of A and B requires 1 unit of Milk. The total amount of
Milk available is 5 units. To represent this mathematically,
X+Y ≤ 5
Also, each unit of A and B requires 3 units & 2 units of Choco respectively. The total
amount of Choco available is 12 units. To represent this mathematically,
3X+2Y ≤ 12
For the company to make maximum profit, the above inequalities have to be satisfied.
Linear programming is used to obtain optimal solutions for operations research. Using
linear programming allows researchers to find the best, most economical solution to a
problem within all of its limitations, or constraints. Many fields use linear programming
techniques to make their processes more efficient. These include food and agriculture,
engineering, transportation, manufacturing and energy.
GOAL PROGRAMMIMG
Goal programming is a branch of multi-objective optimization, which in turn is a branch of
multi-criteria decision analysis (MCDA). It can be thought of as an extension or
generalisation of linear programming to handle multiple, normally conflicting objective
measures. Each of these measures is given a goal or target value to be achieved.
Deviations are measured from these goals both above and below the target. Unwanted
deviations from this set of target values are then minimised in an achievement function.
This can be a vector or a weighted sum dependent on the goal programming variant
used. As satisfaction of the target is deemed to satisfy the decision maker(s),
an underlying satisficing philosophy is assumed. Goal programming is used to perform
three types of analysis:
Determine the degree of attainment of the goals with the available resources.
Providing the best satisfying solution under a varying number of resources and
priorities of the goals.
major strength of goal programming is its simplicity and ease of use. This accounts for
the large number of goal programming applications in many and diverse fields. Linear
goal programmes can be solved using linear programming software as either a single
linear programme, or in the case of the lexicographic variant, a series of connected linear
programmes.
Goal programming can hence handle relatively large numbers of variables, constraints
and objectives. A debated weakness is the ability of goal programming to produce
solutions that are not Pareto efficient. This violates a fundamental concept of decision
theory, that no rational decision maker will knowingly choose a solution that is not Pareto
efficient. However, techniques are available to detect when this occurs and project the
solution onto the Pareto efficient solution in an appropriate manner.
The setting of appropriate weights in the goal programming model is another area
that has caused debate, with some authors suggesting the use of the analytic
hierarchy process or interactive methods for this purpose.
The next table demonstrates the weights of each alternative against the criteria Family.
Here, City C was the closest to family, while City D was the furthest. This would be
repeated for every criteria.
Finally, the weighed importance of each criteria is then multiplied against the score of
each alternative to get the weighed score (For City A's weighted Cultural score: .152 x .
163 = .024776). Add all new criteria numbers together to get the Overall Priority score
(For City A: .024776 + .09093 + .018864 + .085095 + .009462 = .229)
The AHP process begins by defining the alternatives that need to be evaluated. These
alternatives could be the different criteria that solutions must be evaluated against. They
could also be the different features of a product that need to be weighted to better
understand the customers perception. At the end of step 1, a comprehensive list of all the
available alternatives must be ready.
The next step is to model the problem. According to AHP methodology, a problem is a
related set of sub problems. The AHP method therefore relies on breaking the problem
into a hierarchy of smaller problems. In the process of breaking down the sub-problem,
criteria to evaluate the solutions emerge. However, like root cause analysis, a person can
go on and on to deeper levels within the problem. When to stop breaking the problem into
smaller sub problems is a subjective judgement.
Example: A firm needs to decide on the best investment option amongst stocks, bonds,
real estate and gold. If the AHP method is used, the problem of best investment will be
broken down into smaller problems like protection from downfall, maximum chance of
appreciation, liquidity in the market and so on. Each of these sub problems can then be
broken into smaller problems till the management feels that the appropriate criteria has
been reached.
The AHP method uses pairwise comparison to create a matrix. For example the firm will
be asked to weigh the relative importance of protection from downfall vs. liquidity. Then
in the next matrix, there will be a pairwise comparison between liquidity and chance of
appreciation and so on. The managers will be expected to fill this data as per the
expectations of the end consumer or the people who are going to use the process.
This step is inbuilt in most software tools that help solve AHP problems. For instance if I
say that liquidity is twice as important as protection from downfall and in the next matrix I
say that protection from downfall is half as important as chance of appreciation, then the
following situation emerges:
The software tool will run the mathematical calculation based on the data and assign
relative weights to the criteria. Once the equation is ready with weighted criteria, one can
evaluate the alternatives to get the best solution that matches their needs.
UNIT 4
What Is Stochastic Modeling?
Stochastic modeling is a form of financial model that is used to help make investment
decisions. This type of modeling forecasts the probability of various outcomes under
different conditions, using random variables.
Stochastic modeling presents data and predicts outcomes that account for certain levels
of unpredictability or randomness. Companies in many industries can employ stochastic
modeling to improve their business practices and increase profitability. In the financial
services sector, planners, analysts, and portfolio managers use stochastic modeling
to manage their assets and liabilities and optimize their portfolios.
Markov model
A Markov model is a Stochastic method for randomly changing systems where it is
assumed that future states do not depend on past states. These models show all possible
states as well as the transitions, rate of transitions and probabilities between them.
Markov models are often used to model the probabilities of different states and the rates
of transitions among them. The method is generally used to model systems. Markov
models can also be used to recognize patterns, make predictions and to learn the
statistics of sequential data.
There are four types of Markov models that are used situationally:
Markov chain - used by systems that are autonomous and have fully observable states
Hidden Markov model - used by systems that are autonomous where the state is partially
observable.
Markov decision processes - used by controlled systems with a fully observable state.
Partially observable Markov decision processes - used by controlled systems where the
state is partially observable.
Expressing a problem as an MDP is the first step towards solving it through techniques
like dynamic programming or other techniques of RL. A robot playing a computer game or
performing a task are often naturally maps to an MDP. But many other real- world
problems can be solved through this framework too. Not many real -world examples are
readily available though. This article provides some real -world examples of finite MDP. We
also show the corresponding transition graphs which effectively summarizes the MDP
dynamics. Such examples can serve as good motivation to study and develop skills to
formulate problems as MDP.
Artificial Intelligence
Solutions like AI algorithms based on the most advanced neural networks render high
accuracy in inconsistent detection as it learns from past trends & patterns. That way, any
unusual event will be instantly registered & the system will alert the user.
Another rising factor in BI's future is testing AI in a duel. For example, one AI will create a
realistic image, and another one will try to ascertain if the image is artificial or not. This
concept is also called GANs (Generative adversarial networks) and can be utilized in
online verification processes.
Data Visualization
Data discovery has raised its impact in the previous year. Data visualization was listed in
the top 2 BI trends in the Business Application Research Centre survey.
A crucial element to consider is that data visualization tools depend upon a process, and
later, the produced findings will bring business value. It needs understanding the
relationship between data in the form of visual analysis, data preparation, & guided
advanced analytics.
Data visualizations have transformed into state-of-the-art solutions to present & interact
with several graphics on one screen, whether it's focused on building sales charts or all-
inclusive interactive reports. Since humans process visual data better, data visualization
will be the essential addition in BI trend 2021.
Data security
Data & information security has been on everyone's mind in 2020 and will continue to
create a buzz in 2021. The privacy regulation's implementation, like GDPR in the EU & the
CCPA in the USA, has set building blocks for data security & management of user's
details.
SaaS BI
Several businesses have switched to SaaS BI to access any data from the cloud & gain
more flexibility from any gadget. Such technologies that allow data movement & access
from various places will continue to rise as one of the most imp BI trends in 2021.
SaaS is getting remote-friendly, and disparate teams that require solutions will enhance
their business processes & guarantee there are no obstacles by working remotely.
Predictive analytics is the practice of pulling information from current data sets to predict
future possibilities. It's an extension of data mining that refers to historical data. The
predictive analysis involves forecasted future data & thus always includes the likelihood
of errors from its definition. The predictive research shows what might happen in the
future with an acceptable degree of reliability with some alternative scenarios & risk
assessment.
BUSINESS ANALYTICS
EXCEL FUNCTIONS AND FORMULAS
SUM Function
The SUM function adds values. You can add individual values, cell references or ranges or a mix
of all three.
Formula: =SUM (D5:I5)
1. Select a cell next to the numbers you want to sum: To sum a column, select the cell
immediately below the last value in the column.
2. Click the AutoSum button on either the Home or Formulas tab.
3. Press the Enter key to complete the formula.
AVG Function
Returns the average (arithmetic mean) of the arguments.
Formula: =AVERAGE (D5:I5)
1. Click a cell below, or to the right, of the numbers for which you want to find the average.
2. On the Home tab, in the Editing.
Count Function
The COUNT function counts the number of cells that contain numbers, and counts numbers within
the list of arguments. Use the COUNT function to get the number of entries in a number field
that is in a range or array of numbers.
Formula: =COUNT (D5:D19)
For MIN
The Excel MIN function returns the smallest numeric value in a range of values. The MIN
function ignores empty cells, the logical values TRUE and FALSE, and text values. Get the
smallest value.
Formula: = MIN (D5:D19)
For Max
MAX will return the largest value in a given list of arguments. From a given set of numeric
values, it will return the highest value.
Formula: =MAX (D5:D19)
Date Formula
The Excel DATE function creates a valid date from individual year, month, and day components.
The DATE function is useful for assembling dates that need to change dynamically based on other
values in a worksheet.
Formula: = DAY (C5)
Now Formula
The NOW function in Excel is a formula that displays the current date and time. It is
automatically refreshed anytime the workbook is opened or a change is made.
Formula: = NOW ()
Today Formula
The TODAY function returns the current date, and will continually update each time the
worksheet is updated. Use F9 to force the worksheet to recalculate and update the value. The
value returned by the TODAY function is a standard Excel date.
Formula: = TODAY ()
Month Formula
The Excel MONTH function extracts the month from a given date as number between 1 to 12.
You can use the MONTH function to extract a month number from a date into a cell, or to feed a
month number into another function like the DATE function
Formula: = MONTH (C6)
Year Formula
The Excel YEAR function returns the year component of a date as a 4-digit number. You can
use the YEAR function to extract a year number from a date into a cell or to extract and feed a
year value into another formula, like the DATE function.
Formula: = YEAR (C7)
NETWORKDAYS Function
The NETWORKDAYS Function calculates the number of workdays between two dates in
Excel. When using the function, the number of weekends are automatically excluded. It also
allows you to skip specified holidays and only count business days. It is categorized in Excel as a
Date/Time Function.
Formula: NETWORKDAYS (B11,C11,D11:D14)
Conditional Formatting
Conditional formatting is a feature in many spreadsheet applications that allows you to apply
specific formatting to cells that meet certain criteria. It is most often used as color-based formatting
to highlight, emphasize, or differentiate among data and information stored in a spreadsheet.
DATA VALIDATION
Data validation is an Excel feature that lets you control what users enter a cell, it allows
to restrict the various user while entering data types such as date, whole number, decimal,
text.
Basically, there are seven (7) types of data you can set:
o Whole Number - to restrict the cell to accept only whole numbers.
o Decimal - to restrict the cell to accept only decimal numbers.
o List - to pick data from the drop-down list.
o Date - to restrict the cell to accept only date.
o Time - to restrict the cell to accept only time.
o Text Length - to restrict the length of the text.
o Custom – for custom formula.
For Find
In the Find what: box, type the text or numbers you want to find.
Click Find Next to run your search.
You can further define your search if needed: Within: To search for data in a
worksheet or in an entire workbook, select Sheet or Workbook.
For Replace
Select the cells that have the formula in which you want to replace the reference.
If you want to replace in the entire worksheet, select the entire worksheet.
Go to Home –> Find and Select –> Replace (Keyboard Shortcut – Control + H).
Click on Replace All.
Hyperlink
The HYPERLINK function creates a shortcut that jumps to another location in the current
workbook, or opens a document stored on a network server, an intranet, or the Internet. When
you click a cell that contains a HYPERLINK function, Excel jumps to the location listed, or opens
the document you specified.
1. Select cell A2.
2. On the Insert tab, in the Links group, click Link. The 'Insert Hyperlink' dialog box appears.
3. Click 'Place in This Document' under Link to.
4. Type the Text to display, the cell reference, and click OK. Result:
IF Function
Use the IF function, one of the logical functions, to return one value if a condition is true and
another value if it's false.
1. Select a cell.
2. Click the Insert Function button. The 'Insert Function' dialog box appears.
3. Search for a function or select a function from a category. ...
4. Click OK. ...
5. Click in the Range box and select the range A1:C2.
6. Click in the Criteria box and type >5.
7. Click OK.
SUBSTITUTE Function
The Excel SUBSTITUTE function replaces text in a given string by matching.
1. Select the range of cells where you want to replace text or numbers.
2. Press the Ctrl + H shortcut to open the Replace tab of the Excel Find and Replace dialog.
3. In the Find what box type the value to search for, and in the Replace with box type the
value to replace with.
REPLACE Function
The function will replace part of a text string, based on the number of characters you specify,
with a different text string. In financial analysis, the REPLACE function can be useful if we wish to
remove text from a cell when the text is in a variable position.
LOWER Function
LOWER function converts all letters in the specified string to lowercase. If there are characters in
the string that are not letters, they are unaffected by this function.
UPPER Function
Use the UPPER function to replace every lowercase letter in a character string with
an uppercase letter.
Text to Columns
Text to Columns is an amazing feature in Excel that deserves a lot more credit than it usually
gets. As it's name suggests, it is used to split the text into multiple columns. For example, if you
have a first name and last name in the same cell, you can use this to quickly split these into two
different cells.
Goal Seek
Goal Seek is a process of calculating a value by performing what-if analysis on a given set of
values.
1. On the Data tab, in the Data Tools group, click What-If Analysis, and then click Goal Seek.
2. In the Set cell box, enter the reference for the cell that contains the formula that you want to
resolve. ...
3. In the To value box, type the formula result that you want.
LOOKUP Function
The Excel LOOKUP function performs an approximate match lookup in a one-column or one-row
range, and returns the corresponding value from another one-column or one-row range.
LOOKUP's default behavior makes it useful for solving certain problems in Excel.
Syntax: = LOOKUP ($B$16, $B$2: $B$10, A$2: A$10)
HLOOKUP
If you wish to get an array, you need to select the number of cells that are equal to the number of
rows that you want HLOOKUP to return.
Syntax: = HLOOKUP (C3, $B$12: $G$13,2, TRUE)
1. Open the HLOOKUP formula in the salary column and select the lookup value as Emp ID.
2. Next thing is we need to select the table array, i.e., the main table. ...
3. Now, we need to mention the row number, i.e., from which row of the main table we are
looking for the data. ...
4. The final part is a range lookup.
VLOOKUP
VLOOKUP stands for 'Vertical Lookup'. It is a function that makes Excel search for a certain
value in a column (the so called 'table array'), in order to return a value from a different column in
the same row.
Syntax: = VLOOKUP (C3, $G$2: $H$7,2, TRUE)
1. Type “=INDEX(” and select the area of the table, then add a comma.
2. Type the row number for Kevin, which is “4,” and add a comma.
3. Type the column number for Height, which is “2,” and close the bracket.
MATCH Function
The MATCH function searches for a specified item in a range of cells, and then returns the
relative position of that item in the range. ... For example, if the range A1:A3 contains the values 5,
25, and 38, then the formula =MATCH(25,A1:A3,0) returns the number 2, because 25 is the
second item in the range.
1. Type “=MATCH(” and link to the cell containing “Height”… the criteria we want to look up.
2. Select all the cells across the top row of the table.
3. Type zero “0” for an exact match.