Descriptive Analytics
Descriptive Analytics
It involves gathering,
organizing, tabulating and
depicting data and then
describing the characteristics
about what is being studied.
This type of analytics was
historically called reporting. It
can be very useful but does
not tell you anything about
what might have happen in the future.
The vast majority of the analytics we use fall into this category. (Think basic
arithmetic like sums, averages, percent changes). Usually, the underlying
data is a count, or aggregate of a filtered column of data to which basic
math is applied. For all practical purposes, there are an infinite number of
these Analytics. Descriptive Analytics are useful to show things like, total
stock in inventory, average dollars spent per customer and Year over year
change in sales. Common examples of descriptive analytics are reports that
provide historical insights regarding the companys production, financials,
operations, sales, finance, inventory and customers.
The frequency (f) of a particular observation is the number of times the observation occurs
in the data. The distribution of a variable is the pattern of frequencies of the observation.
Frequency distributions are portrayed as frequency tables, histograms, or polygons.
Frequency distributions can show either the actual number of observations falling in each
range or the percentage of observations. In the latter instance, the distribution is called
a relative frequency distribution.
Frequency distribution tables can be used for both categorical and numeric variables.
Continuous variables should only be used with class intervals, which will be explained
shortly.
R provides many methods for creating frequency and contingency tables. Three are
described below. In the following examples, assume that A, B, and C represent categorical
variables.
Table
You can generate frequency tables using the table( ) function, tables of proportions using
the prop.table( ) function, and marginal frequencies using margin.table( ).
Task 1:
Construct a frequency table for the variable cyl in mtcars and enhance the view. Write
your observations.
Task 2
Construct a grouped frequency table for the variable Salary in Crew.data, write your
observations
Crosstable
The CrossTable( ) function in the gmodels package produces cross tabulations
modelled after PROC FREQ in SAS or CROSSTABS in SPSS. It has a wealth of
options.
There are options to report percentages (row, column, cell), specify decimal places,
produce Chi-square, Fisher, and McNemar tests of independence, report expected
and residual values (pearson, standardized, adjusted standardized), include missing
values as valid, annotate with row and column titles, and format
as SAS or SPSS style output. See help(CrossTable) for details.
Skewness:
Intuitively, the skewness is a measure of symmetry. As a rule, negative skewness indicates that
the mean of the data values is less than the median, and the data distribution is left-skewed.
Positive skewness would indicate that the mean of the data values is larger than the median, and
Kurtosis:
Intuitively, the kurtosis describes the tail shape of the data distribution. The normal
distribution has zero kurtosis and thus the standard tail shape. It is said to be mesokurtic.
Negative kurtosis would indicate a thin-tailed data distribution, and is said to be platykurtic.
t.test:
The t.test( ) function produces a variety of t-tests. Unlike most statistical packages,
the default assumes unequal variance and applies the Welsh df modification.
# independent 2-group t-test
t.test(y~x) # where y is numeric and x is a binary factor
# independent 2-group t-test
t.test(y1,y2) # where y1 and y2 are numeric
# paired t-test
t.test(y1,y2,paired=TRUE) # where y1 & y2 are numeric
# one sample t-test
t.test(y,mu=3) # Ho: mu=3
Chi-square test:
Correlation:
These correlated variables can move in the same direction or they can move in opposite
direction. Not always there is a cause and effect relationship between the variables when
there is a change; that might be due to uncertain change.
Simple Correlation is a correlation between two variables only; meaning the relationship
between two variables. Event correlation and simple event correlation are the types of
correlations mainly used in the industry point of view.
Types of correlation:
In Research Methodology of the Management, Correlation is broadly classified into six types
as follows :
Correlation using R
You can use the cor( ) function to produce correlations and the cov( ) function to
produces covariances.
A simplified format is cor(x, use=, method= ) where
Kendall rank correlation: Kendall rank correlation is a non-parametric test that measures
the strength of dependence between two variables. If we consider two samples, a and
b, where each sample size is n, we know that the total number of pairings with a b is n(n-
1)/2. The following formula is used to calculate the value of Kendall rank correlation: