Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
7 views

Data Analytics and Interactive Dashboards using Python

This document outlines a course focused on analyzing business data and creating interactive dashboards using Python. Key topics include data cleaning, manipulation with libraries like Numpy and Pandas, data visualization techniques, and time series analysis. The course emphasizes hands-on coding and practical applications of data analytics to derive insights and create effective dashboards.

Uploaded by

chenkhoonsg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Data Analytics and Interactive Dashboards using Python

This document outlines a course focused on analyzing business data and creating interactive dashboards using Python. Key topics include data cleaning, manipulation with libraries like Numpy and Pandas, data visualization techniques, and time series analysis. The course emphasizes hands-on coding and practical applications of data analytics to derive insights and create effective dashboards.

Uploaded by

chenkhoonsg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 96

Analyse Business Data and

Create Interactive
Dashboards using Python
On a scale of 1 – 10, how
well do you know Python?
1 – Complete beginner
10 – You are a Python Ninja!
What will you be learning in this course?
Understanding Data Cleaning and
Basics of Python
simple Data Manipulation using
(Recap)
Structures Numpy and Pandas

Basic Data Data Analytics and Building simple


Visualization using Visualization using dashboards in
Matplotlib Seaborn Seaborn

Time Series Analysis


Assessment
(Overview)
Method of Learning

95% Hands-on Coding on 5% Presentation


Python
How much data do we
create every single day?
What is data?

Data is a collection of facts, such as


numbers, words, measurements,
observations or even just descriptions of
things.
Significant growth in Data Science & Analytics
Explosion of
data volume

Source: ResearchGate
Devices
connected to
the internet
Data Storage Costs
Interest
peaked
up!!!
What can we do with the data?

EXPLORATORY DATA DERIVE USEFUL CREATE DASHBOARDS DERIVE NEW


ANALYSIS INSIGHTS AND REPORTS INFERENCES
What is Analytics?

Analytics is the discovery, interpretation, and


communication of meaningful patterns in data.
It also entails applying data patterns towards
effective decision making. In other words,
analytics can be understood as the connective
tissue between data and effective decision
making within an organization.
Source: Wikipedia
User Engagement
Analytics is comprised of
two important parts

Visualization Storytelling
Data Visualization
Let’s start by testing the human visual system
How many
9s are
present ?
Data Visualization
Let’s start by testing the human visual system
The human visual
system is powerful

How many 9s
are present
now?
What is Data
Visualization?
Data visualization is the
presentation of data in a
pictorial or graphical format. It
enables decision makers to
see analytics presented
visually, so they can grasp
difficult concepts or identify
new patterns.
Multi-Variable Plot
Deduction & Prediction

Portrait
Flowchart y Distribution
Relationship, Representation
Hierarchy x

Why?
Who &
How? What?

How
When?
Many?
Where?

?
Timeline Comparison
Position in Comparative
Time Representation
Map
Position in Space
Data Storytelling
Every human needs a story to make things memorable
What does Storytelling is used in design as a technique
storytelling to get insight into users, build empathy and
access them emotionally
mean?
Why does storytelling matter?

Stories make data


Stories tell to sell
meaningful

Stories crystallize
Stories simplify
takeaways
• Uncovering a shared meaning, a shared value, or a shared
What are need that can be translated into action.
• Insight is what is learned and what will improve your
insights? business. Your business will know better, so you’ll be able to
work better.
Example of Finding, Insight, Recommendation
(NETFLIX)

Finding - Customers are not watching the entire video to its full length. They are
watching 90–95%

Insight - The parts they are not watching are the title roll and the end credits

Recommendation - Introduce ‘Skip Intro’ at the beginning of title rolls and ‘Watch
Next’ at the beginning of end credits. Benchmark 90–95% watched content as
completed and measure if customers move to the next video in the series
Python is an interpreted, object-oriented,
high-level programming language with
dynamic semantics.
What is
Python?
Often, programmers fall in love with
Python because of the increased
productivity it provides. Since there is no
compilation step, the edit-test-debug cycle
is incredibly fast.
What should you know?

DATA TYPES - LIBRARIES FUNCTIONS FLOW CONTROL BASIC


BASIC AND (PANDAS, NUMPY, VISUALIZATIONS
ADVANCED MATPLOTLIB)
Part 1 – Frame and Prepare

1 2 3
Data Issues & Data Data Manipulation
Cleaning Preprocessing
Data Cleaning
Loss of data quality can occur at many stages:
At the time of collection
During digitisation
Where can
loss of data During documentation

quality During storage and archiving


occur? During analysis and manipulation
At time of presentation
And through the use to which they are put
Why do we have to do Data Cleaning?

• Inaccurate data analytics result into misguided decision making


which can expose the industry to compliance issues. Data
Cleaning ensures the above does not happen.
• It also streamlines business practices and improves efficiency.
• Increased sales and revenue are a result of data cleaning.
Duplicate data

Irrelevant values

Issues in Data
Missing values

Inaccurate data

Old data
Cleaning
Instance selection
Data Normalization
Preprocessing &
Manipulation Transformation

Feature extraction

Feature selection
Example:
Indexing &
Slicing of
Data
Reiterating

•“data scientists spend 80% of their time


cleaning and manipulating data and
only 20% of their time actually analyzing
it.”
Libraries in Python that can help with Data
Cleaning and Manipulation

Numpy Pandas
Tabular data with
Ordered and unordered (not
heterogeneously-typed columns,
necessarily fixed-frequency)
as in an SQL table or Excel
time series data.
spreadsheet

Any other form of observational /


Arbitrary matrix data
statistical data sets. The data
(homogeneously typed or
actually need not be labeled at
heterogeneous) with row and
all to be placed into a pandas
column labels
data structure

Why is Pandas important?


What can it handle?
Automatic and explicit data
alignment: objects can be Powerful, flexible group
Size mutability: columns can
Easy handling of missing explicitly aligned to a set of by functionality to perform
be inserted and
data (represented as NaN) labels, or the user can split-apply-combine
deleted from DataFrame
in floating point as well as simply ignore the labels and operations on data sets, for
and higher dimensional
non-floating point data let Series, DataFrame, etc. both aggregating and
objects
automatically align the data transforming data
for you in computations

Make it easy to Intelligent label-


convert ragged, differently- based slicing, fancy
Intuitive merging and joinin Flexible reshaping and
indexed data in other Python indexing,
g data sets pivoting of data sets
and NumPy data structures and subsetting of large
into DataFrame objects data sets

Time series-specific
Robust IO tools for loading
functionality: date range
data from flat files (CSV
Hierarchical labeling of generation and frequency
and delimited), Excel files,
axes (possible to have conversion, moving window
databases, and saving /
multiple labels per tick) statistics, moving window
loading data from the
linear regressions, date
ultrafast HDF5 format
shifting and lagging, etc.
Let’s dive straight to the Hands-on
using Jupyter notebooks
Part 2 – Descriptive Statistics and Data Analytics

01 02 03 04
Descriptive Data Data Analytics Understanding
Statistics Visualization and basic KPIs
using Visualization
Matplotlib using Seaborn
Collect the data and gain the domain knowledge.

Confirm data types and their probabilities.

Exploratory
Measures of central tendency: mean, median, mode.

Data Analysis Measures of dispersion: variance, std deviation, range.

Skewness, right & left kurtosis, thinner peak, wider peak

Graphical representation: histogram, boxplot, barplot etc.


Descriptive Statistics

Descriptive statistics are used to describe the basic


features of the data in a study. They provide simple
summaries about the sample and the measures. Together
with simple graphics analysis, they form the basis of
virtually every quantitative analysis of data.
• Central Tendency - Mean, Median, Mode
• Dispersion – Range, Variance, Standard Deviation
• Frequency - Count, Percent, Frequency
• Position - Percentile Ranks, Quartile Ranks

Examples of descriptive statistics


Central Tendency - Mean

The mean represents the average value of the dataset. It can be


calculated as the sum of all the values in the dataset divided by
the number of values. In general, it is considered as the
arithmetic mean.
Central Tendency - Median
• Median is the middle value of the dataset in which
the dataset is arranged in the ascending order or in
descending order.
Central Tendency - Mode
• The mode represents the frequently occurring value in the
dataset. Sometimes the dataset may contain multiple modes
and, in some cases, it does not contain any mode at all.
What is dispersion?
• Dispersion is the state of getting dispersed or spread.
Statistical dispersion means the extent to which a numerical
data is likely to vary about an average value. In other words,
dispersion helps to understand the distribution of the data.
Dispersion - Variance
• Variance is the expected value of the squared variation of a
random variable from its mean value, in probability and
statistics. Informally, variance estimates how far a set of
numbers (random) are spread out from their mean value.
• Variance is a measure of how data points differ from the
mean. According to Layman, a variance is a measure of how
far a set of data (numbers) are spread out from their mean
(average) value.

Var (X) = E[( X – μ) ]


2
Dispersion – Standard Deviation
• Standard Deviation is the positive square root of the variance.
• Standard Deviation is a measure of how spread out the data
is. Its formula is simple; it is the square root of the variance for
that data set. It’s represented by the Greek symbol sigma (σ).
Quartiles
• The quartiles are values that divide a list of numbers into
quarters.
Visual Statistics
Let’s see some of the different types of charts
Types of Data Visualization

• Line Charts • Area Charts • Funnels Charts

• Bar Charts • Scatter Charts • Radar Charts

• Pie Charts • Tree Maps


• Scatter Maps (showing
geographical data )
• Polar Charts
• Sandburst Charts
• Bubble Charts
• Numeric / Gauge
Indicators
Line Charts

• Line charts are predominately used


to concisely represent trends over
a period.

• It connects a series of data points


with a single, continuous line.
Bar Charts

• Bar charts are used to represent


categorically data using
rectangular bars.

• Bar charts can be plotted vertically


or horizontally.
Pie Charts

• Pie charts show the share of each


value as part of a whole.

• It uses pie slices to represent the


relative sizes of data.

• Proportions are clearly


demonstrated using pie charts.
Area Charts

• It is used to display quantitative


data.

• Through these charts its easier to


understand the overall proportion
and volume taken by each
category.
Scatter Charts
• Also known as scatter graph, scatter plot
or correlation chart, scatter charts are
used to visualize the distribution of and
relationship between two variables.

• It uses dots to represent values for two


different numeric variables.
Scatter Maps
• When the geographical coordinates -
latitude and longitude - are used as the
variables to plot the points on a map, we
get a scatter map.
Bubble Charts
• Bubble chart is a variation of a scatter
chart where instead of points, there are
bubbles with diameters proportional to the
data it is representing.

• It represents three dimensions of data.


A Key Performance Indicator (KPI) is a
measurable value that demonstrates how
effectively a company is achieving key
What are KPIs? business objectives. Organizations use
KPIs to evaluate their success at reaching
targets. (Source: Klipfolio)
METRICS –
KPIs- KRIs -
ANALYTICS
Monthly sales
growth
Sales KPIs Cost per lead
by each
channel

Examples of Financial KPIs


Net profit
margin

KPIs Resource
utilization

Project
Project resource
Management utilization
% of overdue
KPIs project tasks
Why are KPIs important?

1 2 3
Effective company key A good KPI should act as a KPIs translate your
performance indicators compass: a measurement business strategy into
(KPIs) guide a business on of where your business is, manageable, operational
the journey towards its relative to where it has actions, based on the data
strategic goals. come from and where it is you collect and monitor.
going.
Increases management awareness

Focuses attention on improvement


opportunities
• Increasing Cash Flow
• Improving Clinical Quality
• Reducing Costs
Benefits of •

Identifying Problem Areas
Benchmarking
Using KPIs •

Illustrating Trends
Scoring Performance
• Reducing Denials
• Developing Consistent Processes and
Outcomes
• Developing “Best Practices”
• Improving / Accelerating Management
Reporting
• Monitoring Staffing Levels
Source: bigdata4analytics
The first step is to choose the appropriate plot type.

Second, when we choose your type of plot, one of the


most important things is to label your axis.

Basic Third, we can add a title to make our plot


more informative.

Visualization
Rules Fourth, add labels for different categories when needed.

Five, optionally we can add a text or an arrow


at interesting data points.

Six, in some cases we can use some sizes and colors of


the data to make the plot more informative.
Let’s dive straight to the Hands-on
using Jupyter notebooks
Data Analytics and
Interactive Dashboards using
Python (Day–2)
Today’s Schedule

9am: Session Start


Lesson
10-30am: 15min break
Lesson
12-45pm – 2pm: Lunch break
Too Much Stress?? Lesson
2-30pm: Start of Assessment
5-30pm - 6pm: Session
End
What did we learn last time
around?
Data Visualization using Seaborn
Data Visualization using Plotly
Time Series Analysis
Part 3 – Time Series Analysis

01
Time Series
Analysis
Time-Series Analysis

1 2 3
Time Series Time Series Analysis
series of data points comprises methods
indexed (or listed or for analyzing time
graphed) in time series data in order
order to extract meaningful
statistics
Humans are obsessed about their future – so much so that they worry more
about their future than enjoying the present. This is precisely the reason
why horoscopists, soothsayers, and fortune tellers are always in high-
demand.

Fun Fact Time!


TREND

SEASONALITY
Time Series Analysis
• Trends - A trend is a consistent directional movement in a time
series.
Time Series Analysis
• Seasonal Variation - Many time series contain seasonal variation.
This is particularly true in series representing business sales or
climate levels
Let’s say you want to measure the
sales effectiveness of flu medicine.
Sample Question 1
• You have been given a dataset with some features including the SalesPrice of the house. You do
not have the business knowledge pertaining to the dataset, but would like to find out which are
the features which affect the SalesPrice of the house. Which of the following techniques would
you use?

• Correlation Analysis
• Plotting Bar charts
• Log Plots
• Data Cleaning

Correlation Analysis.
We would use the above because the correlation analysis can give us the strength between
various features and the SalesPrice based on the relationship between them. If there is a strong
relationship the correlation value will be closer to +-1 else it will be closer to 0.
Sample Question 2
• Which technique is most suitable to find anomalies?
• Box plots
• Bar plots
• Correlation Analysis
• Pair plots

Box Plots
The reason for the above technique is because box plots along with plotting 2-3 dimensions of data
shows where the outliers lie with respect to those dimensions. Hence it is easier to visualize and
isolate them.
Landscape of the industry
Analytics usage in the industry

In 2015, 17 percent of companies adopted big data analytics,


by 2017, 53 percent of companies are adopting big data analytics (Forbes,
2017)

90 percent of enterprise analytics and business professionals currently say


data and analytics are key to their organization’s digital transformation
initiatives. —MicroStrategy 2018 Global State of Enterprise Analytics Report
Analytics usage in the industry

Data-driven organizations are 23 times more likely to acquire


customers, six times as likely to retain customers, and 19 times as
likely to be profitable as a result. —McKinsey Global Institute

By 2020, there will be 2.7 million job postings for data science and
analytics roles. —BHEF and PwC America’s Data Science and Analytics
Talent: The Case for Action Report
Business Cases in Traditional
Analytics
Banks - Credit loan

• Credit Risk Analysis


• To estimate the costs associated with a loan
• To see if the bank borrower could potentially renege on its
credit loan

• Banks would typically hire Credit analysts to process the loan


applications.
• degree in finance, accounting, business administration or
economics (statistics background)
• Banks will lose money on bad loans!
https://analyticstraining.com/understanding-credit-risk-analytics/
Microfinance
Will a customer default? Profitability based Analytics

Study behavioral patterns to determine


whether a Customer is likely to default.
With the KYC in place and data collated Organizations can study the customer
from other places, analytics performed relationship with the institute and
on this can come up with patterns with derive a CLTV and this can help in
respect to Customer behavior. projecting the future cashflows.
This will reduce delinquencies to a
great extent.
Asset Management

Investment
• Behavioural Management • Improved Process
Segmentation of Clients Automation
• Improve Sales • Take better investment • Better Administration
Productivity decisions • Keep a track on trade
• Customized Digital • Automated data
Marketing pipelines
• Execute trade more
Acquisition of effectively Asset
Assets Administration
Thank you

You might also like