Data Analytics and Interactive Dashboards using Python
Data Analytics and Interactive Dashboards using Python
Create Interactive
Dashboards using Python
On a scale of 1 – 10, how
well do you know Python?
1 – Complete beginner
10 – You are a Python Ninja!
What will you be learning in this course?
Understanding Data Cleaning and
Basics of Python
simple Data Manipulation using
(Recap)
Structures Numpy and Pandas
Source: ResearchGate
Devices
connected to
the internet
Data Storage Costs
Interest
peaked
up!!!
What can we do with the data?
Visualization Storytelling
Data Visualization
Let’s start by testing the human visual system
How many
9s are
present ?
Data Visualization
Let’s start by testing the human visual system
The human visual
system is powerful
How many 9s
are present
now?
What is Data
Visualization?
Data visualization is the
presentation of data in a
pictorial or graphical format. It
enables decision makers to
see analytics presented
visually, so they can grasp
difficult concepts or identify
new patterns.
Multi-Variable Plot
Deduction & Prediction
Portrait
Flowchart y Distribution
Relationship, Representation
Hierarchy x
Why?
Who &
How? What?
How
When?
Many?
Where?
?
Timeline Comparison
Position in Comparative
Time Representation
Map
Position in Space
Data Storytelling
Every human needs a story to make things memorable
What does Storytelling is used in design as a technique
storytelling to get insight into users, build empathy and
access them emotionally
mean?
Why does storytelling matter?
Stories crystallize
Stories simplify
takeaways
• Uncovering a shared meaning, a shared value, or a shared
What are need that can be translated into action.
• Insight is what is learned and what will improve your
insights? business. Your business will know better, so you’ll be able to
work better.
Example of Finding, Insight, Recommendation
(NETFLIX)
Finding - Customers are not watching the entire video to its full length. They are
watching 90–95%
Insight - The parts they are not watching are the title roll and the end credits
Recommendation - Introduce ‘Skip Intro’ at the beginning of title rolls and ‘Watch
Next’ at the beginning of end credits. Benchmark 90–95% watched content as
completed and measure if customers move to the next video in the series
Python is an interpreted, object-oriented,
high-level programming language with
dynamic semantics.
What is
Python?
Often, programmers fall in love with
Python because of the increased
productivity it provides. Since there is no
compilation step, the edit-test-debug cycle
is incredibly fast.
What should you know?
1 2 3
Data Issues & Data Data Manipulation
Cleaning Preprocessing
Data Cleaning
Loss of data quality can occur at many stages:
At the time of collection
During digitisation
Where can
loss of data During documentation
Irrelevant values
Issues in Data
Missing values
Inaccurate data
Old data
Cleaning
Instance selection
Data Normalization
Preprocessing &
Manipulation Transformation
Feature extraction
Feature selection
Example:
Indexing &
Slicing of
Data
Reiterating
Numpy Pandas
Tabular data with
Ordered and unordered (not
heterogeneously-typed columns,
necessarily fixed-frequency)
as in an SQL table or Excel
time series data.
spreadsheet
Time series-specific
Robust IO tools for loading
functionality: date range
data from flat files (CSV
Hierarchical labeling of generation and frequency
and delimited), Excel files,
axes (possible to have conversion, moving window
databases, and saving /
multiple labels per tick) statistics, moving window
loading data from the
linear regressions, date
ultrafast HDF5 format
shifting and lagging, etc.
Let’s dive straight to the Hands-on
using Jupyter notebooks
Part 2 – Descriptive Statistics and Data Analytics
01 02 03 04
Descriptive Data Data Analytics Understanding
Statistics Visualization and basic KPIs
using Visualization
Matplotlib using Seaborn
Collect the data and gain the domain knowledge.
Exploratory
Measures of central tendency: mean, median, mode.
KPIs Resource
utilization
Project
Project resource
Management utilization
% of overdue
KPIs project tasks
Why are KPIs important?
1 2 3
Effective company key A good KPI should act as a KPIs translate your
performance indicators compass: a measurement business strategy into
(KPIs) guide a business on of where your business is, manageable, operational
the journey towards its relative to where it has actions, based on the data
strategic goals. come from and where it is you collect and monitor.
going.
Increases management awareness
Visualization
Rules Fourth, add labels for different categories when needed.
01
Time Series
Analysis
Time-Series Analysis
1 2 3
Time Series Time Series Analysis
series of data points comprises methods
indexed (or listed or for analyzing time
graphed) in time series data in order
order to extract meaningful
statistics
Humans are obsessed about their future – so much so that they worry more
about their future than enjoying the present. This is precisely the reason
why horoscopists, soothsayers, and fortune tellers are always in high-
demand.
SEASONALITY
Time Series Analysis
• Trends - A trend is a consistent directional movement in a time
series.
Time Series Analysis
• Seasonal Variation - Many time series contain seasonal variation.
This is particularly true in series representing business sales or
climate levels
Let’s say you want to measure the
sales effectiveness of flu medicine.
Sample Question 1
• You have been given a dataset with some features including the SalesPrice of the house. You do
not have the business knowledge pertaining to the dataset, but would like to find out which are
the features which affect the SalesPrice of the house. Which of the following techniques would
you use?
• Correlation Analysis
• Plotting Bar charts
• Log Plots
• Data Cleaning
Correlation Analysis.
We would use the above because the correlation analysis can give us the strength between
various features and the SalesPrice based on the relationship between them. If there is a strong
relationship the correlation value will be closer to +-1 else it will be closer to 0.
Sample Question 2
• Which technique is most suitable to find anomalies?
• Box plots
• Bar plots
• Correlation Analysis
• Pair plots
Box Plots
The reason for the above technique is because box plots along with plotting 2-3 dimensions of data
shows where the outliers lie with respect to those dimensions. Hence it is easier to visualize and
isolate them.
Landscape of the industry
Analytics usage in the industry
By 2020, there will be 2.7 million job postings for data science and
analytics roles. —BHEF and PwC America’s Data Science and Analytics
Talent: The Case for Action Report
Business Cases in Traditional
Analytics
Banks - Credit loan
Investment
• Behavioural Management • Improved Process
Segmentation of Clients Automation
• Improve Sales • Take better investment • Better Administration
Productivity decisions • Keep a track on trade
• Customized Digital • Automated data
Marketing pipelines
• Execute trade more
Acquisition of effectively Asset
Assets Administration
Thank you