Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
2 views

Module 4 Data Science Visualization Tools

Data visualization is crucial for discovering trends, providing context, and saving time in data analysis. Various tools like D3.js, Google Charts, and MapReduce are discussed for creating interactive visualizations and processing large datasets. The document also highlights the pros and cons of developing custom reporting applications versus using existing company tools.

Uploaded by

Pratibha S
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Module 4 Data Science Visualization Tools

Data visualization is crucial for discovering trends, providing context, and saving time in data analysis. Various tools like D3.js, Google Charts, and MapReduce are discussed for creating interactive visualizations and processing large datasets. The document also highlights the pros and cons of developing custom reporting applications versus using existing company tools.

Uploaded by

Pratibha S
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

MODULE 3

VISUALIZATION
TOOLS
Data visualization is the graphical representation of
information and data.

Why is Data Visualization Important?


1. Data Visualization Discovers the Trends in Data
2. Data Visualization Provides a Perspective on the
Data
3. Data Visualization Puts the Data into the Correct
Context
4. Data Visualization Saves Time
5. Data Visualization Tells a Data Story
Java script Libraries
Dashboard Development Tools
High Charts
Google Charts

Chartkick
d3.js
Data scientists must deliver their new insights to the end user. and results can be communicated in several ways:

•A one-time presentation
•A new viewport on your
data
•A real-time dashboard
•A one-time presentation
Research questions are one-shot deals
because the business decision derived
from them will bind the organization to a
certain course for many years to come.

Take, for example, company investment


decisions: Do we distribute our goods
from two distribution centers or only
one? Where do they need to be located
for optimal efficiency?

When the decision is made, the exercise


may not be repeated until you’ve
retired. In this case, the results are
• A new viewport on your data
The most obvious example here is customer
segmentation. Sure, the segments themselves
will be communicated via reports and
presentations, but in essence they form tools,
not the end result itself.

When a clear and relevant customer


segmentation is discovered, it can be fedback to
the database as a new dimension on the data
from which it was derived.

From then on, people can make their own


reports, such as how many products were sold
to each segment of customers
A real-time dashboard
Sometimes your task as a data scientist doesn’t end when
you’ve discovered the new information you were looking
for.

You can send your information back to the database and


be done with it. But when other people start making
reports on this newly discovered gold nugget, they might
interpret it incorrectly and make reports that don’t make
sense.

As the data scientist who discovered this new information,


you must set the example: make the first refreshable
report so others, mainly reporters and IT, can understand
it and follow in your footsteps.

Making the first dashboard is also a way to shorten the


delivery time of your insights to the end user who wants
to use it on an everyday basis.
Data Visualization options
(For delivering dashboard to end users)
D3.js, or D3, is a free, open-source JavaScript library that
allows users to create interactive data visualizations for
web browsers.

D3.js is built on web standards and uses HTML5, Cascading


Style Sheets (CSS), and Scalable Vector Graphics .

D3.js is used to:


•Attach data to Document Object Model (DOM) elements
•Use CSS, HTML, and SVG to showcase data
•Make data interactive with D3.js data-driven
transformations and transitions

•An open-source JavaScript library for custom dynamic


visualizations with unparalleled flexibility and
expressiveness.

The main reason: compared to what it delivers, an


http://dc-js.github.io/
dc.js/
MapReduce is a programming model and software framework for
processing large amounts of data in parallel:

•How it works
MapReduce breaks down large data sets into smaller chunks and
processes them in parallel. This makes it faster and easier to
process large amounts of data.

•Phases
MapReduce has two phases: Map and Reduce. The Map phase splits
and maps data, while the Reduce phase shuffles and reduces the
data.
•Fault-tolerant
MapReduce is fault-tolerant, which means it can maintain reliable
operations and output even if it's interrupted during processing.
.

MapReduce is part of the Apache Hadoop Ecosystem and uses


Hadoop Distributed File System (HDFS) for input and
output. Hadoop can run MapReduce programs written in various
languages, including Python, Java, and C++.
Flow diagram for Map
Reduce
Numerical:-
MovieLens Data
USER_ID MOVIE_ID RATING TIMESTAMP
196 242 3 881250949
186 302 3 891717742
196 377 1 878887116
244 51 2 880606923
166 346 1 886397596
186 474 4 884182806
186 265 2 881171488
Solution : –
Step 1 – First we have to map the values , it is happen in 1st phase of Map Reduce model.
196:242 ; 186:302 ; 196:377 ; 244:51 ; 166:346 ; 186:274 ; 186:265
Step 2 – After Mapping we have to shuffle and sort the values.
166:346 ; 186:302,274,265 ; 196:242,377 ; 244:51
Step 3 – After completion of step1 and step2 we have to reduce each key’s values.
Now, put all values together

Solution
Do not send enormous loads of data over the
internet or even your internal network though, for
these reasons:

■ Sending a bulk of data will tax the network to


the point where it will bother other users.

■ The browser is on the receiving end, and while


loading in the data it will temporarily freeze.
For small amounts of data this is unnoticeable,
but when you start looking at 100,000 lines, it
can become a visible lag. When you go over
1,000,000 lines, depending on the width of your
data, your browser could give up on you.
Crossfilter,
the JavaScript MapReduce
library
Crossfilter is a JavaScript library for
exploring large multivariate datasets in
the browser. Crossfilter supports
extremely fast (<30ms) interaction with
coordinated views, even with datasets
containing a million or more records
Example: Airline on-time performance

https://square.github.io/crossfilter
It’s time to build the actual application, and the
ingredients of our small dc.js application are as
follows:

■ JQuery—To handle the interactivity


■ Crossfilter.js—A MapReduce library and
prerequisite to dc.js
■ d3.js—A popular data visualization library and
prerequisite to dc.js
■ dc.js—The visualization library you will use to
create your interactive dashboard
■ Bootstrap—A widely used layout library you’ll
use to make it all look better

You’ll write only three files:


■ index.html—The HTML page that contains your
There are multiple reasons why you’d create
your own custom reports instead of opting for
the (often more expensive) company tools out
there:

•No budget—Startups can’t always afford


every tool

• High accessibility—Everyone has a browser

•Available talent—(Comparatively) easy


access to JavaScript developers

• Quick release—IT cycles can take a while

• Prototyping—A prototype application can


provide and leave time for IT to build the
production version.
There are reasons against developing your own
application:

•Company policy—Application proliferation isn’t


a good thing and the company might want to
prevent this by restricting local development.

•Mature reporting team—If you have a good


reporting department, why would you still
bother?

•Customization is is satisfactory—Not
everyone wants the shiny stuff; basic can be
enough.

You might also like