Lab Manual: 18CS3262S Data Modelling and Visualization Techniques
Lab Manual: 18CS3262S Data Modelling and Visualization Techniques
Lab Manual: 18CS3262S Data Modelling and Visualization Techniques
Team DV
K L UNIVERSITY | PE 1- Data Visualization and
Modelling Techniques -18CS3262S
LIST OF EXPERIMENTS
o Pandas o Plotly
o MatPlot o Geoplotlib
o Seaborn o Gleam
o GGPlot o Missingo
o Bokeh o Leather
o Pygal o Altair
3. Write the syntax and describe the parameters used for the following:
Box Plot Pair Plot
Scatter Plot Area Chart
Histogram Violin Plot
Pie Chart Bar Char
Facet Plot
IN-LAB
1. For the given dataset that contains immigration details to Canada from 1980 to 2013,
i. Create an area plot for top 6 immigrant countries from 1990 to 2013
ii. Create and year-wise immigrant bar chart from India to Canada during the
period of 1980 to 2013.
iii. Create a boxplot for Indian, Phillipin and China immigrants.
iv. Show the total no. of immigrants from India and France countries using Area
Chart and Pie chart.
v. Create a scatter Histogram for the immigrants from Fiji and Singapore in the
year 2013.
Link for Data Set -
https://drive.google.com/file/d/1kGJsKY7ezB83DMsluHNR4hvaTf_Xj-N0/view?usp=sharing
2. For the given data set that contains the data of flights that were on time in January for the
years 2019 and 2020. Using the two data sets visualize the data using matplot and plotly
libraries to depict the following:
Show the difference in statistics for distance for both the years using the
appropriate plotting technique.
Visualize the no. of flights whose destination airport id is 11778 and 11267
using a bar plot or bar chart.
Create a Sunburst Plot for both the years depicting the difference among
them.
Link for Data Set –
https://drive.google.com/file/d/1braKG91kzu_KX0I30A__enUWJRWXOaL0/view?usp=sharing
https://drive.google.com/file/d/15saUTpp5nEnQtFHpCoqvpprxMuKWMsxt/view?usp=sharing
POST-LAB
1. Visualize the given Placement Data Full Class dataset that contains details about Campus
Recruitment using the below techniques for appropriate dimensions and differentiate between
the two techniques:
Histogram and Bar Chart [For histogram let no. of bins = 10]
Facet Plot and Pair Plot
Area Chart and Pie Chart [For yes or no data]
Viva Questions:
Give a description about the syntax along with parameters and functionalities for the
following visualization techniques:
a. Network
b. Animation
c. Cheat Sheets
d. Data Art
e. Colour
f. 3D
g. Word Cloud
h. Density
Experiment -2: Oracle database connectivity using Python.
Pre-Lab:
1. Write the steps to connect to oracle database using python.
2. Mention the functionalities and methods in cx_Oracle library in Python.
3. Describe about SQL Alchemy library in Python,
In-Lab:
1. Find out output of the join operation applied to the company database.
Apply inner join type to the following queries, apart from this apply other joins type to the first
question.
List the name of all employees who works for the research department.
For every project located at ‘Stafford’ list the project number, the controlling Dept
number and Dept manages lastname.
Find the name of all employees who works on the projects controlled by Dno=4.
Make the list of project numbers for projects that involve an employee whose last
name is ‘Jennifer’ as a worker or as a manager of the dept that controls the project.
List the name of the employees who have no dependents.
List he name of manger who have at least one dependent.
2. Perform the below tasks to understand oracle database connectivity using Python.
a. Write a python code to connect oracle database.
b. Create an oracle table using python connectivity.
c. Insert data into the oracle table using python code.
Single row.
Multiple rows using for loop.
d. Perform a simple SQL query applied to the recently created database through python.
Post-Lab:
1. Create the given tables and insert the data given in Customers File into Oracle database using
SQLPLus.
a. Perform the below given tasks using python for the database created in 3a.
i. Retrieve the data from the above database and convert into pandas dataframe.
ii. Plot a scatter plot against customer code and customer city.
iii. Plot a line plot against customer code and payment amount.
iv. Create a pie chart for outstanding amount.
Link for Data -
https://drive.google.com/file/d/1nnkVqzrPniIE5tcTdc-qECvkZlZzZ7r5/view?usp=sharing
Experiment -3: Visualization of Semi-Structured Data
PRE-LAB
1. Give examples of semi-structured data and how to read semi-structured data in python?
2. Give examples of unstructured data and how to read unstructured data in python?
3. Describe the functionalities and methods that are available in JSON package.
4. Why is Tweepy library used in python? Describe the following methods along with the
parameters from this library.
a. Tweepy.api
b. Api.home_timeline
c. Api.user_timeline
d. Api.get_status
e. Api.retweet
f. Api.get_user
IN-LAB
1. Create a dictionary for the below data and convert the data into JSON using dump() method
from JSON package.
2. For the below given data set which contains world population in json format:
https://query.data.world/s/uvvfp4usm2q4mlapbqtoi2stgunwda
i. Read the data using pandas in column orient.
ii. Using appropriate plotting technique visulaize the given data on the basis of
population feature.
iii. In a histogram show all the counties whose population is less than Italy given
that the population of the country is 59797978.
POST-LAB
1. For the given reviews dataset that contains the data of reviews of women clothes:
i. Clean the data by removing/dropping the rows where the review is missing.
ii. Using TextBlob calculate sentiment polarity.
iii. Create two new features for word count and length of the review.
iv. Using iplot() show the distribution of sentiment polarity score.
v. Visualize the statistics of the review ratings using appropriate plotting
technique.
vi. Plot a pie diagram to show the distribution of the viewers age.
vii. Plot a graph depicting how long the reviews are.
PRE-LAB:
1. Why is Tableau considered a powerful tool for data visualization? Explain five features of
Tableau.
2. What is a view in Tableau? Write the steps involved in creating view?
3. Describe the data types involved in tableau.
4. What is “Show me” in Tableau? Explain Show Me with Two fields and Show Me with
Multiple Fields.
5. What is meant by dimensions and measures in Tableau?
6. What do you understand by the term Data Aggregation (Tableau) and what are the
common aggregate functions in Tableau?
IN-LAB
1. Connect the given Bus Safety dataset to Tableau and perform the below tasks on separate
sheets.
b. Go to meta-data of the dataset and change the column name form ‘Date Of Incident’
to ‘Date’ and ‘Bus Garage’ to ‘Garage’.
c. Visualize the no. of Incidents by different Operators and Explore various possible
charts.
d. Show a pie chart depicting the age categories as Adult, Child, Elderly and Unknown
and no. of incidents in each category.
e. Show the statistics of Route No.’s in purple colour Bar Chart.
f. Create a chart for ‘Borough’ feature depicting the total count of each and then sort
it in ascending order.
g. Depict the no. of incidents under the eight Incident Event Types for each of the
Boroughs in the form of horizontal bar chart.
https://drive.google.com/file/d/1smRDZwBMeJlJWXNAoy8wFqX9iDgIAgJY/view?usp=sharing
2. Using the given movie data perform following tasks in tableau
i. Create a bar graph which shows average count of different languages from year
2000 to 2016.
ii. Create a tree chart showing budget spent by different countries.
https://drive.google.com/file/d/1DKjup74jwD_EJ23Bibo-E3A4mLuQ6gQM/view?usp=sharing
POST-LAB
1. For the inbuilt sample Superstore data set, visualize the below given attributes for the given
aggregate functions.
a. Rename Sheet-1 as Orders and load orders sheet from the workbook and create a
bar chart for the four regions against discount measure with aggregate function as
Average.
b. Create a horizontal bar chart for depicting total no. of items in various categories.
c. Create a circle view against year (ship date) and sum of profits of that year.
d. Write down the insights and an analysis report for the case study.
https://drive.google.com/file/d/1TLrbQeoA9oxI4UwwO5cAiq8YZG9OMdpq/view?usp=sharing
https://drive.google.com/file/d/1D5Mhf-P5lHBbBJPoN9S6EKn81tW5KbjN/view?usp=sharing
Experiment -5: Visual Encodings in Tableau and basic dashboards
Pre-Lab:
4. There is a huge demand for dashboard in Business Meetings. Why do you think dashboards
play a dominant role in understanding various aspects of the company and how?
IN-LAB
1. For the given dataset FIFA.csv that contains data about various football players, perform the
following tasks on separate sheets:
a. After connecting the data use the data interpreter and clean the data.
b. Create a horizontal bar chart to depict the International Reputation of various nations on
an avg.
c. Check if there is any relation between wage and position(left/right). If yes, describe the
relation.
d. Plot a bar chart against Avg. Heading Accuracy and Body Type. Find out which body type
has highest and least accuracy.
e. Create a yellow colored Tree Chart to depict the total penalties of each nation and thus
determine the highest and lowest.
f. Using the above sheets create a dashboard and write an analysis report of what insights
can be drawn from this.
https://drive.google.com/file/d/1E2KbOZ1Bgw8jSsxh4UmlMQ_wy4vI4aJE/view?usp=sharing
2. Visualizing data can be very interesting when different features and interesting shapes come
into picture. Observe the below visualization carefully and jot down the insights that you get
from it. Based upon your insights try to recreate the given visualization for the given Pokémon
data set.
https://drive.google.com/file/d/1bRLGr5uZiFt3AxS4CUFRviQvxwOknfYO/view?usp=sharing
Post-Lab:
1. For given COVID-19 dataset that contains data regarding the cases in US. Create a dashboard
which includes the below given visualizations. Make an analysis report on the same.
https://drive.google.com/file/d/1Tt1xSy9tFaZ2Lqll5BWjZa5UFx-6FrDV/view?usp=sharing
Experiment -6: Interactive Plots using Python
PRE-LAB
IN-LAB
1. Using the in-built “Car Crashes” dataset from seaborn library perform the below tasks in order to
depict interactive plots.
iii. Using bar iplot, display the mean of all columns in the original dataset.
iv. Visualize a scatter matrix plot for the dataset. (The scatter matrix plot is basically a set of
all the scatter plots for numeric columns in your dataset)
vii. Visualize 3D iplot for the data and give your insights so as to why and when should 3D
visualization be used.
2. Visualize the below cases using interactive plots using Pygal. Later, write down the differences
that you observed in Pygal comparing it to other libraries.
i. You work in Covid-19 centre and you want to know many cases are active, recovered and
dead in percentages using solid gauge.
Data total:1275000 death:225000 recoverd:1202500 active:5000
ii. Given 4 students marks in the subjects DV, DWM, DC, ADA respectively, plot a dot chart
to understand the differences in statistics.
Data: S1:75,85,65,95
S2:55,45,65,85
S3:86,75,65,95
S4:89,92,85,90
iii. Using the above question’s (2.ii) data, plot a funnel chart.
POST-LAB
1. Using the in-built “Gap minder” dataset from plotly express perform the below tasks in order to
create interactive plots.
i. Create a box plots and Violin plot lifeExp for different continentii. Visualize an
interactive bar plot for df.
ii. Plot a scatter chart for lifeExp and gdpPercap and colour it by continent and as there
is huge variation in lifexp and gdppercap values normalize gdppercap value
iii. Create a bubble chart on lifeExp and gdpPercap where density of bubble is based on
population and colour by country (Hint: Edit above scatter code to transform into
bubble chart.).
iv. Create a timeline animation for the give dataset on lifeExp and gdpPercap for every
change (Hint: Use above bubble chart code and look for animation_frame and
animation_group arguments) and write down what insights you found out from the timeline
animation
iv. Create a 3d plot on attributes lifeExp, pop and gdpPercap.
Experiment -7: Hierarchical Data and Topographical Data Visualizations in Tableau.
PRE-LAB
1. Why is there a need for hierarchical and topographical kinds of data in real time?
2. Mention the various types of visualizations that can be done in Python for Hierarchical and
Topographical data.
3. Is there a need for the visualization to interact with the users? If yes, how can they be
implemented?
4. What is meant by actions in Tableau? How can actions be implemented and what are the
advantages of incorporating actions into our visuals?
IN-LAB
1. Vishal is appointed as a Food-Inspecting officer and is assigned a task to distinguish the categories
of food visually where in each food item falls into a sub-group and this sub-group is a part of the main
category. Help Vishal accomplish his task using data visualization concepts in Tableau with
appropriate technique and chart.
Link for Data -
https://drive.google.com/file/d/1EK6CdMea1M5sWr0dwFA2SiJRvJr2SXis/view?usp=sharing
2. The given data set consists information about wages from various states and substates in US. Create
a geolocational Map on Tableau and incorporate basic actions such that when clicked on any one of
the states its substates should be shown and the tax returns filed. If none of the states is clicked only
the country with 5 divisions only.
Link for Data Set –
https://drive.google.com/file/d/1XyX77we36q4UHQ3QA4cC7vibJkhi9Zro/view?usp=sharing
POST-LAB
1. A supermarket company named Dango collected data regarding sales of their products and the
same is attached below. It is trying to understand its profits based on category, sub-category and
product segment. Help the company understand it in a better way by implementing Data Visualization
Concepts in Tableau.
Link for Data -
https://drive.google.com/file/d/1TLrbQeoA9oxI4UwwO5cAiq8YZG9OMdpq/view?usp=sharing
Experiment -8: Calendar Heatmaps and Flow data Visualizations in Python
PRE-LAB
1. Is 2-D graphical representation useful in real time scenarios? How can they be implemented
in Tableau and in Python?
2. What are annotated heat maps?
3. What do you understand by the term “Flow Data”? Explain in brief about the functionalities,
syntax, variants and use cases for the below given types of charts/diagrams.
i. Chord Diagram
ii. Network Chart
iii. Sankey Diagram
4. Describe the functionality, syntax and parameters for the below given methods and specify to
which package/library they belong to:
i. imshow() iv. set_xticks()
ii. tick_params() v. set_xticklabels()
iii. tick_top() vi. spines[]
IN-LAB
1. For the given data that contains data about snowfall in a country in the years 2018 and 2019, depict
the days when there was snowfall thereby showing the average snowfall in those years. Create two
different heatmaps for each of the years.
Link for Data -
https://avaa.tdata.fi/smear-
services/smeardata.jsp?variables=pwd_smm&table=KUM_META&from=2018-01-01
00:00:00.102&to=2019-12-22
23:59:59.344&quality=ANY&averaging=NONE&type=NONE
2. The given data set consists information about migrants from New Zealand. Using this data draw a
Sankey chart between New Zealand and Asia, Australia, Africa and the Middle East, Europe,
Americas, Oceania.
Later, draw a Sankey chart between New Zealand and these country (Austria, Belgium, Bulgaria,
Croatia, Denmark, Finland, France, Germany, Greece, Ireland, Italy, Netherlands, Norway, Poland,
Romania, Russia, Spain, Sweden, Switzerland, Ukraine).
Link for Data Set –
https://drive.google.com/file/d/1AFz5oUzHWqyKgW0Xs2vzmvlUdSTJfziM/view?usp=sharing
POST-LAB
1. For the given data that contains data about temperatures in a country in the years 2018 and 2019,
depict the days when there was snowfall thereby showing the average temperatures in those years.
Create two different heatmaps for each of the years.
Link for Data -
https://avaa.tdata.fi/smear-
services/smeardata.jsp?variables=t&table=KUM_META&from=2018-01-01
00:00:00.112&to=2019-12-31
23:59:59.408&quality=ANY&averaging=NONE&type=NONE
Experiment -9: Time Series Data Visualization.
PRE-LAB
1. Describe about time-series data and explain the use-cases in real time.
2. Explain in brief about the functionalities, syntax, variants and use cases for the below given
types of charts/diagrams.
i. Line Chart
ii. Bar Chart
iii. Area Chart
iv. Gantt Chart
v. Heat Map
vi. Polar Chart
vii. Stream Graph
IN-LAB
1. For the given dataset that contains temperatures of major cities do the following tasks.
i. Depict the average temperatures of the seven regions with the help of a bar chart.
ii. Create a line chart to depict Year-Wise average mean temperature of the seven regions.
iii. Depict the variation of maximum temperature over the months for 7 regions in a single bar
chart.
iv. Show the variation of mean temperature for top 20 countries in a horizontal bar chart.
v. Show the mean temperature variations of India in the cities Delhi, Chennai, Calcutta and
Mumbai in a line chart.
Link for Data -
https://drive.google.com/file/d/1xdtF8mlaQPZGgT9Iz-htnilgPXJ40Pg6/view?usp=sharing
2. The given nightvisitors dataset contains data about Night Visitors in Australian regions. Create a
stacked area chart to depict the data from 1998 to 2011.
Link for Data -
https://drive.google.com/file/d/1V4AtDHQmTDQWjvcXXlIvFJ-Zokrmiwto/view?usp=sharing
3. For the given Yahoo dataset plot a calendar heatmap depicting the Stock Prices in the year 2014.
[The below image is a hint to give a basic idea. Everyone’s output need not be the same.]
2.
a. You've been given a timeseries dataset for different countries and their data. As part of his
homework, Sharma is asked to
Analyse and understand the relationship among the GDP and the life expectancy of countries.
Help him visualise using heat map. Import data using plotly express module (gapminder).
b. A company records the time taken (number of days) and the percentage of completion to the tasks
given to its employees.
Now, the company wants to analyse all tasks given showing the time taken and completion percent.
Assist the company in visualising using Gantt chart.