DEV RECORD AIDS

CHRISTIAN COLLEGE OF
ENGINEERING AND TECHNOLOGY

ODDANCHATRAM – 624619, DINDIGUL DT,
TAMILNADU.
RECORD NOTE BOOK
NAME :
REGISTER NO.
DEPARTMENT : ARTIFICIAL INTELLIGENCE & DATA SCIENCE
COURSE :
YEAR/ SEMESTER :
SUBJECTNAME :
SUBJECTCODE :
CHRISTIAN COLLEGE OF ENGINEERING AND
TECHNOLOGY,
ODDANCHATRAM – 624619,
DINDIGUL DT, TAMILNADU.
DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA SCIENCE

AD3301 – DATA EXPLORATION AND VISUALIZATION
LABORATORY
THIRD SEMESTER
CHRISTIAN COLLEGE OF ENGINEERING
AND TECHNOLOGY,
ODDANCHATRAM – 624619,
DINDIGUL, TAMILNADU.
BONAFIDE CERTIFICATE
Certified that this is the bonafide Record of the work done by
Mr./Ms. …………………………………….……. in the
AD3301 – Data Exploration & Visualization Laboratory of
this institution,as per the Anna University,Chennai for the third
Semester Artificial Intelligence & Data Science, during the
period of August 2024 to December 2024.
Staff-In-Charge HOD/CSE
Submitted for the University Practical Examination held on

………………...
Register
Number:
INTERNAL EXAMINER EXTERNAL EXAMINER

INDEX
MARK PAGE
EX.NO. DATE NAMEOFTHE EXPERIMENT SIGN.
NO.
Installing the data Analysis and Visualization Tools

1
Exploratory Data Analysis (EDA) On Datasets Like

2 Email Data Set
Working with Numpy arrays, Pandas data frames,
3 Basic plots using Matplotlib.
4 Explore various variable and row filters in R for

cleaning data. Apply various plot features in R
on sample data sets and visualize.
5 Performing Time Series Analysis And Apply The

Various VisualizationTechniques.
6 Performing Data Analysis and representation on a
Map using various Mapdata sets with Mouse
Rollover effect, user interaction.
Building Cartographic Visualization For Multiple

7 Datasets Involving VariousCountries Of The World
8 Performing EDA on Wine Quality Data Set
9 Using A Case Study On A Data Set And Apply The

Various EDA And VisualizationTechniques And Present
An Analysis Report
10
EXP.NO.: 1
DATE: Installing the data Analysis and Visualization Tools
AIM:
To install the data Analysis and Visualization tool: R/ Python /Tableau Public/ Power BI.
PROGRAM 1:
# importing the pands package import
pandas as pd
# creating rows
hafeez = ['Hafeez', 19] aslan =
['Aslan', 21] kareem =
['Kareem', 18]
# pass those Series to the DataFrame #
passing columns as well
data_frame = pd.DataFrame([hafeez, aslan, kareem], columns = ['Name', 'Age']) #
displaying the DataFrame
print(data_frame)
OUTPUT
If you run the above program, you will get the following results. Name
Age
0 Hafeez 19
1 Aslan 21
2 Kareem 18
PROGRAM 2:
# importing the pyplot module to create graphs import
matplotlib.pyplot as plot
# importing the data using pd.read_csv() method data =
pd.read_csv('CountryData.IND.csv')
# creating a histogram of Time period

data['Time period'].hist(bins = 10)
OUTPUT
If you run the above program, you will get the following results.
<matplotlib.axes._subplots.AxesSubplot at 0x25e363ea8d0>
RESULT:
The installation of the data Analysis and Visualization tool: R/ Python /Tableau
Public/ Power BI are succesfully completed.
EXP.NO.: 2
DATE: Exploratory Data Analysis (EDA) On Datasets Like Email Data Set
AIM:
To perform Exploratory Data Analysis (EDA) on datasets like email data set. Export all
your emails as a dataset, import them inside a pandas data frame, visualize them and get
different insights from the data.
PROGRAM:
Create a CSV file with only the required attributes:
with open('mailbox.csv', 'w') as outputfile:

writer =csv.writer(outputfile)
writer.writerow(['subject','from','date','to','label','thread'])
for message in mbox:
writer.writerow ([message['subject'], message['from'],
message['date'],
message['to'],
message['X-Gmail-Labels'],
message['X-GM-THRID']
The output of the preceding code is as follows:
subject object
from object dateobject
to object labelobject
thread float64dtype:
object
def plot_number_perdhour_per_year(df, ax, label=None, dt=1,
smooth=False,
weight_fun=None, **plot_kwargs):
tod = df[df['timeofday'].notna()]['timeofday'].values year =

df[df['year'].notna()]['year'].values
Ty = year.max() - year.min() T
= tod.max() - tod.min() bins = int(T /
dt)
if weight_fun is None:
weights = 1 / (np.ones_like(tod) * Ty * 365.25 / dt) else:
weights = weight_fun(df) if
smooth:
hst, xedges = np.histogram(tod, bins=bins, weights=weights); x =
np.delete(xedges, -1) + 0.5*(xedges[1] - xedges[0])
hst = ndimage.gaussian_filter(hst, sigma=0.75) f =
interp1d(x, hst, kind='cubic')
x = np.linspace(x.min(), x.max(), 10000) hst = f(x)
ax.plot(x, hst, label=label, **plot_kwargs) else: ax.hist(tod,

bins=bins, weights=weights, label=label,
**plot_kwargs); ax.grid(ls=':',
color='k')
orientation = plot_kwargs.get('orientation')
if orientation is None or orientation == 'vertical':
ax.set_xlim(0, 24)
ax.xaxis.set_major_locator(MaxNLocator(8))
ax.set_xticklabels([datetime.datetime.strptime(str(int(np.mod(ts, 24))),
"%H").strftime("%I %p")
for ts in ax.get_xticks()]); elif
orientation == 'horizontal':
ax.set_ylim(0, 24)
ax.yaxis.set_major_locator(MaxNLocator(8))
ax.set_yticklabels([datetime.datetime.strptime(str(int(np.mod(ts, 24))),
"%H").strftime("%I %p")
for ts in ax.get_yticks()]);
OUTPUT
RESULT:
Thus the above program was executed succesfully.
EXP.NO.: 3
Working with Numpy arrays, Pandas data frames, Basic plots using
DATE: Matplotlib.
AIM:
To Work with Numpy arrays, Pandas data frames, Basic plots using Matplotlib.
PROGRAM 1:
import numpy as np
from matplotlib import pyplot as plt
x = np.arange(1,11) y = 2
*x+5
plt.title("Matplotlib demo")
plt.xlabel("x axis caption")
plt.ylabel("y axis caption")
plt.plot(x,y)
plt.show()
OUTPUT
The above code should produce the following output −
PROGRAM 2:
import pandas as pd
import matplotlib.pyplot as plt
# creating a DataFrame with 2 columns
dataFrame = pd.DataFrame(
{
"Car": ['BMW', 'Lexus', 'Audi', 'Mustang', 'Bentley', 'Jaguar'],
"Reg_Price": [2000, 2500, 2800, 3000, 3200, 3500],
"Units": [100, 120, 150, 170, 180, 200]
}
)
# plot a line graph

plt.plot(dataFrame["Reg_Price"], dataFrame["Units"])
plt.show()
OUTPUT
This will produce the following output −
RESULT:
EXP.NO.: 4
Explore Various Variable And Row Filters In R For Cleaning Data.
DATE:
Apply Various Plot Features In R On Sample Data Sets And
Visualize.
AIM:
To explore various variable and row filters in R for cleaning data. Apply various plot
features in R on sample data sets and visualize.
PROCEDURE:
install.packages("data.table") # Install data.table package
library("data.table") # Load data.table
We also create some example data.
dt_all <- data.table(x = rep(month.name[1:3], each = 3), y =
rep(c(1, 2, 3), times = 3),
z = rep(c(TRUE, FALSE, TRUE), each = 3)) # Create data.table
head(dt_all)
Table 1
x y z
1 January 1 TRUE
2 January 2 TRUE
3 January 3 TRUE
4 February 1 FALSE
5 February 2 FALSE
6 February 3 FALSE
Filter Rows by Column Values

In this example, I’ll demonstrate how to select all those rows of the example data for
which column x is equal to February. With the use of %in%, we can choose a set of values
of x. In this example, the set only contains one value.
dt_all[x %in% month.name[c(2)], ] # Rows where x is February
Table 2
x y z
1 February 1 FALSE
2 February 2 FALSE
3 February 3 FALSE
Filter Rows by Column Values
In this example, I’ll demonstrate how to select all those rows of the example data for
which column x is equal to February. With the use of %in%, we can choose a set of values
of x. In this example, the set only contains one value.
dt_all[x %in% month.name[c(2)], ] # Rows where x is February
Table 2
x y z
1 February 1 FALSE
2 February 2 FALSE
3 February 3 FALSE
Filter Rows by Multiple Column Value

In the previous example, we addressed those rows of the example data for which one
column was equal to some value. In this example, we condition on the values of multiple
columns.
dt_all[x %in% month.name[c(2)] & y == 1, ] # Rows, where x is February and y is 1
Table 3
x y z
1 February 1 FALSE
RESULT:
EXP.NO.: 5
DATE: Performing Time Series Analysis And Apply The Various Visualization
Techniques.
AIM:
To perform Time Series Analysis and apply the various visualization Techniques.
PROGRAM:
import matplotlib as mpl import
matplotlib.pyplot as plt import
seaborn as sns
import numpy as np import
pandas as pd
plt.rcParams.update({'figure.figsize': (10, 7), 'figure.dpi': 120}) #
Import as Dataframe
df=pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv',
parse_dates=['date'])
df.head()
Date Value
0 1991-07-01 3.526591
1 1991-08-01 3.180891
2 1991-09-01 3.252221
3 1991-10-01 3.611003
4 1991-11-01 3.565869
# Time series data source: fpp pacakge in R. import
matplotlib.pyplot as plt
df=pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv',
parse_dates=['date'], index_col='date')
# Draw Plot
def plot_df(df, x, y, title="", xlabel='Date', ylabel='Value', dpi=100):
plt.figure(figsize=(16,5), dpi=dpi)
plt.plot(x, y, color='tab:red')
plt.gca().set(title=title, xlabel=xlabel, ylabel=ylabel)
plt.show()
plot_df(df, x=df.index, y=df.value, title='Monthly anti-diabetic drug sales inAustralia
from 1992 to 2008.')
OUTPUT
RESULT:
EXP.NO.: 6
DATE: Performing Data Analysis and representation on a Map using
various Mapdata sets with Mouse Rollover effect, user interaction.
AIM:
To perform Data Analysis and representation on a Map using various Mapdata sets with
Mouse Rollover effect, user interaction.
PROGRAM:
# 1. Draw the map background fig =

plt.figure(figsize=(8, 8))
m = Basemap(projection='lcc', resolution='h',
lat_0=37.5, lon_0=-119,
width=1E6, height=1.2E6)
m.shadedrelief()
m.drawcoastlines(color='gray')
m.drawcountries(color='gray')
m.drawstates(color='gray')
# 2. scatter city data, with color reflecting population# and
size reflecting area
m.scatter(lon, lat, latlon=True,
c=np.log10(population), s=area,
cmap='Reds', alpha=0.5)
# 3. create colorbar and legend
plt.colorbar(label=r'$\log_{10}({\rm population})$')
plt.clim(3, 7)
# make legend with dummy pointsfor a
in [100, 300, 500]:
plt.scatter([], [], c='k', alpha=0.5, s=a,
label=str(a) + ' km$^2$')
plt.legend(scatterpoints=1, frameon=False,
labelspacing=1, loc='lower left');
OUTPUT
RESULT:
EXP.NO.: 7
DATE: Building Cartographic Visualization For Multiple Datasets Involving
Various Countries Of The World
AIM:
To build cartographic visualization for multiple datasets involving various countries of the
world.
PROGRAM:
alt.Chart(zipcodes).transform_filter (
'-150 < datum.longitude && 22 < datum.latitude && datum.latitude < 55'
). transform_calculate(
digit='datum.zip_code[0]'
).mark_line(
strokeWidth=0.5
).encode( longitude='longitude:Q',
latitude='latitude:Q',
color='digit:N',
order='zip_code:O'
).project(
type='albersUsa'
).properties(
width=900,
height=500
).configure_view(
stroke=None
)
OUTPUT
alt.layer(
alt.Chart(alt.topo_feature(usa, 'states')).mark_geoshape(
fill='#ddd', stroke='#fff', strokeWidth=1
),
alt.Chart(airports).mark_circle(size=9).encode(
latitude='latitude:Q', longitude='longitude:Q',
tooltip='iata:N'
)
).project(
type='albersUsa'
).properties(
width=900,
height=500
).configure_view(
stroke=None
)
OUTPUT
RESULT:
EXP.NO.: 8
DATE: Performing EDA on Wine Quality Data Set
AIM:
To perform EDA on Wine Quality Data Set.
PROGRAM:
#importing libraries
import numpy as np
import pandas as pd
importmatplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
In [4]: 1 #features in data
df.columns
Out [4]: Index([‘fixed acidity’, volatile acidity’, ‘citric acid’, ‘residual sugar’,
;chlorides’, ‘free sulfur dioxide’, total sulfur dioxide’, ‘den sity’,
‘pH’, ‘sulphates’, ‘alcohol’, ‘quality’],
dtype=’object’)
In [5]: #few datapoints
df.head( )
In [13]: sns.catplot(x=‘quality’,data=df,kind=‘count’)
Out [13]: <seaborn.axisgrid.facegrid at0 22b7de0dba8 ?? >

OUTPUT
RESULT:
EXP.NO.: 9
DATE: Using A Case Study On A Data Set And Apply The Various EDA And
Visualization Techniques And Present An Analysis Report
AIM:
To use a case study on a data set and apply the various EDA and visualizationtechniques
and present an analysis report.
PROGRAM:
import datetime
import math
import pandas as pd import
random import radar
from faker import Faker fake =
Faker()
def generateData(n): listdata = []
start = datetime.datetime(2019, 8, 1)
end = datetime.datetime(2019, 8, 30) delta = end -
start
for _ in range(n):
date = radar.random_datetime(start='2019-08-1', stop='2019-08-

30').strftime("%Y-%m-%d")
price = round(random.uniform(900, 1000), 4)
Date Price
2019-08-01 999.598900
2019-08-02 957.870150
2019-08-04 978.674200
2019-08-05 963.380375
2019-08-06 978.092900
2019-08-07 987.847700
2019-08-08 952.669900
2019-08-10 973.929400
2019-08-13 971.485600
2019-08-14 977.036200
listdata.append([date, price])
df = pd.DataFrame(listdata, columns = ['Date', 'Price']) df['Date']
= pd.to_datetime(df['Date'], format='%Y-%m-%d') df =
df.groupby(by='Date').mean()
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (14, 10) plt.plot(df)

OUTPUT
And the plotted graph looks something like this:
RESULT:

DEV RECORD AIDS

Uploaded by

Document Informationclick to expand document informationLan manual

Document Informationclick to expand document information

Copyright:

Available Formats

DEV RECORD AIDS

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DEV RECORD AIDS

Uploaded by

Copyright:

Available Formats

CHRISTIAN COLLEGE OF

ENGINEERING AND TECHNOLOGY

RECORD NOTE BOOK

DEPARTMENT : ARTIFICIAL INTELLIGENCE & DATA SCIENCE

DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA SCIENCE

Submitted for the University Practical Examination held on

INTERNAL EXAMINER EXTERNAL EXAMINER

Installing the data Analysis and Visualization Tools

Exploratory Data Analysis (EDA) On Datasets Like

4 Explore various variable and row filters in R for

5 Performing Time Series Analysis And Apply The

Building Cartographic Visualization For Multiple

9 Using A Case Study On A Data Set And Apply The

# creating a histogram of Time period

with open('mailbox.csv', 'w') as outputfile:

tod = df[df['timeofday'].notna()]['timeofday'].values year =

ax.plot(x, hst, label=label, **plot_kwargs) else: ax.hist(tod,

# plot a line graph

Filter Rows by Column Values

Filter Rows by Multiple Column Value

# 1. Draw the map background fig =

Out [13]: <seaborn.axisgrid.facegrid at0 22b7de0dba8 ?? >

date = radar.random_datetime(start='2019-08-1', stop='2019-08-

plt.rcParams['figure.figsize'] = (14, 10) plt.plot(df)

And the plotted graph looks something like this:

You might also like