Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
2 views24 pages

DEV RECORD AIDS

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 24

CHRISTIAN COLLEGE OF

ENGINEERING AND TECHNOLOGY


ODDANCHATRAM – 624619, DINDIGUL DT,
TAMILNADU.

RECORD NOTE BOOK

NAME :

REGISTER NO.

DEPARTMENT : ARTIFICIAL INTELLIGENCE & DATA SCIENCE

COURSE :

YEAR/ SEMESTER :

SUBJECTNAME :

SUBJECTCODE :
CHRISTIAN COLLEGE OF ENGINEERING AND
TECHNOLOGY,
ODDANCHATRAM – 624619,
DINDIGUL DT, TAMILNADU.

DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA SCIENCE


AD3301 – DATA EXPLORATION AND VISUALIZATION
LABORATORY
THIRD SEMESTER
CHRISTIAN COLLEGE OF ENGINEERING
AND TECHNOLOGY,
ODDANCHATRAM – 624619,
DINDIGUL, TAMILNADU.

BONAFIDE CERTIFICATE
Certified that this is the bonafide Record of the work done by
Mr./Ms. …………………………………….……. in the
AD3301 – Data Exploration & Visualization Laboratory of
this institution,as per the Anna University,Chennai for the third
Semester Artificial Intelligence & Data Science, during the
period of August 2024 to December 2024.

Staff-In-Charge HOD/CSE

Submitted for the University Practical Examination held on


………………...

Register
Number:

INTERNAL EXAMINER EXTERNAL EXAMINER


INDEX

MARK PAGE
EX.NO. DATE NAMEOFTHE EXPERIMENT SIGN.
NO.

Installing the data Analysis and Visualization Tools


1

Exploratory Data Analysis (EDA) On Datasets Like


2 Email Data Set
Working with Numpy arrays, Pandas data frames,
3 Basic plots using Matplotlib.

4 Explore various variable and row filters in R for


cleaning data. Apply various plot features in R
on sample data sets and visualize.

5 Performing Time Series Analysis And Apply The


Various VisualizationTechniques.
6 Performing Data Analysis and representation on a
Map using various Mapdata sets with Mouse
Rollover effect, user interaction.

Building Cartographic Visualization For Multiple


7 Datasets Involving VariousCountries Of The World
8 Performing EDA on Wine Quality Data Set

9 Using A Case Study On A Data Set And Apply The


Various EDA And VisualizationTechniques And Present
An Analysis Report

10
EXP.NO.: 1
DATE: Installing the data Analysis and Visualization Tools

AIM:
To install the data Analysis and Visualization tool: R/ Python /Tableau Public/ Power BI.

PROGRAM 1:
# importing the pands package import
pandas as pd
# creating rows
hafeez = ['Hafeez', 19] aslan =
['Aslan', 21] kareem =
['Kareem', 18]
# pass those Series to the DataFrame #
passing columns as well
data_frame = pd.DataFrame([hafeez, aslan, kareem], columns = ['Name', 'Age']) #
displaying the DataFrame
print(data_frame)

OUTPUT
If you run the above program, you will get the following results. Name
Age
0 Hafeez 19
1 Aslan 21
2 Kareem 18

PROGRAM 2:
# importing the pyplot module to create graphs import
matplotlib.pyplot as plot
# importing the data using pd.read_csv() method data =
pd.read_csv('CountryData.IND.csv')

# creating a histogram of Time period


data['Time period'].hist(bins = 10)
OUTPUT
If you run the above program, you will get the following results.
<matplotlib.axes._subplots.AxesSubplot at 0x25e363ea8d0>

RESULT:
The installation of the data Analysis and Visualization tool: R/ Python /Tableau
Public/ Power BI are succesfully completed.
EXP.NO.: 2
DATE: Exploratory Data Analysis (EDA) On Datasets Like Email Data Set

AIM:
To perform Exploratory Data Analysis (EDA) on datasets like email data set. Export all
your emails as a dataset, import them inside a pandas data frame, visualize them and get
different insights from the data.

PROGRAM:
Create a CSV file with only the required attributes:

with open('mailbox.csv', 'w') as outputfile:


writer =csv.writer(outputfile)
writer.writerow(['subject','from','date','to','label','thread'])
for message in mbox:
writer.writerow ([message['subject'], message['from'],
message['date'],
message['to'],
message['X-Gmail-Labels'],
message['X-GM-THRID']
The output of the preceding code is as follows:
subject object
from object dateobject
to object labelobject
thread float64dtype:
object
def plot_number_perdhour_per_year(df, ax, label=None, dt=1,
smooth=False,
weight_fun=None, **plot_kwargs):

tod = df[df['timeofday'].notna()]['timeofday'].values year =


df[df['year'].notna()]['year'].values
Ty = year.max() - year.min() T
= tod.max() - tod.min() bins = int(T /
dt)

if weight_fun is None:
weights = 1 / (np.ones_like(tod) * Ty * 365.25 / dt) else:
weights = weight_fun(df) if
smooth:
hst, xedges = np.histogram(tod, bins=bins, weights=weights); x =
np.delete(xedges, -1) + 0.5*(xedges[1] - xedges[0])
hst = ndimage.gaussian_filter(hst, sigma=0.75) f =
interp1d(x, hst, kind='cubic')
x = np.linspace(x.min(), x.max(), 10000) hst = f(x)

ax.plot(x, hst, label=label, **plot_kwargs) else: ax.hist(tod,


bins=bins, weights=weights, label=label,
**plot_kwargs); ax.grid(ls=':',
color='k')
orientation = plot_kwargs.get('orientation')
if orientation is None or orientation == 'vertical':
ax.set_xlim(0, 24)
ax.xaxis.set_major_locator(MaxNLocator(8))
ax.set_xticklabels([datetime.datetime.strptime(str(int(np.mod(ts, 24))),
"%H").strftime("%I %p")
for ts in ax.get_xticks()]); elif
orientation == 'horizontal':
ax.set_ylim(0, 24)
ax.yaxis.set_major_locator(MaxNLocator(8))

ax.set_yticklabels([datetime.datetime.strptime(str(int(np.mod(ts, 24))),
"%H").strftime("%I %p")

for ts in ax.get_yticks()]);
OUTPUT

RESULT:
Thus the above program was executed succesfully.
EXP.NO.: 3
Working with Numpy arrays, Pandas data frames, Basic plots using
DATE: Matplotlib.

AIM:
To Work with Numpy arrays, Pandas data frames, Basic plots using Matplotlib.

PROGRAM 1:
import numpy as np
from matplotlib import pyplot as plt
x = np.arange(1,11) y = 2
*x+5
plt.title("Matplotlib demo")
plt.xlabel("x axis caption")
plt.ylabel("y axis caption")
plt.plot(x,y)
plt.show()

OUTPUT
The above code should produce the following output −

PROGRAM 2:
import pandas as pd
import matplotlib.pyplot as plt
# creating a DataFrame with 2 columns
dataFrame = pd.DataFrame(
{
"Car": ['BMW', 'Lexus', 'Audi', 'Mustang', 'Bentley', 'Jaguar'],
"Reg_Price": [2000, 2500, 2800, 3000, 3200, 3500],
"Units": [100, 120, 150, 170, 180, 200]
}
)

# plot a line graph


plt.plot(dataFrame["Reg_Price"], dataFrame["Units"])
plt.show()

OUTPUT
This will produce the following output −

RESULT:
Thus the above program was executed succesfully.
EXP.NO.: 4
Explore Various Variable And Row Filters In R For Cleaning Data.
DATE:
Apply Various Plot Features In R On Sample Data Sets And
Visualize.

AIM:
To explore various variable and row filters in R for cleaning data. Apply various plot
features in R on sample data sets and visualize.

PROCEDURE:
install.packages("data.table") # Install data.table package
library("data.table") # Load data.table
We also create some example data.
dt_all <- data.table(x = rep(month.name[1:3], each = 3), y =
rep(c(1, 2, 3), times = 3),
z = rep(c(TRUE, FALSE, TRUE), each = 3)) # Create data.table
head(dt_all)

Table 1

x y z

1 January 1 TRUE

2 January 2 TRUE

3 January 3 TRUE

4 February 1 FALSE

5 February 2 FALSE

6 February 3 FALSE

Filter Rows by Column Values


In this example, I’ll demonstrate how to select all those rows of the example data for
which column x is equal to February. With the use of %in%, we can choose a set of values
of x. In this example, the set only contains one value.
dt_all[x %in% month.name[c(2)], ] # Rows where x is February
Table 2

x y z

1 February 1 FALSE

2 February 2 FALSE

3 February 3 FALSE
Filter Rows by Column Values
In this example, I’ll demonstrate how to select all those rows of the example data for
which column x is equal to February. With the use of %in%, we can choose a set of values
of x. In this example, the set only contains one value.
dt_all[x %in% month.name[c(2)], ] # Rows where x is February

Table 2

x y z

1 February 1 FALSE

2 February 2 FALSE

3 February 3 FALSE

Filter Rows by Multiple Column Value


In the previous example, we addressed those rows of the example data for which one
column was equal to some value. In this example, we condition on the values of multiple
columns.
dt_all[x %in% month.name[c(2)] & y == 1, ] # Rows, where x is February and y is 1

Table 3

x y z

1 February 1 FALSE

RESULT:
Thus the above program was executed succesfully.
EXP.NO.: 5
DATE: Performing Time Series Analysis And Apply The Various Visualization
Techniques.

AIM:
To perform Time Series Analysis and apply the various visualization Techniques.

PROGRAM:
import matplotlib as mpl import
matplotlib.pyplot as plt import
seaborn as sns
import numpy as np import
pandas as pd
plt.rcParams.update({'figure.figsize': (10, 7), 'figure.dpi': 120}) #
Import as Dataframe
df=pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv',
parse_dates=['date'])
df.head()

Date Value
0 1991-07-01 3.526591
1 1991-08-01 3.180891
2 1991-09-01 3.252221
3 1991-10-01 3.611003
4 1991-11-01 3.565869
# Time series data source: fpp pacakge in R. import
matplotlib.pyplot as plt
df=pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv',
parse_dates=['date'], index_col='date')
# Draw Plot
def plot_df(df, x, y, title="", xlabel='Date', ylabel='Value', dpi=100):
plt.figure(figsize=(16,5), dpi=dpi)
plt.plot(x, y, color='tab:red')
plt.gca().set(title=title, xlabel=xlabel, ylabel=ylabel)
plt.show()
plot_df(df, x=df.index, y=df.value, title='Monthly anti-diabetic drug sales inAustralia
from 1992 to 2008.')

OUTPUT

RESULT:
Thus the above program was executed succesfully.
EXP.NO.: 6
DATE: Performing Data Analysis and representation on a Map using
various Mapdata sets with Mouse Rollover effect, user interaction.

AIM:
To perform Data Analysis and representation on a Map using various Mapdata sets with
Mouse Rollover effect, user interaction.

PROGRAM:

# 1. Draw the map background fig =


plt.figure(figsize=(8, 8))
m = Basemap(projection='lcc', resolution='h',
lat_0=37.5, lon_0=-119,
width=1E6, height=1.2E6)
m.shadedrelief()
m.drawcoastlines(color='gray')
m.drawcountries(color='gray')
m.drawstates(color='gray')
# 2. scatter city data, with color reflecting population# and
size reflecting area
m.scatter(lon, lat, latlon=True,
c=np.log10(population), s=area,
cmap='Reds', alpha=0.5)
# 3. create colorbar and legend
plt.colorbar(label=r'$\log_{10}({\rm population})$')
plt.clim(3, 7)
# make legend with dummy pointsfor a
in [100, 300, 500]:
plt.scatter([], [], c='k', alpha=0.5, s=a,
label=str(a) + ' km$^2$')
plt.legend(scatterpoints=1, frameon=False,
labelspacing=1, loc='lower left');
OUTPUT

RESULT:
Thus the above program was executed succesfully.
EXP.NO.: 7
DATE: Building Cartographic Visualization For Multiple Datasets Involving
Various Countries Of The World

AIM:
To build cartographic visualization for multiple datasets involving various countries of the
world.

PROGRAM:
alt.Chart(zipcodes).transform_filter (
'-150 < datum.longitude && 22 < datum.latitude && datum.latitude < 55'
). transform_calculate(
digit='datum.zip_code[0]'
).mark_line(
strokeWidth=0.5
).encode( longitude='longitude:Q',
latitude='latitude:Q',
color='digit:N',
order='zip_code:O'

).project(
type='albersUsa'
).properties(
width=900,
height=500
).configure_view(
stroke=None
)

OUTPUT
alt.layer(
alt.Chart(alt.topo_feature(usa, 'states')).mark_geoshape(
fill='#ddd', stroke='#fff', strokeWidth=1
),
alt.Chart(airports).mark_circle(size=9).encode(
latitude='latitude:Q', longitude='longitude:Q',
tooltip='iata:N'
)
).project(
type='albersUsa'
).properties(
width=900,
height=500
).configure_view(
stroke=None
)

OUTPUT

RESULT:
Thus the above program was executed succesfully.
EXP.NO.: 8
DATE: Performing EDA on Wine Quality Data Set

AIM:
To perform EDA on Wine Quality Data Set.

PROGRAM:
#importing libraries

import numpy as np

import pandas as pd
importmatplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
In [4]: 1 #features in data

df.columns
Out [4]: Index([‘fixed acidity’, volatile acidity’, ‘citric acid’, ‘residual sugar’,
;chlorides’, ‘free sulfur dioxide’, total sulfur dioxide’, ‘den sity’,
‘pH’, ‘sulphates’, ‘alcohol’, ‘quality’],
dtype=’object’)
In [5]: #few datapoints

df.head( )

In [13]: sns.catplot(x=‘quality’,data=df,kind=‘count’)

Out [13]: <seaborn.axisgrid.facegrid at0 22b7de0dba8 ?? >


OUTPUT

RESULT:
Thus the above program was executed succesfully.
EXP.NO.: 9
DATE: Using A Case Study On A Data Set And Apply The Various EDA And
Visualization Techniques And Present An Analysis Report

AIM:
To use a case study on a data set and apply the various EDA and visualizationtechniques
and present an analysis report.

PROGRAM:
import datetime
import math
import pandas as pd import
random import radar
from faker import Faker fake =
Faker()
def generateData(n): listdata = []
start = datetime.datetime(2019, 8, 1)
end = datetime.datetime(2019, 8, 30) delta = end -
start
for _ in range(n):

date = radar.random_datetime(start='2019-08-1', stop='2019-08-


30').strftime("%Y-%m-%d")
price = round(random.uniform(900, 1000), 4)
Date Price

2019-08-01 999.598900

2019-08-02 957.870150

2019-08-04 978.674200

2019-08-05 963.380375

2019-08-06 978.092900

2019-08-07 987.847700

2019-08-08 952.669900

2019-08-10 973.929400

2019-08-13 971.485600

2019-08-14 977.036200

listdata.append([date, price])
df = pd.DataFrame(listdata, columns = ['Date', 'Price']) df['Date']
= pd.to_datetime(df['Date'], format='%Y-%m-%d') df =
df.groupby(by='Date').mean()
import matplotlib.pyplot as plt

plt.rcParams['figure.figsize'] = (14, 10) plt.plot(df)


OUTPUT

And the plotted graph looks something like this:

RESULT:
Thus the above program was executed succesfully.

You might also like