DEV RECORD AIDS
DEV RECORD AIDS
DEV RECORD AIDS
NAME :
REGISTER NO.
COURSE :
YEAR/ SEMESTER :
SUBJECTNAME :
SUBJECTCODE :
CHRISTIAN COLLEGE OF ENGINEERING AND
TECHNOLOGY,
ODDANCHATRAM – 624619,
DINDIGUL DT, TAMILNADU.
BONAFIDE CERTIFICATE
Certified that this is the bonafide Record of the work done by
Mr./Ms. …………………………………….……. in the
AD3301 – Data Exploration & Visualization Laboratory of
this institution,as per the Anna University,Chennai for the third
Semester Artificial Intelligence & Data Science, during the
period of August 2024 to December 2024.
Staff-In-Charge HOD/CSE
Register
Number:
MARK PAGE
EX.NO. DATE NAMEOFTHE EXPERIMENT SIGN.
NO.
10
EXP.NO.: 1
DATE: Installing the data Analysis and Visualization Tools
AIM:
To install the data Analysis and Visualization tool: R/ Python /Tableau Public/ Power BI.
PROGRAM 1:
# importing the pands package import
pandas as pd
# creating rows
hafeez = ['Hafeez', 19] aslan =
['Aslan', 21] kareem =
['Kareem', 18]
# pass those Series to the DataFrame #
passing columns as well
data_frame = pd.DataFrame([hafeez, aslan, kareem], columns = ['Name', 'Age']) #
displaying the DataFrame
print(data_frame)
OUTPUT
If you run the above program, you will get the following results. Name
Age
0 Hafeez 19
1 Aslan 21
2 Kareem 18
PROGRAM 2:
# importing the pyplot module to create graphs import
matplotlib.pyplot as plot
# importing the data using pd.read_csv() method data =
pd.read_csv('CountryData.IND.csv')
RESULT:
The installation of the data Analysis and Visualization tool: R/ Python /Tableau
Public/ Power BI are succesfully completed.
EXP.NO.: 2
DATE: Exploratory Data Analysis (EDA) On Datasets Like Email Data Set
AIM:
To perform Exploratory Data Analysis (EDA) on datasets like email data set. Export all
your emails as a dataset, import them inside a pandas data frame, visualize them and get
different insights from the data.
PROGRAM:
Create a CSV file with only the required attributes:
if weight_fun is None:
weights = 1 / (np.ones_like(tod) * Ty * 365.25 / dt) else:
weights = weight_fun(df) if
smooth:
hst, xedges = np.histogram(tod, bins=bins, weights=weights); x =
np.delete(xedges, -1) + 0.5*(xedges[1] - xedges[0])
hst = ndimage.gaussian_filter(hst, sigma=0.75) f =
interp1d(x, hst, kind='cubic')
x = np.linspace(x.min(), x.max(), 10000) hst = f(x)
ax.set_yticklabels([datetime.datetime.strptime(str(int(np.mod(ts, 24))),
"%H").strftime("%I %p")
for ts in ax.get_yticks()]);
OUTPUT
RESULT:
Thus the above program was executed succesfully.
EXP.NO.: 3
Working with Numpy arrays, Pandas data frames, Basic plots using
DATE: Matplotlib.
AIM:
To Work with Numpy arrays, Pandas data frames, Basic plots using Matplotlib.
PROGRAM 1:
import numpy as np
from matplotlib import pyplot as plt
x = np.arange(1,11) y = 2
*x+5
plt.title("Matplotlib demo")
plt.xlabel("x axis caption")
plt.ylabel("y axis caption")
plt.plot(x,y)
plt.show()
OUTPUT
The above code should produce the following output −
PROGRAM 2:
import pandas as pd
import matplotlib.pyplot as plt
# creating a DataFrame with 2 columns
dataFrame = pd.DataFrame(
{
"Car": ['BMW', 'Lexus', 'Audi', 'Mustang', 'Bentley', 'Jaguar'],
"Reg_Price": [2000, 2500, 2800, 3000, 3200, 3500],
"Units": [100, 120, 150, 170, 180, 200]
}
)
OUTPUT
This will produce the following output −
RESULT:
Thus the above program was executed succesfully.
EXP.NO.: 4
Explore Various Variable And Row Filters In R For Cleaning Data.
DATE:
Apply Various Plot Features In R On Sample Data Sets And
Visualize.
AIM:
To explore various variable and row filters in R for cleaning data. Apply various plot
features in R on sample data sets and visualize.
PROCEDURE:
install.packages("data.table") # Install data.table package
library("data.table") # Load data.table
We also create some example data.
dt_all <- data.table(x = rep(month.name[1:3], each = 3), y =
rep(c(1, 2, 3), times = 3),
z = rep(c(TRUE, FALSE, TRUE), each = 3)) # Create data.table
head(dt_all)
Table 1
x y z
1 January 1 TRUE
2 January 2 TRUE
3 January 3 TRUE
4 February 1 FALSE
5 February 2 FALSE
6 February 3 FALSE
x y z
1 February 1 FALSE
2 February 2 FALSE
3 February 3 FALSE
Filter Rows by Column Values
In this example, I’ll demonstrate how to select all those rows of the example data for
which column x is equal to February. With the use of %in%, we can choose a set of values
of x. In this example, the set only contains one value.
dt_all[x %in% month.name[c(2)], ] # Rows where x is February
Table 2
x y z
1 February 1 FALSE
2 February 2 FALSE
3 February 3 FALSE
Table 3
x y z
1 February 1 FALSE
RESULT:
Thus the above program was executed succesfully.
EXP.NO.: 5
DATE: Performing Time Series Analysis And Apply The Various Visualization
Techniques.
AIM:
To perform Time Series Analysis and apply the various visualization Techniques.
PROGRAM:
import matplotlib as mpl import
matplotlib.pyplot as plt import
seaborn as sns
import numpy as np import
pandas as pd
plt.rcParams.update({'figure.figsize': (10, 7), 'figure.dpi': 120}) #
Import as Dataframe
df=pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv',
parse_dates=['date'])
df.head()
Date Value
0 1991-07-01 3.526591
1 1991-08-01 3.180891
2 1991-09-01 3.252221
3 1991-10-01 3.611003
4 1991-11-01 3.565869
# Time series data source: fpp pacakge in R. import
matplotlib.pyplot as plt
df=pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv',
parse_dates=['date'], index_col='date')
# Draw Plot
def plot_df(df, x, y, title="", xlabel='Date', ylabel='Value', dpi=100):
plt.figure(figsize=(16,5), dpi=dpi)
plt.plot(x, y, color='tab:red')
plt.gca().set(title=title, xlabel=xlabel, ylabel=ylabel)
plt.show()
plot_df(df, x=df.index, y=df.value, title='Monthly anti-diabetic drug sales inAustralia
from 1992 to 2008.')
OUTPUT
RESULT:
Thus the above program was executed succesfully.
EXP.NO.: 6
DATE: Performing Data Analysis and representation on a Map using
various Mapdata sets with Mouse Rollover effect, user interaction.
AIM:
To perform Data Analysis and representation on a Map using various Mapdata sets with
Mouse Rollover effect, user interaction.
PROGRAM:
RESULT:
Thus the above program was executed succesfully.
EXP.NO.: 7
DATE: Building Cartographic Visualization For Multiple Datasets Involving
Various Countries Of The World
AIM:
To build cartographic visualization for multiple datasets involving various countries of the
world.
PROGRAM:
alt.Chart(zipcodes).transform_filter (
'-150 < datum.longitude && 22 < datum.latitude && datum.latitude < 55'
). transform_calculate(
digit='datum.zip_code[0]'
).mark_line(
strokeWidth=0.5
).encode( longitude='longitude:Q',
latitude='latitude:Q',
color='digit:N',
order='zip_code:O'
).project(
type='albersUsa'
).properties(
width=900,
height=500
).configure_view(
stroke=None
)
OUTPUT
alt.layer(
alt.Chart(alt.topo_feature(usa, 'states')).mark_geoshape(
fill='#ddd', stroke='#fff', strokeWidth=1
),
alt.Chart(airports).mark_circle(size=9).encode(
latitude='latitude:Q', longitude='longitude:Q',
tooltip='iata:N'
)
).project(
type='albersUsa'
).properties(
width=900,
height=500
).configure_view(
stroke=None
)
OUTPUT
RESULT:
Thus the above program was executed succesfully.
EXP.NO.: 8
DATE: Performing EDA on Wine Quality Data Set
AIM:
To perform EDA on Wine Quality Data Set.
PROGRAM:
#importing libraries
import numpy as np
import pandas as pd
importmatplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
In [4]: 1 #features in data
df.columns
Out [4]: Index([‘fixed acidity’, volatile acidity’, ‘citric acid’, ‘residual sugar’,
;chlorides’, ‘free sulfur dioxide’, total sulfur dioxide’, ‘den sity’,
‘pH’, ‘sulphates’, ‘alcohol’, ‘quality’],
dtype=’object’)
In [5]: #few datapoints
df.head( )
In [13]: sns.catplot(x=‘quality’,data=df,kind=‘count’)
RESULT:
Thus the above program was executed succesfully.
EXP.NO.: 9
DATE: Using A Case Study On A Data Set And Apply The Various EDA And
Visualization Techniques And Present An Analysis Report
AIM:
To use a case study on a data set and apply the various EDA and visualizationtechniques
and present an analysis report.
PROGRAM:
import datetime
import math
import pandas as pd import
random import radar
from faker import Faker fake =
Faker()
def generateData(n): listdata = []
start = datetime.datetime(2019, 8, 1)
end = datetime.datetime(2019, 8, 30) delta = end -
start
for _ in range(n):
2019-08-01 999.598900
2019-08-02 957.870150
2019-08-04 978.674200
2019-08-05 963.380375
2019-08-06 978.092900
2019-08-07 987.847700
2019-08-08 952.669900
2019-08-10 973.929400
2019-08-13 971.485600
2019-08-14 977.036200
listdata.append([date, price])
df = pd.DataFrame(listdata, columns = ['Date', 'Price']) df['Date']
= pd.to_datetime(df['Date'], format='%Y-%m-%d') df =
df.groupby(by='Date').mean()
import matplotlib.pyplot as plt
RESULT:
Thus the above program was executed succesfully.