Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
11 views

Exploratory Data Analysis

Yes
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Exploratory Data Analysis

Yes
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Exploratory Data Analysis

Lab Exercise 1: Summary Statistics and Data Visualization


Problem Statement:
Use the mtcars dataset available in R. Calculate summary statistics (mean, median, standard deviation)
for the mpg (miles per gallon) column. Then, create a histogram and a boxplot for the same column.

Lab Exercise 2: Correlation Analysis


Problem Statement:
Use the iris dataset. Calculate the correlation matrix for the numerical variables in the dataset. Create a
pairs plot to visualize the relationships between these variables.

Lab Exercise 3: Data Cleaning and Handling Missing Values


Problem Statement:
Create a sample dataset with some missing values. Handle the missing values by imputing the mean for
numerical columns and the mode for categorical columns.

Lab Exercise 4: Outlier Detection


Problem Statement:
Using the mtcars dataset, detect outliers in the hp (horsepower) column using the IQR method. Display
the rows that contain outliers.

Lab Exercise 5: Data Transformation and Visualization


Problem Statement:
Use the iris dataset. Normalize the Sepal.Length column and create a density plot for the normalized
values. Also, create a scatter plot between the normalized Sepal.Length and Sepal.Width.
Answers

Lab Exercise 1:
# Load the dataset
data(mtcars)

# Calculate summary statistics


mean_mpg <- mean(mtcars$mpg)
median_mpg <- median(mtcars$mpg)
sd_mpg <- sd(mtcars$mpg)

# Display the summary statistics


mean_mpg
median_mpg
sd_mpg

# Create a histogram
hist(mtcars$mpg, main="Histogram of MPG", xlab="Miles Per Gallon", col="blue")

# Create a boxplot
boxplot(mtcars$mpg, main="Boxplot of MPG", ylab="Miles Per Gallon", col="green")

Lab Exercise 2
# Load the dataset
data(iris)

# Calculate the correlation matrix


cor_matrix <- cor(iris[, 1:4])

# Display the correlation matrix


cor_matrix

# Create a pairs plot


pairs(iris[, 1:4], main="Pairs Plot of Iris Dataset", col=iris$Species)

Lab Exercise 3
# Create a sample dataset with missing values
sample_data <- data.frame(
Age = c(25, 30, NA, 22, 40, NA, 35),
Gender = c("Male", "Female", "Female", NA, "Male", "Male", NA)
)
# Define a function to impute the mean for numerical columns
impute_mean <- function(x) {
x[is.na(x)] <- mean(x, na.rm = TRUE)
return(x)
}

# Define a function to impute the mode for categorical columns


impute_mode <- function(x) {
x[is.na(x)] <- names(sort(table(x), decreasing = TRUE))[1]
return(x)
}

# Impute missing values


sample_data$Age <- impute_mean(sample_data$Age)
sample_data$Gender <- impute_mode(sample_data$Gender)

# Display the cleaned dataset


sample_data

Lab Exercise 4
# Load the dataset
data(mtcars)

# Calculate the IQR for the hp column


Q1 <- quantile(mtcars$hp, 0.25)
Q3 <- quantile(mtcars$hp, 0.75)
IQR_hp <- IQR(mtcars$hp)

# Define the outlier boundaries


lower_bound <- Q1 - 1.5 * IQR_hp
upper_bound <- Q3 + 1.5 * IQR_hp

# Detect outliers
outliers <- mtcars[mtcars$hp < lower_bound | mtcars$hp > upper_bound, ]

# Display the rows containing outliers


outliers

Lab Exercise 5
# Load the dataset
data(iris)
# Normalize the Sepal.Length column
normalize <- function(x) {
return((x - min(x)) / (max(x) - min(x)))
}
iris$Sepal.Length.Normalized <- normalize(iris$Sepal.Length)

# Create a density plot for the normalized values


plot(density(iris$Sepal.Length.Normalized), main="Density Plot of Normalized Sepal Length",
xlab="Normalized Sepal Length")

# Create a scatter plot between the normalized Sepal.Length and Sepal.Width


plot(iris$Sepal.Length.Normalized, iris$Sepal.Width, main="Scatter Plot of Normalized Sepal Length vs
Sepal Width", xlab="Normalized Sepal Length", ylab="Sepal Width", col=iris$Species)

You might also like