0% found this document useful (0 votes)

16 views

Commands for Data Analysis using R

The document provides a comprehensive overview of statistical analysis in R, covering single-value summary statistics, data frames, contingency tables, and various statistical tests such as t-tests, ANOVA, and regression analysis. It includes examples of data manipulation and visualization techniques, as well as commands for importing data from CSV files. Additionally, it outlines methods for conducting normality tests, correlation analysis, and chi-squared tests for association.

Uploaded by

nihalpj1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views

Commands for Data Analysis using R

Uploaded by

nihalpj1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

Summary Statistics for Single set of Data (with vectors)

Examples for Vectors and Data frame

# Sample vector of daily sales for a retail store (in dollars)
daily_sales <- c(5000, 6200, 4800, 5500, 7200, 6300, 5100, 4800, 5400, 6200, 5800, 7000,
6800, 5500, 6100, 5300, 4700, 5900, 6200, 6500, 7200, 6800, 5600, 4800, 5200, 6100, 5800,
7200, 6900, 5500, 6100)

# Calculate single-value summary statistics

mean_sales <- mean(daily_sales)
median_sales <- median(daily_sales)
std_deviation <- sd(daily_sales)
min_sales <- min(daily_sales)
max_sales <- max(daily_sales)

Summary Statistics with Data frame-(data frame is a type of data structure with two or
more set of data)

NOTE:
There are two ways of writing data frame
1. Writing the datas separately such as
Data1= c(1,2,3,4,5,6,7,8,9,20)
Data2= c(2,4,6,8,10,12,14,16,18,20)
So here there is two set of data and now you want this in table format.
Then the command is
Give any variable name eg:
df= data.frame(Data1=Data1, Data2=Data2) #click enter
df (#click enter)
2. The other way is
df = data.frame("Name" = c("Amiya", "Rosy", "Asish"),"Gender" = c("Male",
"Female", "Male"))
df
(the difference here is instead of writing each data separately you just write directly in
one command)
These are the some of examples

data_frame_data <- data.frame(

Name = c("Alice", "Bob", "Charlie"),
Math = c(85, 92, 78),
Science = c(88, 90, 85),
History = c(75, 82, 90))

#Contingency table

df = data.frame("Name" = c("Amiya", "Rosy", "Asish"),"Gender" = c("Male", "Female",

"Male"))

> table(df)

Output:

Gender

Name Female Male

Amiya 0 1

Asish 0 1

Rosy 1 0

# Sample data
gender <- c("Male", "Female", "Male", "Female", "Male", "Female", "Male", "Female",
"Male", "Female")
brand <- c("Apple", "Samsung", "Samsung", "Apple", "Samsung", "Google", "Apple",
"Google", "Samsung", "Other")
#While using strings it is important to notes that it should have “…..” (inside the
brackets only)
# Create a data frame
data_df <- data.frame(Gender = gender, Brand = brand)

# Using the xtabs() function

cross_tab2 <- xtabs(~ Gender + Brand, data = data_df)
print(cross_tab2)

Importing Data from Excel or CSV File

File searching
Command IN R Software:
getwd()

enter
"C:/Users/Hp/Documents"

data1=read.csv(file.choose()) enter
data1 Enter

COMMANDS FOR DATA ANALYSIS FOR DIFFERENT TESTS

Tests Codes Data

Shapiro-Wilk #summary statistics for channel 1 ch1_data <- c(7, 7,
normality test 8, 8, 9, 10, 11, 11,
summary(ch1_data) 12, 12, 12, 13, 14,
15, 17, 17, 17, 18,
#summary statistics for channel 2
18, 19)
> summary(ch2_data)
ch2_data <- c(36,
# Step 3: Create a histogram 21, 27, 39, 33, 42,
> hist(ch1_data, main = "DeliveryTimes Ch1", 25, 30, 31, 37, 35,
xlab = "Delivery Time (hrs)", col = "lightblue", 29, 23, 34, 41, 23,
border = "black") 32, 32, 30, 39)

> hist(ch2_data, main = "DeliveryTimes Ch2",

xlab = "Delivery Time (hrs)", col = "lightblue",
border = "black")

# Step 4: Density plot and visualize the data

for channel 1 and 2

dens = density(ch1_data)
> plot(dens$x, dens$y)

#For channel 2

> dens = density(ch2_data)

> plot(dens$x, dens$y)

# Step 5: Conduct the Shapiro-Wilk

normality test

shapiro_test_result = shapiro.test(ch1_data)
> print(shapiro_test_result)

ks_test_result = ks.test(ch1_data, ch2_data)

> print(ks_test_result)

# Step 7: Create a QQ plot to visually assess

the goodness-of-fit

> qqnorm(ch1_data)
qqline(ch1_data)
> qqnorm(ch2_data)

> qqline(ch2_data)

# Step 8: Draw qqplot to compare the two

channels
> qp = qqplot(ch1_data,ch2_data)

Student's t-test t_test_result <- t.test(group_new, group_new <- c(80,

group_traditional) 85, 88, 92, 78, 90,
84, 88, 85, 89)
print(t_test_result) group_traditional <-
c(75, 82, 79, 88, 70,
81, 75, 80, 78, 83)

one-sample t- t_test_result <- t.test(sample_data, mu = 75) # sample_data <-

testing mu is the hypothesized population mean c(72, 74, 78, 70, 76,
print(t_test_result) 73, 77, 75, 79, 71,
74, 76, 80, 72, 74,
75, 73, 75, 78, 76,
73, 74, 76, 77, 75,
72, 78, 74, 76, 75)
(assume population
mean’s test score is
75.

#Two sample T- # Perform a two-sample t-test with unequal group1 <- c(22, 24,
test with unequal variances (Welch's t-test) 25, 28, 26)
variance (by t_test_result <- t.test(group1, group2, var.equal group2 <- c(30, 32,
default it = FALSE) 31, 35, 33)
assumes equal
variance) # Print the results
print(t_test_result)
(R has equal variance built in function already,
but in this analysis we are going with unequal
variances so while using the above command it
is important to write var.equal=FALSE
Incase you want equal variance then the
command would be var.equal=TRUE

one-tailed paired t_test_result <- t.test(after_training, before_training <-

samples t-test before_training, alternative = "greater") c(50, 55, 48, 52, 45,
print(t_test_result) 47, 53, 49, 51, 50)
after_training <-
c(58, 62, 55, 60, 54,
56, 61, 57, 59, 58)

two-sample result = wilcox.test(after, before, paired = after= c(4, 3, 4, 2, 3)

paired Wilcoxon TRUE) before= c(6, 7, 8, 5,
U-test print(result) 7)

#Covariance data <- data.frame(

(Without xlfile) Student = 1:10,
Hours_Studied = c(2, 3, 1, 4, 5, 2, 3, 1, 4, 5),
Exam_Score = c(65, 75, 60, 80, 90, 70, 75, 55,
85, 95)
)
# Calculate the covariance between
Hours_Studied and Exam_Score
covariance_matrix <- cov(data$Hours_Studied,
data$Exam_Score)

# Print the covariance

>covariance_matrix
#Correlation # Calculate the Pearson correlation tv_ad_spend <-
(Without xl file) coefficient c(5000, 5500, 6000,
>correlation_coefficient <- cor(tv_ad_spend, 5500, 5800, 6200,
sales_revenue, method = "pearson") 6500, 7000, 7500,
7200)
# Print the correlation coefficient >sales_revenue <-
>correlation_coefficient c(75000, 78000,
82000, 76000,
80000, 84000,
87000, 91000,
95000, 93000)

#Test for # Create a data frame with demographic data

Association >data <- data.frame(
using Gender = c("Male", "Female", "Male",
chi_squared_test "Female", "Male", "Female", "Male", "Female",
(without xl file) "Male", "Female"),
Education_Level = c("High School",
"College", "High School", "College",
"Graduate", "High School", "College",
"Graduate", "High School", "Graduate")
)

# Create a contingency table (cross-

tabulation) of the two variables
>contingency_table <- table(data$Gender,
data$Education_Level)
#Check how the contigency table looks like
>contingency_table

# Perform a chi-squared test for

independence
>chisq.test(contingency_table)
#Test for data1=read.csv(file.choose())
Association > data1
using
chi_squared_test
(with xl file)

# Step2: Generate a cross contingency table

summing up total respondent for each price-
rating combination
> contingency_table
=xtabs(Number.of.respondents ~ Price + Rating,
data = data1)

> contingency_table

# Step 4: Run chi-squared test to test the

Hypothesis
> chisq.test(contingency_table)

One Way students_data= read.csv(file.choose())

ANOVA > students_data
(with xlfile) #Step 2: visualization of mean using Boxplot
boxplot(Test_Score ~ Teaching_Method, data =
students_data, col = "lightblue", pch = 18, main
= "Distribution of Test Scores by Teaching
Method", xlab = "Teaching Method", ylab =
"Test Score")

#Step 3: One way ANOVA

anova_result <- aov(Test_Score ~

Teaching_Method, data = students_data)

> summary(anova_result)

#Step 4: Tukey HSD post hoc testing

tukey_results <- TukeyHSD(anova_result)

> print(tukey_results)

Two Way #Step 1: load the CSV file and organize it into
ANOVA data frames.
GTL=read.csv(file.choose())

> GTL

# Step 2: Using box plot, visualizing light vs

temperature for different glass types.
boxplot(Light ~ Temp * Glass, data = GTL, col
= c("lightblue", "lightgreen"), main = "Boxplot
of Light vs Temperature for Different Glass
Types",xlab = "Temperature",ylab = "Light")
# Step 3: Formulate a hypothesis about the
effect of glass type and temperature on light
output. Run Two way ANOVA.

anova_result= aov(Light ~ Glass * Temp, data =

GTL)

> summary(anova_result)

# Step 4: Conduct Post-hoc testing

TukeyHSD(anova_result)

Linear Step 1: load the data and organize it into data

Regression frames.
> height= c(65, 62, 60, 64, 68, 70, 68, 65)

> weight= c(75, 70, 65, 72, 75, 80, 72, 64)

> student_data= data.frame(Height=height,

Weight=weight)

> student_data

#Create a simple regression of Weight vs

Height
reg=
lm(student_data$Weight~student_data$Height)

> summary(reg)

#Find correlation coefficient and intercept

correlation=cor(student_data$Height,
student_data$Weight)

summary(correlation)

Multiple sales_data= data.frame(Sales = sales, sales =c(10, 15, 12,

Regression Advertising = advertising, Pricing = pricing, 18, 20, 22, 25, 28,
Competitor_Pricing = competitor_pricing) 39, 32)
> sales_data > advertising=c(5, 6,
6, 8, 10, 12, 15, 16,
Note: in case you want to input the data into
18, 20)
data structures(vectors etc) that is without
xlfiles, for regression or any kind of analysis > pricing= c(20, 18,
which involves more columns in such cases 16, 15, 14, 13, 12,
you need to convert the vector in a data 11, 10, 9)
frame.
>
# Step 2: Create a regression model
competitor_pricing=
> reg_model=lm(Sales ~ Advertising + Pricing +
c(18, 17, 16, 16, 15,
Competitor_Pricing, data = sales_data)
14, 13, 12, 11, 10)
> reg_model

> summary(reg_model)

Bike Sharing Assignment
100% (6)
Bike Sharing Assignment
7 pages
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Krijnen IntroBioInfStatistics
No ratings yet
Krijnen IntroBioInfStatistics
278 pages
360DigiTMG Practical Data Science New
100% (1)
360DigiTMG Practical Data Science New
168 pages
R Console
No ratings yet
R Console
6 pages
BAN5
No ratings yet
BAN5
2 pages
R Practicals
No ratings yet
R Practicals
32 pages
R_record-1
No ratings yet
R_record-1
57 pages
Unit3-Data Science
No ratings yet
Unit3-Data Science
37 pages
Cost Practical
No ratings yet
Cost Practical
13 pages
Which Test When: 1 Exploratory Tests
No ratings yet
Which Test When: 1 Exploratory Tests
5 pages
Lab file AD pdf
No ratings yet
Lab file AD pdf
25 pages
R Intro 2011
No ratings yet
R Intro 2011
115 pages
Workshop Activity: X Seq y Length
No ratings yet
Workshop Activity: X Seq y Length
3 pages
r Cheat Sheet
No ratings yet
r Cheat Sheet
9 pages
Statistical Computing by Using R
100% (1)
Statistical Computing by Using R
11 pages
UL2
No ratings yet
UL2
2 pages
R Studio Notes
No ratings yet
R Studio Notes
10 pages
Final Cost Practical
No ratings yet
Final Cost Practical
29 pages
AMDA Practical - A048
No ratings yet
AMDA Practical - A048
35 pages
Rstudio Study Notes For PA 20181126
No ratings yet
Rstudio Study Notes For PA 20181126
6 pages
Module - 4 (R Training) - Basic Stats & Modeling
No ratings yet
Module - 4 (R Training) - Basic Stats & Modeling
15 pages
Session 6-15 - Unit II & III: Probability and Distribution, Classical Tests
No ratings yet
Session 6-15 - Unit II & III: Probability and Distribution, Classical Tests
34 pages
R-Unit 5
No ratings yet
R-Unit 5
76 pages
BM-1, Applied Statistics, Lesson 2: Comparing Two Groups (And One Group)
No ratings yet
BM-1, Applied Statistics, Lesson 2: Comparing Two Groups (And One Group)
39 pages
R Commands
No ratings yet
R Commands
18 pages
Midterm Codes
No ratings yet
Midterm Codes
8 pages
R Code
No ratings yet
R Code
13 pages
Rstudio_cours
No ratings yet
Rstudio_cours
11 pages
All Values in The First Column
No ratings yet
All Values in The First Column
7 pages
Advanced Statistical Methods Using R
No ratings yet
Advanced Statistical Methods Using R
32 pages
R Course
No ratings yet
R Course
7 pages
HLST 2301 Notes Print Me
No ratings yet
HLST 2301 Notes Print Me
29 pages
R Commands
No ratings yet
R Commands
5 pages
Introduction Qr1
No ratings yet
Introduction Qr1
34 pages
Introduction Qr
No ratings yet
Introduction Qr
34 pages
STATISTICS
No ratings yet
STATISTICS
6 pages
BDA MSC It
No ratings yet
BDA MSC It
35 pages
Analysis Using Statistical: Introduction & Data Exploration
No ratings yet
Analysis Using Statistical: Introduction & Data Exploration
23 pages
Copy Entire Document Content in R Studio: R Script Compiled by Mr. Anup Sharma (Strictly To Be Used As Class Notes)
No ratings yet
Copy Entire Document Content in R Studio: R Script Compiled by Mr. Anup Sharma (Strictly To Be Used As Class Notes)
15 pages
data analysis in r
No ratings yet
data analysis in r
10 pages
R Programming-1
No ratings yet
R Programming-1
6 pages
R Program Record Book Iba
No ratings yet
R Program Record Book Iba
24 pages
r-cheatsheet-ABC (1)
No ratings yet
r-cheatsheet-ABC (1)
3 pages
Data Analytics
No ratings yet
Data Analytics
31 pages
BES - R Lab
No ratings yet
BES - R Lab
5 pages
Applied Statistics For Bioinformatics PDF
No ratings yet
Applied Statistics For Bioinformatics PDF
278 pages
Lab Manual Record: St. Josephs PG College
No ratings yet
Lab Manual Record: St. Josephs PG College
14 pages
A Short List of The Most Useful R Commands
No ratings yet
A Short List of The Most Useful R Commands
11 pages
R 2nd IA
No ratings yet
R 2nd IA
7 pages
Mock Exam - Appendix
No ratings yet
Mock Exam - Appendix
15 pages
r-cheatsheet-ABCD (1)
No ratings yet
r-cheatsheet-ABCD (1)
3 pages
model_lab[1]
No ratings yet
model_lab[1]
6 pages
Modelling in R
No ratings yet
Modelling in R
47 pages
R Lab File Deepak
No ratings yet
R Lab File Deepak
27 pages
Useful R Functions-1
No ratings yet
Useful R Functions-1
4 pages
R
No ratings yet
R
6 pages
Analysing Data Using Linear Models 5th Ed January 2021
No ratings yet
Analysing Data Using Linear Models 5th Ed January 2021
388 pages
Econometrics 2019 PDF
No ratings yet
Econometrics 2019 PDF
143 pages
CourseKata r Cheatsheet ABC (1)
No ratings yet
CourseKata r Cheatsheet ABC (1)
5 pages
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Subjective Questions
No ratings yet
Subjective Questions
8 pages
Numeric Modelling With The Wolf Pass Project: For Leapfrog Geo Version 5.1
100% (1)
Numeric Modelling With The Wolf Pass Project: For Leapfrog Geo Version 5.1
80 pages
Econ 138: Financial and Behavioral Economics: Brownian Motion in The Stock Market January 25, 2017
No ratings yet
Econ 138: Financial and Behavioral Economics: Brownian Motion in The Stock Market January 25, 2017
26 pages
Basic Engineering Data Collection and Analysis
100% (2)
Basic Engineering Data Collection and Analysis
851 pages
NumXL - Getting Started
No ratings yet
NumXL - Getting Started
34 pages
Laptop Price Predicton Report
No ratings yet
Laptop Price Predicton Report
30 pages
04 Mahalanobis Distance in R
No ratings yet
04 Mahalanobis Distance in R
12 pages
Quantitative Approaches For Second Language Education Research
No ratings yet
Quantitative Approaches For Second Language Education Research
129 pages
Spss
No ratings yet
Spss
23 pages
Normal Distribution For ML
No ratings yet
Normal Distribution For ML
17 pages
MAT104 PBL (14bme1041) & (14bme1027)
No ratings yet
MAT104 PBL (14bme1041) & (14bme1027)
9 pages
MBA 8040 MODEL BUILDING With Data Transformations PDF
No ratings yet
MBA 8040 MODEL BUILDING With Data Transformations PDF
17 pages
Mastering Data Visualization Techniques
No ratings yet
Mastering Data Visualization Techniques
159 pages
Flood Frequency Analysis (Engineering Hydrology)
100% (1)
Flood Frequency Analysis (Engineering Hydrology)
42 pages
Andy Field - Exploring Data
No ratings yet
Andy Field - Exploring Data
21 pages
Estimation of Plotting Position For Flood Frequency Analysis
No ratings yet
Estimation of Plotting Position For Flood Frequency Analysis
10 pages
Questions Stats and Trix
No ratings yet
Questions Stats and Trix
39 pages
Flood Frequency Analysis: Reading: Applied Hydrology Sec 12.1 - 12.6
No ratings yet
Flood Frequency Analysis: Reading: Applied Hydrology Sec 12.1 - 12.6
35 pages
Assignment-Based Subjective Questions/Answers
No ratings yet
Assignment-Based Subjective Questions/Answers
3 pages
The Latticeextra Package: R Topics Documented
No ratings yet
The Latticeextra Package: R Topics Documented
24 pages
(Ebook) The Sharpe Ratio: Statistics and Applications by Steven E. Pav ISBN 9781032019307, 1032019301 - The special ebook edition is available for download now
100% (1)
(Ebook) The Sharpe Ratio: Statistics and Applications by Steven E. Pav ISBN 9781032019307, 1032019301 - The special ebook edition is available for download now
80 pages
Input Modelling: Discrete-Event System Simulation
No ratings yet
Input Modelling: Discrete-Event System Simulation
41 pages
Probability and Statistical Inference 9ed (2015) Answer
No ratings yet
Probability and Statistical Inference 9ed (2015) Answer
12 pages
Package Openair': December 7, 2020
No ratings yet
Package Openair': December 7, 2020
165 pages
ipl
No ratings yet
ipl
19 pages
Section 6 Slides PDF
No ratings yet
Section 6 Slides PDF
362 pages
Ncaa ML Competition
No ratings yet
Ncaa ML Competition
24 pages
04 Mahalanobis Distance in R MV PDF
No ratings yet
04 Mahalanobis Distance in R MV PDF
9 pages