Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
54 views

Data Science Using R

The document summarizes the mtcars dataset in R. It displays the first 6 rows, provides summary statistics of the variables, and explores relationships between variables through plots and linear regression. Key details include there being 32 observations across 11 variables, with mpg ranging from 10.4 to 33.9 and summary plots exploring the distribution of mpg and its negative correlation with wt.

Uploaded by

PARIDHI DEVAL
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views

Data Science Using R

The document summarizes the mtcars dataset in R. It displays the first 6 rows, provides summary statistics of the variables, and explores relationships between variables through plots and linear regression. Key details include there being 32 observations across 11 variables, with mpg ranging from 10.4 to 33.9 and summary plots exploring the distribution of mpg and its negative correlation with wt.

Uploaded by

PARIDHI DEVAL
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

> data(mtcars)

> #view first six rows of mtcars dataset


> head(mtcars)
mpg cyl disp hp drat wt qsec vs am
gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1
4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1
4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1
4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0
3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0
3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0
3 1
> #summarize mtcars dataset
> summary(mtcars)
mpg cyl disp hp
drat
Min. :10.40 Min. :4.000 Min. : 71.1 Min. :
52.0 Min. :2.760
1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.:
96.5 1st Qu.:3.080
Median :19.20 Median :6.000 Median :196.3 Median
:123.0 Median :3.695
Mean :20.09 Mean :6.188 Mean :230.7 Mean
:146.7 Mean :3.597
3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd
Qu.:180.0 3rd Qu.:3.920
Max. :33.90 Max. :8.000 Max. :472.0 Max.
:335.0 Max. :4.930
wt qsec vs am
gear
Min. :1.513 Min. :14.50 Min. :0.0000 Min.
:0.0000 Min. :3.000
1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000 1st
Qu.:0.0000 1st Qu.:3.000
Median :3.325 Median :17.71 Median :0.0000 Median
:0.0000 Median :4.000
Mean :3.217 Mean :17.85 Mean :0.4375 Mean
:0.4062 Mean :3.688
3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000 3rd
Qu.:1.0000 3rd Qu.:4.000
Max. :5.424 Max. :22.90 Max. :1.0000 Max.
:1.0000 Max. :5.000
carb
Min. :1.000
1st Qu.:2.000
Median :2.000
Mean :2.812
3rd Qu.:4.000
Max. :8.000
> #display rows and columns
> dim(mtcars)
[1] 32 11
> #display column names
> names(mtcars)
[1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs"
"am" "gear" "carb"
> #create histogram of values for mpg
> hist(mtcars$mpg,
+ col='steelblue',
+ main='Histogram',
+ xlab='mpg',
+ ylab='Frequency')
> #create boxplot of values for mpg
> boxplot(mtcars$mpg,
+ main='Distribution of mpg values',
+ ylab='mpg',
+ col='steelblue',
+ border='black')
> #create scatterplot of mpg vs. wt
> plot(mtcars$mpg, mtcars$wt,
+ col='steelblue',
+ main='Scatterplot',
+ xlab='mpg',
+ ylab='wt',
+ pch=19)
> # Number of rows (observations)
> nrow(mtcars)
[1] 32
> # Number of columns (variables)
> ncol(mtcars)
[1] 11
> plot(mpg ~ wt, data = mtcars, col=2)
> fit <- lm(mpg ~ wt, data = mtcars)
> summary(fit)

Call:
lm(formula = mpg ~ wt, data = mtcars)

Residuals:
Min 1Q Median 3Q Max
-4.5432 -2.3647 -0.1252 1.4096 6.8727

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.2851 1.8776 19.858 < 2e-16 ***
wt -5.3445 0.5591 -9.559 1.29e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘
’ 1

Residual standard error: 3.046 on 30 degrees of freedom


Multiple R-squared: 0.7528, Adjusted R-squared: 0.7446
F-statistic: 91.38 on 1 and 30 DF, p-value: 1.294e-10

> abline(fit,col=3,lwd=2)
> mtext(lmlab, 3, line=-2)
Error in as.graphicsAnnot(text) : object 'lmlab' not found
> # Create a vector.
> x <- c(12,7,3,4.2,18,2,54,-21,8,-5)
> # Find Mean.
> result.mean <- mean(x)
> print(result.mean)
[1] 8.22
> # Create the function.
> getmode <- function(v) {
+ uniqv <- unique(v)
+ uniqv[which.max(tabulate(match(v, uniqv)))]
+ }
> # Create the vector with numbers.
> v <- c(2,1,2,3,1,2,3,4,1,5,5,3,2,3)
> # Calculate the mode using the user function.
> result <- getmode(v)
> print(result)
[1] 2
> # Create the vector with characters.
> charv <- c("o","it","the","it","it")
> # Calculate the mode using the user function.
> result <- getmode(charv)
> print(result)
[1] "it"
> x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
> y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)
> # Apply the lm() function.
> relation <- lm(y~x)
> print(relation)

Call:
lm(formula = y ~ x)

Coefficients:
(Intercept) x
-38.4551 0.6746
> # Apply the lm() function.
> relation <- lm(y~x)
> print(summary(relation))

Call:
lm(formula = y ~ x)

Residuals:
Min 1Q Median 3Q Max
-6.3002 -1.6629 0.0412 1.8944 3.9775

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -38.45509 8.04901 -4.778 0.00139 **
x 0.67461 0.05191 12.997 1.16e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘
’ 1

Residual standard error: 3.253 on 8 degrees of freedom


Multiple R-squared: 0.9548, Adjusted R-squared: 0.9491
F-statistic: 168.9 on 1 and 8 DF, p-value: 1.164e-06

> # Create a sequence of numbers from 32 to 44.


> print(seq(32,44))
[1] 32 33 34 35 36 37 38 39 40 41 42 43 44
> # Find mean of numbers from 25 to 82.
> print(mean(25:82))
[1] 53.5
> # Find sum of numbers frm 41 to 68.
> print(sum(41:68))
[1] 1526
> # Create a function to print squares of numbers in
sequence.
> new.function <- function(a) {
+ for(i in 1:a) {
+ b <- i^2
+ print(b)
+ }
+ }
> # Call the function new.function supplying 6 as an
argument.
> new.function(6)
[1] 1
[1] 4
[1] 9
[1] 16
[1] 25
[1] 36
> # Create a vector.
> apple <- c('red','green',"yellow")
> print(apple)
[1] "red" "green" "yellow"
> # Get the class of the vector.
> print(class(apple))
[1] "character"
> # Create a list.
> list1 <- list(c(2,5,3),21.3,sin)
> # Print the list.
> print(list1)
[[1]]
[1] 2 5 3

[[2]]
[1] 21.3

[[3]]
function (x) .Primitive("sin")

> # Create a matrix.


> M = matrix( c('a','a','b','c','b','a'), nrow = 2, ncol =
3, byrow = TRUE)
> print(M)
[,1] [,2] [,3]
[1,] "a" "a" "b"
[2,] "c" "b" "a"
> # Create an array.
> a <- array(c('green','yellow'),dim = c(3,3,2))
> print(a)
, , 1

[,1] [,2] [,3]


[1,] "green" "yellow" "green"
[2,] "yellow" "green" "yellow"
[3,] "green" "yellow" "green"

, , 2

[,1] [,2] [,3]


[1,] "yellow" "green" "yellow"
[2,] "green" "yellow" "green"
[3,] "yellow" "green" "yellow"

> # Create a vector.


> apple_colors <-
c('green','green','yellow','red','red','red','green')
> # Create a factor object.
> factor_apple <- factor(apple_colors)
> # Print the factor.
> print(factor_apple)
[1] green green yellow red red red green
Levels: green red yellow
> print(nlevels(factor_apple))
[1] 3
> # Create the data frame.
> BMI <- data.frame(
+ gender = c("Male", "Male","Female"),
+ height = c(152, 171.5, 165),
+ weight = c(81,93, 78),
+ Age = c(42,38,26)
+ )
> print(BMI)
gender height weight Age
1 Male 152.0 81 42
2 Male 171.5 93 38
3 Female 165.0 78 26
> # Assignment using equal operator.
> var.1 = c(0,1,2,3)
> # Assignment using leftward operator.
> var.2 <- c("learn","R")
> # Assignment using rightward operator.
> c(TRUE,1) -> var.3
> print(var.1)
[1] 0 1 2 3
> cat ("var.1 is ", var.1 ,"\n")
var.1 is 0 1 2 3
> cat ("var.2 is ", var.2 ,"\n")
var.2 is learn R
> cat ("var.3 is ", var.3 ,"\n")
var.3 is 1 1
> a <- "Hello"
> b <- 'How'
> c <- "are you? "
> print(paste(a,b,c))
[1] "Hello How are you? "
> print(paste(a,b,c, sep = "-"))
[1] "Hello-How-are you? "
> print(paste(a,b,c, sep = "", collapse = ""))
[1] "HelloHoware you? "
> # Total number of digits displayed. Last digit rounded
off.
> result <- format(23.123456789, digits = 9)
> print(result)
[1] "23.1234568"
> # Display numbers in scientific notation.
> result <- format(c(6, 13.14521), scientific = TRUE)
> print(result)
[1] "6.000000e+00" "1.314521e+01"
> # The minimum number of digits to the right of the
decimal point.
> result <- format(23.47, nsmall = 5)
> print(result)
[1] "23.47000"
> # Format treats everything as a string.
> result <- format(6)
> print(result)
[1] "6"
> # Numbers are padded with blank in the beginning for
width.
> result <- format(13.7, width = 6)
> print(result)
[1] " 13.7"
> # Left justify strings.
> result <- format("Hello", width = 8, justify = "l")
> print(result)
[1] "Hello "
> # Justfy string with center.
> result <- format("Hello", width = 8, justify = "c")
> print(result)
[1] " Hello "
>
> # Accessing vector elements using position.
> t <- c("Sun","Mon","Tue","Wed","Thurs","Fri","Sat")
> u <- t[c(2,3,6)]
> print(u)
[1] "Mon" "Tue" "Fri"
> # Accessing vector elements using logical indexing.
> v <- t[c(TRUE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE)]
> print(v)
[1] "Sun" "Fri"
> # Accessing vector elements using negative indexing.
> x <- t[c(-2,-5)]
> print(x)
[1] "Sun" "Tue" "Wed" "Fri" "Sat"
> # Accessing vector elements using 0/1 indexing.
> y <- t[c(0,0,0,0,0,0,1)]
> print(y)
[1] "Sun"
> # Create a list containing a vector, a matrix and a list.
> list_data <- list(c("Jan","Feb","Mar"),
matrix(c(3,9,5,1,-2,8), nrow = 2),
+ list("green",12.3))
> # Give names to the elements in the list.
> names(list_data) <- c("1st Quarter", "A_Matrix", "A Inner
list")
> # Access the first element of the list.
> print(list_data[1])
$`1st Quarter`
[1] "Jan" "Feb" "Mar"

> # Access the thrid element. As it is also a list, all its


elements will be printed.
> print(list_data[3])
$`A Inner list`
$`A Inner list`[[1]]
[1] "green"

$`A Inner list`[[2]]


[1] 12.3

> # Access the list element using the name of the element.
> print(list_data$A_Matrix)
[,1] [,2] [,3]
[1,] 3 5 -2
[2,] 9 1 8
> # Define the column and row names.
> rownames = c("row1", "row2", "row3", "row4")
> colnames = c("col1", "col2", "col3")
> # Create the matrix.
> P <- matrix(c(3:14), nrow = 4, byrow = TRUE, dimnames =
list(rownames, colnames))
> # Access the element at 3rd column and 1st row.
> print(P[1,3])
[1] 5
> # Access the element at 2nd column and 4th row.
> print(P[4,2])
[1] 13
> # Access only the 2nd row.
> print(P[2,])
col1 col2 col3
6 7 8
> # Access only the 3rd column.
> print(P[,3])
row1 row2 row3 row4
5 8 11 14
> # Create two vectors of different lengths.
> vector1 <- c(5,9,3)
> vector2 <- c(10,11,12,13,14,15)
> # Take these vectors as input to the array.
> array1 <- array(c(vector1,vector2),dim = c(3,3,2))
> # Create two vectors of different lengths.
> vector3 <- c(9,1,0)
> vector4 <- c(6,0,11,3,14,1,2,6,9)
> array2 <- array(c(vector1,vector2),dim = c(3,3,2))
> # create matrices from these arrays.
> matrix1 <- array1[,,2]
> matrix2 <- array2[,,2]
> # Add the matrices.
> result <- matrix1+matrix2
> print(result)
[,1] [,2] [,3]
[1,] 10 20 26
[2,] 18 22 28
[3,] 6 24 30
> # Create the vectors for data frame.
> height <- c(132,151,162,139,166,147,122)
> weight <- c(48,49,66,53,67,52,40)
> gender <-
c("male","male","female","female","male","female","male")
> # Create the data frame.
> input_data <- data.frame(height,weight,gender)
> print(input_data)
height weight gender
1 132 48 male
2 151 49 male
3 162 66 female
4 139 53 female
5 166 67 male
6 147 52 female
7 122 40 male
> # Test if the gender column is a factor.
> print(is.factor(input_data$gender))
[1] FALSE
> # Print the gender column so see the levels.
> print(input_data$gender)
[1] "male" "male" "female" "female" "male" "female"
"male"
> # Create the data frame.
> emp.data <- data.frame(
+ emp_id = c (1:5),
+ emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
+ salary = c(623.3,515.2,611.0,729.0,843.25),
+
+ start_date = as.Date(c("2012-01-01", "2013-09-23",
"2014-11-15", "2014-05-11",
+ "2015-03-27")),
+ stringsAsFactors = FALSE
+ )
> # Print the summary.
> print(summary(emp.data))
emp_id emp_name salary
start_date
Min. :1 Length:5 Min. :515.2 Min.
:2012-01-01
1st Qu.:2 Class :character 1st Qu.:611.0 1st
Qu.:2013-09-23
Median :3 Mode :character Median :623.3 Median
:2014-05-11
Mean :3 Mean :664.4 Mean
:2014-01-14
3rd Qu.:4 3rd Qu.:729.0 3rd
Qu.:2014-11-15
Max. :5 Max. :843.2 Max.
:2015-03-27
>
> # Create vector objects.
> city <- c("Tampa","Seattle","Hartford","Denver")
> state <- c("FL","WA","CT","CO")
> zipcode <- c(33602,98104,06161,80294)
> # Combine above three vectors into one data frame.
> addresses <- cbind(city,state,zipcode)
> # Print a header.
> cat("# # # # The First data frame\n")
# # # # The First data frame
> # Print the data frame.
> print(addresses)
city state zipcode
[1,] "Tampa" "FL" "33602"
[2,] "Seattle" "WA" "98104"
[3,] "Hartford" "CT" "6161"
[4,] "Denver" "CO" "80294"
> # Create another data frame with similar columns
> new.address <- data.frame(
+ city = c("Lowry","Charlotte"),
+ state = c("CO","FL"),
+ zipcode = c("80230","33949"),
+ stringsAsFactors = FALSE
+ )
> # Print a header.
> cat("# # # The Second data frame\n")
# # # The Second data frame
> # Print the data frame.
> print(new.address)
city state zipcode
1 Lowry CO 80230
2 Charlotte FL 33949
> # Combine rows form both the data frames.
> all.addresses <- rbind(addresses,new.address)
> # Print a header.
> cat("# # # The combined data frame\n")
# # # The combined data frame
> # Print the result.
> print(all.addresses)
city state zipcode
1 Tampa FL 33602
2 Seattle WA 98104
3 Hartford CT 6161
4 Denver CO 80294
5 Lowry CO 80230
6 Charlotte FL 33949

You might also like