Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Loading Datasets From Excel/CSV: A) Local R Database Dataset

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

Loading datasets from Excel/CSV

a)Local R database
Dataset
Dataset name : iris

This famous (Fisher's or Anderson's) iris data set gives the measurements in centimeters of the
variables sepal length and width and petal length and width, respectively, for 50 flowers from
each of 3 species of iris. The species are Iris setosa, versicolor, and virginica.

R commands used:
data()
head()
R Script
> data(iris)
> head(iris)

Output:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa

b)xls,csv files
Dataset

Dataset:table1
Contains student details
R commands used:
read.csv()
install.packages("")
library()
read_xls()
R Script:
> data<-read.csv("D:/cloud/table1.csv")
> data

Output:

Roll.no Name Department


1 1951202 Athira MCA
2 1951203 Pradeepa MCA

> library(readxl)
> student<-read_xls("D:/cloud/R/students.xls")
> student

Output:

# A tibble: 30 x 14
ID `Last Name` `First Name` City State Gender `Student Status` Major Country Age SAT
`Average score ~
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
<dbl> <dbl>
1 1 DOE01 JANE01 Los ~ Cali~ Female Graduate Poli~ US 30 2263
67
2 2 DOE02 JANE02 Sedo~ Ariz~ Female Undergraduate Math US 19 2006
63
3 3 DOE01 JOE01 Elmi~ New ~ Male Graduate Math US 26 2221
78.1
4 4 DOE02 JOE02 Lack~ New ~ Male Graduate Econ US 33 1716
77.8
5 5 DOE03 JOE03 Defi~ Ohio Male Graduate Econ US 37 1701
65
6 6 DOE04 JOE04 Tel ~ Isra~ Male Graduate Econ Israel 25 1786
69
7 7 DOE05 JOE05 Cimax Nort~ Male Graduate Poli~ US 39 1577
95.9
8 8 DOE03 JANE03 Libe~ Kans~ Female Undergraduate Poli~ US 21 1842
87
# ... with 20 more rows, and 2 more variables: `Height (in)` <dbl>, `Newspaper readership
(times/wk)` <dbl>
Descriptive Statistics(Dataset characteristics using R commands)
Dataset:
Dataset :Flights2008
US Flight data 2008 is awesome decent size data to be explored for newbies. A lot of
great insights can be get from it. Like which months we have many or few flights, why
there is delays in flights arrival and departure.
Package used:
stringr
devtools
chron

R Commands used:
read.csv() str()

summary() dim()

names() is.na()

ncol() library()

str_pad() substring()

chron() head()

tail()

R Script:
> flights<-read.csv("D:/cloud/R/2008.csv")
> str(flights)

Output:

'data.frame': 7009728 obs. of 29 variables:


$ Year : int 2008 2008 2008 2008 2008 2008 2008 2008 2008 2008 ...
$ Month : int 1 1 1 1 1 1 1 1 1 1 ...
$ DayofMonth : int 3 3 3 3 3 3 3 3 3 3 ...
$ DayOfWeek : int 4 4 4 4 4 4 4 4 4 4 ...
$ DepTime : int 2003 754 628 926 1829 1940 1937 1039 617 1620 ...
$ CRSDepTime : int 1955 735 620 930 1755 1915 1830 1040 615 1620 ...
$ ArrTime : int 2211 1002 804 1054 1959 2121 2037 1132 652 1639 ...
$ CRSArrTime : int 2225 1000 750 1100 1925 2110 1940 1150 650 1655 ...
$ UniqueCarrier : Factor w/ 20 levels "9E","AA","AQ",..: 18 18 18 18 18 18 18 18 18 18 ...
$ FlightNum : int 335 3231 448 1746 3920 378 509 535 11 810 ...
$ TailNum : Factor w/ 5374 levels "","80009E","80019E",..: 3769 4129 1961 3059 2142
3852 4062 1961 3616 3324 ...
$ ActualElapsedTime: int 128 128 96 88 90 101 240 233 95 79 ...
$ CRSElapsedTime : int 150 145 90 90 90 115 250 250 95 95 ...
$ AirTime : int 116 113 76 78 77 87 230 219 70 70 ...
$ ArrDelay : int -14 2 14 -6 34 11 57 -18 2 -16 ...
$ DepDelay : int 8 19 8 -4 34 25 67 -1 2 0 ...
$ Origin : Factor w/ 303 levels "ABE","ABI","ABQ",..: 136 136 141 141 141 141 141
141 141 141 ...
$ Dest : Factor w/ 304 levels "ABE","ABI","ABQ",..: 287 287 49 49 49 151 157 157
177 177 ...
$ Distance : int 810 810 515 515 515 688 1591 1591 451 451 ...
$ TaxiIn : int 4 5 3 3 3 4 3 7 6 3 ...
$ TaxiOut : int 8 10 17 7 10 10 7 7 19 6 ...
$ Cancelled : int 0 0 0 0 0 0 0 0 0 0 ...
$ CancellationCode : Factor w/ 5 levels "","A","B","C",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Diverted : int 0 0 0 0 0 0 0 0 0 0 ...
$ CarrierDelay : int NA NA NA NA 2 NA 10 NA NA NA ...
$ WeatherDelay : int NA NA NA NA 0 NA 0 NA NA NA ...
$ NASDelay : int NA NA NA NA 0 NA 0 NA NA NA ...
$ SecurityDelay : int NA NA NA NA 0 NA 0 NA NA NA ...
$ LateAircraftDelay: int NA NA NA NA 32 NA 47 NA NA NA ...

You might also like