0% found this document useful (0 votes)

108 views

My Learning From Data Science Classes

This document provides an introduction to using R for data analysis and manipulation. It demonstrates how to create objects, view and manipulate data, import and export data, explore datasets, partition data for modeling, and discusses different probability distributions like binomial, negative binomial, and hypergeometric that are useful for data analysis. Key functions and commands covered include c(), length(), is.na(), setwd(), seq(), rep(), class(), str(), summary(), head(), tail(), dim(), nrow(), ncol(), attach(), detach(), read.table(), read.csv(), fread(), loadworkbook(), ntile(), sample(), createDataPartition(), and probability distribution formulas.

Uploaded by

Sunny Dogra

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

108 views

My Learning From Data Science Classes

Uploaded by

Sunny Dogra

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 16

An Introduction to R

********************
********************
Price<-c(100,200) << Create a object Price
Price << View the object created
length(Price) << Gives the total number of elements in
an object

Price<-c(100,200,NULL) << NULL added

Price
Length(Price) << Length will be same as old, NULL didn't
increased length

Adding a missing value:

**********************
Price<-c(100,200,NA)
Price
Length(Price)

Checking for missing value:

*****************************
is.na(Price) << check for na in Price, result will be
False,False,True
#which element
which(is.na(Price)) << will give the index of elements in
Price which are NA
which(Price == 100) << gives index of element which is 100

Chracter object:
***************
Names<-c("John","Robert","NA","Catherine") << Create a character object
length(Names) << Gives the total number
elements in Names

To view the type of the object

class(Names) << Gives type of Oject, this will give
Character
class(Price) << this will give number

Setting the working directory:

*****************************
setwd("D:\\Data Manipulation with R\\") << Set the Working directory
getwd() << Get the working directory

Sequence:
*********
Sequence<-seq(1970,2000) << To assign Sequence object all the
numbers in between 1970 to 2000
Sequence_By<-seq(from=1,to=5,by=0.5) << Sequence of numbers from 1 to 5 with
interval of 0.5
Sequence_By << results will be :
1.0,1.5,2.0........4.5,5.0

Repeat:
******
rep(1,5) << Repeat 1 for 5 times, result will be 1,1,1,1,1
rep(1:5,2) << Repeat 1 to 5 for 2 times, result will be
1,2,3,4,5,1,2,3,4,5
rep(1:5,each=2) << Repeat 1 to 5 each for 2 times
#Both numric and character data
mixed<-c(1,2,3,"hi")
mixed << Result will be "1","2","3","Hi"
class(mixed) << This will give characterm as the whole object is converted
into characters

#Take values from the user

**************************
x<-scan()

Vectors:
*******
>Most simplest structure in R
>If data has only one dimesion, like a set of digits, then vectors can be used to
represent it.

Matrices:
********
>Used when data is a higher dimensional array.
>But contains only data of a single class Eg:only character or numeric.

Data Frames:
***********
>it is like a single table with rows and columns of data
>Contains columns or lists of different data

Lists:
*****
>Used when data cannont be reperesented by data frames.
>it contains all kinds of other obects, icluding other lists or data frames.

#Saving an object
save(Names,file="Names.rda")
#Saving the entire workspace
save.image("all_work.RData")
#Saving the entire workspace
save.image("all_work.RData")
#How do i load my object back?
load("Names.rda")

name(iris) or colnames(iris) >>> to get the list of variables in any dataset

dim(iris) >> to get the number of rows and colums

nrow(iris) >> to get the number of rows
ncol(iris) >> to get the number of columns

# To look at top few or bottom few rows of a dataframe

head(iris)
tail(iris)

#Looking at the structure of the datasets

str(iris)

#if we want to see more about the dataset

?iris
#checking the type of an object
class(iris)

#if we have to traverse through any dataset then we calls it using

iris.Sepal_Length like that one way to avoid this is using attach option:
attach(iris) >>> to attach iris with each element of a dataset so that we can
traverse through that dataset without prefixing dataset name before the element
name
detach(iris) >>> to detach the iris attach using above command.

#importing a CSV file

iris<-read.csv("Iris.csv",header=T,sep=',')

#importing a text file

iris<-read.table("Iris.txt",header=T,sep="\t")

#summary of a dataset
summary of a dataset gives us the summary of any dataset across each of its
elements
summary(iris)

#we can check if a variable is character, numeric or factor

# is.character() and is.factor()
is.character(iris$Species)
is.numeric(iris$Petal.Length)

# It is good to know about the following information while started working on any
dataset:
1. Presence of header line
2. Kind of value seperator.
3. Representation of missing values.
4. Notation of comment characters or quotes.
5. Existence of any unfilled or blank lines.
6. Classes of the variables.

# Importing and Exporting of Data in R

Working with plain text files

Data from plain text files:

1. Scan function:
Reads data directly from console.
Reads data from files.
Return list or vector R objects.

2. read.table function:
Reads a file in table format and creates a data frame from it.

Working with large text files:

******************************
fread function >>> its package name is data.table

Using read.csv function part of ffdf package.

read.table.ffdf fucntion
read.csv.sql
write.table
write.table.ffdf
System setup inclides Windows 8 64-bit version fitted with 4 GB RAM.
Package function Runtime Good with Big file or not
BASE R read.table ~6.37 sec Bad
data.table fread ~1sec
sqldf read.csv.sql ~11 sec Good
ff read.table.ffdf `8 sec Good
Base R optimized `4 sec Good
read.table

Working with XL files:

**********************
library(XLConnect)

#Loading Excel Workbook

wb<-loadworkbook("customers.xlsx") >> first load the workbook
newyork<-readworksheet(wb,"newyork",header=T) >> Then read worksheet

Data Exploration
****************
****************
Read a file and specify the type of NA in that file:
cr<-read.csv("Credit.csv",na.strings=c("",NA))

To avoid exponential forms in x and y axis:

*******************************************
options(scipen=999)
To get the column names:
***********************
names(cr)

To get the summary of each column in a dataframe:

************************************************
summary(<dataset name>)

To get the index of missing values and to remove it:

***************************************************
index<-which(is.na(cr$Good_Bad))
cr<-cr[-index,]

Look at individual summary:

**************************
summary(cr$Good_Bad) # to get the summary of particular column

Percentile breakup:
******************
quantile(cr$RevolvingUtilizationOfUnsecuredLines,p=c(1:100)/100)

#Discuss with client, 2 limit on the number, replace

cr%>%filter(RevolvingUtilizationOfUnsecuredLines<=2)%>%nrow()

cr%>%filter(RevolvingUtilizationOfUnsecuredLines<=2)->cr

To replace '0' with the missing value:

*************************************
#We find after discussions that '0' here means a missing value

cr$MonthlyIncome<-ifelse(cr$MonthlyIncome==0,NA,cr$MonthlyIncome)

#We uses mutate function to add new column in any dataframe

contribution%>
%mutate(Contribution=FY04Giving+FY03Giving+FY02Giving+FY01Giving+FY00Giving)-
>contribution

Group the data

**************
contribution%>%group_by(Gender)%>
%summarise(Count=n(),Percentage_Count=n()/1230,Total_Contribution=sum(Contributions
),Percentage_Contribution=Total_Contribution/1205454,Average=mean(Contributions)%>
%ungroup()%>%arrange(-Total_Contribution)%data.frame()%>%head(,10)

For Continous Variable we use ntile function to divide that continous variable into
deciles i.e to convert a continous variable into categorical variable
***********************************************************************************
**********************************************************************
cr%>%mutate(quantile=ntile(MonthlyIncome,10))%>%group_by(Good_Bad,quantile)%>
%summarize(N=n())%>%filter(Good_Bad=="Bad")->dat

Divide the data into Test and training samples randomly :

*******************************************************
##Partitioning data##
set.seed(100) >>> this will give the same results after doing
sample function
# so in this we are considering 1 to n rows out of which we are selecting 70% or
rows and replace parameter is set as false
# sample function gives you the index of the rows that are selected.
indexP<-sample(1:nrow(cr),0.70*nrow(cr),replace = F)
# train_cr datafare will have all the rows that were selected in the previous step.
train_cr<-cr[indexP,]
#test_cr dataframe will have all the remaining rows.
test_cr<-cr[-indexP,]

# to know the rows and columns of any dataframe we do:

dim(train_cr) >>> this will give us the number of rows and colums present in
this dataframe

One below function can also be used to get sample data/subset of the dataset:
***************************************************************************
library(caret)
indexPC<-createDataPartition(y=cr$Good_Bad,times = 1,p=0.70,list=F)
train_crC<-cr[indexPC,]
test_crC<-cr[-indexPC,]

In Excel there is a function to get Binomial Distribution:

*********************************************************
>BINOM.DIST

Cumilative probability True in the BINOM.DIST formula gives us the result for the
success <= x(i.e sum of the probabilities from success = 0 till success =X)
Cumilative probability False in the BINOM.DIST formula gives us the result for the
success = x only.

Hypergeometric Districution:
***************************
If in some case if the some selection has been made such the selection has not
replaced back to the population then our poppulation/probability will get changed
and now Binomial distribution is no more used. In this case we uses Hypergeometric
Distribution.
>Xcel Formula for Hypergeometric Distribution is as below:
hypgeom.dist()

Negative Binomial:
*****************
Used to find out the number of trials needed to get X successes.

Xcel Formula for this is =NEGBINOM.DIST()

Example of Negative Binomial :

What is the probability that the 30th purchase in my store will happen with the
100th customer, when the probability of purchase for any customer is 20%?

Geometric Distribution:
**********************
Used when we are interested in the probability of the first success in the rth
trial.

example : Supposing there is a defect rate of 2% with some mechanical component

being produced. What is the probability that a QC inspector will need to review at
most 20 items before finding a defect?

Same NEGBINOM.DIST formula is used for Geometric Distribution also but with
Cumulative = TRUE
Data Manipulation:
*****************
To Get the data used in the first row and third column we can use:
******************************************************************
oj[1,3]
oj[c(1,2,8,456),c(1,3,6)] << to check rows 1,2,8,456 corresponding to
the columns 1,3,6
#Selecting only those rows where brand bought is tropicana:
dat<-oj[oj$brand=='tropicana',]
We can perfrom the OR/AND operations also while selecting rows and columns:
dat1<oj[oj$brand=='tropicana'|oj$brand=='dominicks',]
head(dat1)
Difference between Logical verctors Vs. which statement
#consider vector sales with missing values
sales<-c(100,200,NA,300,400,NA,500,600,700,NA,1000,1500,NA,NA)
#subset data using logical operator
sales[sales>600]
[1] NA NA 700 NA 1000 1500 NA NA <<< as you can see NA is also included in the
results of above logical querry.
#subset data using which
>sales[which(sales>600)]
[1] 700 1000 1500
#Selecting Columns:
dat4<-oj[,c("week","brand")]
head(dat4)
#Adding new columns:
*******************
oj$logInc<-log(oj$INCOME) << this new column will have the value of Log of income

order() retrun the element order that results in a sorted vector:

>students<-c("John","Tim","Alice","Zeus")
>students
[1] "John" "Tim" "Alice" "Zeus"
>order(students)
[1] 3 1 2 4
>students[orders(students)]
[1] "Alice" "John" "Tim" "Zeus"

Ordering of numbers:
*******************
numbers<-c(10,100,5,8)
order(numbers) >>> retruns the indices of the numbers ordered in acccending
order
order(-numbers) >>> returns the indices of the numbers ordered in the
decending order.

GroupWise operations:
********************
aggregate(oj$price,by=list(oj$brand),mean) <<< on the basis of Price group the
data by Brand using mean operation

We can use the tapply function to perform the same task:

*******************************************************
tapply(oj$price,oj$brand,mean)

#Cross tabulations
#Units of different brands sold based on if feature advertisement was run or not
table(oj$brand,oj$feat)
xtabs can also be used for the same operation:
xtabs(oj$INCOME~oj$brand+oj$feat) <<< mean of the incomes based upon the various
brands and whether feat advertisement was done or not

dplyr
*****
1. Works only with Data frames.

To get the rows correponding to the brand names tropicana.

dat8<-filter(oj,brand=="tropicana")
To get the rows corresponding to the brand names tropicana and domonicks
dat9<-filter(oj,brand=="tropicana"|brand=="dominicks")

#Selecting columns
Suppose we have to select Columns brand, Income and feat, we can do that using
following command:
dat10<-select(oj,brand,INCOME,feat)

#we can drop the columns using the -sign before the columns name:
dat1<-select(oj,-brand,-INCOME,-feat)

#Creating a new column

dat12<-mutate(oj,logname=log(INCOME))

#Arranging data
dat13<-arrange(oj,INCOME) << Arrange the OJ dataset based on accending order of
Income

#Decending arranging of data

dat14<-arrange(oj,des(INCOME))
or
dat14<-arrange(oj,-INCOME)

#Summarizing data
#group wise summaries
*********************
gr_brand<group_by(oj,brand)
summarize(gr_brand,mean(INCOME),sd(INCOME))
#Find the mean price for all the people whose income is >=10.5.

#Base R code
mean(oj[oj$income>=10.5,"price"])
#dplyr code
summarize(filter(oj,INCOME>=10.5),mean(price))

Pipe operator:
*************
oj%>%filter(INCOME>=10.5)%>%summarize(mean(price))

Subset the data based on price>=2.5, create a column logIncome, compute the
mean,standard deviation and median of column logIncome
oj%>%filter(price>=2.5)%>%mutate(logIncome=log(INCOME))%>
%summarize(mean(logIncome),sd(logIncome),median(logIncome))

To Convert character string into date:

*************************************
fd$FlightDate<-as.Date(fd$FlightDate,"%d-%b-%y") <<< this command will convert
the date in DD-MMM-YY string format to Date format
[1] 01-10-2018

25/Aug/04: "%d/%b/%y"
25-August-2004: %d-%B-%Y

month function will get you the month of the date mentioned in the date format:
months(fd$FlightDate) >>>>>>>> will give you the month of the date in date
unique(months(fd$FlightDate)) >>> to get the unique months present in the
particular date column

#Finding time Interval

fd$FlightDate[60]-fd$FlightDate[900]

#difftime function we can use to get the time interval based upon weeks, days and
hours
difftime(fd$FlightDate[3000])

difftime(fd$FlightDate[3000],fd$FlightDate[90],units="weeks")
difftime(fd$FlightDate[3000],fd$FlightDate[90],units="days")
difftime(fd$FlightDate[3000],fd$FlightDate[90],units="hours")

Sub-setting data: All rows when day is Sunday

*********************************************
nrow() function will get you the number of rows in a particular dataset.
library(dplyr)
fd_s<-fd%>%filter(weekdays(FlightDate)=="Sunday") >>> All rows when day is
Sunday

fd_s1<-fd%>%filter(weekdays(FlightDate)=="Sunday" & city=="Atlanta")%>%nrow() >>>

Find the number of flights on Sundays for destination Atlanta

#Whenever data has time information along with date, R uses POSIXct and POSIXit
classes to deal with dates
date1<-Sys.time()
date1
[1] "2015-03-02 17:35:47 IST"
class(date1)
[1] "POSIXct" "POSIXt"
for using weekdays() and month() functions that date/paramter passed need to be in
POSIXCT POSIXt format
weekdays(date1)
[1] "Monday"
month(date1)
[1] "March"

Lubridate() is a package that is a wrapper for POSIXct class

fd$FlightDate<-ymd(fd$FlightDate)
[1] "2014-01-01","2014-01-01".....

Function Date
dmy() 26/11/2008
ymd() 2008/11/26
mdy() 11/26/2008
dmy_hm() 26/11/2008 20:15
dmy_hms() 26/11/2008 20:15:30

Joining Dataframes:
******************
Inner join: Joining two tables based on a key column,such that rows matching in
both tables are selected.

Customer ID Product CutomerID State

1 1 Toaster 1 2 Alabama
2 2 Toaster 2 4 Alabama
3 3 Toaster 3 6 Ohio
4 4 Radio
5 5 Radio
6 6 Radio

>merge(x=df1,y=df2,by="CustomerId") #Inner Join/Intersection of both tables

CustonerId Product State
1 2 Toaster Alabama
2 4 Radio Alabama
3 6 Radio Ohio

#Full outer join : Two tables are joined irrespective of any match between the
rows:

Customer ID Product CustomerId State

1 1 Toaster 1 2 Alabama
2 2 Toaster 2 4 Alabama
3 3 Toaster 3 6 Ohio
4 4 Radio
5 5 Radio
6 6 Radio

>merge(x = df1,y = df2, by = "CustomerId",all = TRUE) #Outer Join:

CustomerID Product State
1 1 Toaster <NA>
2 2 Toaster Alabama
3 3 Toaster <NA>
4 4 Radio Alabama
5 5 Radio <NA>
6 6 Radio Ohio

#Left Outer Join: All the rows of left table are retained while matching rows of
right table are displayed.
Customer ID Product CustomerID State
1 1 Toaster 1 2 Alabama
2 2 Toaster 2 4 Alabama
3 3 Toaster 3 6 Ohio
4 4 Radio
5 5 Radio
6 6 Radio

>merge(x = df1, y = df2, by = "CustomerID",all.x=TRUE) # Left Join

CustomerID Product State
1 1 Toaster <NA>
2 2 Toaster Alabama
3 3 Toaster <NA>
4 4 Radio Alabama
5 5 Radio <NA>
6 6 Radio Ohio

Right Outer Join : All the rows of right table are retained while matching rows of
left table are displayed
>merge(x = df1, y = df2, by = "CustomerID",all.y=TRUE) # Right Join

Finding Missing values:

**********************
is.na() to find out the total numbers of missing values.
>a<-c(1,2,NA,9)
>is.na(a)
[1] FALSE FALSE TRUE FALSE
>sum(is.na(a)) >> to find out the total number of missing
values
[1] 3
Another option to get the sum of missing values is :
summary() command

#Imputing Missing values

air$Ozone[is.na(air$Ozone)]<-45

#Imputing the mean of the column in the missingvalues:

air$Solar.R[is.na(air$Solar.R)]<-mean(air$Solar.R,na.rm=TRUE) >> in this we have
removed the NA from the values before calculating the mean
summary(air)

RESHAPE function:
****************
It helps in converting data from Wide to Long format and Long to wide format.

Data in Wide format:

Persons Age Weight
Sankar 26 70
Aiyar 30 50
Singh 23 40

Data in Long Format:

Persons Variable Value

******* ******** ******
Sankar Age 26
Sankar Weight 70
Aiyar Age 24
Aiyar Weight 60
Singh Age 25
Singh Weight 65

We can convert the data from one format to another using the functions : Melt and
Cast present in the library reshape2

library(reshape2)
person<-c("Sankar","Aiyar","Singh")
age<-c(26,24,25)
weight<-c(70,60,65)
wide<-data.frame(person,age,weight)
the result of the above command will be :
>wide
person age weight
1 Sankar 26 70
2 Aiyar 24 60
3 Singh 25 65

melted<-melt(wide,id.vars="person",value.name="Demo_value")
Person Variable Demo_value
1 Sankar age 26
2 Aiyar age 34
3 Singh age 25
4 Sankar weight 70
5 Aiyar weight 60
6 Singh weight 65
>dcast(melted,person~variable,value.var = "Demo_Value")
Person age weight
1 Aiyar 24 60
2 Sankar 26 70
3 Singh 25 65

Working with Stings:

*******************
a<-"Batman"
substr(a,start=2,stop=6)
[1] atman

nchar(a) >>> Number of characters in a string

tolower(a) >>>> to convert a string into lowercase
toupper(a) >>> to convert a string into uppercase

strsplit(b,split="-") >>>[1] "Bat" "man"

c<-"Bat/Man"
strsplit(c,split="/") >>> string split by specifying "/" as the spliting
variable

paste(b,split=c) >>> it concatenate two strings one after the other

#sometimes we want to know where some patterns occurs in a string, so we uses grep
command:
c(b,c)
grep("-",c(b,c)) >>> this will tell you the count "-" in the string

So times we want to find whether some pattern exist in the string or not?
>c(b,c)
"Bat-Man" "Bat/Man"
>grepl("/",c(c,b))
FALSE TRUE

#sometimes we want to substitute one pattern with that of other

for eg:
>b
"Bat-Man"
>sub("-","/")
"Bat/Man"

#Using SQL queries inside R

***************************
#Selecting columns from a dataframe :
>oj_s<-sqldf("select brand, income, feat from oj ")
#subseting using where statement:
oj_s<-sqldf("select brand, income, feat from oj where price<3.8 and income<10")

#order by statement
oj_s<-sqldf("Select store,brand,week,logmove,feat,price,income from oj order by
income asc")

#Base Plotting

#Using plot() to study to continous variables

ir<-iris

#To understand the relationship between the two continous variables we uses
vibriant plot
plot(x=ir$Petal.Width,y=ir$Petal.Length) >> whatever we want to put in x axis we
assign it to x
whatever we want to put in y axis we
assign it to y
#Adding xlabels, ylables and title

plot(x=ir$Petal.Width,y=ir$Petal.Length,main=c("Petal Width Vs Petal

Length"),xlab=c("Petal width"),ylab=c("Petal Length"))

#Addiing Colors
plot(x=ir$Petal.Width,y=ir$Petal.Length,main=c("Petal Width Vs Petal
Length"),xlab=c("Petal Width"),ylab=c("Petal Length"),col="red")

#Adding different plotting symbol

plot(x=ir$Petal.Width,y=ir$Petal.Length,main=c("Petal Width Vs Petal
Length"),xlab=c("Petal Width"),ylab=c("Petal Length"),col="red",pch=2)

#seeing relationship across different species

plot(x=ir$Petal.Width,y=ir$Petal.Length,main=c("Petal Width Vs Petal

Length"),xlab=c("Petal Width"),ylab=c("Petal Length"),col=ir$Species)

#Adding a legend
plot(x=ir$Petal.Width,y=ir$Petal.Length,main=c("Petal Width Vs Petal
Length"),xlab=c("Petal Width"),ylab=c("Petal Length"),pch=as.numeric(ir$Species))

legend(0.2,7,c("SEtosa","Versicolor","Verginica"),pch=1:3)

Studying Univariant data:

#Box Plot:

boxplot(ir#Petal.Length)

. ------------ OUTLIER More than 3/2 times of upper quartile

. .

___------------- MAXIMUM Greatest value, excluding outliers

|
|
|
__________------------- UPPER QUARTILE 25% of data greater than this value
| |
| |
|________|------------- MEDIAN 50% of data is greater than this value;
middle of dataset
| |
| |
|________|------------- LOWER QUARTILE 25% of data less than this value
|
|
|
___---------------- MINIMUM Least value, excluding outliers

. ---------------- OUTLIER less than 3/2 times of lower quartile

Histograms:
**********

hist(ir$Sepal.Width,col="orange")

Label = True, parameter is added to get the count accross various bins.

Plotting more than one plot in single window:

********************************************
par()
mfrow()

par(mfrow=c(1,2)) >>>>>>>> one row and two columns

plot(x=ir$Speciesy=ir$Sepal.Width,xlab="Species",main="Sepal Width across
species",col="red")

GGPLOT2:
*******

Based on grammar of grapphics: Simple syntax, interaces with ggmap and other
packages.

Grammar of Graphics:
*******************
A plot composed of : Aesthetic Mapping, Geoms, Statistical Transformation,
Coordinate Systems and Scales.

Aesthetic Mapping : What component of data appears on X axis, Y axis, how is the
color, size, fill and position of elements is related with the data.

Geom(Geometrical Objects) : What geometrical objects appear on the plot: points,

lines, polygons, area, boxplot, rectangle,tile etc

Statistical Transformations : Compute desity, counts(Histogram: Need to bin and

count data)

Scales and coordinate Systems:

Discreet scales or Continous, Cartesian or Spherical.

p<-ggplot(ch,aes(x=temp,y=dewpoint,colour=season))

Downloading google maps:

***********************
map<-get_map("bangalore",maptype="hybrid")

ggmap(map)+geom_point(data=sh,aes(x=long,y=lat),colour="red") >>>geom_point is the

function to put points on the map.

Process of visualisation geospatical:

*******
Downloading maps ggmap()

get long-lat data Text file Geospatial file:rgdal()

Overlay data on the map:ggplot2

Downloding maps : ggmap()

Downloading maps using ggmap

****************************
map<-get_map("bangalore",maptype="hybrid")

ggmap(map)+geom_point(data=sh,aes(x=log,y=lat),colour="red") >>>we use geom point

to put our dataset location points on the map.

Extrating long-lat data from shape files using rgdal() package

Many times the data and loactional infomation is not the same file.
Most geospatial data is stored in shape files
Shapefile = Data + Location data

How to extract log lat data from the Spatial points data frame:
***************************************************************
shape2<-readOGR(dsn="Subway","DoITT_SUBWAY_ENTRACE_01_13SEPT2010") >>> Subay is
the folder name and "DOIT ... is the file name"

To convert coordinates into long lat form:

shape2<-spTransform(shape2,CRS("+init=epsg:4326"))
fortify() is used to extract the location data.

How to use ggplot:

*****************
library(ggplot2)
library(dplyr)
#Technology
data%>%filter(Industry=="Technology")->data1
p<-
ggplot(data1,aes(x=Company.Advertising,y=Brand.Revenue,size=Brand.Value,color=Brand
))
q<-p+geom_point()
q+xlab("Company Advertising in Billions of $")+ylab("Brand Revenue in Billions of
$")+scale_size(range = c(2,4),breaks=c(30,60,100),name="Brand Value $ (Billions)")
+geom_text(aes(label=Brand),hjust=0.5,vjust=1)+guides(color=FALSE)+theme_light()
+theme(legend.key=element_rect(fill = "light blue", color = "black"))

data%>%filter(Industry=="Luxury")->data2
p<-
ggplot(data2,aes(x=Company.Advertising,y=Brand.Revenue,size=Brand.Value,color=Brand
))
q<-p+geom_point()
q+xlab("Company Advertising in Billions of $")+ylab("Brand Revenue in Billions of
$")+scale_size(range = c(2,4),breaks=c(10,28.1),name="Brand Value $ (Billions)")
+geom_text(aes(label=Brand),hjust=0.7,vjust=1.7)+guides(color=FALSE)+theme_light()
+theme(legend.key=element_rect(fill = "light blue", color = "black"))
+scale_x_continuous(breaks=seq(0,6,0.1))

2 - RX Guide - Q&A
No ratings yet
2 - RX Guide - Q&A
40 pages
Lesson Plan Audio Video Conferencing
100% (8)
Lesson Plan Audio Video Conferencing
3 pages
Agile Templates (User Stories, Burndown Chart and Velocity Chart)
0% (1)
Agile Templates (User Stories, Burndown Chart and Velocity Chart)
12 pages
Advanced Logic Synthesis: Multiple Choice Questions
No ratings yet
Advanced Logic Synthesis: Multiple Choice Questions
16 pages
R Commands
No ratings yet
R Commands
18 pages
Daur Unit 2
No ratings yet
Daur Unit 2
28 pages
Source Code 1
No ratings yet
Source Code 1
40 pages
R_intro2021
No ratings yet
R_intro2021
23 pages
R Programming For NGS Data Analysis
No ratings yet
R Programming For NGS Data Analysis
5 pages
R/Rpad Reference Card: Slicing and Extracting Data
No ratings yet
R/Rpad Reference Card: Slicing and Extracting Data
5 pages
R Reference Card
100% (4)
R Reference Card
4 pages
Getting Started With R
No ratings yet
Getting Started With R
155 pages
BigData_BCom-Unit-4
No ratings yet
BigData_BCom-Unit-4
9 pages
R Tutorial #1: Applied Econometrics (Econ3005)
No ratings yet
R Tutorial #1: Applied Econometrics (Econ3005)
21 pages
Week 7
No ratings yet
Week 7
10 pages
Programming With R: Lecture #4
No ratings yet
Programming With R: Lecture #4
34 pages
Apunts BLOC 1 Estadística
No ratings yet
Apunts BLOC 1 Estadística
15 pages
R-Programming Record - Odd Sem 21-22
No ratings yet
R-Programming Record - Odd Sem 21-22
35 pages
R Exercise 1 - Introduction To R For Non-Programmers
No ratings yet
R Exercise 1 - Introduction To R For Non-Programmers
9 pages
R BasicCommands
No ratings yet
R BasicCommands
5 pages
Basic R Commands For Data Analysis
No ratings yet
Basic R Commands For Data Analysis
7 pages
EM622 Data Analysis and Visualization Techniques For Decision-Making
No ratings yet
EM622 Data Analysis and Visualization Techniques For Decision-Making
47 pages
Simple Tutorial in R
No ratings yet
Simple Tutorial in R
15 pages
R-Programming: To See The Working Directory in R Studio
No ratings yet
R-Programming: To See The Working Directory in R Studio
17 pages
DS Lab
No ratings yet
DS Lab
31 pages
R-Cheat Sheet
100% (1)
R-Cheat Sheet
4 pages
R Command Cheatsheet2551545
No ratings yet
R Command Cheatsheet2551545
2 pages
Introduction To R PDF
No ratings yet
Introduction To R PDF
56 pages
R Basic and Advanced
No ratings yet
R Basic and Advanced
9 pages
Lab 1
No ratings yet
Lab 1
26 pages
Broomspatial
No ratings yet
Broomspatial
31 pages
R
No ratings yet
R
38 pages
Handout 02
No ratings yet
Handout 02
12 pages
R Cheatsheet Base R
No ratings yet
R Cheatsheet Base R
2 pages
R Docs
No ratings yet
R Docs
45 pages
R Functions
No ratings yet
R Functions
2 pages
An R Tutorial Starting Out
No ratings yet
An R Tutorial Starting Out
9 pages
Rtips. Revival 2012!: Paul E. Johnson June 8, 2012
No ratings yet
Rtips. Revival 2012!: Paul E. Johnson June 8, 2012
72 pages
Progammin in R 2. Workspace and Files
No ratings yet
Progammin in R 2. Workspace and Files
3 pages
R Functions
No ratings yet
R Functions
8 pages
R Cheat Sheet (Updated)
No ratings yet
R Cheat Sheet (Updated)
13 pages
DR - Pierpaolo-Delser - Introduction R
No ratings yet
DR - Pierpaolo-Delser - Introduction R
83 pages
DA_Lab_Week-1
No ratings yet
DA_Lab_Week-1
7 pages
R study material I
No ratings yet
R study material I
8 pages
All Codes
No ratings yet
All Codes
10 pages
R For Machine Learning Lab Practical Work: Master of Business Administration in Business Analytics
0% (1)
R For Machine Learning Lab Practical Work: Master of Business Administration in Business Analytics
9 pages
Big Data - Lab 3
No ratings yet
Big Data - Lab 3
25 pages
R20 - R Program - P
No ratings yet
R20 - R Program - P
29 pages
Data Structures
No ratings yet
Data Structures
8 pages
R Examples
No ratings yet
R Examples
56 pages
A Brief Guide To R For Beginners in Econometrics: Department of Economics, Stockholm University
No ratings yet
A Brief Guide To R For Beginners in Econometrics: Department of Economics, Stockholm University
33 pages
4 Overview of R Part 2
No ratings yet
4 Overview of R Part 2
63 pages
Data Analysis Using R and Vectors
No ratings yet
Data Analysis Using R and Vectors
35 pages
Matrix, Dataframes, List
No ratings yet
Matrix, Dataframes, List
8 pages
R Programming: © 2016 SMART Training Resources Pvt. LTD
No ratings yet
R Programming: © 2016 SMART Training Resources Pvt. LTD
28 pages
Introduction To R
No ratings yet
Introduction To R
52 pages
Kiran R1
No ratings yet
Kiran R1
12 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Basic DBA Query v.1: Oracle Database
From Everand
Basic DBA Query v.1: Oracle Database
Oraclesql-plsql
5/5 (1)
Simplifying Data Science With Python
From Everand
Simplifying Data Science With Python
Billy David millican
No ratings yet
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
From Everand
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
Tenko
No ratings yet
Capitalstructureppt 130719100546 Phpapp02
No ratings yet
Capitalstructureppt 130719100546 Phpapp02
13 pages
Mba Btech
No ratings yet
Mba Btech
2 pages
Operatio Ns Resea RCH
No ratings yet
Operatio Ns Resea RCH
21 pages
Practical 2
No ratings yet
Practical 2
7 pages
Topic: Remote Sensing Platforms: Dr. Md. Nazim
No ratings yet
Topic: Remote Sensing Platforms: Dr. Md. Nazim
17 pages
Fire Pump Wire Insulation Test Sheet
No ratings yet
Fire Pump Wire Insulation Test Sheet
10 pages
Enforcing Arbitration Awards in California
No ratings yet
Enforcing Arbitration Awards in California
6 pages
Progressinair Jetspinning
No ratings yet
Progressinair Jetspinning
40 pages
How To Read P&Ids: Dave Harrold, Senior Editor Control Engineering August 1, 2000
No ratings yet
How To Read P&Ids: Dave Harrold, Senior Editor Control Engineering August 1, 2000
6 pages
Indoor (Online) Gedung Direktorat Lantai 4: Stage Uk. 3m X 8m, T:20cm New Karpet LED Indoor 3m X 8m + Level. 1m
No ratings yet
Indoor (Online) Gedung Direktorat Lantai 4: Stage Uk. 3m X 8m, T:20cm New Karpet LED Indoor 3m X 8m + Level. 1m
3 pages
AMO-Registration Form (Individual)
No ratings yet
AMO-Registration Form (Individual)
1 page
zastosowanie_metodologii_ue_do_zdefiniowania_obszarow_rynku_pracy_w_polsce
No ratings yet
zastosowanie_metodologii_ue_do_zdefiniowania_obszarow_rynku_pracy_w_polsce
196 pages
Common Types of Corrosion - Marine Protection Systems
No ratings yet
Common Types of Corrosion - Marine Protection Systems
9 pages
RealControl-2 7
No ratings yet
RealControl-2 7
1 page
Cmos Digital Circuits - Book
100% (1)
Cmos Digital Circuits - Book
56 pages
Class 3 - Normal Distribution
No ratings yet
Class 3 - Normal Distribution
20 pages
Contacts No
No ratings yet
Contacts No
105 pages
Customer LLA Zolo Cider 9321089065
No ratings yet
Customer LLA Zolo Cider 9321089065
6 pages
Practitioner Competency Profile Self Assessment
No ratings yet
Practitioner Competency Profile Self Assessment
13 pages
Commission Structure
No ratings yet
Commission Structure
11 pages
ISBN
No ratings yet
ISBN
3 pages
Groupb01 Myntra
No ratings yet
Groupb01 Myntra
29 pages
AIL2-Lesson-3 20240219 112217 0000
No ratings yet
AIL2-Lesson-3 20240219 112217 0000
28 pages
0455_MW_OTG_Marking_Guidance
No ratings yet
0455_MW_OTG_Marking_Guidance
2 pages
Edu 103 - Instructional Technology
No ratings yet
Edu 103 - Instructional Technology
10 pages
Emery 2020 The Importance of Self Care For Improving Student Nurse Wellbeing
No ratings yet
Emery 2020 The Importance of Self Care For Improving Student Nurse Wellbeing
1 page
HUS Dynamic Provisioning Config Guide DF82771
No ratings yet
HUS Dynamic Provisioning Config Guide DF82771
130 pages
Introduction
No ratings yet
Introduction
1 page
KPMG UC How To Analyze A Case
No ratings yet
KPMG UC How To Analyze A Case
2 pages
Practice of Urban Aerial Ropeways: Work Report No.1
No ratings yet
Practice of Urban Aerial Ropeways: Work Report No.1
79 pages
Operating Manual: Electric Dust Catcher
No ratings yet
Operating Manual: Electric Dust Catcher
25 pages