R Programming Language Notes
R Programming Language Notes
R Python
● Easier for experienced programers ● Good for beginners and
● Tends to be favored by academics, experienced
researchers, hard-core data ● Used by software engineers of all
scientists types
● Shorter code for complex analysis, ● Better integrated for general
statistics, graphics purpose coding.
● Extremely slow! ● Not especially fast
#This says loop through the first 10 rows, each time in the loop print the ith row of cities
The {} are equivalent to the indent in python for loop
In [ ]:
%%R
for (i in 1:10) { print(cities[i,]) }
#Select all rows in the dataframe where the longitude is less than 0
In [ ]:
%%R
cities[cities$longitude < 0,]
#Select rows and columns
In [ ]:
%%R
cities[cities$latitude > 50 & cities$temperature > 9,
c('city','latitude','temperature')]
#Sort the rows based on temperature (note its BEFORE comma cuz its rows we sort)
Decreasing = TRUE changes the order of sorting
In [ ]:
%%R
cities[order(cities$temperature),decreasing = TRUE]
# descending country with ascending temperature?
# can use - on string columns with as.numeric()
#Sorting by increasing country, then within each country, increasing temp (like grouping)
# ascending count
%%R
cities[order(cities$country,cities$temperature),]
NOTE: if we add a minus sign - before cities, it will do decreasing temp but country will
remain increasing
# If we want to do the same thing but sort decreasing country and increasing temp we
have to add as.numeric() around cities cuz minus sign expects numbers
# ascending count
%%R
cities[order(-as.numeric(cities$country),cities$temperature),]
# If we want to do a selection (which goes before comma) and want to sort (which goes
before) we can just put them together. We have to use “temporary” cities2 to pick out
temperatures. And then within cities2, it will order by decreasing temperature.
In [ ]:
%%R
cities2 <- cities[cities$longitude < 0 & cities$temperature > 12,
c('city','temperature')]
cities2[order(-cities2$temperature),]
Your Turn
Find all countries that are not in the EU and don't have coastline, together with their populations,
sorted by country name in reverse alphabetical order. Note: equality uses '==' and strings can be
single (') or double (") quoted.
In [6]:
%%R
countries2 <- countries[countries$EU =='no' & countries$coastline == 'no',
c('country','population')]
countries2[order(countries2$country, decreasing=TRUE),]
Aggregation
EU Coastline Average
1 no no 4.35375
2 yes no 6.99000
3 no yes 19.59571
4 yes yes 21.37818
#Number of cities west of the Prime Meridian (i.e., longitude < 0) - error then fix
In [ ]:
%%R
cities2 <- cities[cities$longitude < 0,]
nrow(cities2)
Your Turn
Considering only cities with latitude < 40, find the average temperature for each country. Then
considering only cities with latitude > 60, find the average temperature for each country. Remember
print() is needed to see a result unless it's the last line.
In [12]:
%%R
south <- cities[cities$latitude < 40,]
north <- cities[cities$latitude > 60,]
print(aggregate(south$temperature, by=list(south$country), FUN=mean))
print(aggregate(north$temperature, by=list(north$country), FUN=mean))
Joining
#Cities not in the EU with latitude > 50; return city, country, latitude, and whether country has
coastline
In [ ]:
%%R
citiesext <- merge(cities,countries)
citiesext[citiesext$EU == 'no' & citiesext$latitude > 50,
c('city','country','latitude','coastline')]
Miscellaneous features
Plotting
Scatterplots
Pie charts