Introduction To The R Project For Statistical Computing: Stefano CASALEGNO, PH.D
Introduction To The R Project For Statistical Computing: Stefano CASALEGNO, PH.D
University of Basilicata, Italy – May 2010
Introduction to the
R Project for Statistical Computing
info@spatial.ecology.net www.spatial-ecology.net
Topics for this lecture
1. Introducing the R Project for Statistical
Computing: what and why?
info@spatial.ecology.net www.spatial-ecology.net
1. GENERAL INTRODUCTION
The R Project for Statistical
Computing:
what and why?
www.spatial-ecology.net
What ?
R is a language and environment for statistical
computing and graphics.
It is a GNU OS project : open source free
software, a mass collaboration project
R is based and similar to the S language and
environment → developed at Bell Laboratories (formerly
AT&T) by John Chambers and colleagues. (the same group
that developed C and UNIX©)
1. R introduction www.spatial-ecology.net
Software or Environment ?
Many users think of R as a statistics system.
We prefer to think of it of an environment within
which statistical techniques are implemented.
R has its own LaTeXlike documentation
format, which is used to supply comprehensive
documentation, both online in a number of
formats and in hardcopy.
1. R introduction www.spatial-ecology.net
The R environment
The term "environment" is intended to
characterize as a fully planned and
coherent system, rather than an incremental
accretion of very specific and inflexible tools, as
is frequently the case with other data analysis
software.
R is an integrated suite of software facilities for
data manipulation, calculation and graphical
display.
1. R introduction www.spatial-ecology.net
What does the R environment includes ?
an effective data handling and storage facility,
a suite of operators for calculations on arrays, in
particular matrices,
a large, coherent, integrated collection of intermediate
tools for data analysis,
graphical facilities for data analysis and display either
onscreen or on hardcopy, and
a welldeveloped, simple and effective programming
language which includes conditionals, loops, user
defined recursive functions and input and output
facilities.
1. R introduction www.spatial-ecology.net
WHY ?
Peculiarity
In S a statistical analysis is normally done as a
series of steps, with intermediate results being
stored in objects.
Thus whereas SAS and SPSS will give copious
output from a regression or discriminant
analysis,
R will give minimal output and store the results in
a fit object for subsequent interrogation by
further R functions.
1. R introduction www.spatial-ecology.net
Advantages of
FREE There are no restrictions on access or use.
Scientifically robust It is the product of
international collaboration between top
computational statisticians and computer language
designers
It runs on almost all operating systems
It allows statistical analysis and modelling of high
sophistication: you are not limited to one method
of accomplishing a given computation or graphical
presentation
1. R introduction www.spatial-ecology.net
Advantages of 2
It can work on objects of unlimited size and
complexity (cluster processing)
Exchange data (csv, Gdal) and work environment
( shell / GRASS)
It is supported by comprehensive online technical
documentation and usercontributed community
Repetitive functions ”scripts”
Published and available source codes
1. R introduction www.spatial-ecology.net
Disadvantagtes of
Command line
Learn the S language
Approach a new way of thinking about data, as
objects each with its type, which in turn supports a
set of methods.
R works on Random Access Memory
RAM is a type of physical memory that can be read
from and written to.
1. R introduction www.spatial-ecology.net
2. Resources for learning
http://www.rproject.org/
Introductions and tutorials
Textbooks, manuals
Web
R News, Mailing lists, user’s conference
... R help
2. Learning R www.spatial-ecology.net
Introductions and tutorials
Venables, W. N. ; Smith, D. M. ; R Development Core Team, 2007. An
Introduction to R (Notes on R: A Programming Environment for Data
Analysis and Graphics), Version 2.5.0 (20070423). ISBN 3900051
127
http://www.cran.rproject.org
Hornik, K. 2007. R FAQ: Frequently Asked Questions on R. Version
2.5.20070423. ISBN 3900051089
Rossiter, D.G., 2007. Introduction to the R Project for Statistical
Computing for use at ITC. Revision 2.95. International Institute for
Geoinformation Science & Earth Observation (ITC), Enschede (NL),
129 pp.
http://www.itc.nl/personal/rossiter/teach/R/RIntro_ITC.pdf
2. Learning R www.spatial-ecology.net
text books
Introductory Statistics with R. Dalgaard, P. 2002. Springer
Verlag
Venables, W. N. & Ripley, B. D. 2002. Modern applied
statistics with S. New York: SpringerVerlag, 4th edition
A Handbook of Statistical Analyses Using R, Brian S. Everitt,
Torsten Hothorn. 2006 Chapman & Hall.
A Practical Guide to Ecological Modelling: Using R as a
Simulation Platform. Karline Soetaert, Peter M.J. Herman.
2008. Springer
Data Manipulation with R, Phil Spector. 2009. Springer.
2. Learning R www.spatial-ecology.net
Web
Wikipedia on R http://wiki.rproject.org/rwiki/doku.php
Help at UCLA http://www.ats.ucla.edu/stat/r/
help on packages http://astrostatistics.psu.edu/datasets/R/html/index.html
Ecological models and data in R, princeton University
http://www.zoology.ufl.edu/bolker/emdbook/
R seek function http://www.rseek.org/
multisite search engine http://www.dangoldstein.com/search_r.html
2. Learning R www.spatial-ecology.net
R News, Mailing lists, user’s conference
● MAILING LIST: http://www.rproject.org/mail.html
Rsiggeo: R Special Interest Group on using Geographical data and
Mapping https://stat.ethz.ch/mailman/listinfo/rsiggeo
Help in spanish
https://stat.ethz.ch/mailman/listinfo/rhelpes
● NEWS LETTER
http://cran.rproject.org/doc/Rnews/Rnews_20013.pdf
● CONFERENCES
http://www2.agrocampusouest.fr/math/useR2009/
2. Learning R www.spatial-ecology.net
3. APPLICATION
Using for
Spatial Ecological modelling
packages
Basic package of R environment, 8 “standard”
packages
Packages includes: functions / data / examples / manuals
Packages Internet sites
http://cran.rproject.org
3. R spatial www.spatial-ecology.net
packages
3. R spatial www.spatial-ecology.net
Spatial data and
R has dedicated data structures and methods
for specific kinds of data (e.g. time series data,
spatial data, ecological modelling)
A large number of packages provide spatial
statistical methods or interfaces to GIS, and
many of them provide data structures and e.g.
plotting methods for spatial data.
3. R spatial www.spatial-ecology.net
Editing scripts
using KATE
KDE Advanced Text Editor
www.spatial-ecology.net