06/10/2014

What is R?

From R-CRAN:

" R is 'GNU S', a freely available language and environment for statistical computing and graphics which provides a wide variety of statistical and graphical techniques: linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering, etc. Please consult the R project homepage for further information. "

Why use R?

R is…

  • free
  • open-source
  • flexible
  • expandable with thousands of third-party packages
  • widely used in academia and industry
  • awesome

Outline

  1. R and RStudio: the basics
  2. A typical data analysis
  3. Going further

R and RStudio: the basics

I strongly recommend that you work with RStudio, but if you prefer there are some alternatives (in particular emacs with ESS). (see demo)

R and RStudio: the basics

R as a calculator (see demo)

sqrt(2)
## [1] 1.414
x <- 3
y <- x^2
x + y
## [1] 12
sin(2*pi)
## [1] -2.449e-16

R and RStudio: the basics

  • Write your R instruction in a script, which is basically a text file such as filename.R
  • Execute your script in the R console line by line, or by chunk, clicking on Run or by typing Ctrl-Return.
  • Look at your plot on the side window and export it to various formats.
  • Look for help about any function by typing ?foo

(see demo)

R and RStudio: the basics

  • If you are not familiar at all with R, I advise you to go through the steps of this tutorial

  • If you have any question, google how do I X with R. There is a huge community and most of your questions have already been asked and answered by somebody else!

  • Learning by doing is particularly true for programming: do the exercises, check the solutions, understand every step, try new things, fail and you'll become stronger. R won't eat you.

Typical data analysis

  1. Design your experiment: there are some packages to help you, but you also need to think.
  2. Import and transform your data into a suitable format: R makes the process a breeze (almost)
  3. Exploratory data analysis: easy to create flexible and informative plots
  4. Fit your model: typically one or two lines of code
  5. Check the model assumptions: many diagnostics plots available
  6. Report your results: knitr integrates R with Latex/HTML/Word/etc.

Import and transform the data

Here is a fake dataset about coagulation time of blood given the type of diet followed, which has been saved in the file `blood.csv' and looks like that:

patient_id diet coagulation
1 A 62
2 A 60
9 B 65

Import and transform the data

Importing such a table in R is easy and can be done in different ways depending on the format of the data (csv, txt, xlsx, etc.).

(Alternative: use the import dataset tool in RStudio (upper-right panel))

blood <- read.table(file='blood.csv', sep=',', header=TRUE)
blood
##    patient_id diet coagulation
## 1           1    A          62
## 2           2    A          60
## 3           3    A          63
## 4           4    A          59
## 5           5    A          63
## 6           6    A          67
## 7           7    A          71
## 8           8    A          64
## 9           9    B          65
## 10         10    B          66
## 11         11    B          68
## 12         12    B          66
## 13         13    B          73
## 14         14    B          67
## 15         15    B          68
## 16         16    B          65
## 17         17    C          56
## 18         18    C          62
## 19         19    C          60
## 20         20    C          61
## 21         21    C          63
## 22         22    C          55
## 23         23    C          62
## 24         24    C          58

Import and transform the data

The variable blood in which we imported the data is called a data.frame. It is the most essential data structure in R and you should get familiar with it. The basic operations are:

Access some rows of the data:

row1_10 <- blood[1:10, ]

Access a particular column/variable of the data:

patient_id <- blood$patient_id
## or alternatively:
patient_id <- blood[, 'patient_id']

Import and transform the data

useful functions (look for help by typing ?foo):

  • str()
  • nrow() and ncol()
  • summary()
  • apply()

Exploratory data analysis

R is very good for plotting and can help you to understand your data in a very flexible manner.

plot(coagulation~diet, data=blood)

plot of chunk unnamed-chunk-5

Exploratory data analysis

And fancier (with ggplot2):

g <- ggplot(data=blood, aes(x=coagulation, fill=diet))
g + geom_density(alpha=.2)

plot of chunk unnamed-chunk-7

Exploratory data analysis

useful functions (look for help by typing ?foo):

  • plot()
  • boxplot()
  • pairs()
  • par()
  • and much more in the packages ggplot2 (strongly advised to look at it! here is a tutorial) and lattice

Fit a model

To fit any linear model in R, use the function lm().

mod <- lm(coagulation~diet, data=blood)

The syntax y~x is called a formula and it is very useful to express concisely a model, also when you have nested factors, blocks, random effects, etc. (later this semester)

Fit a model

The function Anova() from the package car produces an adequate ANOVA table:

require(car, quietly = TRUE)
Anova(mod)
## Anova Table (Type II tests)
## 
## Response: coagulation
##           Sum Sq Df F value  Pr(>F)    
## diet         233  2    11.5 0.00043 ***
## Residuals    213 21                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Model checking

Producing diagnostics plots for linear models with R is straightforward:

par(mfrow=c(2,2))
plot(mod)

plot of chunk unnamed-chunk-10

Report your results

knitr is an awesome tool that allows you to embed your R code and results (including tables and plots) in a nicely looking document. Doing so makes your research literate and reproducible.

If you use rmarkdown you can output your report in any of these format:

  • PDF
  • HTML
  • slides (like this presentation)
  • Word document (if your collaborators require it)

Going further: courses

  • Compicampus course on 29+30.10.2014: here (still 5 free spots as of last week)

  • Resources about getting help with R here

  • Online course about R here (in German)

  • Practice, practice, practice!

Any question?

You can always send me an email at robert@stat.math.ethz.ch

Series 1

For the next hour you will try R yourself by starting the series 1.

For those who have a laptop: stay here

For the others: go to room E 19