Regression

Spring semester 2018

General information

Lecturer Peter Bühlmann
Assistant Solt Kovács
Lectures Wed 10-12 HG D 7.2 >>
Fri 13-15 (every second week) HG E 5 >>
Exercises Fri 13-15 (every second week) HG E 5 >>
Course catalogue data >>

Course content

In regression analysis, we examine the relationship between a random response variable and several other explanatory variables. In this class, we consider the theory of linear regression with one or more explanatory variables. Moreover, we also study robust methods, generalized linear models, model choice, high-dimensional linear models, nonlinear models and nonparametric methods. Several numerical examples will illustrate the theory. You will learn to perform a regression analysis and interpret the results correctly. We will use the statistical software R to get hands-on experience with this. You will also learn to interpret and critique regression analyses done by others.

Literature

  • Practical regression with R by Julian R. Faraway (2002) with R-code.
  • Peter Bühlmann and Sara van de Geer (2011), "Statistics for High-Dimensional Data - Methods, Theory and Applications", Springer. (Available here for free when logged in via ETH; For high-dimensional regression.)
  • John Fox (1997), "Applied Regression Analysis, Linear Models, and Related Methods", Sage Publications. (Intuitive examples, not very mathematical.)
  • Sanford Weisberg (2005), "Applied Linear Regression", 3rd edition, Wiley. (Similar to the one by Fox, but shorter.)
  • Paul D. Allison (1999), "Multiple linear regression, a primer", Thousand Oaks. (Brief, good for interpretations, not very mathematical.)
  • Peter Dalgaard (2002), "Introductory Statistics with R", Springer. (Introduction based on the software R.)
  • T. Hastie, R. Tibshirani, and J. Friedman (2009), "The Elements of Statistical Learning", 2nd edition, Springer.

Additional information

Examples in the lecture, as well as solutions to the exercises will be based on the statistical software R. This is a freely available open source program that works on all platforms and has become worldwide standard for data analysis. It can be downloaded from CRAN. An R Tutorial can be found here.

Announcements

  • February 10th, 2018:
    The first exercise session is on February 21 and will be an introduction to the statistical programming language R with some exercises. Starting from the second exercise session (March 2), the exercise classes will take place every second Friday. In the exercise sessions, you can solve the R problems, the series and ask questions. You need to bring your own laptop for solving the R questions. Wednesdays there will be lectures every week and Fridays will alternate between lectures and exercise sessions (exceptions will be announced). Please check this course website regularly for announcements regarding the schedule. The first lecture will be on February 23.
  • June 5th, 2018:
    The course material for the lecture has been updated to inlcude some R-scripts and further notes. Similarly, all sample solutions for the exercises are online.
  • June 5th, 2018:
    Note the following important dates, as announced on the last exercise sheet.

    Question hour / Ferienpräsenz:
    Monday, August 20th, 2018, 15:00 - 16:00, HG G 19.2
    Thursday, August 23rd, 2018, 15:00 - 16:00, HG G 19.2

    Exam review / Prüfungseinsicht:
    Monday, September 24th, 2018, 12:00 - 13:00, HG G19.1

Course materials

Text:

  • Lecture notes can be found here (PDF).
  • The book used for high-dimensional regression is available here for free when logged in via ETH. Details: Peter Bühlmann and Sara van de Geer (2011), "Statistics for High-Dimensional Data - Methods, Theory and Applications", Springer.
  • Practical regression with R by Julian R. Faraway (2002) with R-code.

R-Scripts, Outputs, and Slides:

Additional material:

Alternative texts:

  • John Fox (1997), "Applied Regression Analysis, Linear Models, and Related Methods", Sage Publications. (Intuitive examples, not very mathematical.)
  • Sanford Weisberg (2005), "Applied Linear Regression", 3rd edition, Wiley. (Similar as the one by Fox but shorter.)
  • Paul D. Allison (1999), "Multiple linear regression, a primer", Thousand Oaks. (Brief, good for interpretations, not very mathematical.)
  • Peter Dalgaard (2002), "Introductory Statistics with R", Springer. (Introduction based on the software R.)
  • T. Hastie, R. Tibshirani, and J. Friedman (2009), "The Elements of Statistical Learning", 2nd edition, Springer.

Exercise classes

The first exercise class (Wednesday, February 21) will feature an R tutorial with some exercises. Please install R and RStudio and bring your laptop to the exercise classes. From the second exercise class on (March 2), exercise classes will take place every second Friday.

Series and solutions

Handing in solutions for the exercise series is not mandatory. In case you do wish to hand in solutions to the series, these should be handed in by 13:00 of the designated optional hand-in date. You can submit your solutions by placing them in the REGRESSION box in room HG J 68.

Exercises Hand out Optional hand in Discussion Solution Slides/Notes/Remarks
Series 6 May 18, 2018 May 30, 2018 May 25, 2018 Solutions 6
Series 5 May 4, 2018 May 16, 2018 May 11, 2018 Solutions 5
Series 4 April 20, 2018 May 2, 2018 April 27, 2018 Solutions 4
Series 3 April 6, 2018 April 18, 2018 April 13, 2018 Solutions 3
Series 2 March 9, 2018 March 21, 2018 March 16, 2018 Solutions 2 Remarks 2

Examining data notes Examining data R-code Transformations notes Transformations R-code
Series 1; wdi dataset February 23, 2018 March 7, 2018 March 2, 2018 Solutions 1 Remarks 1
R Series - - February 21, 2018 R Series Solution R Intro Slides
R Intro Democode
easy dataset
A short introduction to R
Introduction