Regression
Spring semester 2018
General information
Course content
In regression analysis, we examine the relationship between a random response variable and several other explanatory variables. In this class, we consider the theory of linear regression with one or more explanatory variables. Moreover, we also study robust methods, generalized linear models, model choice, high-dimensional linear models, nonlinear models and nonparametric methods. Several numerical examples will illustrate the theory. You will learn to perform a regression analysis and interpret the results correctly. We will use the statistical software R to get hands-on experience with this. You will also learn to interpret and critique regression analyses done by others.
Literature
- Practical regression with R by Julian R. Faraway (2002) with R-code.
- Peter Bühlmann and Sara van de Geer (2011), "Statistics for High-Dimensional Data - Methods, Theory and Applications", Springer. (Available here for free when logged in via ETH; For high-dimensional regression.)
- John Fox (1997), "Applied Regression Analysis, Linear Models, and Related Methods", Sage Publications. (Intuitive examples, not very mathematical.)
- Sanford Weisberg (2005), "Applied Linear Regression", 3rd edition, Wiley. (Similar to the one by Fox, but shorter.)
- Paul D. Allison (1999), "Multiple linear regression, a primer", Thousand Oaks. (Brief, good for interpretations, not very mathematical.)
- Peter Dalgaard (2002), "Introductory Statistics with R", Springer. (Introduction based on the software R.)
- T. Hastie, R. Tibshirani, and J. Friedman (2009), "The Elements of Statistical Learning", 2nd edition, Springer.
Additional information
Examples in the lecture, as well as solutions to the exercises will be based on the statistical software R. This is a freely available open source program that works on all platforms and has become worldwide standard for data analysis. It can be downloaded from CRAN. An R Tutorial can be found here.
Announcements
-
February 10th, 2018:
The first exercise session is on February 21 and will be an introduction to the statistical programming language R with some exercises. Starting from the second exercise session (March 2), the exercise classes will take place every second Friday. In the exercise sessions, you can solve the R problems, the series and ask questions. You need to bring your own laptop for solving the R questions. Wednesdays there will be lectures every week and Fridays will alternate between lectures and exercise sessions (exceptions will be announced). Please check this course website regularly for announcements regarding the schedule. The first lecture will be on February 23. -
June 5th, 2018:
The course material for the lecture has been updated to inlcude some R-scripts and further notes. Similarly, all sample solutions for the exercises are online. -
June 5th, 2018:
Note the following important dates, as announced on the last exercise sheet.
Question hour / Ferienpräsenz:
Monday, August 20th, 2018, 15:00 - 16:00, HG G 19.2
Thursday, August 23rd, 2018, 15:00 - 16:00, HG G 19.2
Exam review / Prüfungseinsicht:
Monday, September 24th, 2018, 12:00 - 13:00, HG G19.1
Course materials
Text:
- Lecture notes can be found here (PDF).
- The book used for high-dimensional regression is available here for free when logged in via ETH. Details: Peter Bühlmann and Sara van de Geer (2011), "Statistics for High-Dimensional Data - Methods, Theory and Applications", Springer.
- Practical regression with R by Julian R. Faraway (2002) with R-code.
R-Scripts, Outputs, and Slides:
- High-dimensional inference
- Robust methods (by Jonathan Taylor)
- R script (p.23-28)
- boston.R
- brainsize.R
- kernelsmoothing.R
- leukemia_modelselection.R
- poissonregr.R
- riboflavin-highdim.R
Additional material:
Alternative texts:
- John Fox (1997), "Applied Regression Analysis, Linear Models, and Related Methods", Sage Publications. (Intuitive examples, not very mathematical.)
- Sanford Weisberg (2005), "Applied Linear Regression", 3rd edition, Wiley. (Similar as the one by Fox but shorter.)
- Paul D. Allison (1999), "Multiple linear regression, a primer", Thousand Oaks. (Brief, good for interpretations, not very mathematical.)
- Peter Dalgaard (2002), "Introductory Statistics with R", Springer. (Introduction based on the software R.)
- T. Hastie, R. Tibshirani, and J. Friedman (2009), "The Elements of Statistical Learning", 2nd edition, Springer.
Exercise classes
The first exercise class (Wednesday, February 21) will feature an R tutorial with some exercises. Please install R and RStudio and bring your laptop to the exercise classes. From the second exercise class on (March 2), exercise classes will take place every second Friday.
Series and solutions
Handing in solutions for the exercise series is not mandatory. In case you do wish to hand in solutions to the series, these should be handed in by 13:00 of the designated optional hand-in date. You can submit your solutions by placing them in the REGRESSION box in room HG J 68.
Exercises | Hand out | Optional hand in | Discussion | Solution | Slides/Notes/Remarks | |
---|---|---|---|---|---|---|
Series 6 | May 18, 2018 | May 30, 2018 | May 25, 2018 | Solutions 6 | ||
Series 5 | May 4, 2018 | May 16, 2018 | May 11, 2018 | Solutions 5 | ||
Series 4 | April 20, 2018 | May 2, 2018 | April 27, 2018 | Solutions 4 | ||
Series 3 | April 6, 2018 | April 18, 2018 | April 13, 2018 | Solutions 3 | ||
Series 2 | March 9, 2018 | March 21, 2018 | March 16, 2018 | Solutions 2 | Remarks 2 Examining data notes Examining data R-code Transformations notes Transformations R-code |
|
Series 1; wdi dataset | February 23, 2018 | March 7, 2018 | March 2, 2018 | Solutions 1 | Remarks 1 | |
R Series | - | - | February 21, 2018 | R Series Solution | R Intro Slides R Intro Democode easy dataset A short introduction to R |
|
Introduction |