Computational Statistics

Spring semester 2017

General information

Lecturer Martin Mächler
Assistants Niklas Pfister, Dominik Rothenhäusler
Lectures Thu 13-15 HG E 5 >>
Fri 09-10 HG E 1.2 >>
Exercises Fri 10-12 HG E 1.2 >>
Course catalogue data >>
Literature T. Hastie, R. Tibshirani, J. Friedman. The Elements of Statistical Learning. Springer
J. E. Gentle. Elements of Computational Statistics. Springer
W. N. Venables, B. D. Ripley. Modern Applied Statistics with S. Springer
Exam and lecture attestation (Testat) In order to obtain ECTS-credit points you have to pass the written exam during the examination session. This exam will include tasks to be solved on a computer with R. No course attendance confirmation is required to subscribe for the exam. For ECTS credits, you do not have to hand in your solution to the exercises during the semester.

Doctoral students who need (part of the) credit points, but who do not require a grade should talk to Martin Mächler at the beginning of the semester. In that case you will need to solve and submit at least 70% of the exercises.

Course content

Abstract

"Computational Statistics" deals with modern methods of data analysis (aka "data science") for prediction and inference. An overview of existing methodology is provided and also by the exercises, the student is taught to choose among possible models and about their algorithms and to validate them using graphical methods and simulation based approaches.

Objective

Getting to know modern methods of data analysis for prediction and inference. Learn to choose among possible models and about their algorithms. Validate them using graphical methods and simulation based approaches.

Course Synopsis

multiple regression, nonparametric methods for regression and classification (kernel estimates, smoothing splines, regression and classification trees, additive models, projection pursuit, neural nets, ridging and the lasso, boosting). Problems of interpretation, reliable prediction and the curse of dimensionality are dealt with using resampling, bootstrap and cross validation.

Notice

Exercises will be based on the open-source statistics software R. Emphasis will be put on applied problems. Active participation in the exercises is strongly recommended.

Announcements

  • First week
    Beginning of lecture: Thursday, 23/02/2017.

    Exercises start on 24/02/2017 at 10:15 in room HG E 1.2, with a special introduction to the "R" software. Please bring a laptop. The tutorial can be downloaded here. Here are some further helpful links related to R:

    R homepage
    R studio homepage
    Getting help with R
    Online R course (in German)
    Try R

  • No lecture on Friday May 5th
    There will be no lecture on Friday 05/05/2017, however the exercise will take place as usual.
  • Question hours:
    Monday, 07/08/2017, 2pm - 3pm, HG G19.1
    Thursday, 10/08/2017, 2pm - 3pm, HG G19.1
  • Exam Review:
    Friday, 22/09/2017, 12pm - 1pm, HG G19.1

Course materials

Lecture notes

The lecture notes are available here.

Recorded lectures

The recorded lectures can be found here. You need nethz login credentials to log in.

R Scripts as used in the lecture

A selection is online in this directory.

Course organisation

The course outline can be found here.

Exercise classes

Exercise classes will be held weekly on Friday from 10.15 to 11:55 in HG E 1.2. For the precise dates see the table below.

The exercise class on Friday will be divided in 2 parts. During the first part, the assistant will discuss the new exercise series. Part of the material covered during the lecture can also be recapitulated if this is requested. During the second part, the assistant will answer individual questions (regarding the lecture, the exercise sheet or R) and give you back the corrected solutions you submitted the previous exercise class.

Questions and comments

You can always ask questions to the lecturer and tutors during the class and the exercise session. If you have concrete R questions, you can bring your own laptop to the exercise class and ask them directly to the assistant. If you want to send us an email, please send it to this address: compstat@stat.math.ethz.ch. Please do not send any R code to this address, it is usually easier to answer these questions during the exercise class or to arrange an appointment.

Exercise sheets

The new exercise sheet will always be uploaded on the Thursdays preceding the corresponding preliminary discussion session.

We will correct your answers given that you respect our hand-in policy:

  • No R script files and no lengthy compilations of outputs or figures.
    • Only hand in your most important findings and answers. Do not include your code unless the question specifically asks to fill in a given skeleton of an R code.
  • Focus on the interpretation of the results.
    • We are usually more interested in what you conclude based on the obtained results, than on the numbers per se.
  • Hand in by 12pm (noon) on due date.
    • The solved exercises should be handed in during the exercise class or placed in the corresponding tray in HG J68 on the due date by 12pm at the latest.

A very elegant way to hand in your solution is to combine everything in a single file (for example by generating a pdf with LATEX or Word/Libreoffice). An easy way to do this is to use R-Markdown! See the tutorial and the template.

Exercises Discussion Deadline
R-Tutorial February 24, 2017
Exercise 1 March 3, 2017 March 10, 2017
Exercise 2 March 10, 2017 March 17, 2017
Exercise 3 March 17, 2017 March 24, 2017
Exercise 4 March 24, 2017 March 31, 2017
Exercise 5, R-Skeleton_Ex_5.2 March 31, 2017 April 7, 2017
Exercise 6 April 7, 2017 April 28, 2017
Exercise 7, skeleton7.R April 28, 2017 May 5, 2017
Exercise 8, skeleton8.R May 5, 2017 May 12, 2017
Exercise 9 May 12, 2017 May 19, 2017
Exercise 10, skeleton10.R May 19, 2017 May 26, 2017
Exercise 11 (minor changes in 1. d); June 1st) May 26, 2017 June 2, 2017
General discussion June 2, 2017

Solutions

The solutions will be sent weekly to the students enrolled for this course.

Help with R

On Friday, February 24st 2017, there will be an introduction to the statistical software R during the exercise class from 10:15 to 11:55. The tutorial can be downloaded here. Here are some further helpful links related to R:

R homepage
R studio homepage
Getting help with R
Online R course (in German)
Try R