Lectures

Previous semesters

The websites of courses taught in previous semesters can be found here.

Question hours

In German: Ferienpräsenz

Important: This semester, the question hours are held on zoom. Please see the corresponding emails you received/will receive.
Lecture Date Time Room

Exam review

In German: Prüfungseinsicht

Important: Exam reviews are regulated by the official ETH Directive on "Viewing and transfer of performance assessment records" that can be found here. The English translation is for information purposes only. The German version is the legally binding version. It can be found here.

Statistik und Wahrscheinlichkeitsrechnung

Mathematik IV: Statistik

Fundamentals of Mathematical Statistics

Applied ANOVA and Experimental Design

Bachelor, master and semester thesis topics

Below you can find topics for bachelor, master or semester theses that the supervisors at the Seminar for Statistics offer.
Please note: This site is still under construction.

Peter Bühlmann

Contact: E-mail

Conformal prediction for anchor regression

Description: Conformal prediction leads to finite sample correct prediction intervals when the data are i.i.d. The goal is to study these methods and extend them to heterogeneous problems when using anchor regression and related techniques for domain adaptation.
Methods: Linear models, machine learning algorithms, stabilization
Knowledge: Statistical methods and modeling, programming in R or Python
Data: Mostly simulated, if interested larger ICU patient data
Literature:
https://www.tandfonline.com/doi/full/10.1080/01621459.2017.1307116?casa_token=xEbi9SO9uJ0AAAAA%3APhTQ-jYyhH9Ow_wI1DWsepy_PiwKtZ92TFy_tHDdZOIothxphTE_EPsJPXILkcj5YYZbkajD87ytB9M
https://projecteuclid.org/journals/annals-of-statistics/volume-49/issue-1/Predictive-inference-with-the-jackknife/10.1214/20-AOS1965.full
https://proceedings.neurips.cc/paper/2019/hash/8fb21ee7a2207526da55a679f0332de2-Abstract.html
https://www.pnas.org/doi/abs/10.1073/pnas.2107794118
https://rss.onlinelibrary.wiley.com/doi/10.1111/rssb.12398

Markus Kalisch

Contact: E-mail

Ordinal Response Models

Description: In many applied settings the response variable is an ordinal variable, i.e. a variable whose value exists on an arbitrary scale where only the relative ordering between different values is significant. In this project, you will read publications in the area, write a summary, apply and implement methods in R, perform simulation studies.
Methods: Extensions to linear regression motivated by e.g. social sciences
Knowledge: Linear Regression

Nonparametric Regression and Generalized Additive Models

Description: A generalized additive model (GAM) is a generalized linear model in which the response variable depends linearly on unknown smooth functions of some predictor variables. In this project, you will read publications in the area, write a summary, apply and implement methods in R, perform simulation studies.
Methods: Extensions to linear regression motivated by many applied fields of research
Knowledge: Linear Regression

Model-Robustness in Linear Regression

Description: Linear Regression is a simple but surprisingly powerful tool in practical data analysis problems. In this thesis (SA/BA or MA) we have a closer look at the assumptions and optimality guarantees that come with the standard linear regression. Then, we will have a closer look at the robustness of the inference if these assumptions are violated and will research on methods which are more robust wrt. violations of the assumptions.
Methods: Extensions to linear regression motivated by many applied fields of research
Knowledge: Linear Regression

Lukas Meier

Contact: E-mail

Regression with Interval Censoring

Description: Read publications in the area, write a summary, apply and implement methods in R, perform simulation studies.
Methods: Special regression models motivated by survival analysis
Knowledge: Linear regression

Dyadic Regression Models

Description: Dyadic regression is used to model pairwise interaction data (between people, countries etc.), some models are also known as "gravity models". Read publications in the area, write a summary, apply and implement methods in R, perform simulation studies.
Methods: Regression
Knowledge: Linear regression

Nicolai Meinshausen

Contact: E-mail

Fairness in Machine Learning

Description: Read a few key publications in the area of fairness in Machine Learning and write a concise summary, highlighting key conceptual commonalities and differences
Methods: Linear regression and classification; tree ensembles; structural causal models
Knowledge: Regression and classification; causality
Data: some standard benchmark datasets can be used but can also be more theoretical

Invariant Risk Minimization

Description: Implement the invariant risk minimization framework of Arjovski (2019) and write a discussion
Methods: Linear models; tree ensembles; deep networks; causal inference
Knowledge: Machine Learning; Causality
Data: Datasets in paper or some other simple simulation data; possibly some larger datasets

Out-of-distribution generalizations

Description: Read some recent publications on out-of-distribution generalization and write a summary of their differences, advantages and drawbacks.
Methods: Linear models; tree ensembles; structural causal models
Knowledge: Regression and Classification; Causality
Data: Some small simulation studies; if of interest also larger datasets on ICU patient data

Quantile Treatment Effects

Description: Read on quantile treatment effects which characterize the possibly heterogenous causal effect and write a summary of current approaches
Methods: Linear models; tree ensembles; structural causal models; instrumental variables
Knowledge: Regression and Classification; Causality
Data: Can be theoretical; can also use some large-scale climate data

Corinne Emmenegger (with Peter Bühlmann)

Contact: E-mail

Coding treatment effect estimation from network data

Methods: (Causal) treatment effect estimation from network data estimates the average effect someone experiences if they take a drug versus when they do not. But now, individual people are not independent from each other. Instead, they influence each others’ behaviours or outcomes. Who is influencing whom depends on a „closeness relationship“ that is encoded by ties in a network. For example, if people interact with each other, are friends, or are close to each other in space, they share a tie in the network across which information can „flow“ from one person to the other. The algorithm to estimate the treatment effect is motivated from double machine learning, uses some sample splitting scheme, and estimates „nuisance“ functions using some ML like for instance random forests. A preprint of this work is available on arXiv.
Project: The aim of the project is to write a nice R package for this methodology. There is some R code available that has been used to generate the simulations in the above-mentioned paper to get you started. Your job would be to first read a bit about the method and understand what is going on there. Then, you would apply this knowledge to write some nice code implementing it. You would test the code and see how well the method works on different kinds of network structures and statistical models. The current algorithm might undersample units that are connected to many other units in the network. Potentially, we could also have a look at that.
Type of thesis: Master thesis or Semester paper
Prerequisites: Some coding experience with R would be helpful.

Alexander Henzi (with Peter Bühlmann)

Contact: E-mail

Smooth isotonic distributional regression

Description: Isotonic distributional regression (IDR; https://doi.org/10.1111/rssb.12450, doi.org/10.1214/19-EJS1659) is a method for estimating the conditional distribution of an outcome given covariates under monotonicity constraints. The estimator produces discrete distributions, but often one would like to have an estimate of the conditional density. The goal of this project is to investigate methods for smoothing the IDR output distributions, based on a kernel density estimation approach.
Methods: kernel density estimation, shape restricted regression
Knowledge: basic knowledge of kernel density estimation, nonparametric statistics, R (or Python) programming