Lectures

Previous semesters

The websites of courses taught in previous semesters can be found here.

Question hours

In German: Ferienpräsenz

Important: This semester, the question hours are held on zoom. Please see the corresponding emails you received/will receive.
Lecture Date Time Room

Exam review

In German: Prüfungseinsicht

Important: Exam reviews are regulated by the official ETH Directive on "Viewing and transfer of performance assessment records" that can be found here. The English translation is for information purposes only. The German version is the legally binding version. It can be found here.

Statistik und Wahrscheinlichkeitsrechnung

Mathematik IV: Statistik

Fundamentals of Mathematical Statistics

Applied ANOVA and Experimental Design

Bachelor, master and semester thesis topics

Below you can find topics for bachelor, master or semester theses that the supervisors at the Seminar for Statistics offer.
Please note: This site is still under construction.

Magali Champion

Contact: E-mail, Website

Application of L1-spectral clustering

Description: Application of L1-spectral clustering (Champion et al. 2021) to discover groups of genes associated with the development of kidney cancer
Methods: L1-spectral clustering (Champion et al. 2021) which combines spectral clustering and L1-minimization
Knowledge: basic notions of maths (spectral theory, graphs theory) and R
Data: gene expression data from TCGA (kidney cancer)

Benchmark of clustering methods

Description: Benchmark of clustering methods to discover groups of genes associated with the development of kidney cancer
Methods: k-means, Markov clustering algorithm, ...
Knowledge: basic notions of clustering and R
Data: gene expression data from TCGA (kidney cancer)

Identification of genes involved in the development of ER+ breast cancer

Description: Identification of genes involved in the development of ER+ breast cancer
Methods: multiple tests, PCA, lasso
Knowledge: basic notions of machine learning and R
Data: gene expression data from TCGA (breast cancer)

Benchmark of gene regulatory network inference methods

Description: Benchmark of gene regulatory network inference methods
Methods: lasso, elastic net, random forest (-> causal networks, pcalg,...)
Knowledge: basic notions of graph theory, machine learning and R
Data: gene expression data from TCGA (kidney cancer)

Multi-layer gene networks

Description: Multi-layer gene networks: how to combine biological data to create a gene network?
Methods: multiplex algorithm from (Didier, 15), (Cantini, 21)
Knowledge: basic notions of machine learning and R
Data: multi-omics data from TCGA

Extension of the L1-spectral clustering algorithm

Description: Extension of the L1-spectral clustering algorithm to stochastic block models
Methods: L1-spectral clustering, stochastic block models
Knowledge: spectral theory, graph theory
Data: no data (more theoretical)

Markus Kalisch

Contact: E-mail

Discrete Choice Models

Description: Discrete choice models or qualitative choice models are intended to explain choices between two or more discrete alternatives, such as buying a car or not or choosing among different occupations. In this project, you will read publications in the area, write a summary, apply and implement methods in R, perform simulation studies.
Methods: Extensions to linear regression motivated by economics and social sciences
Knowledge: Linear Regression

Ordinal Response Models

Description: In many applied settings the response variable is an ordinal variable, i.e. a variable whose value exists on an arbitrary scale where only the relative ordering between different values is significant. In this project, you will read publications in the area, write a summary, apply and implement methods in R, perform simulation studies.
Methods: Extensions to linear regression motivated by e.g. social sciences
Knowledge: Linear Regression

Generalized Additive Models

Description: A generalized additive model (GAM) is a generalized linear model in which the response variable depends linearly on unknown smooth functions of some predictor variables. In this project, you will read publications in the area, write a summary, apply and implement methods in R, perform simulation studies.
Methods: Extensions to linear regression motivated by many applied fields of research
Knowledge: Linear Regression

Lukas Meier

Contact: E-mail

Regression with Interval Censoring

Description: Read publications in the area, write a summary, apply and implement methods in R, perform simulation studies.
Methods: Special regression models motivated by survival analysis
Knowledge: Linear regression

Dyadic Regression Models

Description: Dyadic regression is used to model pairwise interaction data (between people, countries etc.), some models are also known as "gravity models". Read publications in the area, write a summary, apply and implement methods in R, perform simulation studies.
Methods: Regression
Knowledge: Linear regression

Nicolai Meinshausen

Contact: E-mail

Fairness in Machine Learning

Description: Read a few key publications in the area of fairness in Machine Learning and write a concise summary, highlighting key conceptual commonalities and differences
Methods: Linear regression and classification; tree ensembles; structural causal models
Knowledge: Regression and classification; causality
Data: some standard benchmark datasets can be used but can also be more theoretical

Invariant Risk Minimization

Description: Implement the invariant risk minimization framework of Arjovski (2019) and write a discussion
Methods: Linear models; tree ensembles; deep networks; causal inference
Knowledge: Machine Learning; Causality
Data: Datasets in paper or some other simple simulation data; possibly some larger datasets

Out-of-distribution generalizations

Description: Read some recent publications on out-of-distribution generalization and write a summary of their differences, advantages and drawbacks.
Methods: Linear models; tree ensembles; structural causal models
Knowledge: Regression and Classification; Causality
Data: Some small simulation studies; if of interest also larger datasets on ICU patient data

Quantile Treatment Effects

Description: Read on quantile treatment effects which characterize the possibly heterogenous causal effect and write a summary of current approaches
Methods: Linear models; tree ensembles; structural causal models; instrumental variables
Knowledge: Regression and Classification; Causality
Data: Can be theoretical; can also use some large-scale climate data

Malte Londschien (mit Peter Bühlmann)

Contact: E-mail

Integration of Change Point Detection Algorithms with Spline-Based Smoothers for Drift Correction of Metabolomics Data

Description: Metabolomics is the study of small molecules in various tissues such as blood, urine, etc. Applications of metabolomics include monitoring of clinical trials and drug and biomarker discovery. In a typical metabolomics experiment, samples are placed in numbered wells on plates and processed by mass spectrometer well by well, plate by plate. The resulting data can thus be interpreted as a high-dimensional time series. Metabolomic measurements are prone to batch effects, instrumental drifts and abrupt jumps, which need to be removed in a pre-processing step. Change (or break) point detection considers the localization of abrupt distributional changes in time series. We propose to estimate drifts and jumps simultaneously with change point detection.
Methods: Change point detection
Data: Metabolomics
Knowledge: Statistical methods, programming in R or Python.