Student Seminar in Statistics:
Multiple Testing for Modern Data Science

Autumn semester 2020

General information

Lecturer Matthias Löffler, Armeen Taeb
Assistants Domagoj Ćevid, Jinzhou Li
Lectures Mon 16.00-18.00 RZ F 21 Zoom link (password sent via email)
Course catalogue data VVZ

Course content

Objective

The students understand the relevance of multiple testing in modern applications. Further, they learn about two commonly used measures -- namely family-wise-error-rate (FWER) and false discovery rate (FDR) -- and approaches to control for them.

Literature

Please see Course material and schedule.

Prerequisites / Notice

Every lecture will consist of an oral presentation highlighting key ideas of selected papers by a pair of students. Another two students will be responsible for asking questions during the presentation and providing a discussion of the pros+cons of the papers at the end. Finally, an additional two students are responsible for giving an evaluation on the quality of the presentations/discussions and provide constructive feedback for improvement.

Announcements



    14.09.2020
    The Zoom link for the lectures has been added to the website in the "Overview" tab. Passwords are sent via email, please contact one of the assistant if you did not get it.



    14.09.2020
    Welcome to the website of the course "Student Seminar in Statistics: Multiple Testing for Modern Data Science"!
    The first class will take place on Monday, 21.09.2020.
    This will be an introductory lecture by Dr. Matthias Löffler and Dr. Armeen Taeb. We are looking forward to seeing you!

    Assignment of topics: The first two topics will be assigned per email before the start of the semester. The remaining topics will be assigned during the first class, on Monday 21.09.2020. We will send out an anonymous Doodle poll beforehand, so that you can indicate your interests. The first student presentation will take place on Monday 28.09.2020.

    Please, let us know ASAP in case you decide not to take part in the seminar.

Course material and schedule

The course encompasses a review of approaches to multiple testing.

The group of 24 students will be divided into 12 pairs. Everyone is expected to participate actively during all lectures. Moreover, each pair will have a special role during three different lectures: once as presenters, once to take the lead in asking questions, and once to give feedback.

The presentations should be roughly 2 x 25 minutes, with a 5-10 minute break in between. One of the assistants will meet with you twice before your presentation, to answer questions about the material and to give feedback on your planned presentation. More detailed guidelines for the presentations will be given during the first class. Please also see the FAQ for further details.

More related resources: Topics in Selective Inference ; Theory of Statistics.


Week Topics and related papers Questions and Feedback Slides
Week 1 (21.09.2020) Introductory Lecture by Dr. Matthias Löffler and Dr. Armeen Taeb.
Week 2 (28.09.2020) Group 1: Bonferroni and Simes

-A simple sequentially rejective multiple test procedure
-An improved Bonferroni procedure for multiple tests of significance

  • Students: Wayne Zeng, Skander Stephan
  • Assistant: Domagoj
  • Questions: Group 3
  • Feedback: Group 5
Week 3 (05.10.2020) Group 2: Permutation tests

-Multiple hypothesis testing in microarray experiments
-Asymptotic optimality of the Westfall–Young permutation procedure for multiple testing under dependence

  • Students: Pascal Schwendimann, Pan Zhao
  • Assistant: Domagoj
  • Questions: Group 4
  • Feedback: Group 6
Week 4 (12.10.2020) Group 3: Hierarchical testing

-Hierarchical false discovery rate controlling methodology
-Hierarchical testing of variable importance
-A graphical approach to sequentially rejective multipletest procedures

  • Students: Lara Fratini, Kaye Iseli
  • Assistant: Domagoj
  • Questions: Group 5
  • Feedback: Group 7
Week 5 (19.10.2020) Group 4: Higher criticism

-Methodology: Higher criticism for large-scale inference, for rare and weak effects, and for theoretical reference: Higher criticism for detecting sparse heterogeneous mixtures
-Application: Higher criticism statistic: detecting and identifying non-Gaussianity in the WMAP first-year data, and for more reference Higher criticism: theory and applications in cosmology

  • Students: Georgios Vasilakopoulos, Delia Keiser
  • Assistant: Jinzhou
  • Questions: Group 6
  • Feedback: Group 8
Week 6 (26.10.2020) Group 5: Benjamini-Hochberg (BH) with martingales

-Controlling the false discovery rate: a practical and powerful approach to multiple testing
-Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach

  • Students: Marco Hassan, Luca Pedrazzini
  • Assistant: Jinzhou
  • Questions: Group 7
  • Feedback: Group 9
Week 7 (02.11.2020) Group 6: FDR control under dependence

-The control of the false discovery rate in multiple testing under dependency
-Adaptive false discovery rate control under independence and dependence

  • Students: Ramon Stieger, Joel Widmer
  • Assistant: Jinzhou
  • Questions: Group 8
  • Feedback: Group 10
Week 8 (09.11.2020) Group 7: Empirical null distribution

-Empirical bayes methods and false discovery rates for microarrays
-On using empirical null distributions in Benjamini-Hochberg procedure

  • Students: Davide Apolloni, Stefan Thoma
  • Assistant: Domagoj
  • Questions: Group 9
  • Feedback: Group 11
Week 9 (16.11.2020) Group 8: Bayes FDR methods

-The positive false discovery rate: a Bayesian interpretation and the q-value
-On spike and slab empirical Bayes multiple testing

  • Students: Jsmea Hug, Nathan Brack
  • Assistant: Domagoj
  • Questions: Group 10
  • Feedback: Group 12
Week 10 (23.11.2020) Group 9: SLOPE

-Adapting to unknown sparsity by controlling the false discovery rate
-SLOPE - Adaptive variable selection via convex optimization

  • Students: Denis Schaub, Pascal Kündig
  • Assistant: Jinzhou
  • Questions: Group 11
  • Feedback: Group 1
Week 11 (30.11.2020) Group 10: Knockoffs

-Controlling the false discovery rate via knockoffs
-Multi-resolution localization of causal variants across the genome

  • Students: Zoé Vaquette, Helena Obrist
  • Assistant: Domagoj
  • Questions: Group 12
  • Feedback: Group 2
Week 12 (07.12.2020) Group 11: Generalization of FWER and connections to FDR

-Generalizations of the familywise error rate
-Controlling the number of false discoveries: application to high-dimensional genomic data

  • Students: Aaron Renggli, Jakob Heimer
  • Assistant: Jinzhou
  • Questions: Group 1
  • Feedback: Group 3
Week 13 (14.12.2020) Group 12: Exploratory testing

-Multiple testing for exploratory research
-Simultaneous high-probability bounds on the false discovery proportion in structured, regression, and online settings

  • Students: Vincent Bardenhagen, Daiki Brender
  • Assistant: Jinzhou
  • Questions: Group 2
  • Feedback: Group 4

FAQ

  1. When and how will the presentations be assigned?

    We will assign the first presentation before the start of the semester. The remaining presentations will be assigned on the 21rd of September. We will send out an anonymous Doodle poll beforehand so that you can indicate your interests.

  2. How long should the presentation be?

    The total presentation time is 50 minutes. Each student should present roughly half of the time. We advise you to split the presentation in two parts of about 25 minutes each, with a 5-10 minute break in between. Please make sure to practice so that you don't go over your time! We highly encourage interaction and discussion with the audience, both during and after your talk. If this happens during your talk, this will not be counted as presentation time.

  3. Do I have to present all content of the assigned materials?

    No, you should select what is the best for your presentation. It is important to present and illustrate the main ideas, but you are free to choose how.

  4. Should I look at additional material beyond the assigned one?

    It is not required, but you are free to do so. Anything that is interesting and relevant to the topic is very welcome. You can also create some informative R simulations yourself.

  5. Should I use a certain template for my slides?

    You can use any template you like. We recommend using one of the ETH presentation templates.

  6. How should the presentation be structured?

    The main purpose of the presentation is to transmit knowledge to the audience. So, after reading the material, please take a step back and try to put yourself in the shoes of the audience: What do they already know? What would they find most interesting? What would be helpful examples? We will also provide further guidelines for the presentations during the first lecture.

  7. Do I need to bring my own laptop to present my slides?

    Ideally, yes. If you do not have a laptop, or you do not have a way of connecting to the projector, please let the assistants know in advance.

  8. Will my slides be published somewhere?

    Yes, all slides will be published on the course website after the presentation. Please make sure to respect copyright. In particular, if you include any images or tables not created by yourself in the presentation, make sure to include the source of the image/table as well.

  9. What is the role of the assistants?

    The assistant in charge for your group gives you guidance and feedback prior to your presentation. You will have a chance to meet with the assistant twice before your presentation. The first meeting will be on Friday, 1.5 weeks before your presentation (it will be Friday by default but it is possible to reschedule the meeting on mutual agreement). The second meeting will typically take place on Friday, 0.5 week before your presentation (again, rescheduling rule applies).

  10. How should I prepare for the meetings with the assistants?

    For the first meeting: You should read all material in advance, make a list of questions you have, and make a rough plan of what you would like to present (main concepts, main examples, questions you could pose to the audience to create some interaction, R-example that you could integrate, etc). For the second meeting: Your presentation should be fully prepared and should be sent to the assistants the day before. During the meeting, you will get feedback on your presentation, and you can clarify any remaining issues.

  11. What do the questions and feedback groups do during the presentation?

    The questions group studies the material that is being presented in advance. They follow the presentation extra closely and take the lead in asking questions, for example if some terms or ideas are not clearly explained. (Of course, the rest of the class should also participate actively!). After the presentation and questions, the feedback group will give constructive comments on the strengths and weaknesses of the presentation.

  12. How will COVID-19 affect the course? Do I need to be physically present?

    The course is planned to be in physical presence. However, we will allow and enable the students that are not presenting to follow the presentation on Zoom. There is always a possibility that the entire course migrates online and you will be kept up to date about it.

  13. Do I have to attend all lectures?

    Yes, attendance at all lectures is compulsory. You can attend either in person or via Zoom, but with the camera turned on. You can miss one lecture (where you do not have an assignment) without giving a reason. If you have to miss any further lecture, you must contact us immediately and have a good reason.