[Statlist] Swiss Statistics Seminar - May 5, 2017 - Programme

Wed Apr 12 14:24:04 CEST 2017

Dear Colleagues

In a bit more than three weeks, on Friday May 5, the next Swiss Statistics Seminar will take place in Bern. 

The speakers and topics are:

14:15-15:15
Sebastian Engelke (EPFL)
An entropy-based test for multivariate threshold exceedances

15:30-16:30
Matthias Templ (ZHAW)
Creating Public-Use Synthetic Data From Complex Surveys 

16:45-17:45
Damian Kozbur (Uni Zurich)
Targeted Undersmoothing

For the abtracts see below my signature and for more details see 
www.imsv.unibe.ch/research/talks/swiss_statistics_seminars_live/index_eng.html

With kind regards and enjoy your Easter holidays,

Barbara Hellriegel
---
www.aim.uzh.ch
Board Member of the Section "Education and Research" (SSS-ER)

------------------------
Sebastian Engelke -- An entropy-based test for multivariate threshold exceedances

Abstract:
---
Many effects of climate change seem to be reflected not in the mean
temperatures, precipitation or other environmental variables, but rather
in the frequency and severity of the extreme events in the
distributional tails. Detecting such changes requires a statistical
methodology that efficiently uses the largest observations in the sample.
We propose a simple, non-parametric test that decides whether two
multivariate distributions exhibit the same tail behavior. The test is
based on the entropy, namely Kullback-Leibler divergence, between
exceedances over a high threshold of the two multivariate random
vectors. We show that such a type of divergence is closely related to
the divergence between Bernoulli random variables. We study the
properties of the test and further explore its effectiveness for finite
sample sizes. As an application we apply the method to precipitation
data where we test whether the marginal tails and/or the extremal
dependence structure have changed over time.

------------------------
Matthias Templ -- Creating Public-Use Synthetic Data From Complex Surveys  

Abstract:
---
The production of synthetic datasets has been proposed as a statistical 
disclosure control solution to generate public use files from confidential 
data. This is also a tool to create "augmented datasets" to serve as 
input for micro-simulation models, and - more generally - the synthetic 
data sets can be used for design-based simulation studies in general. 
The performance and acceptability of such a tool relies heavily on the 
quality of the synthetic data, i.e., on the statistical similarity 
between the synthetic and the true population of interest. Multiple 
approaches and tools have been developed to generate synthetic data.
These approaches can be categorized into three main groups: synthetic 
reconstruction, combinatorial optimization, and model-based generation. 
In addition, methods have been formulated to evaluate the quality of 
synthetic data.
In this presentation, the methods are not shown from the theoretical 
point of view; they are rather introduced in an applied and generally 
understandable fashion.  We focus on new concepts for the model-based 
generation of synthetic data that avoids disclosure problems. In the 
end of the presentation, we introduce simPop, an open source data 
synthesizer. simPop is a user-friendly R-package based on a modular 
object-oriented concept. It provides a highly optimized S4 class 
implementation of various methods, including calibration by iterative 
proportional fitting/updating and simulated annealing, and modeling 
or data fusion by logistic regression, regression tree methods and 
many other methods. Utility functions to deal with (age) heaping are 
implemented as well. An example is shown using real data from Official 
Statistics. The simulated data then serves as input for agent-based 
simulation and/or microsimulation or can be used as open data for 
research and teaching.

------------------------
Damian Kozbur -- Targeted Undersmoothing

Abstract:  
---
This talk describes a post-model selection inference procedure, called 
'targeted undersmoothing', designed to construct confidence sets for a 
broad class of functionals of high-dimensional statistical models. These 
include dense functionals, which may potentially depend on all elements 
of an unknown high-dimensional parameter. The proposed confidence sets 
are based on an initially selected model and two additionally selected 
models, an upper model and a lower model, which enlarge the initially 
selected model. The procedure is illustrated with two examples. The 
first example studies heterogeneous treatment effects in a direct mail 
marketing campaign, and the second example studies treatment effects of 
the Job Training Partnership Act of 1982.