[RsR] The descriptive statistics/EDA group in Treviso

Christian Hennig chr|@h @end|ng |rom @t@t@@uc|@@c@uk
Tue Dec 6 22:01:13 CET 2005


Hi,

I wanted to post the minutes of the descriptive statistics/EDA group in 
Treviso but unfortunately I lost my material from this group somewhere 
between Treviso, Cyprus and London. Sorry for that. (Anybody else of 
the group is of course invited to add or correct something.)

Here are some things that I remind.

We felt that we were in a difficult position with our group's topic 
because the field of descriptive statistics and EDA is very broad and though
some of us have written code that could be seen to belong to this topic, 
nobody of the Treviso group members is currently involved in any kind of 
project concerning the implementation or unification of implementations of 
any existing methods - we just implemented our own stuff to make it 
available.

Therefore our group didn't start any continuing projects.
Instead, we discussed some basic issues and tried to work 
through some existing functions and documentation.

First we tried to demarcate our topic "robust descriptive statistics and 
EDA" somehow. Generally, more or less every technique in 
statistics can be seen as descriptive or explorative as long as it is not 
used with a model-based background. This comprises more or less simple 
summary statistics, smoothing methods, and the whole field of data visualization.
Related to robust statistics, model-free outlier identification rules and 
sensitivity curves can also be included.

It seems that some scientists use the term "robust" only in
connection with model-based settings while they prefer the term
"resistant" for descriptive/EDA techniques, which refers to the change of 
the results of a method caused by small changes in the data. Concepts such 
as minimax asymptotic variance are of course not of interest in 
non-model-based settings. It is quite difficult to decide whether some 
methods (especially graphical ones) are qualified to be called "resistant"
or "robust".

We then excluded projection techniques such as PCA, which are usually 
discussed under the title "multivariate analysis" from what we tried to 
discuss.

Quite few resistant techniques are implemented, and they are 
difficult to find. Descriptive statistics and/or EDA are seemingly not 
official keywords (though "robust", "smooth" and something like "graphics"
are). For example, only a very small part of the techniques suggested in 
Tukey and co-author's legendary books seem to be implemented.

There is no unification of input/output procedures at all, and 
standard techniques like the boxplot do not use plot/summary-methods 
and the like. However, given the variety in the whole field, we agreed 
that full unification is not desirable.

The best thing to do is perhaps to give some non-binding recommendations 
such as
* read documentation and code of existing related methods first (this 
refers to methods serving the same purpose such as smoothing methods, 
projection methods, visualization of univariate data etc.) and try to make 
input and output similar,
* use print/plot/summary-methods following the standard converntions 
whenever possible,
* if your method has an in-built outlier identification, give out an 
outlier identifier or score vector,
* use "robust" in the keywords (and hope that eda or descriptive will be 
introduced as official keyword, if it applies).

Not too original, I'm afriad, but at least we kept in mind some valuable
stimulations for our own next packages.

All the best,
Christian Hennig



*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
chrish using stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche




More information about the R-SIG-Robust mailing list