[R] Needed: Beta Testers and Summer Camp Students
claudia.beleites at ipht-jena.de
Tue Apr 23 20:47:41 CEST 2013
I skimmed over the pdf.
I have comments on the discusssion about centering. I'm from a
completely different field (chemometrics). Of course, I also have to
explain centering. However, the argumentation I use is somewhat
different from the one you give in your pdf.
One argument I have in favour of (mean) centering is numerical
stability, depending on the algorithm of course.
I generally recommend that if data is centered, there should be an
argument why the *chosen* center is *meaningful*, emphasizing that
centering actually involves decisions, and that the center can have a
While I agree that a centered model with the center chosen without any
thought about its meaning is "exactly the same in every important way"
compared to not centering, I disagree with the generality of your
A "natural" center of the data may exist. And in this case, using this
appropriate center will ease the interpretation. Examples:
- In analytical chemistry / chemometrics e.g. we can often use blanks
(samples without analyte) as coordinate origin. Centering to the
blank removes the influence of some parts of the instrumentation,
like sample holders, cuvettes, etc.
- Many of our samples (sample in the meaning of physical specimen) have
a so-called matrix (a common composition/substance in which different
other substances/things are observed), or is measured in a solvent.
- I also work with biological specimen. There we often have controls
(either control specimen/patients or for example normal tissue [vs.
diseased tissues]) which are another type of "natural" coordinate
- I can even imagine problems where mean centering is meaningful:
if the problem involves modeling properties that are deviations from a
mean (I'm thinking of process analytics). However, mean centering
will always need careful attention about the sampling procedure.
Looking from the opposite point of view, some problems of *mean*
centering become apparent. If the data comes from different groups, the
mean may not be meaningful (I once heard a biologist arguing that the
average human has one ovary and one testicle - this gets your audience
awake and usually convinces immediately). And the mean may be
influenced by the different proportions of the groups in your data.
Which is what you do *not* want: what you want is a stable center.
Institute of Photonic Technology
email: claudia.beleites at ipht-jena.de
phone: +49 3641 206-133
fax: +49 2641 206-399
More information about the R-help