[R] Use of geometric mean .. in good data analysis
Martin Maechler
m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Mon Jan 22 18:18:36 CET 2024
>>>>> Rich Shepard
>>>>> on Mon, 22 Jan 2024 07:45:31 -0800 (PST) writes:
> A statistical question, not specific to R. I'm asking for
> a pointer for a source of definitive descriptions of what
> types of data are best summarized by the arithmetic,
> geometric, and harmonic means.
In spite of off-topic:
I think it is a good question, not really only about
geo-chemistry, but about statistics in applied sciences (and
engineering for that matter).
Something I sure good applied statisticians in the 1980's and
1990's would all know the answer of :
To use the geometric mean instead of the arithmetic mean
is basically *equivalent* to first log-transform the data
and then work with that transformed data:
Not just for computing average, but for more relevant modelling,
inference, etc.
John W Tukey (and several other of the grands of the time)
had the log transform among the "First aid transformations":
If the data for a continuous variable must all be positive it is
also typically the case that the distribution is considerably
skewed to the right.
In such a case behave as a good human who sees another human in
health distress: apply First Aid -- do the things you learned to
do quickly without too much thought, because things must happen
fast ---to hopefully save the other's life.
Here: Do log transform all such variables with further ado,
and only afterwards start your (exploratory and more) data analysis.
Now, mean(log(y)) = log(geometricmean(y)),
where mean() is the arithmetic mean as in R
{mathematically; on the computer you need all.equal(), not '==' !!}
I.e., according to Tukey and all the other experienced applied
statisticians of the past, the geometric mean is the "best thing"
to do for such positive right-skewed data in the same sense
that the log-transform is the best "a priori" transformation for
such data -- with the one advantage even that you need to fiddle
with zeroes when log-transforming, whereas the geometric mean
works already for zeroes.
Martin
> As an aquatic ecologist I see regulators apply the
> geometric mean to geochemical concentrations rather than
> using the arithmetic mean. I want to know whether the
> geometric mean of a set of chemical concentrations (e.g.,
> in mg/L) is an appropriate representation of the expected
> value. If not, I want to explain this to non-technical
> decision-makers; if so, I want to understand why my
> assumption is wrong.
> TIA,
> Rich
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and
> more, see https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide
> commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list