It is a must in many areas of science to describe any sample of data by "mean plus-minus standard deviation", like 83 ± 52 . Such a notation implicitly suggests a symmetric distribution around 83, with a spread measured by the standard deviaton 52. Since many believe that the normal distribution is usually an adequate description of data, the implication would be that about 95 % of the data could be found within an interval given by 83 ± 2*52 , extending from -21 to 187.
In many applications, negative values are impossible. Therefore, the foregoing description is clearly inadequate. Even in less extreme cases, this common way to characterize the "plausible range" of data is inappropriate. Data as observed is, as just mentioned, often restricted to positive values, and their distribution is usually skewed. Therefore, large deviations from the mean to the positive side are more plausible than equally large deviations towards zero. The description of the spread should reflect this property -- but the "plus-minus" convention does not.
It is an empirical fact that the distributions of many datasets are well approximated by the log-normal distribution. For such datasets, a suitable description is of the form
μ* ^{x}/ σ*
= geometric mean ^{x}/
multiplicative (or geometric) standard deviation
= interval from
μ* / σ* to μ* x σ*
We therefore recommend the use of this characterization in scientific papers.
Some reviewers may be reluctant to accept this new way to describe data. Please refer them to this page. Additional hints are welcome.
To be clear, the plus-minus notation is often adequate in statistics. When the precision of estimated parameters is given, the form estimate plus-minus standard error is generally appropriate, since the distribution of the estimators is often approximately normal -- and not lognormal!
A different case consists of transformed data, such as the logarithms of
the observed raw values. Note that in some fields of science, it is
commonplace to report data in log transformed form -- and scientists will not
always be aware of it: Acidity in chemistry is given in pH, and most
energies are measured in "decibel", dB. The pH value is defined as
log (to base 10) of the concentration of H_{3}^{+}O,
and dB equals 10 times the log of an energy in original units.
The distribution of a logarithmized sample may well be symmetric and
approximately normal.
For such data, the plus-minus description is appropriate.
A solution for other fields of science might be to introduce such
conventions, too.
As long as data is reported in original
units, the characterization with ^{x}/ is clearly preferable.
If the spread is small as compared to the mean (or median), the plus-minus notation given an adequate characterization of the data, as does the ^{x}/ notation. In fact, both are then very similar. In fact, 83 ^{x}/ 1.06 means almost the same as "83 ± 6 %" or 83 ± 5. Whereas the choice in such cases may be arbitrary, we argue to maintain the ^{x}/ notation for the sake of consistency with cases of larger spread.