[R] "chi-square" | "chi-squared" | "chi squared" | "chi square" ?

Martin Maechler m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Fri Oct 18 14:51:03 CEST 2019


As it's Friday ..

and I also really want to clean up help files and similar R documents,
both in R's own sources and in my new 'DPQ' CRAN package :

As a trained mathematician, I'm uneasy if a thing has
several easily confusable names, .. but as somewhat
humanistically educated person, I know that natural languages,
English in this case, are much more flexible than computer
languages or math... 

Anyway, back to the question(s) .. which I had asked myself a
couple of months ago, and already remained slightly undecided:

The 0-th (meta-)question of course is

  0. Is it worth using only one written form for the
     χ² - distribution, e.g. "everywhere" in R?

The answer is not obvious, as already the first few words of the
(English) Wikipedia clearly convey:

The URL is  https://en.wikipedia.org/wiki/Chi-squared_distribution
and the main title therefore also
    "Chi-squared distribution"

Then it reads 

> This article is about the mathematics of the chi-squared
> distribution. For its uses in statistics, see chi-squared
> test. For the music [...]

> In probability theory and statistics, the chi-square
> distribution (also chi-squared or χ2-distribution) with k
> degrees of freedom is the distribution of a sum of the squares
> of k independent standard normal random variables.

> The chi-square distribution is a special case of the gamma
> distribution and is one of the most widely used probability
> distributions in inferential statistics, notably in hypothesis
> testing [........]
> [........]

So, in title and 1st paragraph its "chi-squared", but then
everywhere(?) the text used "chi-square".

Undoubtedly, Wilson & Hilferty (1931) has been an important
paper and they use "Chi-square" in the title;
also  Johnson, Kotz & Balakrishnan (1995)
see R's help page ?pchisq use  "Chi-square" in the title of
chapter 18 and then, diplomatically for chapter 29,
 "Noncentral χ²-Distributions" as title.

So it seems, that historically and using prestigious sources,
"chi-square" to dominate (notably if we do not count "χ²" as an
alternative).

Things look a bit different when I study R's sources; on one
hand, I find all 4 forms (s.Subject); then in the "R source
history", I see

  $ svn log -c11342
  ------------------------------------------------------------------------
  r11342 | <....> | 2000-11-14 ...

  Use `chi-squared'.
  ------------------------------------------------------------------------

which changed 16 (if I counted correctly) cases of 'chi-square' to 'chi-squared'.

I have not found any R-core internal (or public) reasoning about
that change, but had kept it in mind and often worked along that "goal".

As a consequence, "statistically" speaking, much of R's own use has been
standardized to use "chi-squared"; but as I mentioned, I still
find all  4  variants even in "R base" package help files
(which of course I now could quite quickly change  (using Emacs M-x grep, plus a script);
but

... "as it is Friday" ... I'm interested to hear what others
think, notably if you are native English (or "American" ;-)
speaking and/or have some extra good knowledge on such
matters...

Martin Maechler
ETH Zurich



More information about the R-help mailing list