[Rd] stats::line() does not produce correct Tukey line when n mod 6 is 2 or 3
Martin Maechler
maechler at stat.math.ethz.ch
Wed May 31 22:00:18 CEST 2017
>>>>> Serguei Sokol <sokol at insa-toulouse.fr>
>>>>> on Wed, 31 May 2017 18:46:34 +0200 writes:
> Le 31/05/2017 à 17:30, Serguei Sokol a écrit :
>>
>> More thorough reading revealed that I have overlooked this phrase in the
>> line's doc: "left and right /thirds/ of the data" (emphasis is mine).
> Oops. I have read the first ref returned by google and it happened to be
> tibco's doc, not the R's one. The layout is very similar hence my mistake.
> The latter does not mention "thirds" but ...
> Anyway, here is a new line's patch which still gives a result slightly different
> form MMline(). The slope is the same but not the intercept.
> What are the exact terms for intercept calculation that should be implemented?
> Serguei.
Sorry Serguei, I have new version of line.c since yesterday,
and will not be disturbed anymore.
Note that I *did* give the litterature, and it seems most
discussants don't have paper books in physical libraries anymore;
In this case, interestingly, you need one of those I think -
almost everything I found online did not have the exact details.
Peter Dalgaard definitely was right that Tukey did not use
quantiles at all, and notably did *not* define the three groups
via {i; x_i <= x_L} and {i; x_i >= X_R} which (as I think
you noticed) may make the groups quite unbalanced in case of duplicated x's.
But then, for now I had decided to fix the bug (namely computing
the x-medians wrongly as you diagnosed correctly(!) -- but your
first 2 patches only fixed partly) *and* go at least one step in
the direction of Tukey's original, namely by allowing iteration via a new 'iter' argument.
I have also updated the help page to document what line() has
been computing all these years {apart from the bug which
typically shows for non-equidistant x[]}.
We could also consider to eventually add a new 'method = <string>'
argument to line() one version of which would continue to
compute the current solution, another would compute the one
corresponding to Velleman & Hoaglin (1981)'s FORTRAN
implementation (which had to be corrected for some infinite-loop
cases!)... not in the close future though
Given all this discussions here, I think I should commit what I
currently have ASAP.
Martin
> x[DELETED ATTACHMENT line.c.patch2, plain text]
More information about the R-devel
mailing list