[Rd] stats::line() does not produce correct Tukey line when n mod 6 is 2 or 3

GlenB glnbrntt at gmail.com
Mon May 29 06:19:37 CEST 2017


Tukey divides the points into three groups, not the x and y values
separately.

I'll try to get hold of the book for a direct quote, might take a couple of
days.



On Mon, May 29, 2017 at 8:40 AM, Duncan Murdoch <murdoch.duncan at gmail.com>
wrote:

> On 27/05/2017 9:28 PM, GlenB wrote:
>
>> Bug: stats::line() does not produce correct Tukey line when n mod 6 is 2
>> or
>> 3
>>
>> Example: line(1:9,1:9) should have intercept 0 and slope 1 but it gives
>> intercept -1 and slope 1.2
>>
>> Trying line(1:i,1:i) across a range of i makes it clear there's a cycle of
>> length 6, with four of every six correct.
>>
>> Bug has been present across many versions.
>>
>> The machine I just tried it on just now has R3.2.3:
>>
>
> If you look at the source (in src/library/stats/src/line.c), the
> explanation is clear:  the x value is chosen as the 1/6 quantile (according
> to a particular definition of quantile), and the y value is chosen as the
> median of the y values where x is less than or equal to the 1/3 quantile.
> Those are different definitions (though I think they would be
> asymptotically equivalent under pretty weak assumptions), so it's not
> surprising the x value doesn't correspond perfectly to the y value, and the
> line ends up "wrong".
>
> So is it a bug?  Well, that depends on Tukey's definition.  I don't have a
> copy of his book handy so I can't really say.  Maybe the R function is
> doing exactly what Tukey said it should, and that's not a bug.  Or maybe R
> is wrong.
>
> Duncan Murdoch
>
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list