[Rd] Incorrect Kendall's tau for ordered variables (PR#14207)
Prof Brian Ripley
ripley at stats.ox.ac.uk
Mon Feb 8 18:09:10 CET 2010
On Mon, 8 Feb 2010, Peter Dalgaard wrote:
> msa at biostat.mgh.harvard.edu wrote:
>> Full_Name: Marek Ancukiewicz
>> Version: 2.10.1
>> OS: Linux
>> Submission from: (NULL) (74.0.49.2)
>>
>>
>> Both cor() and cor.test() incorrectly handle ordered variables with
>> method="kendall", cor() incorrectly handles ordered variables for
>> method="spearman" (method="person" always works correctly, while
>> method="spearman" works for cor.test, but not for cor()).
>>
>> In erroneous calculations these functions ignore the inherent ordering
>> of the ordered variable (e.g., '9'<'10'<'11') and instead seem to assume
>> an alphabetic ordering ('10'<'11'<'9').
>
> Strictly speaking, not a bug, since the documentation has
>
> x: a numeric vector, matrix or data frame.
>
> respectively
>
> x, y: numeric vectors of data values. ‘x’ and ‘y’ must have the
> same length.
>
> so noone ever claimed that class "ordered" variables should work.
>
> However, the root cause is that as.vector on a factor variable (ordered
> or not) converts it to a character vector, hence
>
>> rank(as.vector(as.ordered(9:11)))
> [1] 3 1 2
>
> Looks like a simple fix would be to use as.vector(x, "numeric") inside
> the definition of cor().
A fix for that particular case: the problem is that relies on the
underlying representation. I think a better fix would be to do either
of
- test for numeric and throw an error otherwise, or
- use xtfrm, which has the advantage of being more general and
allowing methods to be written (S3 or S4 methods in R-devel).
>
>
>>> cor(9:11,1:3,method="k")
>> [1] 1
>>> cor(as.ordered(9:11),1:3,method="k")
>> [1] -0.3333333
>>> cor.test(as.ordered(9:11),1:3,method="k")
>>
>> Kendall's rank correlation tau
>>
>> data: as.ordered(9:11) and 1:3
>> T = 1, p-value = 1
>> alternative hypothesis: true tau is not equal to 0
>> sample estimates:
>> tau
>> -0.3333333
>>
>>> cor(9:11,1:3,method="s")
>> [1] 1
>>> cor(as.ordered(9:11),1:3,method="s")
>> [1] -0.5
>>> cor.test(as.ordered(9:11),1:3,method="s")
>>
>> Spearman's rank correlation rho
>>
>> data: as.ordered(9:11) and 1:3
>> S = 0, p-value = 0.3333
>> alternative hypothesis: true rho is not equal to 0
>> sample estimates:
>> rho
>> 1
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
> --
> O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
> c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
> (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list