[R] cor.test() -> p-values may be incorrect due to tie

Peter Dalgaard p.dalgaard at biostat.ku.dk
Fri Mar 19 17:30:49 CET 2004


Jan Verbesselt <Jan.Verbesselt at agr.kuleuven.ac.be> writes:

> Hi R specialists,
> 
> When testing the association between two time series the cor.test gives
> the following message...-> p-values may be incorrect due to tie
> 
> What does it mean? (it is not described in the help)

It means what it says... The p-values in the test for rho=0 is based
on the assumption that the ranks are 1:n for both variables. In the
presence of ties (multiple x or y having the same value) we calculate
a modified rho, but we still use the same formula for the p-value. 

There are really two issues: there's a nice theory that allows you to
calculate the exact p-value when ties are absent. This becomes much
harder when there are ties. However, there's also an asymptotic
approximation to a normal distribution, and I believe that that would
actually be rather easy to compute in the tied cases, but we don't do
that either. 

I don't think you have anything to worry about with the example you
provide, though.

> >  cor.test(Origi[,1],Origi[,2], alternative = c("two.sided"),method =
> c("spearman"), conf.level = 0.95)
> 
>         Spearman's rank correlation rho
> 
> data:  Origi[, 1] and Origi[, 2] 
> S = 101457, p-value = < 2.2e-16
> alternative hypothesis: true rho is not equal to 0 
> sample estimates:
>       rho 
> 0.8938577 
> 
> Warning message: 
> p-values may be incorrect due to ties in: cor.test.default(Origi[, 1],
> Origi[, 2], alternative = c("two.sided"),

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907




More information about the R-help mailing list