[R] scan seems to modify the data

DJNordlund@aol.com DJNordlund at aol.com
Wed Mar 31 23:55:57 CEST 2004


Stéphane,

in the example below which you are concerned about, the large correlation you 
see is not a result of the small variance, but rather the 3 random numbers 
you generated just happened to have the same rank ordering as the magnitudes of 
the three coefficients you were correlating them with.  

Just try your example again but repeat your correlation command

cor(x[1,],runif(3))

several times in succession.  You will probably see correlations ranging from 
large and positive to large (in absolute value) and negative to everywhere in 
between.

Dan Nordlund

------------------Original message----------------------
In a message dated 3/31/2004 10:53:03 AM Pacific Standard Time, 
dray at biomserv.univ-lyon1.fr writes:

>At 13:34 31/03/2004, Prof Brian Ripley wrote:
>
>>Take a look at formatReal.  scientific thinks 0.251 has 17 digits and
>>0.255 has 3.  It really doesn't make any sense to ask for more precision
>>than you have (.Machine$double.eps) and you do often get spurious
>>errors if you attempt to do so.  So 15 digits is normally safe, but no
>>more.
>>
>>Note that there are decimal -> binary -> decimal conversions and you
>>can't say which one introduced the small changes.
>
>I completely agree with you. My problem arise when I try to compute a 
>correlation. One of the variable seems to have equal values but it does 
>not. Hence, it has a very low variance and so when I try to compute the 
>correlation with another variable, this correlation is very high. I wonder 
>if it would not be good to introduce a tolerance threshold. Is it 
>meaningful to produce correlation when a variance is very low ?
>See the example below :
>
>> essai=matrix(c(0.266,.234,.005,.481,.1,.009,.4,.155,.255,.2,.34,.43),4,3)
>> essai2=sweep(essai,2,apply(essai,2,sum),"/")
>> x=coef(lm(essai2~scale(runif(4))))
>> x
>                       [,1]      [,2]       [,3]
>(Intercept)     0.25000000 0.2500000 0.25000000
>scale(runif(4)) 0.05307906 0.1330111 0.06936634
>> cor(x[1,],runif(3))
>[1] 0.932772
>> var(x)
>            [,1]        [,2]       [,3]
>[1,] 0.01938893 0.011518783 0.01778528
>[2,] 0.01151878 0.006843202 0.01056607
>[3,] 0.01778528 0.010566067 0.01631426
>> var(x[1,])
>[1] 1.92593e-33




More information about the R-help mailing list