[R] Basic statistic (Was: (no subject))

Petr PIKAL petr.pikal at precheza.cz
Thu Nov 8 12:07:20 CET 2007


Hi

r-help-bounces at r-project.org napsal dne 07.11.2007 18:23:55:

> hello,
> 
> i am a bit of a statistical neophyte and currently trying to make some 
sense 
> of confidence intervals for correlation coefficients. i am using the 
cor.
> test() function. the documentation is quite terse and i am having 
trouble 
> tieing up the output from this function with stuff that i have read in 
the 
> literature. so, for example, i make two sequences and calculate the 
> correlation coefficient:
> 
> > x <- runif(20)
> > y <- jitter(x, amount = 0.7)
> > cor(x, y)
> [1] 0.5198252
> 
> now i want to establish that confidence i can attach to this value. from 
the 
> table i retrieved from the article "Understanding Correlation" by r. j. 
rummel
> [online] i get that the probability of a correlation coefficient of 
0.5198252 
> arising by chance from two sequences of length 20 is less than 0.01. so 
this 
> seems like i can attach some significance to the result. i still don't 
> understand where the table comes from and it only goes up as far as 
sequences 
> of length 1000. the data i am wanting to analyse has length of more than 

> 70000, so i need to calculate these confidence levels myself. i assume 
that 
> cor.test() is the way to do this. so i tried:

You shall consult some basic statistic textbooks. Some of them you can 
find in CRAN recommended literature but much is explained in output.

> 
> > cor.test(x, y, "greater", conf.level = 0.95)
> 
>         Pearson's product-moment correlation
> 
> data:  x and y 
> t = 2.5816, df = 18, p-value = 0.009405
                                 ^^^^^^^^^ 
Here is your 0.01 value getting this cor coeficient by chance

> alternative hypothesis: true correlation is greater than 0 

positive correlation

> 95 percent confidence interval:
>  0.1753340 1.0000000 

confidence interval for correlation coeficient

> sample estimates:
>       cor 
> 0.5198252 
> 
> > cor.test(x, y, "less", conf.level = 0.95)
> 
>         Pearson's product-moment correlation
> 
> data:  x and y 
> t = 2.5816, df = 18, p-value = 0.9906
> alternative hypothesis: true correlation is less than 0 

negative correlation

> 95 percent confidence interval:
>  -1.0000000  0.7509089 
> sample estimates:
>       cor 
> 0.5198252 
> 
> > cor.test(x, y, "two.sided", conf.level = 0.95)
> 
>         Pearson's product-moment correlation
> 
> data:  x and y 
> t = 2.5816, df = 18, p-value = 0.01881
> alternative hypothesis: true correlation is not equal to 0 

any type of correlation

> 95 percent confidence interval:
>  0.1003997 0.7823738 
> sample estimates:
>       cor 
> 0.5198252
> 
> i reckon that the first invocation of the function is closest to what i 
am 
> looking for. now the rest of the output from the function is a total 
mystery 
> to me. could anyone please tell me:
> 
> o what is a p-value?

Wikipedia says

In statistical hypothesis testing, the p-value is the probability of 
obtaining a result at least as extreme as a given data point, assuming the 
data point was the result of chance alone. The fact that p-values are 
based on this assumption is crucial to their correct interpretation

> o how to interpret the quoted confidence interval?
> 
> i do see that as i increase the conf.level input parameter to cov.test() 
the 
> lower bound of the confidence interval gets lower:
> 
>    0.95      ->      0.1753340 1.0000000
>    0.975      ->      0.1003997 1.0000000
>    0.995      ->      -0.04859184  1.00000000
> 
> does this mean that with 99.5% certainty the correlation coefficient 
should 
> lie in the range -0.04859184 to 1.00000000? hmmm. i am doubtful. plus 
this 
> doesn't really answer my question, which is more about what confidence i 
can 
> assign to the measured correlation coefficient (0.5198252).

Why not. Those figures are really what they seems to be. In first case the 
true correlation coeficient lies between 0.17 and 1 based on data and 
assumption of positice correlation with 95% probability. If you want to 
increase the probability for true coeficient to be in some interval you 
need to expand your interval (and if you want to be 100% sure you need to 
expand it infinitelly :-).

Regards
Petr

> 
> an alternative question would be: given two sequences and a calculated 
> correlation coefficient, with what probability could i assert that the 
> underlying processes are indeed correlated and that the calculated 
correlation
> coefficient does not simply arise by chance.
> 
> please forgive my ignorance. any help will be vastly appreciated. 
thanks!
> 
> best regards,
> andrew.
> 
> ----------------------------------------------------------------------
> Get a free email account with anti spam protection.
> http://www.bluebottle.com/tag/2
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list