[R] correlation matricies: getting p-values?
Bill Venables
William.Venables at cmis.CSIRO.AU
Tue Jan 4 06:05:39 CET 2000
> I have to admit that I'm at a bit of a loss here; any pointers would be
> greatly appreciated.
>
> I've been making correlation matricies from some of my datasets, and
> have been instructed to get the probability values for each of these
> correlations.
To be precise I take it this means the significance probability,
that is the chance of getting a value of the correlation as far
from zero in absolute value or more so as the one you got...
>
> I've checked the online help for info on both the cor and cov functions,
> but I was unable to find any relevant info on finding how to obtain
> these probability values.
The connexion between r and t is known to be
t = r*sqrt(n - 2)/(1 - r^2)
that is, F = t^2 = r^2*(n-2)/(1 - r^2) ~ F(1, n-2)
This allows you to find the probabilities in a line or two of code.
Here is the calculation as a function that puts the correlations
below the diagonal and the significance probabilities above.
cor.prob <- function(X, dfr = nrow(X) - 2) {
R <- cor(X)
above <- row(R) < col(R)
r2 <- R[above]^2
Fstat <- r2 * dfr / (1 - r2)
R[above] <- 1 - pf(Fstat, 1, dfr)
R
}
> X <- matrix(rnorm(1000), 200, 5)
> cor.prob(X)
[,1] [,2] [,3] [,4] [,5]
[1,] 1.000000 0.359702 0.872159 0.296850 0.07346
[2,] -0.065106 1.000000 0.111743 0.569198 0.41012
[3,] -0.011450 0.112807 1.000000 0.386409 0.47407
[4,] 0.074129 0.040488 0.061574 1.000000 0.40357
[5,] 0.126853 -0.058560 0.050906 -0.059382 1.00000
You can spot-check the results by regression, for example:
> X <- as.data.frame(X)
> names(X)
[1] "V1" "V2" "V3" "V4" "V5"
> summary(lm(V2 ~ V1, X))
....
Residual standard error: 1 on 198 degrees of freedom
Multiple R-Squared: 0.00424, Adjusted R-squared: -0.00079
F-statistic: 0.843 on 1 and 198 degrees of freedom, p-value: 0.36
-------
The final p-value compares with 0.359702 in the 1,2 position of
the matrix.
This assumes you have a common degrees of freedom (n-2) for all
correlations in the table. If you are using something like
"pairwise deletion" of missing values you have a bit more work to
do (and I hope your matrix comes out as positive definite - there
is no guarantee it will).
--
-----------------------------------------------------------------
Bill Venables, Statistician, CMIS Environmetrics Project.
Physical address: Postal address:
CSIRO Marine Laboratories, PO Box 120,
233 Middle St, Cleveland, Queensland Cleveland, Qld, 4163
AUSTRALIA AUSTRALIA
Telephone: +61 7 3826 7251 Email: Bill.Venables at cmis.csiro.au
Fax: +61 7 3826 7304
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list