[R] Partial correlations and p-values
Charles C. Berry
cberry at tajo.ucsd.edu
Sat Dec 5 18:50:50 CET 2009
On Sat, 5 Dec 2009, Juliet Hannah wrote:
> Your R code looks correct.
There are a couple of hiccups.
First the degrees of freedom for the partial correlation would be wrong
even if there was no missing data.
>
> Because this is a straightforward calculation, I would be surprised if there
> were any differences with SPSS.
There are differences. SPSS seems to use the correlation matrix computed
with a pairwise present method and compute partial correlations from that.
Following
http://wiki.r-project.org/rwiki/doku.php?id=tips:data-matrices:part_corr
> R.pp <- cor(cbind(x,y,z1,z2),use='pair')
> R.comp <- cor(cbind(x,y,z1,z2),use='complete')
> Rinv <- solve(R.pp)
> D <- diag(1 / sqrt(diag(Rinv)))
> P <- -D %*% Rinv %*% D
> P[1,2]
[1] 0.4596122
> Rinv <- solve(R.comp)
> D <- diag(1 / sqrt(diag(Rinv)))
> P <- -D %*% Rinv %*% D
> P[1,2]
[1] 0.657214
The pairwise present value seems to be what SPSS is reporting.
The complete cases values is nearly (but not the same as) what you got.
A real issue here is how to usefully compute and test partial
correlations in the presence of missing data. If you want to persue that,
I would suggest opening a new thread with a subject line like 'partial
correlations with missing observations'
HTH,
Chuck
It may be worthwhile to check
> if SPSS gives partial correlations or semipartial correlations. For example,
> if you take the correlation between
>
> py <- resid(lm(y ~ z1 + z2,data=mydat2))
>
> and
>
> x
>
> where mydat2 has missing values removed, you get 0.47.
>
> On Tue, Dec 1, 2009 at 8:24 PM, dadrivr <dadrivr at gmail.com> wrote:
>>
>> I am trying to calculate a partial correlation and p-values. Unfortunately,
>> the results in R are different than what SPSS gives.
>>
>> Here is an example in R (calculating the partial correlation of x and y,
>> controlling for z1 and z2):
>>
>> x <- c(1,20,14,30,9,4,8)
>> y <- c(5,6,7,9,NA,10,6)
>> z1 <- c(13,8,16,14,26,13,20)
>> z2 <- c(12,NA,2,5,8,16,13)
>> fmx <- lm(x ~ z1 + z2, na.action = na.exclude)
>> fmy <- lm(y ~ z1 + z2, na.action = na.exclude)
>> yres <- resid(fmy)
>> xres <- resid(fmx)
>> cor(xres, yres, use = "p")
>> ct <- cor.test(xres, yres)
>> ct$estimate
>> ct$p.value
>>
>> R give me:
>> r = .65, p = .23
>>
>> However, SPSS calculates:
>> r = .46, p = .70
>>
>> I think something may be different with R's handling of missing data, as
>> when I replace the NA's with values, R and SPSS give the same r-values,
>> albeit different p-values still. I am doing pairwise case exclusion in both
>> R and SPSS. Any ideas why I'm getting different values? Is something wrong
>> with my formula in R? Any help would be greatly appreciated. Thanks!
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Charles C. Berry (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
More information about the R-help
mailing list