[R] correlation with missing values.. different answers
peter dalgaard
pdalgd at gmail.com
Mon Apr 14 15:45:57 CEST 2014
On 14 Apr 2014, at 05:02 , Paul Tanger <paul.tanger at colostate.edu> wrote:
> Thanks, I did not realize it was deleting rows! I was afraid to try
> "pairwise.complete.obs" because it said something about resulting in a
> matrix which is not "positive semi-definite" (and googling that term
> just confused me more).
It means that you can get a covariance matrix that isn't one. I.e., it may predict that some linear combination of your variables has negative variance. It may turn out not to be a problem in practice, but that sort of thing tends to worry theoreticians.
> But I ran the dataset through JMP and got the
> same answers so I think that "pairwise.complete.obs" works for what I
> want to do.
>
Actually, JMP 10 claims to be using the REML method, which is different from pairwise correlations (you can get both, so it is easy to check that they differ). I'm not sure we have the REML method coded up anywhere; the ML counterpart is in package mvnmle, and one might hope that REML isn't alll that much harder.
> On Sun, Apr 13, 2014 at 7:36 PM, arun <smartpink111 at yahoo.com> wrote:
>>
>>
>>
>> Hi,
>>
>> I think in this case, when you use "na.or.complete", all the NA rows are removed for the full dataset.
>> cor(swM[-1,1:2])
>> # Frtlty Agrclt
>> #Frtlty 1.0000000 0.3920289
>> #Agrclt 0.3920289 1.0000000
>>
>> cor(swM[-1,])[1:2,1:2]
>> #Frtlty Agrclt
>> #Frtlty 1.0000000 0.3920289
>> #Agrclt 0.3920289 1.0000000
>>
>> May be you can try with "pairwise.complete.obs"
>> cor(swM, use = "pairwise.complete.obs")
>> # Frtlty Agrclt Exmntn Eductn Cathlc Infn.M
>> #Frtlty 1.0000000 0.39202893 -0.6531492 -0.66378886 0.4723129 0.41655603
>> #Agrclt 0.3920289 1.00000000 -0.7150561 -0.65221506 0.4152007 -0.03648427
>> #Exmntn -0.6531492 -0.71505612 1.0000000 0.69921153 -0.6003402 -0.11433546
>> #Eductn -0.6637889 -0.65221506 0.6992115 1.00000000 -0.1791334 -0.09932185
>> #Cathlc 0.4723129 0.41520069 -0.6003402 -0.17913339 1.0000000 0.18503786
>> #Infn.M 0.4165560 -0.03648427 -0.1143355 -0.09932185 0.1850379 1.00000000
>> cor(swM[,1:2],use="pairwise.complete.obs")
>> # Frtlty Agrclt
>> #Frtlty 1.0000000 0.3920289
>> #Agrclt 0.3920289 1.0000000
>>
>> A.K.
>>
>> On Sunday, April 13, 2014 9:11 PM, Paul Tanger <paul.tanger at colostate.edu> wrote:
>> Hi,
>> I can't seem to figure out why this gives me different answers. Probably
>> something obvious, but I thought they would be the same.
>> This is an minimal example from the help page of cor() :
>>
>>> ## swM := "swiss" with 3 "missing"s :
>>> swM <- swiss
>>> colnames(swM) <- abbreviate(colnames(swiss), min=6)
>>> swM[1,2] <- swM[7,3] <- swM[25,5] <- NA # create 3 "missing"
>>> cor(swM, use = "na.or.complete")
>> Frtlty Agrclt Exmntn Eductn Cathlc Infn.M
>> Frtlty 1.0000000 0.37821953 -0.6548306 -0.67421581 0.4772298 0.38781500
>> Agrclt 0.3782195 1.00000000 -0.7127078 -0.64337782 0.4014837 -0.07168223
>> Exmntn -0.6548306 -0.71270778 1.0000000 0.69776906 -0.6079436 -0.10710047
>> Eductn -0.6742158 -0.64337782 0.6977691 1.00000000 -0.1701445 -0.08343279
>> Cathlc 0.4772298 0.40148365 -0.6079436 -0.17014449 1.0000000 0.17221594
>> Infn.M 0.3878150 -0.07168223 -0.1071005 -0.08343279 0.1722159 1.00000000
>>> # why isn't this the same?
>>> cor(swM[,c(1:2)], use = "na.or.complete")
>> Frtlty Agrclt
>> Frtlty 1.0000000 0.3920289
>> Agrclt 0.3920289 1.0000000
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
More information about the R-help
mailing list