[R] correlation with missing values.. different answers
Paul Tanger
paul.tanger at colostate.edu
Mon Apr 14 05:02:27 CEST 2014
Thanks, I did not realize it was deleting rows! I was afraid to try
"pairwise.complete.obs" because it said something about resulting in a
matrix which is not "positive semi-definite" (and googling that term
just confused me more). But I ran the dataset through JMP and got the
same answers so I think that "pairwise.complete.obs" works for what I
want to do.
On Sun, Apr 13, 2014 at 7:36 PM, arun <smartpink111 at yahoo.com> wrote:
>
>
>
> Hi,
>
> I think in this case, when you use "na.or.complete", all the NA rows are removed for the full dataset.
> cor(swM[-1,1:2])
> # Frtlty Agrclt
> #Frtlty 1.0000000 0.3920289
> #Agrclt 0.3920289 1.0000000
>
> cor(swM[-1,])[1:2,1:2]
> #Frtlty Agrclt
> #Frtlty 1.0000000 0.3920289
> #Agrclt 0.3920289 1.0000000
>
> May be you can try with "pairwise.complete.obs"
> cor(swM, use = "pairwise.complete.obs")
> # Frtlty Agrclt Exmntn Eductn Cathlc Infn.M
> #Frtlty 1.0000000 0.39202893 -0.6531492 -0.66378886 0.4723129 0.41655603
> #Agrclt 0.3920289 1.00000000 -0.7150561 -0.65221506 0.4152007 -0.03648427
> #Exmntn -0.6531492 -0.71505612 1.0000000 0.69921153 -0.6003402 -0.11433546
> #Eductn -0.6637889 -0.65221506 0.6992115 1.00000000 -0.1791334 -0.09932185
> #Cathlc 0.4723129 0.41520069 -0.6003402 -0.17913339 1.0000000 0.18503786
> #Infn.M 0.4165560 -0.03648427 -0.1143355 -0.09932185 0.1850379 1.00000000
> cor(swM[,1:2],use="pairwise.complete.obs")
> # Frtlty Agrclt
> #Frtlty 1.0000000 0.3920289
> #Agrclt 0.3920289 1.0000000
>
> A.K.
>
> On Sunday, April 13, 2014 9:11 PM, Paul Tanger <paul.tanger at colostate.edu> wrote:
> Hi,
> I can't seem to figure out why this gives me different answers. Probably
> something obvious, but I thought they would be the same.
> This is an minimal example from the help page of cor() :
>
>> ## swM := "swiss" with 3 "missing"s :
>> swM <- swiss
>> colnames(swM) <- abbreviate(colnames(swiss), min=6)
>> swM[1,2] <- swM[7,3] <- swM[25,5] <- NA # create 3 "missing"
>> cor(swM, use = "na.or.complete")
> Frtlty Agrclt Exmntn Eductn Cathlc Infn.M
> Frtlty 1.0000000 0.37821953 -0.6548306 -0.67421581 0.4772298 0.38781500
> Agrclt 0.3782195 1.00000000 -0.7127078 -0.64337782 0.4014837 -0.07168223
> Exmntn -0.6548306 -0.71270778 1.0000000 0.69776906 -0.6079436 -0.10710047
> Eductn -0.6742158 -0.64337782 0.6977691 1.00000000 -0.1701445 -0.08343279
> Cathlc 0.4772298 0.40148365 -0.6079436 -0.17014449 1.0000000 0.17221594
> Infn.M 0.3878150 -0.07168223 -0.1071005 -0.08343279 0.1722159 1.00000000
>> # why isn't this the same?
>> cor(swM[,c(1:2)], use = "na.or.complete")
> Frtlty Agrclt
> Frtlty 1.0000000 0.3920289
> Agrclt 0.3920289 1.0000000
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list