[R] correlation with missing values.. different answers

Paul Tanger paul.tanger at colostate.edu
Mon Apr 14 05:02:27 CEST 2014


Thanks, I did not realize it was deleting rows!  I was afraid to try
"pairwise.complete.obs" because it said something about resulting in a
matrix which is not "positive semi-definite" (and googling that term
just confused me more).  But I ran the dataset through JMP and got the
same answers so I think that "pairwise.complete.obs" works for what I
want to do.

On Sun, Apr 13, 2014 at 7:36 PM, arun <smartpink111 at yahoo.com> wrote:
>
>
>
> Hi,
>
> I think in this case, when you use "na.or.complete", all the NA rows are removed for the full dataset.
> cor(swM[-1,1:2])
> #          Frtlty    Agrclt
>  #Frtlty 1.0000000 0.3920289
> #Agrclt 0.3920289 1.0000000
>
> cor(swM[-1,])[1:2,1:2]
> #Frtlty    Agrclt
> #Frtlty 1.0000000 0.3920289
> #Agrclt 0.3920289 1.0000000
>
> May be you can try with "pairwise.complete.obs"
> cor(swM, use = "pairwise.complete.obs")
> #           Frtlty      Agrclt     Exmntn      Eductn     Cathlc      Infn.M
> #Frtlty  1.0000000  0.39202893 -0.6531492 -0.66378886  0.4723129  0.41655603
> #Agrclt  0.3920289  1.00000000 -0.7150561 -0.65221506  0.4152007 -0.03648427
> #Exmntn -0.6531492 -0.71505612  1.0000000  0.69921153 -0.6003402 -0.11433546
>  #Eductn -0.6637889 -0.65221506  0.6992115  1.00000000 -0.1791334 -0.09932185
>  #Cathlc  0.4723129  0.41520069 -0.6003402 -0.17913339  1.0000000  0.18503786
>  #Infn.M  0.4165560 -0.03648427 -0.1143355 -0.09932185  0.1850379  1.00000000
>  cor(swM[,1:2],use="pairwise.complete.obs")
> #          Frtlty    Agrclt
> #Frtlty 1.0000000 0.3920289
> #Agrclt 0.3920289 1.0000000
>
> A.K.
>
> On Sunday, April 13, 2014 9:11 PM, Paul Tanger <paul.tanger at colostate.edu> wrote:
> Hi,
> I can't seem to figure out why this gives me different answers.  Probably
> something obvious, but I thought they would be the same.
> This is an minimal example from the help page of cor() :
>
>> ## swM := "swiss" with  3 "missing"s :
>> swM <- swiss
>> colnames(swM) <- abbreviate(colnames(swiss), min=6)
>> swM[1,2] <- swM[7,3] <- swM[25,5] <- NA # create 3 "missing"
>> cor(swM, use = "na.or.complete")
>            Frtlty      Agrclt     Exmntn      Eductn     Cathlc      Infn.M
> Frtlty  1.0000000  0.37821953 -0.6548306 -0.67421581  0.4772298  0.38781500
> Agrclt  0.3782195  1.00000000 -0.7127078 -0.64337782  0.4014837 -0.07168223
> Exmntn -0.6548306 -0.71270778  1.0000000  0.69776906 -0.6079436 -0.10710047
> Eductn -0.6742158 -0.64337782  0.6977691  1.00000000 -0.1701445 -0.08343279
> Cathlc  0.4772298  0.40148365 -0.6079436 -0.17014449  1.0000000  0.17221594
> Infn.M  0.3878150 -0.07168223 -0.1071005 -0.08343279  0.1722159  1.00000000
>> # why isn't this the same?
>> cor(swM[,c(1:2)], use = "na.or.complete")
>           Frtlty    Agrclt
> Frtlty 1.0000000 0.3920289
> Agrclt 0.3920289 1.0000000
>
>     [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list