[R] why the same values cannot be judged to be the same in R

Fri Nov 13 04:52:18 CET 2009

On Nov 12, 2009, at 10:04 PM, rusers.sh wrote:

> Hi Rusers,
>  I found sometimes that the same values cannot be judged to be the  
> same in
> R. Anybody knows the probelm? I think i ignored some minor detail.  
> Thanks.
> Here is the example.
> ############
> data1<-matrix(data=c(1,1.2,1.3,"3/23/2004",1,1.5,2.3,"3/22/2004", 
> 2,0.2,3.3,"4/23/2004",3,1.5,1.3,"5/22/2004"),nrow=4,ncol=4,byrow=TRUE)
> data1<-data.frame(data1);names(data1)<-c("areaid","x","y","date")
> data2<-matrix(data=c(1,1.22,1.32,1,  1.53,  2.34,1,  1.21,  1.37,1,   
> 1.52,
> 2.35,2,  0.21,  3.33,2,  0.23,  3.35,3,  1.57, 1.31,3,  1.59,
> 1.33),nrow=8,ncol=3,byrow=TRUE)
> data2<-data.frame(data2);names(data2)<-c("areaid","x1","y1")
> data2$tag<-0
> data1_1<-data1[1,]
> data2_1<-data2[data2$areaid==data1_1$areaid & data2$tag==0,]
> ran_1<-sample(c(1:length(data2_1)),2, replace = FALSE)
> data2_1<-data2_1[ran_1,]
> data_1<-merge(data1_1,data2_1)
> #data_1
> #       areaid  x   y      date   x1   y1  tag
> #    1      1 1.2 1.3 3/23/2004 1.52 2.35   0
> #    2      1 1.2 1.3 3/23/2004 1.53 2.34   0
> data2[data_1$x1==data2$x1 & data_1$y1==data2$y1 & data2$tag==0,]
> #data2[c(data_1$x1==data2$x1 & data_1$y1==data2$y1 & data2$tag==0),]

I suspect your use of multiple logical tests across two different  
dataframes with different numbers of rows may not be giving you the  
results you expect:
 > data_1$x1==data2$x1
[1] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
 > data_1$y1==data2$y1
[1] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
 > data2$tag==0
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  # So that test is doing  
nothing

 > data2$y1
[1] 1.32 2.34 1.37 2.35 3.33 3.35 1.31 1.33

 > data_1$y1
[1] 2.34 2.35

Perhaps you ought to look at the dyadic function %in% as a substitute  
for "=="

?"%in%"

> #    areaid   x1    y1   tag
> #2      1    1.53  2.34   0
>  There should be two same observations between data_1 and data2, but  
> here
> only one was identified.

If the above commentary did not clarify what was happening, then can  
you explain in more detail what (and why) you were doing at each step  
and what you expected to get in the end. The phrase "two same  
observations between data_1 and data2" does not seem to be an  
unambiguous expression. I cannot think of any natural ordering to  
these rows of cases, so the notion of "between" does not seem  
appropriate.
-- 

David Winsemius, MD
Heritage Laboratories
West Hartford, CT