[R] Big data and column correspondence problem

Daniel Malter daniel at umd.edu
Tue Jul 26 23:31:27 CEST 2011


This is much clearer. So here is what I think you want to do. In theory and
practice:

Theory: 

Check if AA[i] is in BB

If AA[i] is in BB, then take the row where BB[j] == AA[i] and check whether
A1 and A2 are in B1 to B3. Is that right? Only if both are, you want the
indicator to take 1.

Here is how you do this:

newdata<-merge(A,B,by.x='AA',by.y='BB',all.x=F,all.y=F)

A1.check<-with(newdata,A1==B1|A1==B2|A1==B3)
B1.check<-with(newdata,A2==B1|A1==B2|A1==B3)

A1.check<-replace(A1.check,which(is.na(A1.check)),0)
B1.check<-replace(B1.check,which(is.na(B1.check)),0)

newdata<-data.frame(newdata,A1.check,B1.check)

newdata$index<-with(newdata,ifelse(A1.check+B1.check==2,1,0))

HTH,
Daniel


murilofm wrote:
> 
>>>I can not see A1[1]=20 in your example data. 
> 
> Sorry about the typos.... A1[1]=3.
> 
>>> Why B[3,]?
> 
> Because AA[1]=BB[3]=4.
> 
> I will reformulate the example with the code I'm running:
> 
> AA = c(4,4,4,2,2,6,8,9) 
> A1 = c(3,3,11,5,5,7,11,12) 
> A2 = c(3,3,7,3,5,7,11,12) 
> A = cbind(AA, A1, A2)
> 
> BB = c(2,2,4,6,6) 
> B1 =c(5,11,7,13,NA) 
> B2 =c(4,12,11,NA,NA) 
> B3 =c(12,13,NA,NA,NA) 
> 
> A = cbind(AA, A1, A2,0) 
> B=cbind(BB,B1,B2,B3)
> 
> for(i in 1:dim(A)[1]){
> if (!is.na(sum(match(A[i,2:3],B[B[,1]==A[i,1],2:dim(B)[2]])))) A[i,4]<-1
> }
> 
> Thanks
> 

--
View this message in context: http://r.789695.n4.nabble.com/Big-data-and-column-correspondence-problem-tp3694912p3697067.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list