[R] Big data and column correspondence problem

murilofm murilofmoraes at gmail.com
Wed Jul 27 05:49:10 CEST 2011


Thanks Daniel, that helped me. Based on your suggestions I built this final
code:

library(foreign)
library(gdata)

AA = c(4,4,4,2,2,6,8,9) 
A1 = c(3,3,11,5,5,7,11,12) 
A2 = c(3,3,7,3,5,7,11,12) 
A = cbind(AA, A1, A2) 

BB = c(2,2,4,6,6) 
B1 =c(5,11,7,13,NA) 
B2 =c(4,12,11,NA,NA) 
B3 =c(12,13,NA,NA,NA) 

A = cbind(AA, A1, A2,0) 
B=cbind(BB,B1,B2,B3) 

newdata<-merge(A,B,by.x='AA',by.y='BB',all.x=T,all.y=F)
newdata$dum <- rowSums (newdata[,matchcols(newdata,
with=c("B"))]==newdata$A1, na.rm = FALSE, dims = 1)*
rowSums (newdata[,matchcols(newdata, with=c("B"))]==newdata$A2, na.rm =
FALSE, dims = 1)

colnames(A)[4]<-"dum"
newdata$dum1<-newdata$dum
A_final<-merge(A,newdata,by.x=c("AA","A1","A2","dum"),by.y=c("AA","A1","A2","dum"),all.x=T,all.y=F)

Which gives me the same result of the "loop" version. Unfortunately, I can't
replicate it on the original data since i can't make the merge work: i get
an error message "Reached total allocation of 4090Mb". So, I'm stuck again.

If anyone could shed some light on this problem, i would really appreciate.

--
View this message in context: http://r.789695.n4.nabble.com/Big-data-and-column-correspondence-problem-tp3694912p3697557.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list