[R] Big data and column correspondence problem

murilofm murilofmoraes at gmail.com
Tue Jul 26 07:13:28 CEST 2011


I've been struggling for some time with a problem concerning a big database
that i have to deal with.
I'll try to exemplify my problem since the database is really big.
Suppose I have the following data:

AA = c(4,4,4,2,2,6,8,9)
A1 = c(3,3,5,5,5,7,11,12)
A2 = c(3,3,5,5,5,7,11,12)
A = cbind(A, A1, A2)

BB = c(2,2,4,6,6)
B1 =c(5,11,7,13,NA)
B2 =c(3,12,11,NA,NA)
B3 =c(12,13,NA,NA,NA)


I have to do the following:

1. Create a dummy (binary) variable in a new column of A that indicates if,
for each row:
a) the value from the column AA can be found in BB
b) within the lines of B that corresponds to the value of AA, I can find
both A1 and A2 in B1, B2 or B3.

In this example i would have

I been able to do it with some loops; the problem is that since in the
original data A has 2.936.044 lines and B has 14.965 it's taking forever to
finish (probably because I might be doing the wrong way).

I would really appreciate any help or advice on how to deal with this.

View this message in context: http://r.789695.n4.nabble.com/Big-data-and-column-correspondence-problem-tp3694912p3694912.html
Sent from the R help mailing list archive at Nabble.com.

More information about the R-help mailing list