[R] Merging data frames, or one column/vector with a data frame filling out empty rows with NA's

stephenb stenka1 at go.com
Mon Apr 27 17:37:17 CEST 2009


You are exceeding your max memory here, so R will not be able to do that. 
dump both tables into a db such as mysql and then run the query either from
RMySQL or from mysql directly. then output the result and import back in R.

that will take care of the merge, but not sure what will happen when you
actually try to run some stats on the object. it is very likely the
operation will exceed memory again.

in the end you may have to write your own code which does not attempt to
load everything in memory, it could be either R or a lower level language.

if you have SAS it will probably work as it deals with large sets in long
format well. depending on what you do R may be able to deal with it after a
reshape() to a wide format.


joe1985 wrote:
> 
> Hello
> 
> I have two data frames, SNP4 and SNP1:
> 
>> head(SNP4)
>           Animal     Marker        Y
> 3213 194073197  P1001 0.021088
> 1295 194073197  P1002 0.021088
> 915   194073197  P1004 0.021088
> 2833 194073197  P1005 0.021088
> 1487 194073197  P1006 0.021088
> 1885 194073197  P1007 0.021088
> 
>> head(SNP1)
>            Animal    Marker x
> 3213 194073197  P1001 2
> 1295 194073197  P1002 1
> 915   194073197  P1004 2
> 2833 194073197  P1005 0
> 1487 194073197  P1006 2
> 1885 194073197  P1007 0
> 
> I want these two data frames merged by 'Marker', but when i try 
> 
>> SNP5 <- merge(SNP4, SNP1, by = 'Marker', all = TRUE)
> Error: cannot allocate vector of size 2.4 Gb
> In addition: Warning messages:
> 1: In merge.data.frame(SNP4, SNP1, by = "Marker", all = TRUE) :
>   Reached total allocation of 1535Mb: see help(memory.size)
> 2: In merge.data.frame(SNP4, SNP1, by = "Marker", all = TRUE) :
>   Reached total allocation of 1535Mb: see help(memory.size)
> 3: In merge.data.frame(SNP4, SNP1, by = "Marker", all = TRUE) :
>   Reached total allocation of 1535Mb: see help(memory.size)
> 4: In merge.data.frame(SNP4, SNP1, by = "Marker", all = TRUE) :
>   Reached total allocation of 1535Mb: see help(memory.size)
> 
> And error occurs.
> 
> What i want is the column SNP1$x merged together with SNP4 by Marker, so
> some markers will have NA's in the 'x'-column in the SNP5 dataset.
> 
> I also tried this
> 
>> SNP5 <- merge(SNP4, SNP1$x, by.x = 'Marker', by.y = 'Marker', all = TRUE) 
> Error in fix.by(by.y, y) : 'by' must specify valid column(s)
> 
> I won't work either. 
> 
> Does anyone have any idea how to solve this.
> 
> Regards,
> 
> Johannes.
> 
> 
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Merging-data-frames%2C-or-one-column-vector-with-a-data-frame-filling-out-empty-rows-with-NA%27s-tp23171110p23259062.html
Sent from the R help mailing list archive at Nabble.com.




More information about the R-help mailing list