[R] Merging data frames, or one column/vector with a data frame filling out empty rows with NA's

Gabor Grothendieck ggrothendieck at gmail.com
Wed Apr 22 13:33:22 CEST 2009


Try this (where SNP1x is same as SNP1 from your post
but without the last line).  If the merge below does not work
on real data set due to size then try the sqldf alternative
as it

> SNP1x <-
+ structure(list(Animal = c(194073197L, 194073197L, 194073197L,
+ 194073197L, 194073197L), Marker = structure(1:5, .Label = c("P1001",
+ "P1002", "P1004", "P1005", "P1006", "P1007"), class = "factor"),
+     x = c(2L, 1L, 2L, 0L, 2L)), .Names = c("Animal", "Marker",
+ "x"), row.names = c("3213", "1295", "915", "2833", "1487"), class =
"data.frame")
>
> SNP4 <-
+ structure(list(Animal = c(194073197L, 194073197L, 194073197L,
+ 194073197L, 194073197L, 194073197L), Marker = structure(1:6, .Label
= c("P1001",
+ "P1002", "P1004", "P1005", "P1006", "P1007"), class = "factor"),
+     Y = c(0.021088, 0.021088, 0.021088, 0.021088, 0.021088, 0.021088
+     )), .Names = c("Animal", "Marker", "Y"), class = "data.frame",
row.names = c("3213",
+ "1295", "915", "2833", "1487", "1885"))
>
> merge(SNP1x, SNP4, all = TRUE)
     Animal Marker  x        Y
1 194073197  P1001  2 0.021088
2 194073197  P1002  1 0.021088
3 194073197  P1004  2 0.021088
4 194073197  P1005  0 0.021088
5 194073197  P1006  2 0.021088
6 194073197  P1007 NA 0.021088
> library(sqldf)
> sqldf("select * from SNP4 left join SNP1x using (Animal, Marker)")
     Animal Marker        Y  x
1 194073197  P1001 0.021088  2
2 194073197  P1002 0.021088  1
3 194073197  P1004 0.021088  2
4 194073197  P1005 0.021088  0
5 194073197  P1006 0.021088  2
6 194073197  P1007 0.021088 NA
> # or if that does not work due to size force it to create, use
> #    and destroy an external data base
> sqldf("select * from SNP4 left join SNP1x using (Animal, Marker)", dbname = "temp.db")
     Animal Marker        Y  x
1 194073197  P1001 0.021088  2
2 194073197  P1002 0.021088  1
3 194073197  P1004 0.021088  2
4 194073197  P1005 0.021088  0
5 194073197  P1006 0.021088  2
6 194073197  P1007 0.021088 NA



On Wed, Apr 22, 2009 at 5:22 AM, Johannes G. Madsen
<JGM at dansksvineproduktion.dk> wrote:
> Hello
>
> I have two data frames, SNP4 and SNP1:
>
>> head(SNP4)
>          Animal     Marker        Y
> 3213 194073197  P1001 0.021088
> 1295 194073197  P1002 0.021088
> 915   194073197  P1004 0.021088
> 2833 194073197  P1005 0.021088
> 1487 194073197  P1006 0.021088
> 1885 194073197  P1007 0.021088
>
>> head(SNP1)
>           Animal    Marker x
> 3213 194073197  P1001 2
> 1295 194073197  P1002 1
> 915   194073197  P1004 2
> 2833 194073197  P1005 0
> 1487 194073197  P1006 2
> 1885 194073197  P1007 0
>
> I want these two data frames merged by 'Marker', but when i try
>
>> SNP5 <- merge(SNP4, SNP1, by = 'Marker', all = TRUE)
> Error: cannot allocate vector of size 2.4 Gb
> In addition: Warning messages:
> 1: In merge.data.frame(SNP4, SNP1, by = "Marker", all = TRUE) :
>  Reached total allocation of 1535Mb: see help(memory.size)
> 2: In merge.data.frame(SNP4, SNP1, by = "Marker", all = TRUE) :
>  Reached total allocation of 1535Mb: see help(memory.size)
> 3: In merge.data.frame(SNP4, SNP1, by = "Marker", all = TRUE) :
>  Reached total allocation of 1535Mb: see help(memory.size)
> 4: In merge.data.frame(SNP4, SNP1, by = "Marker", all = TRUE) :
>  Reached total allocation of 1535Mb: see help(memory.size)
>
> And error occurs.
>
> What i want is the column SNP1$x merged together with SNP4 by Marker, so some
> markers will have NA's in the 'x'-column in the SNP5 dataset.
>
> I also tried this
>
>> SNP5 <- merge(SNP4, SNP1$x, by.x = 'Marker', by.y = 'Marker', all = TRUE)
> Error in fix.by(by.y, y) : 'by' must specify valid column(s)
>
> I won't work either.
>
> Does anyone have any idea how to solve this.
>
> Regards,
>
> Johannes.
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>




More information about the R-help mailing list