[R] Merge problem
Christoph Buser
buser at stat.math.ethz.ch
Fri Sep 22 11:36:09 CEST 2006
Dear Tova
There is no reason why the merged data.frame should have
exactely 8000 or less rows.
The "all=FALSE" options only says that now new rows are
generated for cases that appear only in one of the two
data.frames.
Have a look at this sample
> dat1 <- data.frame(a = c(1,2,3,4), b = letters[1:4])
> dat2 <- data.frame(a = c(1,2,3,4,5,6,7,8,1), b = LETTERS[1:9])
> merge(dat1,dat2, by = "a", all = FALSE)
1 1 a A
2 1 a I
3 2 b B
4 3 c C
5 4 d D
Since "1" appears twice in the large data.frame it is repeated
as the help page ?merge says:
"If there is more than one match, all possible matches
contribute one row each."
To compare have a look what the option "all = TRUE" changes
> merge(dat1,dat2, by = "a", all = TRUE)
Probably in your large data frame some rows have identical
target ids and get repeated. It should be easy to check it with
unique()
Hope this helps
Christoph
--------------------------------------------------------------
Credit and Surety PML study: visit our web page www.cs-pml.org
--------------------------------------------------------------
Christoph Buser <buser at stat.math.ethz.ch>
Seminar fuer Statistik, LEO C13
ETH Zurich 8092 Zurich SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--------------------------------------------------------------
Tova Fuller writes:
> Hello all,
>
> I have read as many merge issues as I possibly could tonight and
> although I presume this is a small error, I have not found the
> solution to my problem.
>
> I'm trying to merge two data sets: dat0 and TransTable. As you can
> see below, dat0 has 8000 rows, whereas TransTable has 47296 rows. I
> would expect when I merge the two data sets, with all.x=F, and
> all.y=F, that the intersection would yield 8000 rows, considering
> dat0 is a subset of TransTable.
>
> However, I get a neat little surprise when I check the dimensions of
> the resultant data frame - dat0merge, the merged data frame has 8007
> rows! How can this be? Where did these extra 7 rows come from?
> This appears to defy logic!
>
> Thank you in advance for your help. I've put my code below for
> reference.
>
> Tova Fuller
>
> > dim(dat0)
> [1] 8000 60
> > dim(TransTable)
> [1] 47296 9
> > dat0merge=merge(TransTable,dat0,
> by.x="Target",by.y="TargetID",all.x=F,all.y=F)
> > dim(dat0merge)
> [1] 8007 68
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list