[R] Merge problem

Christoph Buser buser at stat.math.ethz.ch
Fri Sep 22 11:36:09 CEST 2006


Dear Tova

There is no reason why the merged data.frame should have
exactely 8000 or less rows.

The "all=FALSE" options only says that now new rows are
generated for cases that appear only in one of the two
data.frames.

Have a look at this sample

> dat1 <- data.frame(a = c(1,2,3,4), b = letters[1:4])
> dat2 <- data.frame(a = c(1,2,3,4,5,6,7,8,1), b = LETTERS[1:9])

> merge(dat1,dat2, by = "a", all = FALSE)
1 1    a   A
2 1    a   I
3 2    b   B
4 3    c   C
5 4    d   D


Since "1" appears twice in the large data.frame it is repeated
as the help page ?merge says:

"If there is more than one match, all possible  matches
contribute one row each."

To compare have a look what the option "all = TRUE" changes

> merge(dat1,dat2, by = "a", all = TRUE)

Probably in your large data frame some rows have identical
target ids and get repeated. It should be easy to check it with
unique()

Hope this helps

Christoph

--------------------------------------------------------------

Credit and Surety PML study: visit our web page www.cs-pml.org

--------------------------------------------------------------
Christoph Buser <buser at stat.math.ethz.ch>
Seminar fuer Statistik, LEO C13
ETH Zurich	8092 Zurich	 SWITZERLAND
phone: x-41-44-632-4673		fax: 632-1228
http://stat.ethz.ch/~buser/
--------------------------------------------------------------

Tova Fuller writes:
 > Hello all,
 > 
 > I have read as many merge issues as I possibly could tonight and  
 > although I presume this is a small error, I have not found the  
 > solution to my problem.
 > 
 > I'm trying to merge two data sets: dat0 and TransTable.  As you can  
 > see below, dat0 has 8000 rows, whereas TransTable has 47296 rows.  I  
 > would expect when I merge the two data sets, with all.x=F, and  
 > all.y=F, that the intersection would yield 8000 rows, considering  
 > dat0 is a subset of TransTable.
 > 
 > However, I get a neat little surprise when I check the dimensions of  
 > the resultant data frame - dat0merge, the merged data frame has 8007  
 > rows!  How can this be?  Where did these extra 7 rows come from?   
 > This appears to defy logic!
 > 
 > Thank you in advance for your help.  I've put my code below for  
 > reference.
 > 
 > Tova Fuller
 > 
 >  > dim(dat0)
 > [1] 8000   60
 >  > dim(TransTable)
 > [1] 47296     9
 >  > dat0merge=merge(TransTable,dat0,  
 > by.x="Target",by.y="TargetID",all.x=F,all.y=F)
 >  > dim(dat0merge)
 > [1] 8007   68
 > 
 > ______________________________________________
 > R-help at stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 > and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list