[R] Duplicate rows when I combine two data.frames with merge!

Ista Zahn istazahn at gmail.com
Mon Feb 6 22:07:42 CET 2012


Hi Ryan,

You're getting this result because five of the rows in a.one.qtr are
duplicates, as a results of sampling with replacement:

length(which(duplicated(a.one.qtr)))
[1] 5

The relevant section of the documentation (see ?merge) reads "The rows
in the two data frames that match on the specified columns are
extracted, and joined together. *If there is more than one match, all
possible matches contribute one row each*" (emphasis mine).

Best,
Ista

On Mon, Feb 6, 2012 at 3:29 PM, RKinzer <ryank at nezperce.org> wrote:
> Hello all,
>
> First I have done extensive searches on this forum and others and nothing
> seems to work.  So I decided to post thinking someone could point me to the
> write post or give me some help.
>
> I have drawn a 100 samples from a fictitious population (N=1000), and then
> randomly selected 25% of the 100 samples.  I would like to now merge the
> data.frame from the 100 samples with the data.frame for the 25 individuals
> from the sample.  When I do this with the following code I get duplicate
> rows, when I should have at most is 100.
>
> x<-mapply(rnorm,1000,c(54,78,89),c(3.5,5.5,5.9))  #sets up 1000 random
> numbers for age 3,4,5
> x.3<-sample(x[,1],60)  #randomly selects 60 lengths from age 3
> x.4<-sample(x[,2],740)
> x.5<-sample(x[,3],200)
> length<-c(x.3,x.4,x.5)
> length<-round(length,digits=0)  #rounds lengths to whole number
> age3<-rep(3,60)
> age4<-rep(4,740)
> age5<-rep(5,200)
> age<-c(age3,age4,age5)  #combines ages into one vector
> unique<-1:1000  #gives each fish a unique id
> pop<-data.frame(unique,length,age)
> pop<-pop[sample(1:1000,size=1000,replace=FALSE),]  #randomized the order of
> pop
> c.one<-pop[sample(1:1000,size=100,replace=TRUE),]
> a.one.qtr<-c.one[sample(1:100,size=25,replace=TRUE),]
> merge<-merge(c.one,a.one.qtr,by="unique",all=TRUE)
>
> What I would ultimately like to have is one row for all 100 in the sample
> and three columns (unique, length, age).  And then some way to identify the
> 25 individual selected rows.
>
> Thank you upfront for any help.  I have been stuck for days.
>
> Ryan
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Duplicate-rows-when-I-combine-two-data-frames-with-merge-tp4362685p4362685.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list