[R] Duplicate rows when I combine two data.frames with merge!

Mon Feb 6 21:49:46 CET 2012

Hi,

Why do you need to merge them? c.one contains what I think you want,
and then you want to randomly select 25 rows from that without replacement:

c.one <- cbind(c.one, a.qtr = sample(c(rep(TRUE, 25), rep(FALSE, 75))))
> head(c.one)
    unique length age a.qtr
649    649     71   4  TRUE
200    200     79   4 FALSE
410    410     82   4  TRUE
620    620     73   4 FALSE
723    723     81   4 FALSE
855    855     96   5 FALSE

If what you mean by the subject line is that there are duplicate rows in the
merged data frame, of course there are, because there are duplicated rows
in c.one, because you selected sampling with replacement when selecting
rows from pop to make c.one.

Something to be VERY careful of: length, merge and unique are all base
functions, and shouldn't be used as variable names. After you named
something merge, what happens when you try to use merge()?

If I'm misunderstanding the question, then please try to explain more clearly
what you are looking for.

Sarah

On Mon, Feb 6, 2012 at 3:29 PM, RKinzer <ryank at nezperce.org> wrote:
> Hello all,
>
> First I have done extensive searches on this forum and others and nothing
> seems to work.  So I decided to post thinking someone could point me to the
> write post or give me some help.
>
> I have drawn a 100 samples from a fictitious population (N=1000), and then
> randomly selected 25% of the 100 samples.  I would like to now merge the
> data.frame from the 100 samples with the data.frame for the 25 individuals
> from the sample.  When I do this with the following code I get duplicate
> rows, when I should have at most is 100.
>
> x<-mapply(rnorm,1000,c(54,78,89),c(3.5,5.5,5.9))  #sets up 1000 random
> numbers for age 3,4,5
> x.3<-sample(x[,1],60)  #randomly selects 60 lengths from age 3
> x.4<-sample(x[,2],740)
> x.5<-sample(x[,3],200)
> length<-c(x.3,x.4,x.5)
> length<-round(length,digits=0)  #rounds lengths to whole number
> age3<-rep(3,60)
> age4<-rep(4,740)
> age5<-rep(5,200)
> age<-c(age3,age4,age5)  #combines ages into one vector
> unique<-1:1000  #gives each fish a unique id
> pop<-data.frame(unique,length,age)
> pop<-pop[sample(1:1000,size=1000,replace=FALSE),]  #randomized the order of
> pop
> c.one<-pop[sample(1:1000,size=100,replace=TRUE),]
> a.one.qtr<-c.one[sample(1:100,size=25,replace=TRUE),]
> merge<-merge(c.one,a.one.qtr,by="unique",all=TRUE)
>
> What I would ultimately like to have is one row for all 100 in the sample
> and three columns (unique, length, age).  And then some way to identify the
> 25 individual selected rows.
>
> Thank you upfront for any help.  I have been stuck for days.
>
> Ryan
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Duplicate-rows-when-I-combine-two-data-frames-with-merge-tp4362685p4362685.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Sarah Goslee
http://www.stringpage.com
http://www.sarahgoslee.com
http://www.functionaldiversity.org