[R] drawing samples based on a matching variable
Charles C. Berry
cberry at tajo.ucsd.edu
Wed Sep 29 19:54:27 CEST 2010
On Tue, 28 Sep 2010, L Brown wrote:
> Hi, everyone. I have what I hope will be a simple coding question. It seems
> this is a common job, but so far I've had trouble finding the answer in
> searches.
>
> I have two matrices (x and y) with a different number of observations in
> each. I need to draw a random sample without replacement of observations
> from x, and then, using a matching variable, draw a sample of equal size
> from y. It is the matching variable that is hanging me up.
>
> For example--
>
>> # example matrices. lets assume seed always equals 1. (lets also assume I
> have assigned variable names A and B to my columns..)
>> set.seed(1)
>> x<-cbind(1:10,sample(1:5,10,rep=T))
>> x
> [A] [B]
> [1,] 1 2
> [2,] 2 2
> [3,] 3 3
> [4,] 4 5
> [5,] 5 2
> [6,] 6 5
> [7,] 7 5
> [8,] 8 4
> [9,] 9 4
> [10,] 10 1
>
Looks like set.seed(1) was invoked here, too.
>> y<-cbind(1:14,sample(1:5,14,rep=T))
>> y
> [A] [B]
> [1,] 1 2
> [2,] 2 2
> [3,] 3 3
> [4,] 4 5
> [5,] 5 2
> [6,] 6 5
> [7,] 7 5
> [8,] 8 4
> [9,] 9 4
> [10,] 10 1
> [11,] 11 2
> [12,] 12 1
> [13,] 13 4
> [14,] 14 2
>
>> #draw random sample of n=4 without replacement from matrix x.
>> x.samp<-x[sample(10,4,replace=F),]
>> x.samp
> [A] [B]
> [1,] 3 3
> [2,] 4 5
> [3,] 5 2
> [4,] 7 5
>
> Next, I would need to draw four observations from matrix y (without
> replacement) so that the distribution of y$B is identical to x.samp$B.
>
> I'd appreciate any help, and sorry to post such a basic question!
Break it down like this:
> x.levels <- sort( unique(x[,2]) )
> x.samp.tab <- table( factor( x.samp[,2], x.levels ) )
> y.rows <- split(1:nrow(y),factor( y[,2], x.levels ) )
> unlist( mapply( sample, y.rows, x.samp.tab ),use.names=FALSE )
In some cases sample might fail if
length( y.rows[[i]] ) < x.samp.tab[ i ]
you can trace which element that was using 'traceback()' or write a
wrapper for sample() that checks that condition.
HTH,
Chuck
>
> LB
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Charles C. Berry (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
More information about the R-help
mailing list