[R] drawing samples based on a matching variable

Charles C. Berry cberry at tajo.ucsd.edu
Wed Sep 29 19:54:27 CEST 2010


On Tue, 28 Sep 2010, L Brown wrote:

> Hi, everyone. I have what I hope will be a simple coding question. It seems
> this is a common job, but so far I've had trouble finding the answer in
> searches.
>
> I have two matrices (x and y) with a different number of observations in
> each. I need to draw a random sample without replacement of observations
> from x, and then, using a matching variable, draw a sample of equal size
> from y. It is the matching variable that is hanging me up.
>
> For example--
>
>> # example matrices. lets assume seed always equals 1. (lets also assume I
> have assigned variable names A and B to my columns..)
>> set.seed(1)
>> x<-cbind(1:10,sample(1:5,10,rep=T))
>> x
>      [A] [B]
> [1,]    1    2
> [2,]    2    2
> [3,]    3    3
> [4,]    4    5
> [5,]    5    2
> [6,]    6    5
> [7,]    7    5
> [8,]    8    4
> [9,]    9    4
> [10,]   10    1
>

Looks like set.seed(1) was invoked here, too.

>> y<-cbind(1:14,sample(1:5,14,rep=T))
>> y
>      [A] [B]
> [1,]    1    2
> [2,]    2    2
> [3,]    3    3
> [4,]    4    5
> [5,]    5    2
> [6,]    6    5
> [7,]    7    5
> [8,]    8    4
> [9,]    9    4
> [10,]   10    1
> [11,]   11    2
> [12,]   12    1
> [13,]   13    4
> [14,]   14    2
>
>> #draw random sample of n=4 without replacement from matrix x.
>> x.samp<-x[sample(10,4,replace=F),]
>> x.samp
>     [A] [B]
> [1,]    3    3
> [2,]    4    5
> [3,]    5    2
> [4,]    7    5
>
> Next, I would need to draw four observations from matrix y (without
> replacement) so that the distribution of y$B is identical to x.samp$B.
>
> I'd appreciate any help, and sorry to post such a basic question!


Break it down like this:

> x.levels <- sort( unique(x[,2]) )
> x.samp.tab <- table( factor( x.samp[,2], x.levels ) )
> y.rows <- split(1:nrow(y),factor( y[,2], x.levels ) )
> unlist( mapply( sample, y.rows, x.samp.tab ),use.names=FALSE )

In some cases sample might fail if

 	length( y.rows[[i]] ) < x.samp.tab[ i ]

you can trace which element that was using 'traceback()' or write a 
wrapper for sample() that checks that condition.

HTH,

Chuck

>
> LB
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901



More information about the R-help mailing list