[R] drawing samples based on a matching variable

Michael Bedward michael.bedward at gmail.com
Wed Sep 29 03:40:08 CEST 2010


Hello LB,

It's one of those problems that's basic but tricky :)  I don't have an
elegant one-liner for it but here's a function that would do it...

function(xs, y) {
# sample matrix y such that col 2 of the sample matches
# col 2 of matrix xs

  used <- logical(nrow(y))
  yi <- integer(nrow(xs))

  k <- 1
  for (xsval in xs[,2]) {
    i <- which( !used & y[,2] == xsval )
    if (length(i) >= 1) {
      yi[k] <- sample(i, 1)
      used[ yi[k] ] <- TRUE
      k <- k + 1
    } else {
      stop("bummer: not possible to get a matching sample")
    }
  }

  y[yi, ]
}

Note, I've assumed here that in your real data the first col won't
always contain the row index as it does in your example.

Michael

On 29 September 2010 07:46, L Brown <missmissliss at gmail.com> wrote:
> Hi, everyone. I have what I hope will be a simple coding question. It seems
> this is a common job, but so far I've had trouble finding the answer in
> searches.
>
> I have two matrices (x and y) with a different number of observations in
> each. I need to draw a random sample without replacement of observations
> from x, and then, using a matching variable, draw a sample of equal size
> from y. It is the matching variable that is hanging me up.
>
> For example--
>
>> # example matrices. lets assume seed always equals 1. (lets also assume I
> have assigned variable names A and B to my columns..)
>> set.seed(1)
>> x<-cbind(1:10,sample(1:5,10,rep=T))
>> x
>      [A] [B]
>  [1,]    1    2
>  [2,]    2    2
>  [3,]    3    3
>  [4,]    4    5
>  [5,]    5    2
>  [6,]    6    5
>  [7,]    7    5
>  [8,]    8    4
>  [9,]    9    4
> [10,]   10    1
>
>> y<-cbind(1:14,sample(1:5,14,rep=T))
>> y
>      [A] [B]
>  [1,]    1    2
>  [2,]    2    2
>  [3,]    3    3
>  [4,]    4    5
>  [5,]    5    2
>  [6,]    6    5
>  [7,]    7    5
>  [8,]    8    4
>  [9,]    9    4
> [10,]   10    1
> [11,]   11    2
> [12,]   12    1
> [13,]   13    4
> [14,]   14    2
>
>> #draw random sample of n=4 without replacement from matrix x.
>> x.samp<-x[sample(10,4,replace=F),]
>> x.samp
>     [A] [B]
> [1,]    3    3
> [2,]    4    5
> [3,]    5    2
> [4,]    7    5
>
> Next, I would need to draw four observations from matrix y (without
> replacement) so that the distribution of y$B is identical to x.samp$B.
>
> I'd appreciate any help, and sorry to post such a basic question!
>
> LB
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list