[R] drawing samples based on a matching variable
Michael Bedward
michael.bedward at gmail.com
Wed Sep 29 03:40:08 CEST 2010
Hello LB,
It's one of those problems that's basic but tricky :) I don't have an
elegant one-liner for it but here's a function that would do it...
function(xs, y) {
# sample matrix y such that col 2 of the sample matches
# col 2 of matrix xs
used <- logical(nrow(y))
yi <- integer(nrow(xs))
k <- 1
for (xsval in xs[,2]) {
i <- which( !used & y[,2] == xsval )
if (length(i) >= 1) {
yi[k] <- sample(i, 1)
used[ yi[k] ] <- TRUE
k <- k + 1
} else {
stop("bummer: not possible to get a matching sample")
}
}
y[yi, ]
}
Note, I've assumed here that in your real data the first col won't
always contain the row index as it does in your example.
Michael
On 29 September 2010 07:46, L Brown <missmissliss at gmail.com> wrote:
> Hi, everyone. I have what I hope will be a simple coding question. It seems
> this is a common job, but so far I've had trouble finding the answer in
> searches.
>
> I have two matrices (x and y) with a different number of observations in
> each. I need to draw a random sample without replacement of observations
> from x, and then, using a matching variable, draw a sample of equal size
> from y. It is the matching variable that is hanging me up.
>
> For example--
>
>> # example matrices. lets assume seed always equals 1. (lets also assume I
> have assigned variable names A and B to my columns..)
>> set.seed(1)
>> x<-cbind(1:10,sample(1:5,10,rep=T))
>> x
> [A] [B]
> [1,] 1 2
> [2,] 2 2
> [3,] 3 3
> [4,] 4 5
> [5,] 5 2
> [6,] 6 5
> [7,] 7 5
> [8,] 8 4
> [9,] 9 4
> [10,] 10 1
>
>> y<-cbind(1:14,sample(1:5,14,rep=T))
>> y
> [A] [B]
> [1,] 1 2
> [2,] 2 2
> [3,] 3 3
> [4,] 4 5
> [5,] 5 2
> [6,] 6 5
> [7,] 7 5
> [8,] 8 4
> [9,] 9 4
> [10,] 10 1
> [11,] 11 2
> [12,] 12 1
> [13,] 13 4
> [14,] 14 2
>
>> #draw random sample of n=4 without replacement from matrix x.
>> x.samp<-x[sample(10,4,replace=F),]
>> x.samp
> [A] [B]
> [1,] 3 3
> [2,] 4 5
> [3,] 5 2
> [4,] 7 5
>
> Next, I would need to draw four observations from matrix y (without
> replacement) so that the distribution of y$B is identical to x.samp$B.
>
> I'd appreciate any help, and sorry to post such a basic question!
>
> LB
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list