[R] random section of samples based on group membership
Sebastian Luque
spluque at gmail.com
Mon Jul 24 21:25:54 CEST 2006
On Mon, 24 Jul 2006 11:18:10 -0400,
"Wade Wall" <wade.wall at gmail.com> wrote:
> Hi all, I have a matrix of 474 rows (samples) with 565 columns
> (variables). each of the 474 samples belong to one of 120 groups, with
> the groupings as a column in the above matrix. For example, the group
> column would be:
> 1 1 1 2 2 2 . . . 120 120
> I want to randomly select one from each group. Not all the groups have
> the same number of samples, some have 4, some 3 etc. Is there a
> function to do this, or would I need to write a looping statement to
> look at each successive group?
I use the following for that (some of it hacked from help("sample")):
".resample" <- function(x, size, ...) {
if(length(x) <= 1) {
if(!missing(size) && size == 0) x[FALSE] else x
} else sample(x, size, ...)
}
"randpick" <- function(x, by, size = 1, ...)
{
nx <- seq(nrow(x))
ind <- unlist(tapply(nx, by, .resample, size, ...))
x[nx %in% ind, ]
}
So, for instance:
R> randpick(Indometh, Indometh$Subject, 3)
Subject time conc
2 1 0.50 0.94
7 1 3.00 0.12
11 1 8.00 0.05
15 2 1.00 0.70
16 2 1.25 0.64
19 2 4.00 0.20
25 3 0.75 1.16
29 3 3.00 0.22
32 3 6.00 0.08
34 4 0.25 1.85
43 4 6.00 0.07
44 4 8.00 0.07
48 5 1.00 0.39
54 5 6.00 0.10
55 5 8.00 0.06
58 6 0.75 1.03
64 6 5.00 0.13
65 6 6.00 0.10
R> randpick(Indometh, Indometh$Subject, 2)
Subject time conc
8 1 4.00 0.11
10 1 6.00 0.07
14 2 0.75 0.71
20 2 5.00 0.25
23 3 0.25 2.72
28 3 2.00 0.39
39 4 2.00 0.40
43 4 6.00 0.07
48 5 1.00 0.39
52 5 4.00 0.11
57 6 0.50 1.44
66 6 8.00 0.09
The 'by' argument allows to sample within any combination of factors
desired.
Cheers,
--
Seb
More information about the R-help
mailing list