[R] Sample rows in data frame by subsets
Chris Stubben
stubben at lanl.gov
Mon Jan 23 21:04:06 CET 2006
Hi,
I need to resample rows in a data frame by subsets
L3 <- LETTERS[1:3]
d <- data.frame(cbind(x=1, y=1:10), fac=sample(L3, 10, repl=TRUE))
x y fac
1 1 1 A
2 1 2 A
3 1 3 A
4 1 4 A
5 1 5 C
6 1 6 C
7 1 7 B
8 1 8 A
9 1 9 C
10 1 10 A
I have seen this used to sample rows with replacement
d[sample(nrow(d), replace=T), ]
x y fac
7 1 7 B
2 1 2 A
1 1 1 A
3 1 3 A
2.1 1 2 A
10 1 10 A
8 1 8 A
9 1 9 C
1.1 1 1 A
8.1 1 8 A
but I would like to sample based on the original number in fac
summary(d$fac)
A B C
6 1 3
rbind(subset(d, fac=="A")[sample(6, replace=T), ],
subset(d, fac=="B")[sample(1, replace=T), ],
subset(d, fac=="C")[sample(3, replace=T), ] )
x y fac
2 1 2 A
3 1 3 A
3.1 1 3 A
1 1 1 A
10 1 10 A
1.1 1 1 A
7 1 7 B
5 1 5 C
6 1 6 C
5.1 1 5 C
Is there an easy way to do this in one step or with a short function? I
have lots of dataframes to resample.
Thanks,
Chris
--
-----------------
Chris Stubben
Los Alamos National Lab
BioScience Division
MS M888
Los Alamos, NM 87545
More information about the R-help
mailing list