[R] Stratified random sampling?

Dylan Beaudette debeaudette at ucdavis.edu
Fri Jun 19 01:17:16 CEST 2009


On Thursday 18 June 2009, Jonathan Greenberg wrote:
> Rers:
>
>     What is the preferred library/function for doing stratified random
> sampling from a dataset, given I want to control the number of samples
> (rather than the proportion of samples) per strata?  Thanks!
>
> --j

Hi Jonathan!

Check out spsample in the 'sp' package for spatial-stratified random sampling, 
among others. 

For grouped data, there may be a function, but it should be as simple as:

# some grouped data, with different means for clarity
d <- data.frame(x=rnorm(1000, mean=c(1,5,10,15)), g=rep(letters[1:4], 
times=250))

# sample 2 items (without replacement) from each group:
res <- by(d, d$g, function(i) {sample(i$x, size=2)} )

d$g: a
[1] 0.1931319 2.1858605
------------------------------------------------------------ 
d$g: b
[1] 6.020904 5.200289
------------------------------------------------------------ 
d$g: c
[1]  9.61317 11.14428
------------------------------------------------------------ 
d$g: d
[1] 15.26022 14.61383

# Then, parse the result with lapply or sapply. Or, use the plyr framework to 
# extend this to multi-level stratification!

library(lattice)
library(plyr)

# two-levels of grouped data:
d <- data.frame(x=rnorm(1000, mean=c(1,5,100,150)), 
g=rep(letters[1:4], times=250),
gg=rep(c('A','B'), each=2, times=250))

# check:
bwplot(x ~ g | gg, data=d)

# use ddply():
res <- ddply(d, .variables=c('gg','g'), .fun=function(i) { sample(i$x, 
size=2)} )

# result looks ok:
  gg g          V1         V2
1  A a   0.1555472   3.196626
2  A b   4.9836106   5.559472
3  B c 100.0587593 101.723630
4  B d 150.7257066 149.865093

# might need some more work to convert that back into 'long format' for 
modeling...


Cheers,
Dylan

-- 
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341




More information about the R-help mailing list