[R] sampling question
Adaikalavan Ramasamy
ramasamy at cancer.org.uk
Fri Jun 29 00:19:46 CEST 2007
Lets assume your zcta data looks like this
set.seed(12345) ## temporary for reproducibility
zcta <- data.frame( zipcode=LETTERS[1:5], prop=runif(5) )
zcta
zipcode prop
1 A 0.7209039
2 B 0.8757732
3 C 0.7609823
4 D 0.8861246
5 E 0.4564810
This says that 72.1% of the population in zipcode A is female, ..., and
45.6% in zipcode E is female.
Now suppose you sampled 20 people and you recorded the zipcode (and
other variables) and stored in 'samp'
samp <- data.frame( id=1:20,
zipcode=LETTERS[ sample(1:5, 20, replace=TRUE) ])
Now, I am not sure what you want to do. But I could see two possible
meanings from your message.
1) If you want to sample 10 observation, with each observation weighted
INDEPENDENTLY by the proportion of women in its zipcode, try something
like the following. The problem with this option is that it depends on
the prevalence of the zipcodes of the observations.
comb <- merge( samp, zcta, all.x=T )
comb <- comb[ order(comb$id), ]
comb[ sample( comb$id, 10, prob=comb$prop ), ]
2) If you want to sample x% in each zipcode, where x is the proportion
of women in that zipcode. Then this is what I would call stratified
sampling. Try this:
tmp <- split( samp, samp$zipcode )
out <- NULL
for( z in names(tmp) ){
df <- tmp[[z]]
p <- zcta[ zcta$zipcode == z, "prop" ]
out[[z]] <- df[ sample( 1:nrow(df), p*nrow(df) ), ]
}
do.call("rbind", out)
You probably need a variant of these but if you need further help, you
will need to provide more information and better yet examples.
Regards, Adai
Kirsten Beyer wrote:
> I am interested in locating a script to implement a sampling scheme
> that would basically make it more likely that a particular observation
> is chosen based on a weight associated with the observation. I am
> trying to select a sample of ~30 census blocks from each ZIP code area
> based on the proportion of women in a ZCTA living in a particular
> block. I want to make it more likely that a block will be chosen if
> the proportion of women in a patient's age group in a particular block
> is high. Any ideas are appreciated!
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
More information about the R-help
mailing list