[R] Questions about generating samples in R
Charles C. Berry
cberry at tajo.ucsd.edu
Tue Nov 28 02:26:11 CET 2006
On Mon, 27 Nov 2006, Mark Na wrote:
> Further to Alexander's question ... could anyone provide assistance
> with random stratified sampling? Let's say we have Alex's dataframe
> and we want to stratify the random selection by group membership
> (which is contained in one of the eight columns).
>
> We might want to randomly select:
>
> 1) a constant number (e.g., 5) of rows from each group, or
> 2) a percentage (e.g. 10%) of rows from each group resulting in groups
> being represented proportionally in the sample (with respect to the
> population).
>
> I am aware of stratsrs but this function does not seem to allow the
> second of the above two options.
>
> Any ideas how to achieve this in R?
Suppose 'grp.numbers' holds the group identitities.
Define wrappers for sample():
sample.just.5 <- function(x) sample(x ,size = 5 )
sample.10.pct <- function(x) sample(x,size=round(0.10*length(x)))
Then use tapply:
samples.of.5 <- tapply(seq(along=grp.numbers),grp.numbers, sample.just.5 )
Check this with:
table( grp.numbers[ unlist( samples.of.5 ) ] )
Again use tapply:
samples.of.10.pct <- tapply(seq(along=grp.numbers),grp.numbers, sample.10.pct )
Check this with:
table( grp.numbers[ unlist( samples.of.10.pct ) ] )
There are loads of variations ...
>
> Thanks, Mark
>
>
>
> On 11/26/06, Alexander Geisler <alexander.geisler at gmail.com> wrote:
>> Hello!
>>
>> I have a data set with 8 columns and in about 5000 rows. What I want to
>> do is to generate samples of this data set.
>>
>> Samples of a special size, as example 200.
>>
>> What is the easiest way to do this? No special things are needed, only
>> the random selection of 200 rows of the data set.
>>
>> Thanks
>> Alex
>>
>> --
>> Alexander Geisler * Kaltenbach 151 * A-6272 Kaltenbach
>> email: alexander.geisler at gmx.at | alexander.geisler at gmail.com
>> phone: +43 650 / 811 61 90 | skpye: al1405ex
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Charles C. Berry (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu UC San Diego
http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0717
More information about the R-help
mailing list