[R] bootstrap resampling - simplified

Bert Gunter gunter.berton at gene.com
Wed Mar 2 20:42:40 CET 2011


Folks:

On Wed, Mar 2, 2011 at 10:32 AM, Jonathan P Daily <jdaily at usgs.gov> wrote:
> I will point out again that sampling a five-fold replicate of 1:20 is not
> the same as resampling with replacement,

-- Correct. In sampling with replacement from 1:20 there is positive
probability of getting all 1's or all 2's, etc. The poster
specifically said that he wanted 0 probability of such results. So,
obviously, the poster does NOT want to "sample with replacement from
1:20." What he does want (I think) is a re-sample of size n from the
set of all **vectors** of length 20, each element of which is an
integer from 1 to 20, and for which no individual values occur more
than 5 times in the vector. Of course I'm just
interpreting/paraphrasing the original post (if I got it right), but I
think doing so makes the nature of the task clearer: one needs to find
some way to sample with replacement from the space of all such
**sequences**.

I think it is now clear that one may do so by rejection sampling: i.e.
sample with replacement from 1:20 and throw away any sequences that
fail the at most 5 criterion. The sequences that remain are samples of
size 1 from the population of sequences that satisfy the poster's
criteria (in theory, anyway; this might tax a pseudo RNG in practice).
A collection of n such sequences is a bootstrap sample from this
population. I **think** that's what the poster wants -- and what
others have already provided. However, maybe this clarifies why it
works.

If I have made any error in this, **Please** post a message pointing
out my error. I sometimes get confused about this stuff, too.

Cheers,
Bert





 although I made an error in
> reporting probabilities - the P(x2 = 1 | x1 = 1) = 4/99 and not 4/100.
> When sampling with replacement, P(x2 = 1 | x1 = 1) = P(x2 = 1 | x1 != 1) =
> 1/20.
> --------------------------------------
> Jonathan P. Daily
> Technician - USGS Leetown Science Center
> 11649 Leetown Road
> Kearneysville WV, 25430
> (304) 724-4480
> "Is the room still a room when its empty? Does the room,
>  the thing itself have purpose? Or do we, what's the word... imbue it."
>     - Jubal Early, Firefly
>
> r-help-bounces at r-project.org wrote on 03/02/2011 01:05:01 PM:
>
>> [image removed]
>>
>> Re: [R] bootstrap resampling - simplified
>>
>> Vokey, John
>>
>> to:
>>
>> r-help
>>
>> 03/02/2011 01:07 PM
>>
>> Sent by:
>>
>> r-help-bounces at r-project.org
>>
>> On 2011-03-02, at 4:00 AM, r-help-request at r-project.org wrote:
>>
>> > Hello there,
>> >
>> > I have a problem concerning bootstrapping in R - especially
>> focusing on the resampling part of it. I try to sum it up in a
>> simplified way so that I would not confuse anybody.
>> >
>> > I have a small database consisting of 20 observations (basically
>> numbers from 1 to 20, I mean: 1, 2, 3, 4, 5, ... 18, 19, 20).
>> >
>> > I would like to resample this database many times for the
>> bootstrap process with the following conditions. Firstly, every
>> resampled database should also include 20 observations. Secondly,
>> when selecting a number from the above-mentioned 20 numbers, you can
>> do this selection with replacement. The difficult part comes now:
>> one number can be selected only maximum 5 times. In order to make
>> this clear I show you a couple of examples. So the resampled
>> databases might be like the following ones:
>> >
>> > (1st database)          1,2,1,2,1,2,1,2,1,2,3,3,3,3,3,4,4,4,4,4
>> > 4 different numbers are chosen (1, 2, 3, 4), each selected - for
>> the maximum possible - 5 times.
>> >
>> > (2nd database)          1,8,8,6,8,8,8,2,3,4,5,6,6,6,6,7,19,1,1,1
>> > Two numbers - 8 and 6 - selected 5 times (the maximum possible
>> times), number 1 selected 4 times, the others selected less than 4
> times.
>> >
>> > (3rd database)          1,1,2,2,3,3,4,4,9,9,9,10,10,13,10,9,3,9,2,1
>> > Number 9 chosen for the maximum possible 5 times, number 10, 3, 2,
>> 1 chosen for 3 times, number 4 selected twice and number 13 selectedonly
> once.
>> >
>> > ...
>> >
>> > Anybody knows how to implement my "tricky" condition into one of
>> the R functions - that one number can be selected only 5 times at
>> most? Are 'boot' and 'bootstrap' packages capable of managing this?
>> I guess they are, I just couldn't figure it out yet...
>> >
>> > Thanks very much! Best regards,
>> > Laszlo Bodnar
>>
>> Laszlo,
>>   Create a vector consisting of 5 of each number.  Then, for each
>> sample, scramble the order of the items in the vector, and select
>> the first 20.
>>
>>
>> --
>> Please avoid sending me Word or PowerPoint attachments.
>> See <http://www.gnu.org/philosophy/no-word-attachments.html>
>>
>> -Dr. John R. Vokey
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Bert Gunter
Genentech Nonclinical Biostatistics
467-7374
http://devo.gene.com/groups/devo/depts/ncb/home.shtml



More information about the R-help mailing list