[R-sig-eco] Using "sample()" in Null model construction

Wed Aug 25 12:01:16 CEST 2010

On 25/08/10 12:09 PM, "Bernard Coetzee" <bwtcoetzee at gmail.com> wrote:

> Hi all,
> 
> I have written a simple function, based on sample(), to create Null models
> from actual data - a sample (?!) is below. My question is 1. How exactly
> does the probability weighting work? I find the help file a bit convoluted,
> which states "If* replace* is false, these probabilities are applied
> sequentially, that is the probability of choosing the next item is
> proportional to the weights amongst the remaining items." So does it (a) use
> the probability weighting for each value sequentially, and that's why length
> of both sample and probability file needs to be equal (so in the example
> dataset below, the probability of sampling value "7.04" at each iterative
> sample() selection, is equal to probability "0.912" , weight "9.45" by
> "0.735" and so forth), or does it (b) use each probability values once, and
> averages the remaining probability for the next sample()'s selection?
>
Bernard,

It may be that you must have a look at the C source code to see how this
works. I looked at the code a long time ago, and then it worked so that the
probabilities were updated after each pick when replace = FALSE. (If my
memory serves me correctly.) To be sure, you must get the source files and
see how this is implemented. I looked at this prior to R 2.2.0 where the
handling of prob with replace = TRUE changed, and this part may have changed
as well. 

I don't know if I explained myself clearly: I find the help page clear, but
I can't understand your two alternatives. I understand the text (and
understood the C code work so that the second item o 'n' with probability
vector 'p' is picked like:

sample(n-1, 1, prob = p[-sample(n, 1, prob=p)])

That is, you remove the first picked item from 'p' and update the
probabilities. 

Still one clarification: sum(prob) == 1. You can give any weights, but they
are changed into unit sum (prob <- prob/sum(prob)) so that they give the
probability of picking a certain item. But see C.

> I am really looking to do method "a", and preliminary tests with dummy data
> suggest it is in fact "a", but I need to be 100% sure.

How do you tell the difference?

Cheers, Jari Oksanen