[R] Random assignment

John Haart another83 at me.com
Fri Oct 15 14:18:14 CEST 2010


Hi Michael,

Thanks for this - the reason i am following this approach is that it appeared in a paper i was reading, and i thought it was a interesting angle to take 

The paper is 

Vamosi & Wilson, 2008. Nonrandom extinction leads to elevated loss of angiosperm evolutionary history. Ecology Letters, (2008) 11: 1047–1053.

and the specific method i am following states :- 

> We calculated the number of species expected to be at risk in each family under a random binomial distribution in 10 000 randomizations [generated using R version 2.6.0 (R Development Team 2007)] assuming every species has a 7.48% chance of being at risk. 

I guess the reason i am doing the simulation is because i am not hugely statistically minded and the paper was asking the same question i am interested in answering :).

So following your approach -

> if family F has Fn species, your random expectation is that p * Fn of
> them will be at risk (p = 0.0748). The variance on that expectation
> will be p * (1-p) * Fn.


Family f = Bromeliaceae , with Fn = 80, p=0.0748
random expectation = p*Fn = (0.0748*80) = 5.984
variance = p * (1-p) * Fn = (0.0748*0.9252) *80 = 5.5363968

So the random expectation is that the Bromeliaceae will have 6 species at risk, if risk is assigned randomly?

So if i do this for all the families it will be the same as doing the simulation experiment outline in the method above?

Thanks

John




On 15 Oct 2010, at 12:49, Michael Bedward wrote:

Hi John,

The word "species" attracted my attention :)

Like Dennis, I'm not sure I understand your idea properly. In
particular, I don't see what you need the simulation for.

If family F has Fn species, your random expectation is that p * Fn of
them will be at risk (p = 0.0748). The variance on that expectation
will be p * (1-p) * Fn.

If you do your simulation that's the result you'll get.  Perhaps to
initial identify families with disproportionate observed extinction
rates all you need is the dbinom function ?

Michael


On 15 October 2010 22:29, John Haart <another83 at me.com> wrote:
> Hi Denis and list
> 
> Thanks for this , and sorry for not providing enough information
> 
> First let me put the study into a bit more context : -
> 
> I know the number of species at risk in each family, what i am asking  is "Is risk random according to family or do certain families have a disproportionate number of at risk species?"
> 
> My idea was to randomly allocate risk to the families based on the criteria below (binomial(nspecies, 0.0748)) and then compare this to the "true data" and see if there was a significant difference.
> 
> So in answer to your questions, (assuming my method is correct !)
> 
>> Is this over all families, or within a particular family? If the former, why
>> does a distinction of family matter?
> 
> Within a particular family  - this is because i am looking to see if risk in the "observed" data set is random in respect to family so this will provide the baseline to compare against.
> 
>> I guess you've stated the p, but what's the n? The number of species in each
>> family?
> 
> This varies largely, for instance i have some families that are monotypic  (with 1 species) and then i have other families with 100+ species
> 
> 
>> Assuming you have multiple families, do you want separate simulations per
>> family, or do you want to do some sort of weighting (perhaps proportional to
>> size) over all families?
> 
> I am assuming i want some sort of weighting. This is because i am wanting to calculate the number of species expected to be at risk in EACH family under the random binomial distribution ( assuming every species has a 7.48% chance of being at risk.
> 
> Thanks
> 
> John
> 
> 
> 
> 
> On 15 Oct 2010, at 11:19, Dennis Murphy wrote:
> 
> Hi:
> 
> I don't believe you've provided quite enough information just yet...
> 
> On Fri, Oct 15, 2010 at 2:22 AM, John Haart <another83 at me.com> wrote:
> 
>> Dear List,
>> 
>> I am doing some simulation in R and need basic help!
>> 
>> I have a list of animal families for which i know the number of species in
>> each family.
>> 
>> I am working under the assumption that a species has a 7.48% chance of
>> being at risk.
>> 
> 
> Is this over all families, or within a particular family? If the former, why
> does a distinction of family matter?
> 
>> 
>> I want to simulate the number of species expected to be at risk under a
>> random binomial distribution with 10,000 randomizations.
>> 
> 
> I guess you've stated the p, but what's the n? The number of species in each
> family? If you're simulating on a family by family basis, then it would seem
> that a binomial(nspecies, 0.0748) distribution would be the reference.
> Assuming you have multiple families, do you want separate simulations per
> family, or do you want to do some sort of weighting (perhaps proportional to
> size) over all families? The latter is doable, but it would require a
> two-stage simulation: one to randomly select a family and then to randomly
> select a species.
> 
> Dennis
> 
> 
>> 
>> I am relatively knew to this field and would greatly appreciate a
>> "idiot-proof" response, I.e how should the data be entered into R? I was
>> thinking of using read.table, header = T, where the table has F = Family
>> Name, and SP = Number of species in that family?
>> 
>> John
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 
>        [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list