[R-sig-eco] Error distribution for fractional response

Scott Foster scott.foster at csiro.au
Thu Jan 30 11:46:17 CET 2014


Dear Adhara,

I just saw that Bob O'Hara already answered this query.  Here are some alternatives.  Note that there are similarities between Bob's suggestion and 
number 2) below due to the relatedness of the Poisson and the Binomial.

There are a number of ways to answer this question.

First, if you really must work with the regeneration variable, then you could use a beta distribution, which is quite flexible and is defined on the 
zero to one interval.  This is not what I would recommend though.

Second, if you wanted to know about the number of saplings per adult (and not about the number of saplings irrespective of the number of adults) then 
you should use a binomial model.  Formally, you would assume that the number of saplings is drawn from a binomial distribution with probability of 
success is the parameter of interest and the number of trials is given by the number of adults. Without covariates, you are assuming that the success 
probability is constant over each of the observations.  There might be overdispersion, which would give you some heartache.  A beta-binomial might 
help there.

Third, if you are interested in the number of saplings (irrespective of the number of adults) then you will need to take into account the variation in 
the number of adults _and_ the number of saplings per adult.  Not that the binomial solution earlier accounts for variation of the number of saplings 
_conditional_ on the number of adults. One way to do this is to assume that the number of adults is Poisson, which is the denominator for the binomial 
number of saplings.  Let lambda be the parameter for the Poisson and pi be the parameter for the binomial.  Then the distribution of the result is, 
surprisingly(?), also a Poisson with parameter lambda*pi.  I don't know of an already built R function that will do this.  Others on the list might.  
It wouldn't be that hard to build one if you wanted to go down that route.

Just had a thought -- does the number of saplings ever exceed the number of adults?  This would give regeneration > 1.  It would mean that all of the 
above is meaningless.

Have fun with it.

Scott
On 30/01/14 20:58, Adhara Pardo wrote:
> Dear R users,
>
> I would like to fit a GLM to some plant regeneration data (see bottom
> of this e-mail). The dependent variable, an index of regeneration, was
> obtained by diviving the number of saplings by the number of adults
> plants present in each plot. The result is a highly skewed variable and
> thus, specifying, for instance, a
> Gaussian distribution does not seem to be appropriate. Data
> transformation does not help either. Do you have any suggestion on the
> best distribution to choose?
>
>
> any help would be greatly appreciated!
>
> Best wishes,
>
> Adara
>
> "saplings","adults","regeneration"
> 0,1,0
> 0,2,0
> 450,4399,0.1
> 2416,25340,0.1
> 6,72,0.08
> 0,6,0
> 0,8,0
> 61,95,0.64
> 6,98,0.06
> 5,59,0.08
> 55,88,0.63
> 216,19,11.37
> 6,1,6
> 72,178,0.4
> 26,42,0.62
> 6,4,1.5
> 0,2,0
> 0,1,0
> 229,533,0.43
> 0,43,0
> 5,27,0.19
> 28,86,0.33
> 0,102,0
> 0,2,0
> 1,5,0.2
> 0,1,0
> 2,26,0.08
> 0,4,0
> 11,13,0.85
> 0,59,0
> 0,73,0
> 223,100,2.23
> 0,2,0
> 6,5,1.2
> 0,16,0
> 104,170,0.61
> 0,1,0
> 2,69,0.03
> 4,88,0.05
> 51,180,0.28
> 3,1,3
> 12,30,0.4
> 78,807,0.1
> 1,65,0.02
> 2,29,0.07
> 87,1102,0.08
> 19,2,9.5
> 18,20,0.9
> 22,23,0.96
> 0,1,0
> 20,417,0.05
> 29,64,0.45
> 0,9,0
> 0,3,0
> 0,11,0
> 51,42,1.21
> 22,17,1.29
> 15,25,0.6
> 0,32,0
> 0,13,0
> 0,7,0
> 59,710,0.08
> 0,20,0
> 0,25,0
> 2,77,0.03
> 0,37,0
> 174,882,0.2
> 50,1069,0.05
> 1,5,0.2
> 17,10,1.7
> 0,3,0
> 0,16,0
> 3,967,0
> 8,150,0.05
> 0,1,0
> 6,18,0.33
> 53,122,0.43
> 0,1,0
> 42,74,0.57
> 128,1607,0.08
> 18,114,0.16
> 0,1,0
> 13,31,0.42
> 50,123,0.41
> 11,79,0.14
> 0,28,0
> 25,106,0.24
> 106,1197,0.09
> 4,6,0.67
> 11,22,0.5
> 394,213,1.85
> 4,16,0.25
> 222,776,0.29
> 4,468,0.01
> 0,76,0
> 3,549,0.01
> 17,199,0.09
> 70,2880,0.02
> 8,396,0.02
> 0,15,0
> 14,332,0.04
> 51,318,0.16
> 2,515,0
> 14,1519,0.01
> 0,78,0
> 9,326,0.03
> 11,481,0.02
> 0,266,0
> 6,768,0.01
> 0,8,0
> 6,519,0.01
> 2,38,0.05
> 1,51,0.02
> 0,7,0
> 235,2310,0.1
> 7,521,0.01
> 0,94,0
> 3,174,0.02
> 0,8,0
> 11,205,0.05
> 0,4,0
> 0,15,0
> 4,40,0.1
> 0,28,0
> 75,208,0.36
> 7,166,0.04
> 0,15,0
> 12,143,0.08
> 0,974,0
> 160,614,0.26
> 76,85,0.89
> 0,39,0
> 0,121,0
> 304,699,0.43
> 50,48,1.04
> 11,17,0.65
> 16,211,0.08
> 2,2,1
> 140,2138,0.07
> 0,1,0
> 6,11,0.55
> 0,6,0
> 0,2,0
> 0,2,0
> 1,44,0.02
> 0,65,0
> 42,2,21
> 67,198,0.34
> 98,89,1.1
> 13,44,0.3
> 0,1,0
> 0,2,0
> 0,6,0
> 0,1,0
> 46,231,0.2
> 22,130,0.17
> 0,3,0
> 13,47,0.28
> 0,1,0
> 0,2,0
> 60,304,0.2
> 543,294,1.85
> 7,15,0.47
> 206,475,0.43
> 1,30,0.03
> 91,86,1.06
> 0,15,0
> 49,98,0.5
> 9,7,1.29
> 23,35,0.66
> 27,449,0.06
> 5,53,0.09
> 5,9,0.56
> 40,134,0.3
> 0,10,0
> 0,1,0
> 13,13,1
> 150,165,0.91
> 14,4,3.5
> 0,7,0
> 67,48,1.4
> 0,2,0
> 2,18,0.11
> 1,14,0.07
> 0,6,0
> 8,765,0.01
> 20,2860,0.01
> 1,182,0.01
> 65,146,0.45
> 1,86,0.01
> 0,4,0
> 0,1,0
> 0,17,0
> 0,8,0
> 3,38,0.08
> 188,412,0.46
> 13,1899,0.01
> 9,855,0.01
> 0,27,0
> 1,163,0.01
> 0,15,0
> 10,43,0.23
> 4,22,0.18
> 17,306,0.06
> 2,62,0.03
> 0,3,0
> 0,106,0
> 0,26,0
> 0,53,0
> 40,15,2.67
> 2,18,0.11
> 0,1,0
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>
>

-- 
Scott Foster
Computational Informatics
CSIRO
E scott.foster at csiro.au T +61 3 6232 5178
Postal address: CSIRO Computational Informatics, GPO Box 1538, Hobart TAS 7001
Street Address: CSIRO Computational Informatics, Castray Esplanade, Hobart Tas 7001, Australia
www.csiro.au



More information about the R-sig-ecology mailing list