[R] Error distribution question
Peter Dunn
dunn at usq.edu.au
Fri Mar 9 00:44:40 CET 2007
> > I was wondering if somebody could offer me some advice on which
> > error distribution would be appropriate for the type of data I have.
> > I'm studying what continuous predictor variables such as grooming
> > received, rank, etc. affect the amount of grooming given. This
> > response variable is continuous with many zeros, and so positively
> > skewed.
>
> This kind of variable is very common in prospecting (oil, mining)
> industries, and also in medical research. It's neither continuous
> nor discrete, because of the weight on zero. Basically, it is a
> combination of _two_ variables:
>
> X: a Bernoulli trial, such that p(X = 0) = 1 - p (failure) and
> p(X = 1) = p (success)
>
> Y: the continous variable that represents numerically the success
>
> So, we have the final variable as X * Y.
Indeed, the Tweedie distribution may be just what you are
after.
> I realized in the Tweedie help page that one can use a specific response
> distribution (Normal, Poisson, Compound Poisson, etc) by setting the
> variance power = to a specific number. I'm a beginner, so I really don't
> follow then,
This sounds like you have the tweedie package.
And yes, the variance.power tells you which distribution you have.
Tweedie distributions have a variance of the form var[Y] = phi * mu^p
for some variance.power p. (Note Tweedie distns belong to the
exponential family, so can be used in the generalized linear model
framework.)
The mixed distributions you talk about (continuous, plus a positive
mass at zero) correspond to tweedie distributions with 1 < p < 2.
(p=2 is the gamma; p=0 is Normal; p=3 is inverse Gaussian; p=1
and phi=1 is Poisson).
> which response distribution to use (i.e. what variance power) that would
> be appropriate for continuous response data with many zeros.
If you want to use a tweedie distn in practice, you first need to know
*which* Tweedie distribution you need; that is, what value of p is
appropriate. To do that, use the tweedie.profile function in
package tweedie. tTat will tell you what value of p is approprioate
for your data. For the sake of an example, suppose you wish to fit
a model something like Y ~ x1 + x2; use tweedie.profile
and you get p = 1.6:
tweedie.profile(Y ~ x1 + x2, p.vec=seq(1.1, 1.9, length=10),
do.plot=TRUE)
Then, you can fit the appropriate generalized linear model if you wish
as follow, using package statmod:
glm( Y ~ x1 + x2, family=tweedie(variance.power=1,.6, link.power=0)
(link.power=0 means a log, and is a commonly used link.)
Hope that's of some help.
P.
--
Dr Peter Dunn | dunn <at> usq.edu.au
Faculty of Sciences, USQ; http://www.sci.usq.edu.au/staff/dunn
Aust. Centre for Sustainable Catchments: www.usq.edu.au/acsc
This email (including any attached files) is confidential an...{{dropped}}
More information about the R-help
mailing list