[R] finding an unknown distribution
Rubén Roa-Ureta
rroa at udec.cl
Mon Apr 21 21:43:09 CEST 2008
andrea previtali wrote:
> Hi,
> I need to analyze the influences of several factors on a variable that is a measure of fecundity, consisting of 73 observations ranging from 0 to 5. The
> variable is continuous and highly positive skewed, none of the typical
> transformations was able to normalize the data. Thus, I was thinking in analyzing these data using a generalized linear model where I
> can specify a distribution other than normal. I'm thinking it may fit a
> gamma or exponential distribution. But I'm not sure if the data meets
> the assumptions of those distributions because their definitions are
> too complex for my understanding!
Roughly, the exponential distribution is the model of a random variable
describing the time/distance between two independent events that occur
at the same constant rate. The gamma distribution is the model of a
random variable that can be thought of as the sum of exponential random
variables. I don't think fecundity data, the count of reproductive
cells, qualifies as a random variable to be modeled by either of these
distributions. If the count of reproductive cells is very large, and you
are modeling this count as a function of animal size, such as length,
you should consider the lognormal distribution, since the count of cells
grow multiplicatively (volumetrically) with the increase in length. In
that case you can model your response variable using glm with
family=gaussian(link="log").
Rubén
More information about the R-help
mailing list