[R-sig-eco] Log transforming zero value data
Scott
scott.foster at csiro.au
Thu Jun 25 01:18:17 CEST 2009
Hi Nate,
Here is my 2 cents worth after coming in late to this discussion.
The fact that your data are proportions is important as it suggests how
the data may vary. Do you have the numerator and denominator used to
calculate the proportions? If so then I would suggest that you should be
performing a binomial GLM with these data.
If you don't have these data, or are disinclined to use them for some
reason (why?) then I would strongly suggest considering a asin( sqrt(
p)) transformation where p is in [0,1]. There is some justification for
this: namely that this transformation stabilises the variance of a
binomial variable. That is, it makes the use of un-weighted least
squares more appropriate but, of course, the distributional assumptions
leading to tests of significance etc may still require checking.
The log transformation has similar motivation, but for a different
situation. It is the variance stabilising transformation for when the
data are Poisson.
I find it interesting that these pieces of info were passed down to me
by my PhD supervisor, who (like Carsten's supervisor) was right about so
many things.
HTH,
Scott
Nate Upham wrote:
> Thanks very much indeed Carsten and Philippe!
> Lots to consider. I should have specified this before, but the
> variable with zero values that I would like to log (ln) transform does
> consist of many small values. The range is between 0.00 and 0.35,
> since this variable is the percentage abundance of bipedal rodents
> captured on a given night of trapping:
>
> Y <- c(0.040, 0.040, 0.030, 0.000, 0.030, 0.055, 0.120, 0.050, 0.160,
> 0.130, 0.150, 0.040, 0.080, 0.130, 0.150, 0.110, 0.280, 0.170, 0.000,
> 0.230, 0.140, 0.340, 0.000, 0.000, 0.000, 0.150, 0.020, 0.093, 0.065,
> 0.043, 0.030, 0.030, 0.055, 0.100, 0.007, 0.010, 0.030, 0.000, 0.140,
> 0.025, 0.090, 0.015, 0.078, 0.160, 0.010, 0.100, 0.000, 0.010, 0.050,
> 0.010, 0.000, 0.043, 0.087, 0.040, 0.020, 0.057, 0.107, 0.110, 0.190,
> 0.110, 0.055, 0.030, 0.091, 0.090, 0.020, 0.350, 0.200, 0.177, 0.350)
>
> From your "rules of thumb" advice, it sounds like adding 1 to this
> data through log1p() might be quite distorting to the analyses. This
> would deal with the issue of zero values (log(0+1)=0), but small
> positive values such as 0.01 would go from -4.605 to 0.00995 by log(x
> +1). Adding 0.5 is only slightly better (log(0.01+0.5)= -0.6733).
> Should I assume that this effect will "even out" over all values since
> the log(x+1) transformation is applied to the entire variable?
>
> Or, is it best to go with one of these alternatives for the c in log(x
> +c):
> 1. c <- signif(0.5*sort(unique(Y))[2], 2) #c=0.0035
> 2. c <- (quantile(Y)[2]^2)/quantile(Y)[4] #c=0.0048
>
> Does anyone have English references for alternatives 1 or 2?
> This is super helpful, many thanks!
> --Nate
>
>
> On Jun 24, 2009, at 6:59 AM, Matthew Landis wrote:
>
>
>> Many thanks to Carsten, Philippe, and Nate for a very informative
>> and entertaining discussion of something I have always wondered
>> about, having heard suggestions for both approaches. At least now I
>> have a better understanding of the rationale for each!
>>
>> Matt
>>
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> Matthew Landis
>> Dept. Biology
>> Middlebury College
>> Middlebury VT 05753
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>
> _________________________________
> Nathan S. Upham
> Ph.D. student
> Committee on Evolutionary Biology
> University of Chicago
> 1025 E. 57th St., Culver 402
> Chicago, IL 60637
> nsupham at uchicago.edu
> _________________________________
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>
--
Scott Foster
CSIRO Mathematical and Information Sciences
GPO Box 1538
Castray Esplanade
Hobart 7001
Tasmania
Australia
Phone: (03) 6232 5178
Fax: (03) 6232 5000
Email: scott.foster at csiro.au
More information about the R-sig-ecology
mailing list