[R-sig-eco] Log transforming zero value data

Scott scott.foster at csiro.au
Thu Jun 25 01:18:17 CEST 2009


Hi Nate,

Here is my 2 cents worth after coming in late to this discussion.

The fact that your data are proportions is important as it suggests how 
the data may vary. Do you have the numerator and denominator used to 
calculate the proportions? If so then I would suggest that you should be 
performing a binomial GLM with these data.

If you don't have these data, or are disinclined to use them for some 
reason (why?) then I would strongly suggest considering a asin( sqrt( 
p)) transformation where p is in [0,1]. There is some justification for 
this: namely that this transformation stabilises the variance of a 
binomial variable. That is, it makes the use of un-weighted least 
squares more appropriate but, of course, the distributional assumptions 
leading to tests of significance etc may still require checking.

The log transformation has similar motivation, but for a different 
situation. It is the variance stabilising transformation for when the 
data are Poisson.

I find it interesting that these pieces of info were passed down to me 
by my PhD supervisor, who (like Carsten's supervisor) was right about so 
many things.

HTH,

Scott

Nate Upham wrote:
> Thanks very much indeed Carsten and Philippe!
> Lots to consider.  I should have specified this before, but the  
> variable with zero values that I would like to log (ln) transform does  
> consist of many small values.  The range is between 0.00 and 0.35,  
> since this variable is the percentage abundance of bipedal rodents  
> captured on a given night of trapping:
>
> Y <- c(0.040, 0.040, 0.030, 0.000, 0.030, 0.055, 0.120, 0.050, 0.160,  
> 0.130, 0.150, 0.040, 0.080, 0.130, 0.150, 0.110, 0.280, 0.170, 0.000,  
> 0.230, 0.140, 0.340, 0.000, 0.000, 0.000, 0.150, 0.020, 0.093, 0.065,  
> 0.043, 0.030, 0.030, 0.055, 0.100, 0.007, 0.010, 0.030, 0.000, 0.140,  
> 0.025, 0.090, 0.015, 0.078, 0.160, 0.010, 0.100, 0.000, 0.010, 0.050,  
> 0.010, 0.000, 0.043, 0.087, 0.040, 0.020, 0.057, 0.107, 0.110, 0.190,  
> 0.110, 0.055, 0.030, 0.091, 0.090, 0.020, 0.350, 0.200, 0.177, 0.350)
>
>  From your "rules of thumb" advice, it sounds like adding 1 to this  
> data through log1p() might be quite distorting to the analyses.  This  
> would deal with the issue of zero values (log(0+1)=0), but small  
> positive values such as 0.01 would go from -4.605 to 0.00995 by log(x 
> +1).  Adding 0.5 is only slightly better (log(0.01+0.5)= -0.6733).   
> Should I assume that this effect will "even out" over all values since  
> the log(x+1) transformation is applied to the entire variable?
>
> Or, is it best to go with one of these alternatives for the c in log(x 
> +c):
> 1.  c <- signif(0.5*sort(unique(Y))[2], 2)   #c=0.0035
> 2.  c <- (quantile(Y)[2]^2)/quantile(Y)[4]   #c=0.0048
>
> Does anyone have English references for alternatives 1 or 2?
> This is super helpful, many thanks!
> --Nate
>
>
> On Jun 24, 2009, at 6:59 AM, Matthew Landis wrote:
>
>   
>> Many thanks to Carsten, Philippe, and Nate for a very informative  
>> and entertaining discussion of something I have always wondered  
>> about, having heard suggestions for both approaches.  At least now I  
>> have a better understanding of the rationale for each!
>>
>> Matt
>>
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> Matthew Landis
>> Dept. Biology
>> Middlebury College
>> Middlebury VT 05753
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>     
>
> _________________________________
> Nathan S. Upham
> Ph.D. student
> Committee on Evolutionary Biology
> University of Chicago
> 1025 E. 57th St., Culver 402
> Chicago, IL 60637
> nsupham at uchicago.edu
> _________________________________
>
>
>
>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>   

-- 
Scott Foster
CSIRO Mathematical and Information Sciences
GPO Box 1538
Castray Esplanade
Hobart 7001
Tasmania 
Australia

Phone:     (03) 6232 5178
Fax:       (03) 6232 5000
Email:     scott.foster at csiro.au



More information about the R-sig-ecology mailing list