[R-sig-eco] Log transforming zero value data

Nate Upham nsupham at uchicago.edu
Thu Jun 25 01:21:51 CEST 2009


Hey there Ben,
I was just checking out your book actually.  When you say that I should do this as a binomial
analysis, is that because this variable is distributed similarly to a "zero-inflated binomial
distribution"?

Since all my data are non-normal, and my comparisons have heterogeneous variances, I have been using
quantile regression to tease apart the influence of a habitat variable in upper quantiles as a
limiting factor (in the spirit of Cade et al 1999, 2003).  I am log transforming this response
variable for quantile regression, and then back-transforming it when I plot the lines at different
quantiles.  

I do have access to the "denominator" as # of traps set per night, but is it possible (or desirable)
to incorporate the binomial distribution here when my focus is on quantile regression?

Thanks much, best,
--Nate


---- Original message ----
>Date: Wed, 24 Jun 2009 17:13:07 -0400
>From: Ben Bolker <bolker at ufl.edu>  
>Subject: Re: [R-sig-eco] Log transforming zero value data  
>To: Nate Upham <nsupham at uchicago.edu>
>Cc: Matthew Landis <rlandis at middlebury.edu>, "r-sig-ecology at r-project.org"
<r-sig-ecology at r-project.org>
>
>  If you have percentage abundances (i.e. you can specify the
>"denominator" for each data point) then it would probably
>be best (if you can manage it) to do this as a binomial
>analysis -- the data are unlikely to be normal (if your
>analysis depends on this) after transformation. Based on the
>data below, plot(table(Y)) shows that zero is in fact the
>most common observation.  If you can't get denominators
>(although this would surprise me), you can try a beta regression,
>although it will be harder to incorporate e.g. block
>effects in the model.
>
>  Ben Bolker
>
>Nate Upham wrote:
>> Thanks very much indeed Carsten and Philippe!
>> Lots to consider.  I should have specified this before, but the  
>> variable with zero values that I would like to log (ln) transform does  
>> consist of many small values.  The range is between 0.00 and 0.35,  
>> since this variable is the percentage abundance of bipedal rodents  
>> captured on a given night of trapping:
>> 
>> Y <- c(0.040, 0.040, 0.030, 0.000, 0.030, 0.055, 0.120, 0.050, 0.160,  
>> 0.130, 0.150, 0.040, 0.080, 0.130, 0.150, 0.110, 0.280, 0.170, 0.000,  
>> 0.230, 0.140, 0.340, 0.000, 0.000, 0.000, 0.150, 0.020, 0.093, 0.065,  
>> 0.043, 0.030, 0.030, 0.055, 0.100, 0.007, 0.010, 0.030, 0.000, 0.140,  
>> 0.025, 0.090, 0.015, 0.078, 0.160, 0.010, 0.100, 0.000, 0.010, 0.050,  
>> 0.010, 0.000, 0.043, 0.087, 0.040, 0.020, 0.057, 0.107, 0.110, 0.190,  
>> 0.110, 0.055, 0.030, 0.091, 0.090, 0.020, 0.350, 0.200, 0.177, 0.350)
>> 
>>  From your "rules of thumb" advice, it sounds like adding 1 to this  
>> data through log1p() might be quite distorting to the analyses.  This  
>> would deal with the issue of zero values (log(0+1)=0), but small  
>> positive values such as 0.01 would go from -4.605 to 0.00995 by log(x 
>> +1).  Adding 0.5 is only slightly better (log(0.01+0.5)= -0.6733).   
>> Should I assume that this effect will "even out" over all values since  
>> the log(x+1) transformation is applied to the entire variable?
>> 
>> Or, is it best to go with one of these alternatives for the c in log(x 
>> +c):
>> 1.  c <- signif(0.5*sort(unique(Y))[2], 2)   #c=0.0035
>> 2.  c <- (quantile(Y)[2]^2)/quantile(Y)[4]   #c=0.0048
>> 
>> Does anyone have English references for alternatives 1 or 2?
>> This is super helpful, many thanks!
>> --Nate
>> 
>> 
>> On Jun 24, 2009, at 6:59 AM, Matthew Landis wrote:
>> 
>>> Many thanks to Carsten, Philippe, and Nate for a very informative  
>>> and entertaining discussion of something I have always wondered  
>>> about, having heard suggestions for both approaches.  At least now I  
>>> have a better understanding of the rationale for each!
>>>
>>> Matt
>>>
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> Matthew Landis
>>> Dept. Biology
>>> Middlebury College
>>> Middlebury VT 05753
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> 
>> _________________________________
>> Nathan S. Upham
>> Ph.D. student
>> Committee on Evolutionary Biology
>> University of Chicago
>> 1025 E. 57th St., Culver 402
>> Chicago, IL 60637
>> nsupham at uchicago.edu
>> _________________________________
>> 
>> 
>> 
>> 
>> 
>> 	[[alternative HTML version deleted]]
>> 
>> _______________________________________________
>> R-sig-ecology mailing list
>> R-sig-ecology at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>
>
>-- 
>Ben Bolker
>Associate professor, Biology Dep't, Univ. of Florida
>bolker at ufl.edu / www.zoology.ufl.edu/bolker
>GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc
>
>________________
>signature.asc (1k bytes)
_________________________________
Nathan S. Upham
Ph.D. student
Committee on Evolutionary Biology
University of Chicago
1025 E. 57th St., Culver 402
Chicago, IL 60637
nsupham at uchicago.edu



More information about the R-sig-ecology mailing list