[R-sig-eco] Log transforming zero value data
Nate Upham
nsupham at uchicago.edu
Thu Jun 25 01:21:51 CEST 2009
Hey there Ben,
I was just checking out your book actually. When you say that I should do this as a binomial
analysis, is that because this variable is distributed similarly to a "zero-inflated binomial
distribution"?
Since all my data are non-normal, and my comparisons have heterogeneous variances, I have been using
quantile regression to tease apart the influence of a habitat variable in upper quantiles as a
limiting factor (in the spirit of Cade et al 1999, 2003). I am log transforming this response
variable for quantile regression, and then back-transforming it when I plot the lines at different
quantiles.
I do have access to the "denominator" as # of traps set per night, but is it possible (or desirable)
to incorporate the binomial distribution here when my focus is on quantile regression?
Thanks much, best,
--Nate
---- Original message ----
>Date: Wed, 24 Jun 2009 17:13:07 -0400
>From: Ben Bolker <bolker at ufl.edu>
>Subject: Re: [R-sig-eco] Log transforming zero value data
>To: Nate Upham <nsupham at uchicago.edu>
>Cc: Matthew Landis <rlandis at middlebury.edu>, "r-sig-ecology at r-project.org"
<r-sig-ecology at r-project.org>
>
> If you have percentage abundances (i.e. you can specify the
>"denominator" for each data point) then it would probably
>be best (if you can manage it) to do this as a binomial
>analysis -- the data are unlikely to be normal (if your
>analysis depends on this) after transformation. Based on the
>data below, plot(table(Y)) shows that zero is in fact the
>most common observation. If you can't get denominators
>(although this would surprise me), you can try a beta regression,
>although it will be harder to incorporate e.g. block
>effects in the model.
>
> Ben Bolker
>
>Nate Upham wrote:
>> Thanks very much indeed Carsten and Philippe!
>> Lots to consider. I should have specified this before, but the
>> variable with zero values that I would like to log (ln) transform does
>> consist of many small values. The range is between 0.00 and 0.35,
>> since this variable is the percentage abundance of bipedal rodents
>> captured on a given night of trapping:
>>
>> Y <- c(0.040, 0.040, 0.030, 0.000, 0.030, 0.055, 0.120, 0.050, 0.160,
>> 0.130, 0.150, 0.040, 0.080, 0.130, 0.150, 0.110, 0.280, 0.170, 0.000,
>> 0.230, 0.140, 0.340, 0.000, 0.000, 0.000, 0.150, 0.020, 0.093, 0.065,
>> 0.043, 0.030, 0.030, 0.055, 0.100, 0.007, 0.010, 0.030, 0.000, 0.140,
>> 0.025, 0.090, 0.015, 0.078, 0.160, 0.010, 0.100, 0.000, 0.010, 0.050,
>> 0.010, 0.000, 0.043, 0.087, 0.040, 0.020, 0.057, 0.107, 0.110, 0.190,
>> 0.110, 0.055, 0.030, 0.091, 0.090, 0.020, 0.350, 0.200, 0.177, 0.350)
>>
>> From your "rules of thumb" advice, it sounds like adding 1 to this
>> data through log1p() might be quite distorting to the analyses. This
>> would deal with the issue of zero values (log(0+1)=0), but small
>> positive values such as 0.01 would go from -4.605 to 0.00995 by log(x
>> +1). Adding 0.5 is only slightly better (log(0.01+0.5)= -0.6733).
>> Should I assume that this effect will "even out" over all values since
>> the log(x+1) transformation is applied to the entire variable?
>>
>> Or, is it best to go with one of these alternatives for the c in log(x
>> +c):
>> 1. c <- signif(0.5*sort(unique(Y))[2], 2) #c=0.0035
>> 2. c <- (quantile(Y)[2]^2)/quantile(Y)[4] #c=0.0048
>>
>> Does anyone have English references for alternatives 1 or 2?
>> This is super helpful, many thanks!
>> --Nate
>>
>>
>> On Jun 24, 2009, at 6:59 AM, Matthew Landis wrote:
>>
>>> Many thanks to Carsten, Philippe, and Nate for a very informative
>>> and entertaining discussion of something I have always wondered
>>> about, having heard suggestions for both approaches. At least now I
>>> have a better understanding of the rationale for each!
>>>
>>> Matt
>>>
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> Matthew Landis
>>> Dept. Biology
>>> Middlebury College
>>> Middlebury VT 05753
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>> _________________________________
>> Nathan S. Upham
>> Ph.D. student
>> Committee on Evolutionary Biology
>> University of Chicago
>> 1025 E. 57th St., Culver 402
>> Chicago, IL 60637
>> nsupham at uchicago.edu
>> _________________________________
>>
>>
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-ecology mailing list
>> R-sig-ecology at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>
>
>--
>Ben Bolker
>Associate professor, Biology Dep't, Univ. of Florida
>bolker at ufl.edu / www.zoology.ufl.edu/bolker
>GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc
>
>________________
>signature.asc (1k bytes)
_________________________________
Nathan S. Upham
Ph.D. student
Committee on Evolutionary Biology
University of Chicago
1025 E. 57th St., Culver 402
Chicago, IL 60637
nsupham at uchicago.edu
More information about the R-sig-ecology
mailing list