[R-sig-eco] Log transforming zero value data

Nate Upham nsupham at uchicago.edu
Thu Jun 25 04:25:25 CEST 2009


Hey Ben,
In most of my data plots there is a "factor-ceiling"-type distribution, which I have been describing
as "wedge-shaped" or "triangular-shaped."  The data looks like a right triangle with the central
tendency and variation decreasing as a linear function of the habitat variable (X).  This pattern of
data seems to be fairly common in ecology (either increasing or decreasing), mostly as a result of
unmeasured habitat factors influencing the response variable.  Either way, Cade and others have been
advocating its uses in ecology for several years, so I thought I would give it a try with my data. 
In the quantreg package in R, the use of log() and exp() enable you to model the different quantiles
with (in my case) a log-decay type function.

Is the idea that since Y has a binomial distribution, that a GLM-based approach could tease apart
specific impacts of the habitat variable on average densities in Y?

Thanks,
--Nate


---- Original message ----
>Date: Wed, 24 Jun 2009 21:25:36 -0400
>From: Ben Bolker <bolker at ufl.edu>  
>Subject: Re: [R-sig-eco] Log transforming zero value data  
>To: Nate Upham <nsupham at uchicago.edu>
>Cc: "r-sig-ecology at r-project.org" <r-sig-ecology at r-project.org>
>
>Nate Upham wrote:
>> Hey there Ben,
>> I was just checking out your book actually.  When you say that I should do this as a binomial
>> analysis, is that because this variable is distributed similarly to a "zero-inflated binomial
>> distribution"?
>
>  It's not necessarily zero-inflated -- a moderately high proportion of
>zeros is a natural property of (non-inflated) binomial distributions
>with small p and small/moderate N.
>
>> 
>> Since all my data are non-normal, and my comparisons have heterogeneous variances, I have been using
>> quantile regression to tease apart the influence of a habitat variable in upper quantiles as a
>> limiting factor (in the spirit of Cade et al 1999, 2003).  I am log transforming this response
>> variable for quantile regression, and then back-transforming it when I plot the lines at different
>> quantiles.  
>> 
>> I do have access to the "denominator" as # of traps set per night, but is it possible (or desirable)
>> to incorporate the binomial distribution here when my focus is on quantile regression?
>
>  If you're focusing on quantile regression because you're really
>interested in limiting factors or "ceilings" then I'd say stick with
>log-transforming (but you should definitely use additive constants that
>are "small" with respect to the variation in your data, according to one
>of the recipes specified earlier in this thread -- log(1+x) and
>log(0.5+x) really don't make sense for your data set).  [I don't really
>know what assumptions are required for inference in quantile regression,
>although I'm guessing they're pretty loose -- in particular, no
>assumption of normality -- but it may implicitly assume continuous
>distributions?]
>   On the other hand, if you turned to QR as a way to get around
>heterogeneous variances etc., and you would really prefer to be able to
>draw conclusions about the average densities etc., then you might
>well be able to get farther with a binomial/GLM-based approach.  Do your
>data look like "factor-ceiling" distributions as in Cade et al?
>
>-- 
>Ben Bolker
>Associate professor, Biology Dep't, Univ. of Florida
>bolker at ufl.edu / www.zoology.ufl.edu/bolker
>GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc
>
>________________
>signature.asc (1k bytes)
_________________________________
Nathan S. Upham
Ph.D. student
Committee on Evolutionary Biology
University of Chicago
1025 E. 57th St., Culver 402
Chicago, IL 60637
nsupham at uchicago.edu



More information about the R-sig-ecology mailing list