[R-sig-eco] Zero-inflated model inquiry

Thu Sep 27 18:25:14 CEST 2012

Hi Peter,
With a simple model you may get satisfaction with glht() in multcomp, 
but once your models get more complicated, success with glht tends to 
wane - at least in my experience. [If anyone can provide code to use 
glht for multiple factors or whether addressing each factor one at a 
time works in this situation, please post.]

One approach to getting those 'missing' comparisons is to change the 
base in the contrast treatments. For example:

contrasts(data$impact_period)=contr.treatment(levels(data$impact_period),base="BEFORE")

Someone can correct me if I am wrong, but because these are treatment 
contrasts (mean effect), moving the reference level has no effect on 
coefficients.

Often I put the line above in a loop and change the base accordingly.

cheers,
trevor

On 9/27/2012 7:00 AM, r-sig-ecology-request at r-project.org wrote:
> Send R-sig-ecology mailing list submissions to
> 	r-sig-ecology at r-project.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> 	https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
> or, via email, send a message with subject or body 'help' to
> 	r-sig-ecology-request at r-project.org
>
> You can reach the person managing the list at
> 	r-sig-ecology-owner at r-project.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of R-sig-ecology digest..."
>
>
> Today's Topics:
>
>     1. Re: Zero-inflated model inquiry (Drew Tyre)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 26 Sep 2012 08:38:27 -0500
> From: Drew Tyre <atyre2 at unl.edu>
> To: Peter Houk <peterhouk at gmail.com>
> Cc: r-sig-ecology at r-project.org
> Subject: Re: [R-sig-eco] Zero-inflated model inquiry
> Message-ID:
> 	<CAE-6tvwbp7e=ABLYK_HdbVR6R_5do=-RVjUm=fyaP47K622uFA at mail.gmail.com>
> Content-Type: text/plain; charset="ISO-8859-1"
>
> Hi Peter,
>
> Your assumption that Before and During are contrasted with After is
> correct. By default R parameterizes categorical variables using
> treatment contrasts which compare each level to the first one, and the
> default sorting is lexicographic, so AFTER becomes the first level.
> Your model is indicating that the average abundance both BEFORE and
> DURING are significantly different from the AFTER. It sounds like what
> you'd like to know is also BEFORE different from DURING. I see a
> couple things you could try
> 1) Make predictions of the average urchin_abundance from the model for
> each period along with confidence intervals. Use the confidence
> intervals to decide what is the same and different.
> 2) Change your formula to urchin_density~impact_period-1. This will
> give you a distinct estimate for each period, and make construction of
> the confidence intervals in 1 very easy, but still won't give you all
> the pairwise comparisons.
> 3) Check the package multcomp and use it to find the appropriate
> contrasts for all three levels. I'm not sure this will work for models
> from the pscl package.
>
> hth
>
> On Tue, Sep 25, 2012 at 10:50 PM, Peter Houk <peterhouk at gmail.com> wrote:
>> Greetings -
>>
>> I have a question regarding the use of zero-inflated models for count
>> data.  I have a very basic count dataset consisting of sea urchin density
>> estimates conducted across 20 sites (random: pooled for this example)
>> during three timeframes (fixed: 1-before disturbance, 2-during disturbance,
>> and 3-after disturbance).  For this example, I'm simply looking to
>> interpret significant differences across timeframes.  After initial
>> examinations, the data lend themselves well to an overdispersed, negative
>> binomial distribution (i.e., hurdle approach using the R package pscl).
>>
>> Using the code:
>>
>>> f1<-formula(urchin_density~impact_period)
>>> H1<-hurdle(f1, dist="negbin", link="logit")
>>> summary(H1)
>> provides:
>>
>> Count model coefficients (truncated negbin with log link):
>>                      Estimate Std. Error z value Pr(>|z|)
>> (Intercept)           0.7212     0.1546   4.664 3.10e-06 ***
>> impact_periodBefore   0.6374     0.1713   3.720 0.000199 ***
>> impact_periodDuring   0.6850     0.1696   4.039 5.37e-05 ***
>> Log(theta)           -0.6671     0.2262  -2.949 0.003184 **
>> Zero hurdle model coefficients (binomial with logit link):
>>                      Estimate Std. Error z value Pr(>|z|)
>> (Intercept)          0.51904    0.12824   4.048 5.18e-05 ***
>> impact_periodBefore  0.01869    0.20111   0.093    0.926
>> impact_periodDuring -0.03353    0.19718  -0.170    0.865
>> ---
>> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>>
>> Theta: count = 0.5132
>> Number of iterations in BFGS optimization: 11
>> Log-likelihood: -1377 on 7 Df
>>
>>
>>
>> Before moving to more complex models, my question is regarding whether or
>> not this is the right approach, and if so, why are there no results for the
>> "after" impact period.  My assumption is that both the "before" and
>> "during" time periods are being contrasted against the "after" here, but
>> how can one contrast all three groups to look for significance?  Last, how
>> does one logically translate the two parts of the results?
>>
>>
>> Insight appreciated, I'm aware there are extensive textbooks on the
>> subject, but trying to get an initial feel for things.
>>
>> Peter
>>
>>
>>
>> --
>> Peter Houk, PhD
>> Chief Biologist
>> Pacific Marine Resources Institute
>> www.pacmares.com
>> www.micronesianfishing.com
>>
>>          [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-ecology mailing list
>> R-sig-ecology at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>>
>
>