Hi Paul,
Thanks for your response - it's much appreciated! Yes, I confess I'm new to
R and regression modeling, so it has been a challenge trying to decipher the
'R- and modeling-speak' of the experts! I do understand what you've said
below (which is great because it confirms that i'm on the right track...),
however, my question relates more to how one decides which terms to drop.
As i understand, one can use the p-values to determined which terms to drop
first (i.e. drop the term with the highest p-value, then re-run the model
and compare the two results with anova; if there's a significant difference,
then accept the new model. Correct?) This is done since one wants a model
that explains the data well, and non-significant terms don't contribute to
explaining the data, right?
The challenge with using glmer with quasipoisson is that it does not give
p-values for each term or interaction - so how does one decide which term to
drop...if any. From what i've gleaned from previous advice from others is
that one can drop any term and compare the results with the full model by
anova to get the p-value for that term. So should one do that for all the
terms first (one at a time), then drop the term with the highest p-value,
and then repeat the whole process again?
Any further advice would be greatly appreciated.
Thanks,
Raldo
PS. I had attached the data in a Text file in the previous e-mail; is that
not acceptable for R Help? (I've attached the file again and it is also
available here
).
On Tue, Oct 6, 2009 at 10:19 PM, Paul Johnson wrote:
> On Wed, Sep 30, 2009 at 2:40 PM, Raldo Kruger
wrote:
>> Chris,
>> Thanks for that - I should probably have mentioned that I'm using
>> family=quasipoisson in glmer since my data has Poisson distribution
>> as well as being overdispersed. I'm unsure how one decides which term
>> to drop without being informed by p-values, and so don't quite
>> understand how the "Likelihood ratio test using anova()" , or the AIC
>> or BIC model comparison will work in this case (I thought one's
>> supposed to remove the term with the highest p-value from the model,
>> and compare it with the model with the term included to see if there's
>> a difference, not so?).
>
> Dear Raldo:
>
> Finally there is a question that I can help with! It appears to me
> you don't have much experience with regression modeling in R and the
> other people who answer you are talking a bit "over your head".
>
> In your case, call this fit the "full" or "unrestricted" model:
>
> ex4o_r2<-glmer(Counts~N+G+Year+N:Year+G:Year+N:G:Year+(Year|Site),
> data=ex4o, family=quasipoisson)
>
> (I'm not commenting the specification).
>
> Suppose you wonder "Should I leave out the multiplicative effect
> between N and Year?" THen you fit the model that excludes that one
> element:
>
> ex4o_new <-glmer(Counts~N+G+Year +G:Year+N:G:Year+(Year|Site),
> data=ex4o, family=quasipoisson)
>
> And then use the anova function to compare the 2 models
>
> anova(ex4o_r2, ex4o_new)
>
> This is a "pretty standard" R regression thing, similar to what people
> do with all sorts of models in R. It is what the "drop1" function
> does for lm models, I believe.
>
> You may have seen regression models where an F test is done comparing
> a full model against a restricted model? This is following the same
> line of thought.
>
> To test your question about the factor levels, here is what you should
> do. SUppose the initial factor has 5 levels, and you wonder "do I
> really need 5 levels, or can I drop out the separate estimations for 3
> of the levels?" Create a new factor with the simpler structure, run
> it through the model in place of the original factor, and run anova to
> compare the 2 models. I wrote down some of those ideas in a paper
> last spring (http://pj.freefaculty.org/Papers/MidWest09/Midwest09.pdf),
> but when I was done it seemed so obvious to me (& my friends) that I
> did not try to publish it.
>
> With anova, there is a test= option where you can specify if you want
> a chisq or F test.
>
> And, for future reference, when the experts ask you for a data example
> to work on, they do not mean a copy of your printout, although that
> may help. What they want is the actual data and commands that you
> use. Usually, you have to upload the data somewhere for us to see,
> along with the code.
>
> Good luck with your project.
>
> pj
>
> --
> Paul E. Johnson
> Professor, Political Science
> 1541 Lilac Lane, Room 504
> University of Kansas
>
--
Raldo
On Tue, Oct 6, 2009 at 10:19 PM, Paul Johnson wrote:
> On Wed, Sep 30, 2009 at 2:40 PM, Raldo Kruger
> wrote:
> > Chris,
> > Thanks for that - I should probably have mentioned that I'm using
> > family=quasipoisson in glmer since my data has Poisson distribution
> > as well as being overdispersed. I'm unsure how one decides which term
> > to drop without being informed by p-values, and so don't quite
> > understand how the "Likelihood ratio test using anova()" , or the AIC
> > or BIC model comparison will work in this case (I thought one's
> > supposed to remove the term with the highest p-value from the model,
> > and compare it with the model with the term included to see if there's
> > a difference, not so?).
>
> Dear Raldo:
>
> Finally there is a question that I can help with! It appears to me
> you don't have much experience with regression modeling in R and the
> other people who answer you are talking a bit "over your head".
>
> In your case, call this fit the "full" or "unrestricted" model:
>
> ex4o_r2<-glmer(Counts~N+G+Year+N:Year+G:Year+N:G:Year+(Year|Site),
> data=ex4o, family=quasipoisson)
>
> (I'm not commenting the specification).
>
> Suppose you wonder "Should I leave out the multiplicative effect
> between N and Year?" THen you fit the model that excludes that one
> element:
>
> ex4o_new <-glmer(Counts~N+G+Year +G:Year+N:G:Year+(Year|Site),
> data=ex4o, family=quasipoisson)
>
> And then use the anova function to compare the 2 models
>
> anova(ex4o_r2, ex4o_new)
>
> This is a "pretty standard" R regression thing, similar to what people
> do with all sorts of models in R. It is what the "drop1" function
> does for lm models, I believe.
>
> You may have seen regression models where an F test is done comparing
> a full model against a restricted model? This is following the same
> line of thought.
>
> To test your question about the factor levels, here is what you should
> do. SUppose the initial factor has 5 levels, and you wonder "do I
> really need 5 levels, or can I drop out the separate estimations for 3
> of the levels?" Create a new factor with the simpler structure, run
> it through the model in place of the original factor, and run anova to
> compare the 2 models. I wrote down some of those ideas in a paper
> last spring (http://pj.freefaculty.org/Papers/MidWest09/Midwest09.pdf),
> but when I was done it seemed so obvious to me (& my friends) that I
> did not try to publish it.
>
> With anova, there is a test= option where you can specify if you want
> a chisq or F test.
>
> And, for future reference, when the experts ask you for a data example
> to work on, they do not mean a copy of your printout, although that
> may help. What they want is the actual data and commands that you
> use. Usually, you have to upload the data somewhere for us to see,
> along with the code.
>
> Good luck with your project.
>
> pj
>
> --
> Paul E. Johnson
> Professor, Political Science
> 1541 Lilac Lane, Room 504
> University of Kansas
>
--
Raldo
[[alternative HTML version deleted]]