[R-sig-ME] mixed model testing

Thu Nov 8 12:54:11 CET 2007

Dear John,

Thank you for your reply. Well then I agree, a random effect should be added if one suspects that it should be there. Neat example with pirates!

Pernicious is a too strong word, but dangerous. Since, as I said, many of the results found in the literature are based on screening for covariates of interest, including only those that have a p-value less than 0.30 (say) in a forward selection model. Using simple linear, or at best, (fractional) polynomial bases to represent covariates, and almost always ignoring interactions (difficult to present). I suppose that it would be OK to be inspired by them, but with a good amount of distrust (if Harrell is right, which seems to be the case). This distrust could perhaps be lessened when the literature concerns randomised trials although covariates may find their way into this area to (and still no interactions...).

Best regards,

Fredrik

-----Ursprungligt meddelande-----
Från: John Maindonald [mailto:John.Maindonald at anu.edu.au] 
Skickat: den 8 november 2007 12:17
Till: Nilsson Fredrik X
Kopia: r-sig-mixed-models at r-project.org
Ämne: Re: SV: [R-sig-ME] mixed model testing

For simplicity, I limited attention to a rather small class
of models.  I assumed that the only fixed effect that the
data would support is a linear term, and I do not mind
adding an intercept.  That is realistic, I believe, for data
of this type.

One should not limit oneself to a single random effect
if indeed the data are sampled in a manner (e.g., lawns
within soil types) that makes it natural to expect some
further random effect.  In my example as stated, the data
have no such structure. (NB, the random sample comment,
meaning simple random sample).

I agree that I have used Box's example outside of the
context in which he used it.  A better example might be
use of a reconnaissance that has only a 20% chance
of detecting such pirates as may be present on the high
seas, before deciding whether a valuable cargo that will
venture into those seas should have an escort.

Removing insignificant terms can help understanding
and interpretation.  But if one wants to make anything
of the coefficients, it is necessary to check that the
remaining coefficients have not changed substantially.
If there are more than a few terms to consider, and data
are not from a designed experiment, and attempt to
interpret coefficients is likely to be hazardous. You
mentioned the Harrell book. Rosenbaum's "Observational
Studies" (2edn, Springer, 2002) merits careful attention.

Why do you think Harrell's suggestion pernicious?  Models
that are in the literature can be a good starting point, and
the accompanying discussion an aid to understanding
the science. They may turn out to be more or less right
(as far as one can tell), or to require modification, or the
data may demolish them.  But at least one has a starting
point, rather than an almost unlimited choice of models.

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.

On 8 Nov 2007, at 8:00 PM, Nilsson Fredrik X wrote:

> Dear John,
>
> Forgive me for putting my nose out, I hope that I'm not rude, but I  
> am a bit bewildered by your mail (and by statistical modelling).
>
> I agree that if your model is:
>
> Lawndepression~lawn.roller.weight +(1|lawn.id),
>
> When, in fact, it *should be* (because you simulated the data or  
> you're God):
>
> Lawndepression~lawn.roller.weight +(lawn.roller.weight|lawn.id),
>
> then you might erroneously fail to reject the null hypothesis that  
> the random effect for slope is zero. But in real life one does not  
> know the true model, and there are an infinite number of (functional  
> forms for) random effects that such tests may fail upon. What should  
> one do, why stop at the linear term? Why not saturate the model with  
> random effects? (And still, you don't know whether you have the  
> right model).
>
> In your example one would perhaps like to think that VERY light lawn  
> movers did not cause any depression at all, and that there is a  
> maximum depression that a lawn mover could cause, so there should at  
> least be an f(lawn.roller.weight), or we do as Venables suggest in http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf 
>  : we centre our lawn mover weights and keep in the middle of this  
> interval where things look linear and fit a linear, Gaussian model.
>
> Then I don't quite get your point from Box' quote since that  
> concerned heterogeneous variances, right? Here the situation is  
> quite the opposite to rejecting the null, using a test for  
> heterogeneous variances that is more sensitive than the ANOVA for  
> departures from homogeneity is akin to sending out a rowing boat on  
> a rough sea where the ocean liner (ANOVA) would safely fare. We  
> failed to reject the null hypothesis of zero slope.
>
> So what is your suggestion of a sensible modelling strategy?  
> (Without being biased by what you see in your dataset; I used to  
> like inspecting the data fitting lmList if possible, then fitting a  
> rather complex model, and then removing insignificant terms, then  
> checking assumptions. After having read Harrell's book (Regression  
> modelling strategies) I'm a bit uncertain what to do when people ask  
> me to analyze their data, since they don't like to think too much  
> about it. Harrell's suggestion that one could check the literature  
> for a sensible model seems pernicious to me since these old results  
> are based on the very same modelling strategy that he rejects.  
> Should one use the Bayesian framework with flat priors?)
>
>
> Best regards,
>
> Fredrik Nilsson
>
> -----Ursprungligt meddelande-----
> Från: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org 
> ] För John Maindonald
> Skickat: den 7 november 2007 22:47
> Till: Irene Mantzouni
> Kopia: r-sig-mixed-models at r-project.org; r-help at stat.math.ethz.ch
> Ämne: Re: [R-sig-ME] mixed model testing
>
> Whether or not you need a mixed model, e.g. random versus
> fixed slopes, depends on how you intend to use results.
>
> Suppose you have lines of depression vs lawn roller weight
> calculated for a number of lawns. If the data will always be
> used to make predictions for one of those same lawns, a
> fixed slopes model is fine.
>
> If you want to use the data to make a prediction for another
> lawn from the same "population" (the population from which
> this lawn is a random sample, right?), you need to model
> the slope as a random effect.
>
> Now for a more subtle point:
>
> In the prediction for another lawn situation, it is possible that
> the slope random effect can be zero, and analysts do very
> commonly make this sort of assumption, maybe without
> realizing that this is what they are doing.  You can test whether
> the slope random effect is zero but, especially if you have data
> from a few lawns only, failure to reject the null (zero random
> effect) is not a secure basis for inferences that assume that
> the slope is indeed zero. The "test for zero random effect, then
> infer" is open to Box's pithy objection that
> "... to make preliminary tests on variances is rather like putting to
> sea in a rowing boat to find out whether conditions are sufficiently
> calm for an ocean liner to leave port".
>
>
> John Maindonald             email: john.maindonald at anu.edu.au
> phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
> Centre for Mathematics & Its Applications, Room 1194,
> John Dedman Mathematical Sciences Building (Building 27)
> Australian National University, Canberra ACT 0200.
>
>
> On 8 Nov 2007, at 1:55 AM, Irene Mantzouni wrote:
>
>> Is there a formal way to prove the need of a mixed model, apart from
>> e.g. comparing the intervals estimated by lmList fit?
>> For example, should I compare (with AIC ML?) a model with seperately
>> (unpooled) estimated fixed slopes (i.e.using an index for each
>> group) with a model that treats this parameter as a random effect
>> (both models treat the remaining parameters as random)?
>>
>> Thank you!
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models