[R] question in using nlme and lme4 for unbalanced data

Mike Marchywka marchywka at hotmail.com
Tue Nov 2 13:17:54 CET 2010

> Date: Mon, 1 Nov 2010 17:38:54 -0700
> From: djmuser at gmail.com
> To: cyuan at email.arizona.edu
> CC: r-help at r-project.org
> Subject: Re: [R] question in using nlme and lme4 for unbalanced data
> Hi:
> On Mon, Nov 1, 2010 at 3:59 PM, Chi Yuan  wrote:
> > Hello:
> > I need some help about using mixed for model for unbalanced data. I
> > have an two factorial random block design. It's a ecology
> Unbalanced data is not a problem in either package. However, five blocks is
> rather at the boundary of whether or not one can compute reliable variance
> components and random effects. Given that the variance estimate of blocks in
> your models was nearly zero, you're probably better off treating them as
> fixed rather than random and analyzing the data with a fixed effects model
> instead.
> Another question is about p values.
> > I kind of heard the P value does not matter that much in the mixed
> > model because it's not calculate properly.
> No. p-values are not calculated in lme4 (as I understand it) because,
> especially in the case of severely unbalanced data, the true sampling
> distributions of the test statistics in small to moderate samples are not
> necessarily close to the asymptotic distributions used to compute the
> corresponding p-values. It's the (sometimes gross) disparity between the
> small-sample and asymptotic distributions that makes the reported p-values
> based on the latter unreliable, not an inability to calculate the p-value
> properly. I can assure you that Prof. Bates knows how to compute a p-value.

To add my own question on terminology[ even the statements here should be taken
as questions ], assuming the null hypothesis is 
true and you have some underlying population distribution of various attirubtes, 
you get some distribution for your test statistic for repeated experiemnts. The asymptotic distribution I take it is the true population distribution which may not be well reflected
in your ( small ) sample? Usually people justify non-parametrics by saying they
help in the small sample/outlier cases. Alternatively, if you have some reasonable
basis for knowing the true population distributions, you could use that for p value
calculation and/or do monte carlo and just measure the number of time you incorrectly
reject null hypothesis etc. Of course, monte carlo code needs to be debugged too so
nothing will be a sure thing. Introducing new things like an indpendently known
population distribution may not be statitically rigorous by some criteria( comments welcome LOL) but you free to examine it for analysis.

> Is there any other way I can
> > tell whether the treatment has a effect not? I know AIC is for model
> > comparison,

Get more data? In this case,it would seem the goal of statistical analysis
is to make some guesses about causality. Presumably this is one piece of
evidence in a larger "case" that includes theory and other observations.
To paraphrase the legal popular legal phrase, 
"if the model doesn't fit you must not quit." Or, as someone at the US FDA
is quoted as saying, " A p-value is no substitute for a brain." 

> > do I report this in formal publication?

I guess that depends on the journal (LOL). Personally I'd be more worried about
getting a convincing story together than playing to a specific audience. However,
many questions of detail do relate to the audience and journal- you want to
use the math to determine reality, what you present depends on the publication. 
There is nothing wrong with presenting novel analyses with enough detail to
the right audience but it may not be for everyone :)

> >
> As mentioned above, I would suggest analyzing this as a fixed effects
> problem. Since the imbalance is not too bad, and it is not unusual in field
> experiments to have more control EUs than treatment EUs within each level of
> treatment, a fixed effects analysis may be sufficient. It wouldn't hurt to
> consult with a local statistician to discuss the options.


More information about the R-help mailing list