Hi Jarrod, 
I'm not completely sure if I got you right.

It is a combination of variance inhomogeneity and the unbalanced treatment variable. In your case the bigger group has the smaller variance, which makes the overall variance small and the effects significant. If you switch the group sizes, (small group small variance and big group large variance) the effect is not significant anymore. The first two models differ in R^2 which is a indication for the variance inhomogeneity too. 

A plot:
plot(z[which(treatment==1)], x[which(treatment==1)])
plot(z[which(treatment==0)], xbar[which(treatment==0)])
#or
plot(z[which(treatment==0)], x[which(treatment==0)])
plot(z[which(treatment==1)], xbar[which(treatment==1)])
 
Regards
Jonas
    Hi,  This is not really a specific question for mixed models but I was   hoping someone might know the answer anyway.  To make things simple, imagine you have a chain of causality x->y->z   with some error at each step:  x ~ N(0,1),  y ~ N(x,1) and z ~ N(y, 1).  Observations are made on individuals who are grouped by (this is the   important bit) the intervals their y values fall between.  z is observed for each individual in a group. Although all x values   are observed it is not possible to say which individual within a group   the values belong too. Therefore, xbar is a vector the same length as   x where each individual has the x value of its group mean  We also have a treatment which does not have a causal effect on x, y   or z, but is associated with extreme values of x.  Both  lm(z~x+treatment) and lm(z~xbar+treatment) give an average   treatment effect of zero and uniform p-values as expected.  However, imagine for individuals in the treated group that x values   can be assigned such that xbar2 takes on values of xbar for the   non-treated individuals and x for the treated individuals. In this   case  lm(z~xbar2+treatment) provides strong evidence for a treatment   effect!  I had an idea why this would be the case (based on differences in   variances between xbar and x). However, the problem completely   disappears if the groups are defined by which interval of x they occur   in, rather than which interval of y, yet differences in variances   between xbar and x persist under this scenario.  Some code is below. If anyone has any ideas what this type of problem   is called, why it occurs and if there are known solutions I would be   very glad to know.  Cheers,  Jarrod  x<-rnorm(100) y<-rnorm(100, x) z<-rnorm(100, y)  treatment<-rbinom(100,1, plogis(x-2))  cuty<-cut(y,10) # get 10 groups defined by y  xbar<-tapply(x, cuty, mean)[cuty]  xbar2<-xbar xbar2[which(treatment==1)]<-x[which(treatment==1)]  summary(lm(z~x+treatment)) summary(lm(z~xbar+treatment)) summary(lm(z~xbar2+treatment))  # the treatment effect in model 3 is consistently negative and has   high type I error.         --  The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC0053  

______________________________________________________

Jonas Klasen
PhD student
Genome Plasticity and Computational Genetics
Max Planck Institute for Plant Breeding Research
______________________________________________________


	[[alternative HTML version deleted]]