Hi Jarrod,
I'm not completely sure if I got you right.
It is a combination of variance inhomogeneity and the unbalanced treatment variable. In your case the bigger group has the smaller variance, which makes the overall variance small and the effects significant. If you switch the group sizes, (small group small variance and big group large variance) the effect is not significant anymore. The first two models differ in R^2 which is a indication for the variance inhomogeneity too.
A plot:
plot(z[which(treatment==1)], x[which(treatment==1)])
plot(z[which(treatment==0)], xbar[which(treatment==0)])
#or
plot(z[which(treatment==0)], x[which(treatment==0)])
plot(z[which(treatment==1)], xbar[which(treatment==1)])
Regards
Jonas
Hi, This is not really a specific question for mixed models but I was hoping someone might know the answer anyway. To make things simple, imagine you have a chain of causality x->y->z with some error at each step: x ~ N(0,1), y ~ N(x,1) and z ~ N(y, 1). Observations are made on individuals who are grouped by (this is the important bit) the intervals their y values fall between. z is observed for each individual in a group. Although all x values are observed it is not possible to say which individual within a group the values belong too. Therefore, xbar is a vector the same length as x where each individual has the x value of its group mean We also have a treatment which does not have a causal effect on x, y or z, but is associated with extreme values of x. Both lm(z~x+treatment) and lm(z~xbar+treatment) give an average treatment effect of zero and uniform p-values as expected. However, imagine for individuals in the treated group that x values can be assigned such that xbar2 takes on values of xbar for the non-treated individuals and x for the treated individuals. In this case lm(z~xbar2+treatment) provides strong evidence for a treatment effect! I had an idea why this would be the case (based on differences in variances between xbar and x). However, the problem completely disappears if the groups are defined by which interval of x they occur in, rather than which interval of y, yet differences in variances between xbar and x persist under this scenario. Some code is below. If anyone has any ideas what this type of problem is called, why it occurs and if there are known solutions I would be very glad to know. Cheers, Jarrod x<-rnorm(100) y<-rnorm(100, x) z<-rnorm(100, y) treatment<-rbinom(100,1, plogis(x-2)) cuty<-cut(y,10) # get 10 groups defined by y xbar<-tapply(x, cuty, mean)[cuty] xbar2<-xbar xbar2[which(treatment==1)]<-x[which(treatment==1)] summary(lm(z~x+treatment)) summary(lm(z~xbar+treatment)) summary(lm(z~xbar2+treatment)) # the treatment effect in model 3 is consistently negative and has high type I error. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC0053
______________________________________________________
Jonas Klasen
PhD student
Genome Plasticity and Computational Genetics
Max Planck Institute for Plant Breeding Research
______________________________________________________
[[alternative HTML version deleted]]