[R-sig-ME] within-group averaging

Jarrod Hadfield j.hadfield at ed.ac.uk
Thu Nov 29 10:02:13 CET 2012

Hi Jonas,

Thanks for your reply. Initially, I thought it was solely to do with  
inhomogeneity too, and I agree that it plays a role. However, if you  
cuty<-cut(y,10) with cuty<-cut(x,10) (i.e. define groups by bins of x  
rather than y) then the problems with bias and type-I error rate  
disappear despite the same pattern of inhomogeneity existing. This  
really surprised me. At the moment it is the bias that I would most  
like to fix, rather than the high type-I errors.



Quoting Jonas Klasen <klasen at mpipz.mpg.de> on Wed, 28 Nov 2012 22:17:04 +0100:

> Hi Jarrod,
> I'm not completely sure if I got you right.
> It is a combination of variance inhomogeneity and the unbalanced  
> treatment variable. In your case the bigger group has the smaller  
> variance, which makes the overall variance small and the effects  
> significant. If you switch the group sizes, (small group small  
> variance and big group large variance) the effect is not significant  
> anymore. The first two models differ in R^2 which is a indication  
> for the variance inhomogeneity too.
> A plot:
> plot(z[which(treatment==1)], x[which(treatment==1)])
> plot(z[which(treatment==0)], xbar[which(treatment==0)])
> #or
> plot(z[which(treatment==0)], x[which(treatment==0)])
> plot(z[which(treatment==1)], xbar[which(treatment==1)])
> Regards
> Jonas
>     Hi,  This is not really a specific question for mixed models but  
> I was   hoping someone might know the answer anyway.  To make things  
> simple, imagine you have a chain of causality x->y->z   with some  
> error at each step:  x ~ N(0,1),  y ~ N(x,1) and z ~ N(y, 1).   
> Observations are made on individuals who are grouped by (this is the  
>   important bit) the intervals their y values fall between.  z is  
> observed for each individual in a group. Although all x values   are  
> observed it is not possible to say which individual within a group    
> the values belong too. Therefore, xbar is a vector the same length  
> as   x where each individual has the x value of its group mean  We  
> also have a treatment which does not have a causal effect on x, y    
> or z, but is associated with extreme values of x.  Both   
> lm(z~x+treatment) and lm(z~xbar+treatment) give an average    
> treatment effect of zero and uniform p-values as expected.  However,  
> imagine for individuals in the treated group that x values   can be  
> assigned such that xbar2 takes on values of xbar for the    
> non-treated individuals and x for the treated individuals. In this    
> case  lm(z~xbar2+treatment) provides strong evidence for a treatment  
>   effect!  I had an idea why this would be the case (based on  
> differences in   variances between xbar and x). However, the problem  
> completely   disappears if the groups are defined by which interval  
> of x they occur   in, rather than which interval of y, yet  
> differences in variances   between xbar and x persist under this  
> scenario.  Some code is below. If anyone has any ideas what this  
> type of problem   is called, why it occurs and if there are known  
> solutions I would be   very glad to know.  Cheers,  Jarrod   
> x<-rnorm(100) y<-rnorm(100, x) z<-rnorm(100, y)   
> treatment<-rbinom(100,1, plogis(x-2))  cuty<-cut(y,10) # get 10  
> groups defined by y  xbar<-tapply(x, cuty, mean)[cuty]  xbar2<-xbar  
> xbar2[which(treatment==1)]<-x[which(treatment==1)]   
> summary(lm(z~x+treatment)) summary(lm(z~xbar+treatment))  
> summary(lm(z~xbar2+treatment))  # the treatment effect in model 3 is  
> consistently negative and has   high type I error.         --  The  
> University of Edinburgh is a charitable body, registered in  
> Scotland, with registration number SC0053
> ______________________________________________________
> Jonas Klasen
> PhD student
> Genome Plasticity and Computational Genetics
> Max Planck Institute for Plant Breeding Research
> ______________________________________________________

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

More information about the R-sig-mixed-models mailing list