[R-sig-ME] discrepency paired test and lmer()

Mon Aug 18 12:47:45 CEST 2014

You should not be using lm. Data are balanced, so you could use aov with 
Error() term, which should agree with lmer.

aov(Y ~ exposure + Error(family_id), data = dataset)

The analysis of variance (source, d.f.) is

Concordant versus Discordant       1
Between families within SibTypes  98
Within families                  100

lm will test 'concordant versus discordant' against the 'within 
families' mean square. aov and lmer test against the 'between families 
within SibTypes' mean square.

I don't see how you get a t statistic for 'concordant versus discordant' 
if you only analyse discordant pairs.

On 08/18/2014 10:20 AM, E.W.Tobi at lumc.nl wrote:
> Dear All, as my previous question was posted at the height of the vacation period I would like to repost the following issue I am encountering:
>
>
> I am trying to test a continues, normally distributed, variable (Y) within sibling pairs which are discordant, or concordant, for an exposure, also including singleton exposed or unexposed down the line.
> The inclusion of singletons excludes a simple lm() model like:
> lm(Y~exposure+family_id)
> To account for the family relationship
>
> I tried starting simple, by making the model comparable to the simple lm model when including only the discordant and concordant pairs... leaving the singletons out the mix for now... but this has already proven surprisingly hard.
> If I test the discordant sibling pairs only, the following models yields the exact same outcome as a paired t-test and lm(Y~exposure+fam_id), as it should.
> lmer(Y~exposure+(1|family_id))
> However, adding exposure concordant pairs to the discordant pairs set (non-exposed & non-exposed : "0"-"0" pairs....) yields completely different results between lmer and lm.
>
> Overall, the t-values are (much) higher than with lm(). I am tasting many Y's.. (my field is genomics), so this inflation is very apparent when the plotting the expected versus observed t-statistics. Typically the 'lambda' (the coefficient describing the relation between observed en expected statistic) is 1.2-1.6.
>
> Does anybody have an idea on why  lm() and lmer() differ in their t-value estimates when exposure concordant pairs are added?
> And how to make a model resembling the more simple paired test?
>
> To make things more complicated, we also have strong evidence that not all exposed siblings within a discordant pair will respond the same amounts of Y, but that the response is variable.
> Therefore I added exposure status also as random slope, to keep things 'maximal'.
>
> A standardized example:
> library(lme4.0)  # I am using the old stable version, but the inflation is also apparent in the new version.
>
> set.seed(20289457)
> # normally distributed variable, with added 'effect' to part of the individuals (exposed within certain sibling pairs)
> Y <- rnorm(n=200, m=0.25, sd=0.06)+c(rnorm(n=50,m=0.05,sd=0.03),rep(0,150))
> family_id <- as.factor(rep(seq(1,100,by=1),2))
> exposure <- c(rep(1,50),rep(0,150))
> siblingtype <- c(rep(1,50),rep(0,50),rep(1,50),rep(0,50))  # this variable is added to distinguish discordant exposure from concordant exposure pairs
>
> dataset <- data.frame(Y,family_id,exposure, siblingtype)
>
> # first we take a look at the outcome in the exposure discordant pairs.
> coefficients(summary(lm(Y~exposure+family_id, data=dataset, subset= siblingtype==1)))[c(1,2),]
> t.test(Y[1:50],Y[101:150],paired=T)
> lmer(Y~exposure+(1|family_id), data=dataset, subset= siblingtype==1)
> lmer(Y~exposure+(1+exposure|family_id), data=dataset, subset= siblingtype==1)
> # all have identical t-values; paired tests are identical to the mixed models
>
> # the unexposed-unexposed pairs have:
> t.test(Y[51:100],Y[151:200],paired=T)
> # no evidence for a difference, which should boost confidence on observed difference?
>
> coefficients(summary(lm(Y~exposure+family_id, data=dataset)))[c(1,2),]
> # indeed, in a simple lm() the t-value increases.
> lmer(Y~exposure+(1|family_id), data=dataset)
> # also in a lmer, but now with much greater t-value: of note after testing 100K's of Y's I can definitely say that this is an inflated statistic!
> lmer(Y~exposure+(1+exposure|family_id), data=dataset)
> # adding the random intercept somewhat downplays this increase. But the outcome is not identical. The inflation remains!
> # off note, in my real data the Corr of the random effects is ~0.2-0.3, close to the 'theoretical' 0.25 for full siblings in genetic studies (however Y is not a genetic marker and may thus deviate somewhat from 0.25)
>
>
> sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=Dutch_Netherlands.1252  LC_CTYPE=Dutch_Netherlands.1252    LC_MONETARY=Dutch_Netherlands.1252
> [4] LC_NUMERIC=C                       LC_TIME=Dutch_Netherlands.1252
>
> attached base packages:
> [1] splines   stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] coxme_2.2-3       bdsmatrix_1.3-1   survival_2.37-7   gee_4.13-18       nlme_3.1-117      lme4.0_0.999999-4 lattice_0.20-29
> [8] Matrix_1.1-4      plyr_1.8.1
>
> loaded via a namespace (and not attached):
> [1] grid_3.0.2   Rcpp_0.11.2  stats4_3.0.2 tools_3.0.2
>
>
> Any insight is highly appreciated.
>
>
> Best wishes,
> Tobi
>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.