[R-sig-ME] mcmcpvalue and contrasts
Steven McKinney
smckinney at bccrc.ca
Tue Feb 26 03:22:46 CET 2008
Hi Hank,
>
> -----Original Message-----
> From: r-sig-mixed-models-bounces at r-project.org on behalf of Ken Beath
> Sent: Mon 2/25/2008 4:05 PM
> To: Hank Stevens
> Cc: Help Mixed Models
> Subject: Re: [R-sig-ME] mcmcpvalue and contrasts
>
> On 26/02/2008, at 9:42 AM, Hank Stevens wrote:
>
> > Hi Folks,
> > I wanted to double check that my intuition makes sense.
> >
> > Examples of mcmcpvalue that I have seen use treatment "contrast"
> > coding.
> > However, in more complex designs, testing overall effects of a factor
> > might be better done with other contrasts, such as sum or Helmert
> > contrasts.
> >
> > My Contention:
> > Different contrasts test different hypothesis, and therefore result in
> > different P-values. This consequence of contrasts differs from
> > analysis of variance, as in anova( lm(Y ~ X1*X2) ).
> >
> > *** This is right, isn't it? ***
> >
>
> The main problem is testing for a main effect in the presence of
> interaction. While it looks like it gives sensible results in some
> cases like balanced ANOVA, they really aren't sensible and the effect
> of parameterisation in other cases makes that clear.
>
> The difference for the interaction is probably just sampling
> variation, increasing samples fixes this.
>
> Ken
Ken is correct - testing some of the main effect terms resulting from
different parameterizations due to the differing contrast structures
will yield different results (though they in general will be somewhat
meaningless if the corresponding interaction term is in the model
and you do not have a balanced orthogonal design).
In general, the choice of contrast *** SHOULD NOT *** yield different
p-values for logically equivalent hypothesis tests, so the answer to
Hank's question is 'no - this is not right'. Testing an interaction
term (with all main effects terms comprising the interaction included
in the model as is in general necessary) should not depend on the
choice of contrast terms. Here is an example. (Unfortunately I can
not illustrate using your lmer example, as my lmer produces output
that won't print - some minor bug - but that's another issue.)
I'll illustrate the statistical principle using regular linear
models. First with the usual treatment 'contrasts'.
> options(contrasts=c("contr.treatment","contr.poly"))
> lm1 <- lm(resistance ~ ET * position, data = Semiconductor)
> summary(lm1)
Call:
lm(formula = resistance ~ ET * position, data = Semiconductor)
Residuals:
Min 1Q Median 3Q Max
-1.01333 -0.25750 0.04333 0.28333 0.74667
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.61333 0.26891 20.874 <2e-16 ***
ET2 0.38000 0.38030 0.999 0.325
ET3 0.52333 0.38030 1.376 0.178
ET4 0.72667 0.38030 1.911 0.065 .
position2 -0.16333 0.38030 -0.429 0.670
position3 -0.06000 0.38030 -0.158 0.876
position4 0.27333 0.38030 0.719 0.478
ET2:position2 0.35667 0.53782 0.663 0.512
ET3:position2 0.37333 0.53782 0.694 0.493
ET4:position2 0.37667 0.53782 0.700 0.489
ET2:position3 -0.16667 0.53782 -0.310 0.759
ET3:position3 -0.30333 0.53782 -0.564 0.577
ET4:position3 -0.38333 0.53782 -0.713 0.481
ET2:position4 -0.35000 0.53782 -0.651 0.520
ET3:position4 -0.31667 0.53782 -0.589 0.560
ET4:position4 -0.07333 0.53782 -0.136 0.892
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.4658 on 32 degrees of freedom
Multiple R-squared: 0.4211, Adjusted R-squared: 0.1498
F-statistic: 1.552 on 15 and 32 DF, p-value: 0.1449
Now true contrasts with helmert polynomials:
> options(contrasts=c("contr.helmert","contr.poly"))
> lm2 <- lm(resistance ~ ET * position, data = Semiconductor)
> summary(lm2)
Call:
lm(formula = resistance ~ ET * position, data = Semiconductor)
Residuals:
Min 1Q Median 3Q Max
-1.01333 -0.25750 0.04333 0.28333 0.74667
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.00292 0.06723 89.292 < 2e-16 ***
ET1 0.17000 0.09507 1.788 0.08324 .
ET2 0.09722 0.05489 1.771 0.08606 .
ET3 0.10986 0.03881 2.830 0.00797 **
position1 0.05667 0.09507 0.596 0.55535
position2 -0.11000 0.05489 -2.004 0.05360 .
position3 0.03542 0.03881 0.912 0.36834
ET1:position1 0.08917 0.13446 0.663 0.51197
ET2:position1 0.03250 0.07763 0.419 0.67826
ET3:position1 0.01667 0.05489 0.304 0.76337
ET1:position2 -0.05750 0.07763 -0.741 0.46427
ET2:position2 -0.03528 0.04482 -0.787 0.43700
ET3:position2 -0.02444 0.03169 -0.771 0.44617
ET1:position3 -0.05167 0.05489 -0.941 0.35363
ET2:position3 -0.01111 0.03169 -0.351 0.72818
ET3:position3 0.01125 0.02241 0.502 0.61909
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.4658 on 32 degrees of freedom
Multiple R-squared: 0.4211, Adjusted R-squared: 0.1498
F-statistic: 1.552 on 15 and 32 DF, p-value: 0.1449
Things to notice: The two model fits above show the same R-squared,
the same F-statistic and its associated p-value. This is good - the
two models are mathematically equivalent, so the overall model fit
statistics should not differ.
The two model parameter estimate sets do differ, as the
parameterization is different (though the vector spaces in which the
data and models live are the same - a linear algebra book will show
the details). Though the variables 'ET' and 'position' each have 4
levels, we only ever see 3 parameter estimates due to the linear
constraints imposed by projecting from the high dimensional space
containing the data to the lower dimensional space containing the
model.
Now to test the interaction term:
> options(contrasts=c("contr.treatment","contr.poly"))
> lmr1 <- lm(resistance ~ ET + position, data = Semiconductor) # Fit reduced model
> anova(lm1, lmr1)
Analysis of Variance Table
Model 1: resistance ~ ET * position
Model 2: resistance ~ ET + position
Res.Df RSS Df Sum of Sq F Pr(>F)
1 32 6.9421
2 41 7.7515 -9 -0.8095 0.4146 0.9178
> options(contrasts=c("contr.helmert","contr.poly"))
> lmr2 <- lm(resistance ~ ET + position, data = Semiconductor)
> anova(lm2, lmr2)
Analysis of Variance Table
Model 1: resistance ~ ET * position
Model 2: resistance ~ ET + position
Res.Df RSS Df Sum of Sq F Pr(>F)
1 32 6.9421
2 41 7.7515 -9 -0.8095 0.4146 0.9178
>
Things to notice: The ANOVA tables for the two different contrast
parameterizations are identical. Dropping the interaction term
results in the model space now occupying 9 fewer dimensions (indicated
by the '-9' in the degrees of freedom column). As measured by the
F-statistic, not much predictive capability was sacrificed by this
reduction in model dimension - the simpler model fits the data about
as well as the more complex model so we would in general opt for the
simpler model (caveat - random effects issues being ignored here in
this illustration).
If these two logically equivalent tests had produced non-identical
results, I would next look for numerical instability in the fitting
algorithms. Numerical instability is a hallmark of 'treatment
contrasts' (but not in R - see below *) and is the reason that those
new to statistics are reminded not to hand fit regression models using
the
Beta_hat = ((X'X)^-1)X'Y
formula which is mathematically valid but does not perform well on
finite precision computing devices.
(* 'Treatment contrasts' computed in R are fine because under the
covers R is always using numerically stable algorithms, then results
are converted to the appropriate user-requested contrast format.)
>
>
>
> > I provide a self-contained example.
> >
> > library(lme4) # v. 0.99875-9
> > library(coda)
> > library(SASmixed)
> >
> > mcmcpvalue <- function(samp) {
> > ##From Bates pers comm. September 14, 2006 2:59:23 PM EDT
> > ## elementary version that creates an empirical p-value for the
> > ## hypothesis that the columns of samp have mean zero versus a
> > ## general multivariate distribution with elliptical contours.
> > # samp <- rnorm(10000, m=3)
> > ## differences from the mean standardized by the observed
> > ## variance-covariance factor
> > std <- backsolve(chol(var(samp)),
> > cbind(0, t(samp)) - colMeans(samp),
> > transpose = TRUE)
> > sqdist <- colSums(std * std)
> > sum(sqdist[-1] > sqdist[1])/nrow(samp)
> > }
> >
> >
> > options(contrasts=c("contr.treatment","contr.poly"))
> > print(fm1 <- lmer(resistance ~ ET * position + (1|Grp),
> > Semiconductor),
> > corr=F)
> > c1 <- mcmcsamp(fm1, 50000)
> >
> > options(contrasts=c("contr.helmert","contr.poly"))
> > print(fm2 <- lmer(resistance ~ ET * position + (1|Grp),
> > Semiconductor),
> > corr=F)
> > c2 <- mcmcsamp(fm2, 50000)
> >
> > mcmcpvalue(c1[, 2:4 ])
> > [1] 0.32458
> >> mcmcpvalue(c2[, 2:4 ])
> > [1] 0.0249
> >> mcmcpvalue(c1[, 5:7 ])
> > [1] 0.45376
> >> mcmcpvalue(c2[, 5:7 ])
> > [1] 0.16302
> >> mcmcpvalue(c1[, 8:16 ])
> > [1] 0.6373
> >> mcmcpvalue(c2[, 8:16 ])
> > [1] 0.88808
> >>
> >> sessionInfo()
> > R version 2.6.2 (2008-02-08)
> > i386-apple-darwin8.10.1
> >
> > locale:
> > C
> >
> > attached base packages:
> > [1] stats graphics utils datasets grDevices methods
> > [7] base
> >
> > other attached packages:
> > [1] SASmixed_0.4-2 xtable_1.5-2 reshape_0.8.0
> > [4] coda_0.13-1 lme4_0.99875-9 Matrix_0.999375-4
> > [7] lattice_0.17-4 CarbonEL_0.1-4
> >
> > loaded via a namespace (and not attached):
> > [1] grid_2.6.2
> >>
> >
> > Dr. Hank Stevens, Assoicate Professor
> > 338 Pearson Hall
> > Botany Department
> > Miami University
> > Oxford, OH 45056
> >
> > Office: (513) 529-4206
> > Lab: (513) 529-4262
> > FAX: (513) 529-4243
> > http://www.cas.muohio.edu/~stevenmh/
> > http://www.cas.muohio.edu/ecology
> > http://www.muohio.edu/botany/
> >
> > "If the stars should appear one night in a thousand years, how would
> > men
> > believe and adore." -Ralph Waldo Emerson, writer and philosopher
> > (1803-1882)
> >
> > _______________________________________________
> > R-sig-mixed-models at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >
>
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
Steven McKinney
Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre
email: smckinney +at+ bccrc +dot+ ca
tel: 604-675-8000 x7561
BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C.
V5Z 1L3
Canada
My lmer problem:
> options(contrasts=c("contr.treatment","contr.poly"))
> fm1 <- lmer(resistance ~ ET * position + (1|Grp), Semiconductor)
> print(fm1, corr = FALSE)
Error in .local(x, ...) :
no slot of name "status" for this object of class "table"
This is probably fixed in newer configurations and is probably of no
concern.
> sessionInfo()
R version 2.6.2 (2008-02-08)
powerpc-apple-darwin8.10.1
locale:
en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Hmisc_3.4-3 SASmixed_0.4-2 coda_0.13-1 lme4_0.99875-9 Matrix_0.999375-4 lattice_0.17-4 reshape_0.8.0
[8] R.utils_0.9.8 R.oo_1.4.1 R.methodsS3_1.0.0 RMySQL_0.6-0 DBI_0.2-4 RODBC_1.2-3 CGIwithR_0.72
[15] GDD_0.1-8
loaded via a namespace (and not attached):
[1] cluster_1.11.9 grid_2.6.2 tools_2.6.2
>
More information about the R-sig-mixed-models
mailing list