[R-sig-ME] Conflicting p-values from pvals.fnc
Levy, Roger
rlevy at ucsd.edu
Thu Mar 29 04:36:32 CEST 2012
Hi all,
This example has gotten me pretty confused about how the "weights" argument works for lmer. I'd always assumed that setting a weight of k for an observation would cause lmer to act as if it had seen k replicates of that observation (i.e., the contribution of the observation to the likelihood would be taken to the k-th power). After reading this query I found the following post that they are "precision weights not sampling weights":
http://tolstoy.newcastle.edu.au/R/e17/help/12/01/2099.html
I'm not sure what that means -- was my interpretation one of "sampling weights"?
Regardless, I'm noticing what seems to me to be inconsistent behavior in how setting the weights argument affects the t statistic in lmer() output and how the output of pvals.fnc() is affected. I'm including an example below with a fixed effect of "x" and a random intercept of "a": the higher one sets the weights, the larger the t statistic for x becomes (which is what I'd originally expected given my assumptions about the weights semantics), but the broader the posterior on x becomes in the MCMC output. This doesn't seem right, does it? (I also don't understand why the estimate of the random-intercept variance but not the residual variance reported in the lmer output changes.)
> set.seed(1)
> dat <- data.frame(x=rep(c(0,1),each=4),a=factor(rep(c("a","b"),4)),w=1)
> dat$y <- with(dat,10*x+10*(a=="a") + rnorm(8))
> print(m1 <- lmer(y~x+(1|a),dat,weights=1*w))
Linear mixed model fit by REML
Formula: y ~ x + (1 | a)
Data: dat
AIC BIC logLik deviance REMLdev
32.32 32.64 -12.16 29.97 24.32
Random effects:
Groups Name Variance Std.Dev.
a (Intercept) 44.09718 6.64057
Residual 0.87763 0.93682
Number of obs: 8, groups: a, 2
Fixed effects:
Estimate Std. Error t value
(Intercept) 5.0792 4.7186 1.076
x 10.1045 0.6624 15.254
Correlation of Fixed Effects:
(Intr)
x -0.070
> pvals.fnc(m1)
$fixed
Estimate MCMCmean HPD95lower HPD95upper pMCMC Pr(>|t|)
(Intercept) 5.079 5.037 -1.553 11.32 0.1004 0.3231
x 10.104 10.110 4.917 15.28 0.0038 0.0000
$random
Groups Name Std.Dev. MCMCmedian MCMCmean HPD95lower HPD95upper
1 a (Intercept) 6.6406 2.6325 3.3750 0.0000 7.6330
2 Residual 0.9368 2.8649 3.2042 0.8809 6.0746
> print(m2 <- lmer(y~x+(1|a),dat,weights=100*w))
Linear mixed model fit by REML
Formula: y ~ x + (1 | a)
Data: dat
AIC BIC logLik deviance REMLdev
69.17 69.48 -30.58 66.81 61.17
Random effects:
Groups Name Variance Std.Dev.
a (Intercept) 0.44097 0.66406
Residual 0.87763 0.93682
Number of obs: 8, groups: a, 2
Fixed effects:
Estimate Std. Error t value
(Intercept) 5.07921 0.47185 10.76
x 10.10449 0.06624 152.54
Correlation of Fixed Effects:
(Intr)
x -0.070
> pvals.fnc(m2)
$fixed
Estimate MCMCmean HPD95lower HPD95upper pMCMC Pr(>|t|)
(Intercept) 5.079 5.38 -23.816 38.45 0.3434 0
x 10.104 10.11 4.115 16.06 0.0068 0
$random
Groups Name Std.Dev. MCMCmedian MCMCmean HPD95lower HPD95upper
1 a (Intercept) 0.6641 5.8216 15.7201 0.0000 61.7422
2 Residual 0.9368 23.8808 30.8847 5.2868 71.7694
> print(m3 <- lmer(y~x+(1|a),dat,weights=w/100))
Linear mixed model fit by REML
Formula: y ~ x + (1 | a)
Data: dat
AIC BIC logLik deviance REMLdev
-4.516 -4.199 6.258 -6.868 -12.52
Random effects:
Groups Name Variance Std.Dev.
a (Intercept) 4409.71731 66.40570
Residual 0.87763 0.93682
Number of obs: 8, groups: a, 2
Fixed effects:
Estimate Std. Error t value
(Intercept) 5.079 47.187 0.108
x 10.104 6.624 1.525
Correlation of Fixed Effects:
(Intr)
x -0.070
> pvals.fnc(m3)
$fixed
Estimate MCMCmean HPD95lower HPD95upper pMCMC Pr(>|t|)
(Intercept) 5.079 5.053 -0.8904 11.28 0.085 0.9178
x 10.104 10.107 5.3060 15.21 0.002 0.1780
$random
Groups Name Std.Dev. MCMCmedian MCMCmean HPD95lower HPD95upper
1 a (Intercept) 66.4057 2.6379 3.1098 1.1580 6.5406
2 Residual 0.9368 0.2828 0.3121 0.1214 0.5694
I'd be glad to be enlightened...!
Best
Roger
On Mar 26, 2012, at 10:35 AM PDT, Geoff Brookshire wrote:
> Hi listers,
>
> If you get p-vals using Wald chi-square tests with lme4::anova, they look
> pretty close to the pMCMC output.
> fm.1 <- lmer(y ~ block * condition + (1 | as.factor(subject)),
> weights = weights, REML = FALSE)
> fm.2 <- lmer(y ~ block + condition + (1 | as.factor(subject)),
> weights = weights, REML = FALSE)
> anova(fm.1, fm.2)
>
> This gives p = .35 for the block*condition interaction. For comparison,
> pvals.fnc gave pMCMC = .3 and p(<|t|) < .001. So it looks like p-vals
> derived from t-tests are just way off.
>
> It seems to me like we should just totally ignore the P(<|t|) output. Does
> anyone who knows more about how these work think otherwise?
>
> -- geoff
>
>
> On Mon, Mar 19, 2012 at 3:25 PM, Tom Gijssels <tom.gijssels at gmail.com>wrote:
>
>> Dear R-listers,
>>
>> I'm trying to run a mixed effect model using the lmer() function and have
>> run into some issues in interpreting the p-values generated by
>> pvals.fnc(). The design is a between-subjects design, with two fixed
>> effects (condition & block; each with two levels), and one random effect
>> (subject). Additionally, I have a set of weights that I want to include.
>>
>> When looking at the pvals.fnc() output,there appears to be a large
>> discrepancy between the pMCMC values and the t-statistic p-values. Whereas
>> one of the main effects and the interaction are far from significant
>> judging by the pMCMC values, they are highly significant when looking at
>> the t-statistic p-values (e.g. Condition: pMCMC = 0.2294; Pr(>|t|) = 0.0000
>> & Condition*Block: pMCMC = 0.3296; Pr(>|t|) = 0.0000) . I have read that
>> the t-statistic based p-values are less conservative, but the difference
>> between these two values seems really extreme.
>>
>> Below some code that simulates the model and the data. The original data
>> set has two precise characteristics that might influence the results, so I
>> tried to simulate those characteristics in the mock data. That is: 1)
>> there's fewer observations in block A than in block B; and 2) the weights
>> for observations in block A generally are lower than those for block B.
>>
>> Running this code reproduces the original observation of conflicting pMCMC
>> and p-T-test values. However, when excluding the weights argument from the
>> lmer model, these values seem to converge, suggesting that the weights
>> specification might be underlying these problems.
>>
>> In short, my question is whether anyone knows why these values diverge and
>> what I could do to address this issue.
>>
>> Many thanks in advance!
>>
>> Tom
>>
>> block <- as.factor(c(rep('a', times = 20), rep('b', times = 200)))
>> condition <- as.factor(c(rep(c('x', 'y'), each = 10), rep(c('x','y'), each
>> = 100)))
>> contrasts(block) <- c(-0.5, 0.5)
>> contrasts(condition) <- c(-0.5, 0.5)
>>
>> subject <- c(rep(1:4, each = 5), rep(1:4, each = 50))
>>
>> intercept <- 100
>> block.me <- 20
>> condition.me <- 30
>> err <- rnorm(length(block), sd = 20)
>> weights <- c(rep(1, times = 20), rep(10, times = 200))
>>
>> y <- intercept + ifelse(block == 'a', block.me, 0) + ifelse(condition ==
>> 'x', condition.me, 0) +
>> ifelse(block == 'a' & condition == 'x', 30, 0) + (subject * 10) + err
>>
>>
>> fm.1 <- lmer(y ~ block * condition + (1 | as.factor(subject)),
>> weights = weights, REML = FALSE)
>> fm.1.mcmc <- pvals.fnc(fm.1, addPlot=F)
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
--
Roger Levy Email: rlevy at ucsd.edu
Assistant Professor Phone: 858-534-7219
Department of Linguistics Fax: 858-534-4789
UC San Diego Web: http://idiom.ucsd.edu/~rlevy
More information about the R-sig-mixed-models
mailing list