[R] Different output from lm() and lmPerm lmp() if categorical variables are included in the analysis

Rolf Turner r.turner at auckland.ac.nz
Wed Nov 13 09:27:28 CET 2013


RTFM!!! :-)

The help explicitly says "The default contrasts are set internally to 
(contr.sum, contr.poly) ....".

Set

     options(contrasts=c("contr.sum","contr.poly"))

before your call to lm() and "atest" will agree with "aptest" all down 
the line.

     cheers,

     Rolf Turner

On 11/08/13 21:35, Agustin Lobo wrote:
> I've found a problem when using
> categorical variables in lmp() from package lmPerm
>
> According to help(lmp): "This function will behave identically to lm()
> if the following parameters are set: perm="", seq=TRUE,
> center=FALSE.")
> But not in the case of including categorical variables:
>
> require(lmPerm)
> set.seed(42)
> testx1 <- rnorm(100,10,5)
> testx2 <- c(rep("a",50),rep("b",50))
> testy <- 5*testx1 + 3 + runif(100,-20,20)
> test <- data.frame(x1=testx1,x2=
> testx2,y=testy)
> atest <- lm(y ~ x1*x2,data=test)
> aptest <- lmp(y ~ x1*x2,data=test,perm = "", seqs = TRUE, center = FALSE)
> summary(atest)
>
> Call:
> lm(formula = y ~ x1 * x2, data = test)
> Residuals:
>      Min       1Q   Median       3Q      Max
> -17.1777  -9.5306  -0.9733   7.6840  22.2728
>
> Coefficients:
>          Estimate Std. Error t value Pr(>|t|)
> (Intercept)  -2.0036     3.2488  -0.617    0.539
> x1            5.3346     0.2861  18.646   <2e-16 ***
> x2b           2.4952     5.2160   0.478    0.633
> x1:x2b       -0.3833     0.4568  -0.839    0.404
>
> summary(aptest)
>
> Call:
> lmp(formula = y ~ x1 * x2, data = test, perm = "", seqs = TRUE,
> center = FALSE)
>
> Residuals:
>      Min       1Q   Median       3Q      Max
> -17.1777  -9.5306  -0.9733   7.6840  22.2728
>
> Coefficients:
>     Estimate Std. Error t value Pr(>|t|)
> x1       5.1429     0.2284  22.516   <2e-16 ***
> x21     -1.2476     2.6080  -0.478    0.633
> x1:x21   0.1917     0.2284   0.839    0.404
>
> It looks like lmp() is internally coding dummy variables in a different way, so
> lmp results are for "a" (named "1" by lmp) while lm results are for
> "b" ?



More information about the R-help mailing list