# [R] Help - lm, glm, aov results inconsistent with other stati stical package

Berwin A Turlach berwin at maths.uwa.edu.au
Wed Mar 1 09:01:31 CET 2006

```>>>>> "BR" == Ben Ridenhour <snake8mynewt at yahoo.com> writes:

BR> Alright, I'll try to give some sample code.
BR> # create A with 2 levels - 2 and 4

> set.seed(1)

to make thinks reproducible.

BR> > A<-c(rep(2,times=30),rep(4,times=42))

BR> # make A a factor
BR> > A<-as.factor(A)

BR> #generate 72 random x points
BR> > x<-rnorm(72)

BR> #create different slopes & intercepts for A=2 and A=4
BR> #add a random error term
BR> > y<-c(x[(1:30)]+1,2*x[(31:72)]-2)+rnorm(72,mean=0,sd=2)

BR> #use model y~A*x for lm (or glm)
BR> > test.lm<-lm(y~A*x)

BR> This essentially creates something like my data set and uses
BR> the same model.  In response to (1), I was just using 0/1
BR> because these are codings for the 2 levels, correct?
Actually, no, the internal coding of the factor is with 1 and 2:
> str(A)
Factor w/ 2 levels "2","4": 1 1 1 1 1 1 1 1 1 1 ...
> unique(as.numeric(A))
 1 2

and how factors are internally coded is documented on the help page of
`factor'.

BR> In response to (2), yes, I do want different slopes for the
BR> two categories (that is what I am interested in testing).
Whether you get estimates of different slopes does not depend on how
the factor is internally coded but how your design matrix is
constructed given your model specification and how the factor is coded
when the design matrix is created.  R has plenty of options here.

In your above example, you created data from 1+x for one group and
-2+2*x for the other groups.  Assuming that you have not changed the
way R treats factors, some possibilities would be:

> lm(y~x*A)

Call:
lm(formula = y ~ x * A)

Coefficients:
(Intercept)            x           A4         x:A4
0.9515       1.2449      -3.1825       0.8744

In this case, the first and second values are estimates for the
intercept and the slope, respectively, of the regression line in the
first group.  The other two values are the estimates for the
*differences* in intercept and slope for the two groups.

> lm(y~A/x)

Call:
lm(formula = y ~ A/x)

Coefficients:
(Intercept)           A4         A2:x         A4:x
0.9515      -3.1825       1.2449       2.1193

In this case, the first value is the estimate for the intercept in the
first group.  The second value is the *difference* in intercept for
the two regression lines.  The last two values are estimates for the
slopes in the two groups.

> lm(y~A/(x-1))

Call:
lm(formula = y ~ A/(x - 1))

Coefficients:
A2       A4     A2:x     A4:x
0.9515  -2.2309   1.2449   2.1193

In this case, the first two values are the estimates for the
intercepts for the two regression lines and the last two values give
the two slopes.  Quite reasonable estimates in my book.

BR> If I export the data created above to JMP and run what I
BR> believe to be the same model, I get a different answer for my
BR> equations :(
Presumably because you are using different parameterisations and you
interpret the estimates incorrectly.  To interpret the output, you
have to know what parameters your estimates are estimating.  I do not
know what JMP is doing or what kind of parameterisation it uses given
a certain model specification.

>> [...] Why can't I get the same answer from JMP as in R?  This
>> is very disturbing to me!
Rest assured, no matter how disturbing this might be for you, it is
much more disturbing for us to see people using statistical packages
without knowing what they are doing. :)
And I would find it particular disturbing if there is no statistician
(or someone with sufficient statistical training) at Washigton State
University who could explain all this to you.

Best wishes,

Berwin