[R] Help - lm, glm, aov results inconsistent with other stati stical package

Christoph Buser buser at stat.math.ethz.ch
Wed Mar 1 14:43:33 CET 2006


Dear Prof. Ripley

Please appologize. I was a little too short in my first
answer. My guess is that the line equations have been produced
by Ben (since JMP does not give me that line equation as
output), using the coefficients from JMP with the wrong assumption
that JMP uses the constraint that one coefficient for each
factor is set on zero instead of the true constraint that the
sum of coefficients is equal to zero.

The following example shows what JMP does:


## create A with 2 levels - 2 and 4
A <-factor(c(rep(2,times=30),rep(4,times=42)))     
set.seed(1)
## generate 72 random x points
x <-rnorm(72)                                               
## create different slopes & intercepts for A=2 and A=4
## add a random error term 
y <-c(x[(1:30)]+1,2*x[(31:72)]-2)+rnorm(72,mean=0,sd=2)           
dat <- data.frame(A, x, y)

options(contrasts = c("contr.sum", "contr.poly" ))
## y_1 = mu + Beta*x    (alpha_1 = 0, Beta_int_1 = 0)  [1]
## y_2 = mu + alpha + Beta*x + Beta_int*x              [2]
## ([1] + [2])/2 = mu + alpha/2 + Beta + Beta_int/2*x    [3]
test.lm1 <-lm(y~A*x, dat = dat)
## check the output
summary(test.lm1)
> Coefficients:
>             Estimate Std. Error t value Pr(>|t|)    
> (Intercept)  -0.6397     0.2051  -3.119  0.00266 ** 
> A1            1.5912     0.2051   7.759 5.99e-11 ***
> x             1.6821     0.2226   7.555 1.41e-10 ***
> A1:x         -0.4372     0.2226  -1.964  0.05367 .  
> ---
> Signif. codes:  0 $,1rx(B***$,1ry(B 0.001 $,1rx(B**$,1ry(B 0.01 $,1rx(B*$,1ry(B 0.05 $,1rx(B.$,1ry(B 0.1 $,1rx(B $,1ry(B 1 

In JMP when I calculate a linear model I get:

> Term			Estimate   Std error   t Ratio Prob>|t|
> Intercept             -0.639704  0.205071   -3.12   0.0027
> A[2]                  1.5260508  0.203069    7.51   <.001*
> x                     1.6821048  0.22264     7.56   <.001*
> A[2]*(x-0.14909)      -0.437177  0.22264    -1.96   0.0537
## y_1 = mu + alpha/2 + Beta*x + Beta_int/2*(x - mean(x))    (1a)
## y_2 = mu - alpha/2 - Beta*x - Beta_int/2*(x - mean(x))    (2a)

mean(dat[,"x"])
> 0.1490923

1.5912 = 1.5260508 - (-0.437177)*0.1490923


Going back to the original post with the line equations:

 y = 7072.09-1024.94 x (for A=0) and
 y = 7875.58-970.088 x (for A=1)
 
from JMP. And from R I get

 y = 6276.7-1259.8 x (for A=0) and
 y = 7867.5-1150.1 x (for A=1).


If one tries to calculate the line equations in belief that the
coefficients are estimates, using treatment contrasts, but in reality
the coefficients are estimates, using sum contrasts, I would expect
that the difference of the slopes of his wrong equations are half of
the difference of the true line equations:

1024.94 - 970.088   = 54.852
(1259.8 - 1150.1)/2 = 54.85
  
Furthermore if I expect to get the intercept of the first wrong
equation if I take the mean of the two correct ones:

(7867.5 + 6276.7)/2 = 7072.1

Therefore I assume that not JMP generated the wrong equations rather than Ben misinterpreted the coefficients.

Regards,

Christoph Buser

--------------------------------------------------------------
Christoph Buser <buser at stat.math.ethz.ch>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)	8092 Zurich	 SWITZERLAND
phone: x-41-44-632-4673		fax: 632-1228
http://stat.ethz.ch/~buser/
--------------------------------------------------------------


Prof Brian Ripley writes:
 > Similar results for what?
 > 
 > The original posting quoted fitted lines for the two groups, which do not 
 > depend on the parametrization.  If that is what you want, it is best to 
 > parametrize the model directly to give them, which was one of Berwin's 
 > alternatives.  I don't have JMP, so do not know how to do that in JMP 
 > (and I still think the solution here is to seem help on using JMP).
 > 
 > On Wed, 1 Mar 2006, Christoph Buser wrote:
 > 
 > > Dear Ben
 > >
 > > Berwin is correct in his answer about different
 > > parameterizations.
 > >
 > > After changing the contrasts in R from treatment to sum
 > >
 > > options(contrasts = c("contr.sum", "contr.poly" ))
 > > test.lm<-lm(y~A*x)
 > > summary(test.lm)
 > >
 > > I got similar results to JMP. Be careful in doing a correct
 > > interpretation of your coefficients, especially when you have an
 > > interaction in your model.
 > >
 > > Regards,
 > >
 > > Christoph Buser
 > >
 > > --------------------------------------------------------------
 > > Christoph Buser <buser at stat.math.ethz.ch>
 > > Seminar fuer Statistik, LEO C13
 > > ETH (Federal Inst. Technology)	8092 Zurich	 SWITZERLAND
 > > phone: x-41-44-632-4673		fax: 632-1228
 > > http://stat.ethz.ch/~buser/
 > > --------------------------------------------------------------
 > >
 > >
 > >
 > >
 > > Ben Ridenhour writes:
 > > > Okay, I took the data to SAS and it gives me the same answer that R does.  I don't why JMP is giving me an incorrect answer, but it seems to be.  (Either that or I have made the same mistake in SAS and R.) Any ideas what JMP might be doing?
 > > >
 > > >  Ben
 > > >
 > > > -------------------------------
 > > > Benjamin Ridenhour
 > > > School of Biological Sciences
 > > > Washigton State University
 > > > P.O. Box 644236
 > > > Pullman, WA 99164-4236
 > > > Phone (509)335-7218
 > > > --------------------------------
 > > > "Nothing in biology makes sense except in the light of evolution."
 > > > -T. Dobzhansky
 > > >
 > > >
 > > > ----- Original Message ----
 > > >
 > > > To: "Liaw, Andy" <andy_liaw at merck.com>; r-help at stat.math.ethz.ch
 > > > Sent: Tuesday, February 28, 2006 10:50:18 PM
 > > > Subject: Re: [R] Help - lm, glm, aov results inconsistent with other stati stical package
 > > >
 > > > Alright, I'll try to give some sample code.
 > > >
 > > >  # create A with 2 levels - 2 and 4
 > > >  > A<-c(rep(2,times=30),rep(4,times=42))
 > > >
 > > >  # make A a factor
 > > >  > A<-as.factor(A)
 > > >
 > > >    #generate 72 random x points
 > > >  > x<-rnorm(72)
 > > >
 > > >   #create different slopes & intercepts for A=2 and A=4
 > > >  #add a random error term
 > > >  > y<-c(x[(1:30)]+1,2*x[(31:72)]-2)+rnorm(72,mean=0,sd=2)
 > > >
 > > >  #use model y~A*x for lm (or glm)
 > > >  > test.lm<-lm(y~A*x)
 > > >
 > > >  #check the output
 > > >  > summary(test.lm)
 > > >
 > > >  This essentially creates something like my data set and uses the same model.  In response to (1), I was just using 0/1 because these are codings for the 2 levels, correct? (i.e., when A=2 the dummy variable=0, when A=4 the dummy variable=1?).  In response to (2), yes, I do want different slopes for the two categories (that is what I am interested in testing).
 > > >
 > > >  If I export the data created above to JMP and run what I believe to be the same model, I get a different answer for my equations :(
 > > >
 > > >
 > > > -------------------------------
 > > > Benjamin Ridenhour
 > > > School of Biological Sciences
 > > > Washigton State University
 > > > P.O. Box 644236
 > > > Pullman, WA 99164-4236
 > > > Phone (509)335-7218
 > > > --------------------------------
 > > > "Nothing in biology makes sense except in the light of evolution."
 > > > -T. Dobzhansky
 > > >
 > > >
 > > > ----- Original Message ----
 > > > From: "Liaw, Andy" <andy_liaw at merck.com>
 > > >
 > > > Sent: Tuesday, February 28, 2006 5:14:57 PM
 > > > Subject: RE: [R] Help - lm, glm, aov results inconsistent with other stati stical package
 > > >
 > > > 1. You have levels(A) as "2" and "4", yet you showed equations for A=0 and
 > > > A=1?
 > > >
 > > > 2. y = A + X + A*X means you're allowing the different groups of A to have
 > > > different slopes.  Probably not what you intended.
 > > >
 > > > 3. It's probably best to provide a small sample of the data (and R code) so
 > > > we know how you got what you got.
 > > >
 > > > Andy
 > > >
 > > > From: Ben Ridenhour
 > > > >
 > > > > Hello,
 > > > >
 > > > >  I 'm sure there must a be a simple explanation for what I'm
 > > > > doing wrong but I am stumped.  I am a novice R user and this
 > > > > has shaken my confidence in what I'm doing! I am trying to
 > > > > run a simple ANCOVA using the model y~A*x, where y and x are
 > > > > continuous and A has two levels.  Everything seems fine until
 > > > > I compare the output with what I get from another statistical
 > > > > package (JMP).  JMP has the model y=A+x+A*x (this should be
 > > > > the same as what I specified to R, correct?).  In comparison
 > > > > I get the line equations
 > > > >
 > > > >  y = 7072.09-1024.94 x (for A=0) and
 > > > >  y = 7875.58-970.088 x (for A=1)
 > > > >
 > > > >  from JMP.  And from R I get
 > > > >
 > > > >  y = 6276.7-1259.8 x (for A=0) and
 > > > >  y = 7867.5-1150.1 x (for A=1).
 > > > >
 > > > >  Obviously, these aren't even close to the same answer.  I've
 > > > > tried this using glm(), lm(), and aov() and as expected they
 > > > > all give the same answer.  If I do
 > > > >
 > > > >  >levels(A)
 > > > >  [1] "2" "4"
 > > > >
 > > > >  which are the two levels of A.  Why can't I get the same
 > > > > answer from JMP as in R?  This is very disturbing to me!
 > > > >
 > > > >  Thanks,
 > > > >  Ben
 > > > >
 > > > >
 > > > > -------------------------------
 > > > > Benjamin Ridenhour
 > > > > School of Biological Sciences
 > > > > Washigton State University
 > > > > P.O. Box 644236
 > > > > Pullman, WA 99164-4236
 > > > > Phone (509)335-7218
 > > > > --------------------------------
 > > > > "Nothing in biology makes sense except in the light of evolution."
 > > > > -T. Dobzhansky
 > > > >
 > > > >
 > > > >
 > > > >     [[alternative HTML version deleted]]
 > > > >
 > > > > ______________________________________________
 > > > > R-help at stat.math.ethz.ch mailing list
 > > > > https://stat.ethz.ch/mailman/listinfo/r-help
 > > > > PLEASE do read the posting guide!
 > > > > http://www.R-project.org/posting-guide.html
 > > > >
 > > > >
 > > >
 > > >
 > > > ------------------------------------------------------------------------------
 > > >
 > > > ------------------------------------------------------------------------------
 > > >
 > > >
 > > >
 > > >
 > > >     [[alternative HTML version deleted]]
 > > >
 > > > ______________________________________________
 > > > R-help at stat.math.ethz.ch mailing list
 > > > https://stat.ethz.ch/mailman/listinfo/r-help
 > > > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 > > >
 > > >
 > > >
 > > >
 > > > 	[[alternative HTML version deleted]]
 > > >
 > > > ______________________________________________
 > > > R-help at stat.math.ethz.ch mailing list
 > > > https://stat.ethz.ch/mailman/listinfo/r-help
 > > > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 > >
 > > ______________________________________________
 > > R-help at stat.math.ethz.ch mailing list
 > > https://stat.ethz.ch/mailman/listinfo/r-help
 > > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 > >
 > 
 > -- 
 > Brian D. Ripley,                  ripley at stats.ox.ac.uk
 > Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 > University of Oxford,             Tel:  +44 1865 272861 (self)
 > 1 South Parks Road,                     +44 1865 272866 (PA)
 > Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list