[R] test the significances of two regression lines

Mon Aug 6 13:16:09 CEST 2007

On 06-Aug-07 10:32:50, Luis Ridao Cruz wrote:
> R-help,
> 
> I'm trying to test the significance of two regression lines
> , i.e. the significance of the slopes from two samples
> originated from the same population.
> 
> Is it correct if I fit a liner model for each sample and
> then test the slope signicance with 'anova'. Something like this:
> 
> lm1 <- lm(Y~ a1 + b1*X)    # sample 1
> lm2 <- lm(Y~ a2 + b2*X)    # sample 2
> 
> anova(lm1, lm2)

No, this will not work. From "?anova":

Warning:
  The comparison between two or more models will only be valid if
  they are fitted to the same dataset.

which is not the case in your example. One way to proceed is to
merge the two datasets, and introduve a factor which identifies
the dataset. For example:

  x1<-rnorm(100) ; x2<-rnorm(100)
  y1 <- 0.2 + 0.1*x1 + 0.05*rnorm(100)
  y2 <- 0.2 + 0.12*x2 + 0.05*rnorm(100)
  x <- c(x1,x2)
  y <- c(y1,y2)
  S <- factor(c(rep(0,100),rep(1,100)))
  lm12 <- lm(y ~ x*S)

First look at the fit of y1~x1:
  summary(lm(y1~x1))
Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.206042   0.004647   44.34   <2e-16 ***
x1          0.0913820.091382   0.004768   19.16   <2e-16 ***

Then the fit of y2~x2:
  summary(lm(y2~x2))
Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.208216   0.005171   40.26   <2e-16 ***
x2          0.118840   0.005009   23.73   <2e-16 ***

so the estimated slopes idiffere by 0.118840 - 0.091382 = 0.027458
But what is the "significance" of this difference?

Now:
  summary(lm12)
Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.206042   0.004923  41.852  < 2e-16 ***
x           0.091382   0.005052  18.088  < 2e-16 ***
S1          0.002174   0.006953   0.313 0.754926    
x:S1        0.027457   0.006939   3.957 0.000106 ***

so the "x:S1" value is the same as the difference in slopes
as estimated from lm1 and lm2; but now we have a standard error
and a P-value for it. You can also use anova now:

  anova(lm12)
Response: y
           Df  Sum Sq Mean Sq  F value    Pr(>F)    
x           1 2.26537 2.26537 946.2702 < 2.2e-16 ***
S           1 0.00015 0.00015   0.0614 0.8045253    
x:S         1 0.03749 0.03749  15.6599 0.0001060 ***
Residuals 196 0.46922 0.00239                       

so you get the same P-value, though with anova() you do not see
the actual estimate of the difference between the slopes.

Hoping this helps,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <ted.harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 06-Aug-07                                       Time: 12:16:06
------------------------------ XFMail ------------------------------