[R] Comparison of linear models

Fri Jul 28 23:15:30 CEST 2006

I have one addition to Rolf's thorough advice: if your goal is to try
to find evidence that the two procedures are equivalent then the
tests that you should consider are called equivalence tests.  These do
not come from lm.

The most popular test is TOST, the two one-sided test, and it doesn't
really require a package to implement.  Briefly, the alpha=0.05 test
might proceed as follows.

1) You establish a subjective interval around the value that you wish
   to test.  In the case of trying to assess the evidence that two
   population means for measured heights are the same, for example,
   you might say that the subjective interval for the difference
   between the two means is 0, +/- 2 cm.  The magnitude of the
   interval depends on what you think is an important deviation.

2) Compute two one-sided 1-alpha confidence intervals for the
   difference between the two means, one upper, and one lower.  Take
   the intersection of the two intervals.  (NB in this example it is
   mathematically equivalent to a single, two-sided 1-2*alpha
   confidence interval but this is only true in simple cases).

3) If the intersection is entirely within the subjective interval
   established in step 1) then you reject the null hypothesis of
   difference between the population means.

There is not very much literature on the question.  The originating
articles are:

@Article{schuirmann-1981, 
author = {D. L. Schuirmann},
title ={On hypothesis testing to determine if the mean of a normal distribution is contained in a known interval},
journal ={Biometrics},
year = 1981,
volume = 37,
pages =617
} 

@Article{westlake-1981, 
author = {W. J. Westlake},
title ={Response to {T.B.L. Kirkwood}: bioequivalence testing--a need to rethink},
journal ={Biometrics},
year = 1981,
volume = 37,
pages ={589--594} 
} 

I also recommend:

@Article{BH96:equivalence,
  author =       {R. L. Berger and J. C. Hsu},
  title =        {Bioequivalence trials, intersection-union tests and
  equivalenc
e confidence sets},
  journal =      {Statistical Science},
  year =         1996,
  volume =       11,
  number =       4,
  pages =        {283--319}
}

Finally, there is a nice recent book: 

@Book{W03:equivalence,
  author =       {S. Wellek},
  title =        {Testing statistical hypotheses of equivalence},
  publisher =    {Chapman and Hall/CRC},
  year =         2003
}

There is also an equivalence package on CRAN, which has some other
tests, graphical procedures, and references to some expository
articles (mine and others).

Cheers

Andrew

On Fri, Jul 28, 2006 at 09:10:21AM -0300, Rolf Turner wrote:
> 
> Fabien Lebugle wrote:
> 
> > I am a master student. I am currently doing an internship.  I would
> > like to get some advices about the following issue: I have 2 data
> > sets,  both containing the same variables, but the data were measured
> > using two different procedures. I want to know if the two procedures
> > are equivalent.  Up to know, I have built one linear model for each
> > dataset. The two models have the same form. I would like to compare
> > these two models: are they identical? Are they different? By how
> > much?
> > 
> > Please, could you tell me which R procedure I should use? I have been 
> > searching the list archive, but without success...
> 
> 	This is not a question of ``which R procedure'' but rather a
> 	question of understanding a bit about statistics and linear
> 	models.  You say you are a ``master's student''; I hope you
> 	are not a master's student in *statistics*, given that you
> 	lack this (very) basic knowledge!  If you are a student in
> 	some other discipline, I guess you may be forgiven.
> 
> 	The ``R procedure'' that you need to use is just lm()!
> 
> 	Briefly, what you need to do is combine your two data
> 	sets into a *single* data set (using rbind should work),
> 	add in a grouping variable (a factor with two levels,
> 	one for each measure procedure) e.g.
> 
> 		my.data$gp <- factor(rep(c(1,2),c(n1,n2)))
> 
> 	where n1 and n2 are the sample sizes for procedure 1 and
> 	procedure 2 respectively.
> 
> 	Then fit linear models with formulae involving the
> 	grouping factor (``gp'') as well as the other predictors,
> 	and test for the ``significance'' of the terms in
> 	the model that contain ``gp''.  You might start with
> 
> 		fit <- lm(y~.*gp,data=my.data)
> 		anova(fit)
> 
> 	where ``y'' is (of course) your reponse.
> 
> 	You ought to study up on the underlying ideas of inference
> 	for linear models, and the nature of ``factors''.  John Fox's
> 	book ``Applied Regression Analysis, Linear Models, and
> 	Related Methods'' might be a reasonable place to start.
> 
> 	Bon chance.
> 
> 				cheers,
> 
> 					Rolf Turner
> 					rolf at math.unb.ca
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Andrew Robinson  
Department of Mathematics and Statistics            Tel: +61-3-8344-9763
University of Melbourne, VIC 3010 Australia         Fax: +61-3-8344-4599
Email: a.robinson at ms.unimelb.edu.au         http://www.ms.unimelb.edu.au