[R] Help with predict.lm

Tue Apr 19 16:20:26 CEST 2005

On 19-Apr-05 Mike White wrote:
> Hi
> I have measured the UV absorbance (abs) of 10 solutions
> of a substance at known concentrations (conc) and have
> used a linear model to plot a calibration graph with
> confidence limits.  I now want to predict the concentration
> of solutions with UV absorbance results given in the 
> new.abs data.frame, however predict.lm only appears to work
> for new "conc" variables not new "abs" variables.
> 
> I have search the help files and did find a similar problem
> in June 2000, but unfortunately no solution was offered.
> Any help and how to use predict.lm with the new "abs" data
> to predict "conc" with confidence limits would be appreciated.
> 
>     conc<-seq(100, 280, 20) #  mg/l
>     abs<-c(1.064, 1.177, 1.303, 1.414, 1.534, 1.642, 1.744,
>            1.852, 1.936,2.046) # absorbance units
>     lm.calibration<-lm(abs ~ conc)
>     pred.w.plim <- predict(lm.calibration,  interval="prediction")
>     pred.w.clim <- predict(lm.calibration,  interval="confidence")
>     matplot(conc, cbind(pred.w.clim, pred.w.plim[,-1]),
>        lty=c(1,2,2,3,3), type="l", ylab="abs", xlab= "conc mg/l")
>     points(conc, abs, pch=21, col="blue")
> 
>     new.abs<-data.frame(abs=c(1.251, 1.324, 1.452))
> 
>     predict(calibration.lm, new.abs) # does not work

Apart from the apparent typo "calibration.lm" which Matthias pointed
out, this will not work anyway since "predict" predicts in the
same direction as the model was fitted in the first place.

You have fitted "abs ~ conc", so the lm object lm.calibration
contains a representation of the model equivalent to

  abs = a + b*conc

When you use predict(calibration.lm, new.abs) it will expect
the dataframe "new.abs" to contain values in a list named "conc".
Since you have supplied a list named "abs", this is apparently
ignored, and as a result the function uses the default dataframe
which is the data encapsulated in the calibration.lm object.

You can verify this by comparing the outputs of

  predict(lm.calibration, new.abs)

and

  predict(lm.calibration)

You will see that they are identical!

Either way, "predict" predicts "ans" from "conc" by using the
above equation. It does not attempt to invert the equation
even if you call the "new" data "abs".

Given the apparently extremely close fit, in your data, to
a straight line, you are likely to get perfectly adequate
results simply by doing the regression the other way round,
i.e.

> lm.calibration2<-lm(conc ~ abs)
> new.abs<-data.frame(abs=c(1.251, 1.324, 1.452))
> predict(lm.calibration2,new.abs)
       1        2        3 
131.3813 144.7453 168.1782 

which looks about right!

Strictly speaking, this approach is somewhat invalid, since
under the model

   abs = a + b*conc + error

the "inverse regression"

   conc = (abs - a)/b = A + B*abs

does not have the standard statistical properties because
of the theoretical possiblity (P > 0) that the estimated b
can be arbitrarily close to 0 (with the result that, theoretically,
the estimate of B = 1/b has neither expectation nor variance),
and the estimates of A and B could be a long way out.

Also, the confidence intervals you would get by using the
residuals from this "inverse regression" in the usual way
would not be valid, since they would be finite for all
values of the confidence level. In reality, for a sufficiently
high confidence level (but still short of 100%) you will get
a confidence "interval" consisting of several parts with
gaps between them, or an infinite confidence interval.

This takes place at confidence levels corresponding to
the significance level at which you can no longer reject
the hypothesis "b = 0" in the first euqation. In your case
this significance level would be extremely small (I get
2.71e-12, so you can get confidence intervals up to
at least 99.999999999% confidence level before you need to
worry much about the confidence interval!

In your case where there seems to be a very tight linear
relationship, the possible misrepresentation of the
inverse relationship due to this effect is very unlikely
to have occurred.

The correct approach has in the past been the subject of
at times quite controversial discussion, under the title
indeed of "The Calibration Problem". Nowadays this problem
would be approached by making the concentrations to be
"predicted" additional unknown parameters, and evaluating
likelihood ratios for possible values of these.

I don't have time at the moment to go into this approach,
but will try to write something later.

It seems there is nothing in R at present for this kind of
general (and basically simple) use, though maybe there is
something usable in the package mscalib (which I have not
studied); however, by the look of the contents, the routines
therein may be too elaborately specialised towards a specific
type of application to be convenient for general use in
conjunction with lm.

Till later,
Ted.

 it might be
worth spelling it out for
people who would like to implement it themselves. Later.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 19-Apr-05                                       Time: 15:20:26
------------------------------ XFMail ------------------------------