[Rd] non-linear fitting (nls) and confidence limits
Joerg van den Hoff
j.van_den_hoff at fzd.de
Tue Sep 25 15:56:14 CEST 2007
dear list members,
my question concerns computation of confidence intervals in nonlinear
fits with `nls' when weigthing the fit. the seemingly correct procedure
does not work as (I) expected. I'm posting this here since: (A) the
problem might suggest a modification to the `m' component in the return
argument of `nls' (making this post formally OK for this list) and (B) I
got no response on r-help (my "secret" motivation for posting here,
hoping for some clarifications...):
for _unweighted_ fits using `nls' one can compute confidence intervals for the
fitted model function via
#-------------------
se.fit <- sqrt(apply(res$m$gradient(), 1, function(x) sum(vcov(res)*outer(x,x))))
luconf <- yfit + outer(se.fit, qnorm(c(probex, 1 - probex)))
#-------------------
where `res' contains an `nls' object, `x' is the independent variable vector,
`yfit' the corresponding model prediction (`fitted(res)'), `se.fit' the
corresponding standard error and `luconf' the lower and upper confidence
limits at some level specified by `probex'.
when using the same approach with _weighted_ fits (even if all weights are equal
(but not equal to one)), I noted that the confidence limits depend on the
weights in an counter-intuitive, probably erroneous way.
I have tracked my problem down to the fact that `res$m$gradient()' does
_not_ contain the actual gradients w.r.t. the parameters but rather the
gradients multiplied by the sqrt of the weights.
question 1:
is the fact that `m$gradient' in an `nls' object does compute the "scaled"
gradients documented somewhere (I did'nt find any remark)? if no,
can someone please give me a hint what the rationale behind this approach is?
question 2:
am I right to understand, that the above approach to compute `se.fit'
(in this compact form proposed by p. daalgaard on r-help some time ago)
is erroneous (for weighted nls) that I have to undo the multiplication
by sqrt(weights) which is included in `m$gradient' (or compute the
true gradients of my model function otherwise, e.g. with `deriv')?
here is a simple example demonstrating the problem (yes, I know that
this actually is a linar model :-)):
#-----------------------------------------------------------------------------------
probex <- 0.05
x <- 1:10
y <- rnorm(x, x, .8)
w1 <- rep(1, 10)
w2 <- w1; w2[7:10] <- 0.01 * w2[7:10]
res1 <- nls(y ~ a*x + b, data = list(x = x, y = y), start = list(a = 1, b = 0), weights = w1)
res2 <- nls(y ~ a*x + b, data = list(x = x, y = y), start = list(a = 1, b = 0), weights = w2)
yfit1 <- fitted(res1)
yfit2 <- fitted(res2)
se.fit1 <- sqrt(apply(res1$m$gradient(), 1, function(x) sum(vcov(res1)*outer(x,x))))
luconf1 <- yfit1 + outer(se.fit1, qnorm(c(probex, 1 - probex)))
se.fit2 <- sqrt(apply(res2$m$gradient(), 1, function(x) sum(vcov(res2)*outer(x,x))))
luconf2 <- yfit2 + outer(se.fit2, qnorm(c(probex, 1 - probex)))
op <- par(mfrow = c(2,1))
matplot(x, cbind(y, yfit1,luconf1), type = 'blll', pch = 1, col = c(1,2,4,4), lty = 1)
matplot(x, cbind(y, yfit2,luconf2), type = 'blll', pch = 1, col = c(1,2,4,4), lty = 1)
par(op, no.readonly = T)
#-----------------------------------------------------------------------------------
the first fit uses unit weights for all data points (i.e. effectively is
unweighted) and yields reasonable confidence limits.
the second fit uses unequal weights (where the last 4
points have very small weights and are next to irrelevant for the fit result).
the computed confidence intervals in this case are only fine up to point no. 6, but
nonsense afterwards.
question 3:
computing confidence limits for the fitted model is a rather frequent
requirement (and it occures by and then on r-help). would it not be a
reasonable (and small) modification to `nls' to add a further argument
"conf = NA" so that if `nls' is called, e.g. with "conflim = 0.95",
the confidence limits are computed and are returned in a new
component `res$m$conf' of the `m' component of an nls object?
(alternatively, computation of `se.fit' (in the notation used above)
would suffice. this could also be achieved by making the `se.fit' flag in
`predict.nls' operational. either way would seem a valuable improvement
to the nls functionality, I believe.)
any feedback appreciated.
thanks,
joerg
More information about the R-devel
mailing list