[Rd] problem with zero-weighted observations in predict.lm?
William Dunlap
wdunlap at tibco.com
Tue Jul 27 20:48:56 CEST 2010
In modelling functions some people like to use
a weight of 0 to drop an observation instead of
using a subset value of FALSE. E.g.,
weights=c(0,1,1,...)
instead of
subset=c(FALSE, TRUE, TRUE, ...)
to drop the first observation.
lm() and summary.lm() appear to treat these in the
same way, decrementing the number of degrees of
freedom for each dropped observation. However,
predict.lm() does not treat them the same. It
doesn't seem to diminish the df to account for the
0-weighted observations. E.g., the last printout
from the following script is as follows, where
predw is the prediction from the fit that used
0-weights and preds is from using FALSE's in the
subset argument. Is this difference proper?
predw preds
$fit $fit
fit lwr upr fit lwr upr
1 1.544302 1.389254 1.699350 1 1.544302 1.194879 1.893724
2 1.935504 1.719482 2.151526 2 1.935504 1.448667 2.422341
$se.fit $se.fit
1 2 1 2
0.06723657 0.09367810 0.1097969 0.1529757
$df $df
[1] 8 [1] 3
$residual.scale $residual.scale
[1] 0.1031362 [1] 0.1684207
### start of script ###
data <- data.frame(x=1:10, y=log(1:10))
fitw <- lm(data=data, y~x, weights=as.numeric(x<=5))
fits <- lm(data=data, y~x, subset=x<=5)
fitw$df.residual == 3 && fits$df.residual == 3 # TRUE
identical(coef(fitw), coef(fits)) # TRUE
sumw <- summary(fitw)
sums <- summary(fits)
identical(sumw$df, sums$df) # TRUE
predw <- predict(fitw, interval="confidence",
se.fit=TRUE, newdata=data.frame(x=c(4.5,5.5)))
preds <- predict(fits, interval="confidence",
se.fit=TRUE, newdata=data.frame(x=c(4.5,5.5)))
all.equal(predw, preds) # lots of differences
sideBySide <- function (a, b, argNames)
{
# print objects a and b side by side
oldWidth <- options(width = getOption("width")/2 - 4)
on.exit(options(oldWidth))
if (missing(argNames)) {
argNames <- c(deparse(substitute(a))[1],
deparse(substitute(b))[1])
}
pa <- capture.output(print(a))
pb <- capture.output(print(b))
nlines <- max(length(pa), length(pb))
length(pa) <- nlines
length(pb) <- nlines
pb[is.na(pb)] <- ""
pa[is.na(pa)] <- ""
retval <- cbind(pa, pb, deparse.level = 0)
dimnames(retval) <- list(rep("",nrow(retval)), argNames)
noquote(retval)
}
# lwr, upr, se.fit, df, residual.scale differ
sideBySide(predw, preds)
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
More information about the R-devel
mailing list