[R] newdata for predict.lm() ??
Boris Steipe
bor|@@@te|pe @end|ng |rom utoronto@c@
Wed Nov 4 10:50:38 CET 2020
Can't get data from a data frame into predict() without a detour that seems quite unnecessary ...
Reprex:
# Data frame with simulated data in columns "h" (independent) and "w" (dependent)
DAT <- structure(list(h = c(2.174, 2.092, 2.059, 1.952, 2.216, 2.118,
1.755, 2.060, 2.136, 2.126, 1.792, 1.574,
2.117, 1.741, 2.295, 1.526, 1.666, 1.581,
1.522, 1.995),
w = c(90.552, 89.518, 84.124, 94.685, 94.710, 82.429,
87.176, 90.318, 76.873, 84.183, 57.890, 62.005,
84.258, 78.317,101.304, 64.982, 71.237, 77.124,
65.010, 81.413)),
row.names = c( "1", "2", "3", "4", "5", "6", "7",
"8", "9", "10", "11", "12", "13", "14",
"15", "16", "17", "18", "19", "20"),
class = "data.frame")
myFit <- lm(DAT$w ~ DAT$h)
coef(myFit)
# (Intercept) DAT$h
# 11.76475 35.92002
# Create 50 x-values with seq() to plot confidence intervals
myNew <- data.frame(seq(min(DAT$h), max(DAT$h), length.out = 50))
pc <- predict(myFit, newdata = myNew, interval = "confidence")
# Warning message:
# 'newdata' had 50 rows but variables found have 20 rows
# Problem: predict() was not able to take the single column in myNew
# as the independent variable.
# Ugly workaround: but with that everything works as expected.
xx <- DAT$h
yy <- DAT$w
myFit <- lm(yy ~ xx)
coef(myFit)
myNew <- data.frame(seq(min(DAT$h), max(DAT$h), length.out = 50))
colnames(myNew) <- "xx" # This fixes it!
pc <- predict(myFit, newdata = myNew, interval = "confidence")
str(pc)
# So: specifying the column in newdata to have same name as the coefficient
# name should work, right?
# Back to the original ...
myFit <- lm(DAT$w ~ DAT$h)
colnames(myNew) <- "`DAT$h`"
# ... same error
colnames(myNew) <- "h"
# ... same error again.
Bottom line: how can I properly specify newdata? The documentation is opaque. It seems the algorithm is trying to EXACTLY match the text of the RHS of the formula, which is unlikely to result in a useful column name, unless I assign to an intermediate variable. There must be a better way ...
Thanks!
Boris
More information about the R-help
mailing list