[Rd] Model forecasts with new factor levels - predict.warn
Prof J C Nash
nashjc at uottawa.ca
Wed Aug 24 22:56:54 CEST 2005
predict.warn() -- a function to display factor levels in new data
for linear model prediction that do not exist in the
estimating data.
Date: 2005-8-24
From: John C. Nash (with thanks to Uwe Ligges for suggestions)
nashjc at uottawa.ca
Motivation: In computing predictions from a linear model using factors,
it is possible to introduce new factor levels. This was encountered on
a practical level in forecasting retail sales where certain events
had not occurred during the estimation period but did occur during
the validation (i.e., trial forecast) period. predict() then gives
an error message that it cannot continue because the new data has
new levels of a factor. Indeed, it will give the new level of the
first such factor found. predict.warn is designed to give more
detailed information. for example, if there are other factor levels
that are also "new".
The underlying interest is to find a suitable method for introducing
estimates of the effects of such "new" factor levels into the model,
and I welcome exchanges on how that may be done. Indeed, one of my
main interests in submitting this tool is to learn how that may be
done appropriately and with suitable information to the user to avoid
egregious errors. For example, a new sales promotion may have many
similarities with existing promotions, so a rough estimate for the
effect may be possible, allowing for rudimentary validation testing
and for "working" values of forecasts.
Unfortunately, the lack of some facility to provide "guesstimates"
in this way leads users to ignore R or other tools and to work with
Excel. I have experience of this situation arising in practise, and
we may anticipate that these users will be difficult to convert to
using better tools. I believe we must find ways to allow users to
carry out such operations in as safe and informed manner possible.
Ideally, the follow-on to predict.warn should be a function called
something like predict.expand or predict.force that asks the user
to supply values (or possibly sets or distributions) of the "new"
factor level estimates.
Sample function:
predict.warn<-function (object, newdata,
...)
# this is function predict.warn.R
# Tries to warn if we do not have appropriate factor levels in
estimation data
# that appear in the newdata.
# It does NOT yet test for non-factor variables, but that would also be
nice.
# However, we usually assume that numerical variables can be
interpolated and
# extrapolated.
{ message("predict.warn -- check for factor levels not present")
message("in linear models with factors")
message(" ")
if (missing(newdata) || is.null(newdata)) {
# Without newdata, issue warning and return
message("You do not have any new data.")
}
else {
xlev <- object$xlevels
nn<-names(xlev)
dnn<-length(nn)
message("there are ",dnn," factor names ")
dostop<-FALSE # Ensure the return value is defined
for (i in seq(along=nn)) {
message("Factor ",nn[i]," in estimation data")
ofac <- object[["model"]][[nn[i]]]
print(ofac) # diagnostic
oldlev<-levels(factor(ofac))
nfac <- newdata[[nn[i]]] # ?? note need for double brackets. Why?
message("Factor ",nn[i], " in new data")
print(nfac) # diagnostic
message("about to try levels on newfac")
newlev<-levels(factor(nfac))
for (j in seq(along=newlev)) {
if (!(newlev[j] %in% oldlev)) {
message("New level ", newlev[j]," not found in estimation levels of
factor ", nn[i])
dostop = TRUE
}
}
} # end loop over names
} # end part where we DO have new data
return(dostop) # TRUE if we should stop
}
# end of predict.warn code
Test example:
# A simple test of predict.warn.R for 2 factors
# J C Nash 20050822
rm(list=ls())
source("predict.warn.R")
x<-c(1, 2, 3, 4, 5, 6, 7, 8)
fac1<-c("A", "A", "", "", "", "B", "")
fac1<-c(fac1, "A")
fac1
y<-c( 2, 4, 5, 8, 9.5, 12, 13, 17)
fac2<-c(1,1,1,2,2,1,2,3)
fac2<-as.factor(fac2) # Note: or else defaults to num
joe<-data.frame(x, y, fac1, fac2)
bill<-subset(joe, x<=5)
mary<-subset(joe, x>5)
message("bill")
str(bill)
print(bill)
message("mary")
str(mary)
print(mary)
est<-lm(y ~ fac1 + fac2 + x, bill)
print(est)
test<-predict.warn(est, newdata = mary)
message('predict.warn returns ',test)
message("\n")
message("Now try predict()")
predict(est, newdata = mary)
# ======================= end of msg ===========================
--
John C. Nash, School of Management, University of Ottawa,
Vanier Hall 451, 136 Jean-Jacques Lussier Private,
P.O. Box 450, Stn A, Ottawa, Ontario, K1N 6N5 Canada
email: nashjc on mail server uottawa.ca, voice mail: 613 562 5800 X 4796
fax 613 562 5164, Web URL = http://macnash.admin.uottawa.ca
"Practical Forecasting for Managers" web site is at
http://www.arnoldpublishers.com/support/nash/
More information about the R-devel
mailing list