[R] [Fwd: Re: [Fwd: failure delivery]]

Prof J C Nash nashjc at uottawa.ca
Thu May 26 01:30:35 CEST 2005

I appear to have hit one of the "drop" issues raised in some discussions
a couple of years ago by Frank Harrell. They don't seem to have been
fixed, and I'm under some pressure to get a quick solution for a
forecasting task I'm doing.

I have been modelling some retail sales data, and the days just after
Thanksgiving (US version!) are important. So I created some dummy
variables by a factor called "events" and (really ugly!!) have TG, TG+1,
TG+2, etc. Now I also have DEC1, and the calendar and data are such
that in the period I'm forecasting I have TG+3 but this is
NOT in the estimation data. There are also weekday factors (wdf) and some
cross factors (Saturday + some special days is highly significant).

The model is   Sales ~ daynumber + wdf*events + wdf*specialevents

where daynumber is the day sequence in the year and specialevents is a
set of factors to tell when the business has promotional activities.
The entire model has about 330 coefficients (it seriously needs some
economizing), but only about 140 of these are estimated.

I'm using lm() to do the estimation. I plan to change the model and possibly
the method once I've seen if forecasting works. The current model "works"
moderately well for in-sample fits, though I suspect there is too
much variability generally.

I want to advance 1 week at a time, reestimate, and iterate. This is
a test case where we know the "future". I can get this to work for a few
weeks starting at 20041101, but then get an error msg

		"new factor levels in 'events' ...".

I have tried putting drop.factor.levels = TRUE in predict(), but this
didn't seem to register. Also tried suggestion from web to use

          ifac <- sapply(estndta,is.factor)
          fcstdta[ifac] <- lapply(fcstdta[ifac],factor)

Still get same error.

I've tried a couple of dozen variants on this with no joy.

Finally have tried using the full data set in lm() but set weights for
the estimation period to 1, and those for the forecast period to 0. This
"computes", but the results include NAs at a point where there seems no
reason for them.

I'm starting to suspect that there's some sort of bug somewhere in the R

  Any advice welcome.

John C. Nash, School of Management, University of Ottawa,
Vanier Hall 451, 136 Jean-Jacques Lussier Private,
P.O. Box 450, Stn A, Ottawa, Ontario, K1N 6N5 Canada
email: nashjc on mail server uottawa.ca, voice mail: 613 562 5800 X 4796
fax 613 562 5164,  Web URL = http://macnash.admin.uottawa.ca
"Practical Forecasting for Managers" web site is at

More information about the R-help mailing list