[R] [Fwd: Re: [Fwd: failure delivery]]

Uwe Ligges ligges at statistik.uni-dortmund.de
Thu May 26 16:59:25 CEST 2005

Can you please specify a small reproducible example?

Uwe Ligges

Prof J C Nash wrote:

> I appear to have hit one of the "drop" issues raised in some discussions
> a couple of years ago by Frank Harrell. They don't seem to have been
> fixed, and I'm under some pressure to get a quick solution for a
> forecasting task I'm doing.
> I have been modelling some retail sales data, and the days just after
> Thanksgiving (US version!) are important. So I created some dummy
> variables by a factor called "events" and (really ugly!!) have TG, TG+1,
> TG+2, etc. Now I also have DEC1, and the calendar and data are such
> that in the period I'm forecasting I have TG+3 but this is
> NOT in the estimation data. There are also weekday factors (wdf) and some
> cross factors (Saturday + some special days is highly significant).
> The model is   Sales ~ daynumber + wdf*events + wdf*specialevents
> where daynumber is the day sequence in the year and specialevents is a
> set of factors to tell when the business has promotional activities.
> The entire model has about 330 coefficients (it seriously needs some
> economizing), but only about 140 of these are estimated.
> I'm using lm() to do the estimation. I plan to change the model and 
> possibly
> the method once I've seen if forecasting works. The current model "works"
> moderately well for in-sample fits, though I suspect there is too
> much variability generally.
> I want to advance 1 week at a time, reestimate, and iterate. This is
> a test case where we know the "future". I can get this to work for a few
> weeks starting at 20041101, but then get an error msg
>         "new factor levels in 'events' ...".
> I have tried putting drop.factor.levels = TRUE in predict(), but this
> didn't seem to register. Also tried suggestion from web to use
>          ifac <- sapply(estndta,is.factor)
>          fcstdta[ifac] <- lapply(fcstdta[ifac],factor)
> Still get same error.
> I've tried a couple of dozen variants on this with no joy.
> Finally have tried using the full data set in lm() but set weights for
> the estimation period to 1, and those for the forecast period to 0. This
> "computes", but the results include NAs at a point where there seems no
> reason for them.
> I'm starting to suspect that there's some sort of bug somewhere in the R
> internals.
>  Any advice welcome.

More information about the R-help mailing list