[R] Out-of-sample predictions with boosting model
Benjamin Hofner
Benjamin.Hofner at imbe.med.uni-erlangen.de
Fri Jul 30 11:48:44 CEST 2010
Hi Travis,
I try to give you some hints that might bring you closer to a solution.
The clue to your problem (as far as I understand it) might just be to
appropriately use the predict function of mboost. You can specify a new
data set (e.g. a part of your original data set not used for estimation)
and
> predict(model, newdata = newdata)
which gives you a vector of predictions as you wanted. Thus, you could,
for example, specify newdata such that you get your one-step ahead
predictions.
To estimate the model only on a subset of the data you could either use
> mboost(y ~ x1 + x2 + x3, data = some_part_of_your_dataset)
or you can apply weights
> model <- mboost(y ~ x1 + x2 + x3, data = data,
+ weights = c(rep(1, 100), rep(0, nrow(data) - 100)))
> predict(model) ## gives you predictions for all observations in data
Now you can extract the subset of out-of-bag predictions, i.e.,
predictions for observations with weight 0.
One further thing to mention:
You term your model blackbox, however you should note that you do NOT
fit a blackbox model but an additive model using P-splines (which is the
default). You can see this if you type, e.g.,
> coef(model)
and look at the names.
Another idea for your data problem might be that you fit ONE model with
country as effect modifier specified via the "by" argument in all
base-learners. A call could look like
> mboost(y ~ bbs(x1, by = country) + bbs(x2, by = country)
+ + bbs(x3, by = country), data = data)
Or you could use random effects via brandom() base-learners. Oh, and
please note that you need to tune your mstop value (e.g. via cvrisk)!
HTH
Benjamin
Travis Berge <travisrhelp at gmail.com> wrote:
> Hi UseRs -
>
> I am new to R, and could use some help making out-of-sample predictions
> using a boosting model (the mboost command). The issue is complicated by the
> fact that I have panel data (time by country), and am estimating the model
> separately for each country. FYI, this is monthly data and I have 1986m1 -
> 2009m12 for 9 countries.
>
> To give you a flavor of what I am doing, here is a simple example to show
> how I make in-sample predictions:
>
> # data has following columns: country year month y x1 x2 x3
> dat = read.csv(data.csv)
>
> # Create function that estimates model, produces in-sample predictions
> bbox = function(df)
> {
> blackbox = mboost(y ~ x1 + x2 + x3)
> predict(blackbox)
> }
>
> # Use lapply to estimate by country
> bycountry = lapply(split(dat, dat$country), bbox)
>
>
> So that in the end I have an object bycountry that contains the in-sample
> predictions of the model, estimated for each country separately. What I
> would like to do is take this model and estimate it for each country using
> some initial data. I.e., estimate Australia with 1986m1-2003m12, make
> prediction about 2004m1, roll data forward. Estimate AUS with 1986m2-2004m1,
> predict 2004m2, etc for all data points. Now do the same for Canada,
> Denmark, etc.
>
> So I guess my problem is twofold. 1) How to make these out-of-sample
> predictions, by country, when my data has not been declared as time-series?
> (I do not think that mboost can handle time-series data...x1 x2 and x3 have
> been lagged appropriately). 2) How to save the one-step ahead predictions
> into a vector?
>
> Any thoughts would be greatly appreciated. Many thanks!
>
> -Travis
>
> [[alternative HTML version deleted]]
More information about the R-help
mailing list