[R] Re: Load prediction
ripley at stats.ox.ac.uk
Sun Jun 23 08:50:16 CEST 2002
On Sat, 22 Jun 2002, Johanus Dagius wrote:
> I have received no reply to my previous query, so I
> will try again.
> I have tried glm on this problem with the default
> parameters and it produced a model with mean absolute
> error of approx 300 MWhrs. (The data is roughly
> normally distributed with a mean of 1700 MWhrs and
> SD=500). I know very little about R and so I am not
> sure what parameter needs to be tweaked from here.
> Using Cubist (www.rulequest.com) I have created a
> predictive model whose mean error is around 100 MWhrs.
> Cubist builds a recursively partitioned tree using
> piecewise linear regression. Cubist also outputs a
> nice set of rules which explain the model in terms of
> feature splits.
> I think R should give a comparable result. Does R have
> a method of piecewise approximation like this? I would
> like to compare R against Cubist. What method(s)in R
> must I learn to do this?
R is an extensible software system, not a set of model-building
techniques. You really didn't tell us anything like enough (either time)
about your data. (E.g. Cubist is designed for thousands of records and
tens to hundreds of variables: you showed five and around seven.) But as
a general principle, this looks as if glm (as distinct from lm) is not
needed, and the currently most promising prediction techniques for
continuous quantities are thought to be neural networks (in the VR bundle)
and SVMs (in package e1071). R also has several packages for tree-building
(see the FAQ), and you could implement something very like Cubist in R.
So `to compare R against Cubist' is not well-defined, both for `R' and for
the criteria to be used.
My advice would be to engage a statistical consultant to guide you.
> At 12:13 PM 6/21/02 -0700, I wrote:
> > Hello,
> >This is perhaps more of a regression question than R,
> >but I am learning both, so would appreciate your
> >wisdom here.
> >I have some data which reflects power load for an
> >electrical generating system, with some temporal
> >features. The data fields look like this:
> >4455 5 13 92 13 4 70 63 1617
> >4456 3 9 92 13 2 73 57 1397
> >4457 10 5 92 8 2 58 58 1501
> >4458 11 24 92 18 3 56 56 1885
> >4459 9 27 92 8 1 65 65 1402
> >What R methodology is likely to produce the most
> >accurate load forecast prediction for a given date
> >temperatures for problems like this?
> >Thank you,
> >Johanus Dagius
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272860 (secr)
Oxford OX1 3TG, UK Fax: +44 1865 272595
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
More information about the R-help