[R] loess crash
Liaw, Andy
andy_liaw at merck.com
Tue Sep 17 14:21:18 CEST 2002
Actually, I forgot there's the `locfit' package:
library(locfit)
> fit1 <- locfit(y~x1*x2*x3*x4*x5, data=data2)
> fit1
Call:
locfit(formula = y ~ x1 * x2 * x3 * x4 * x5, data = data2)
Number of observations: 500
Family: Gaussian
Fitted Degrees of freedom: 32.179
Residual scale: 0.954
> summary(fit1)
Estimation type: Local Regression
Call:
locfit(formula = y ~ x1 * x2 * x3 * x4 * x5, data = data2)
Number of data points: 500
Independent variables: x1 x2 x3 x4 x5
Evaluation structure: Rectangular Tree
Number of evaluation points: 32
Degree of fit: 2
Fitted Degrees of Freedom: 32.179
The default settings might be different from loess, though.
Andy
> -----Original Message-----
> From: Liaw, Andy [mailto:andy_liaw at merck.com]
> Sent: Monday, September 16, 2002 4:17 PM
> To: 'John Fox'; jdeke2 at comcast.net
> Cc: r-help at stat.math.ethz.ch
> Subject: RE: [R] loess crash
>
>
> I agree with John mostly. For a model as complicated as
> you're trying to
> fit with loess, you might as well try things like ppr (in
> the `modreg'
> package), MARS (in the 'mda' package) or neural nets (in the 'nnet'
> package), or even randomForest... Actually MARS might offer
> a bit more
> interpretability than others, because of its hierarchical
> construction.
>
> If you do care about `marginal effects' of the predictors,
> then aren't you
> sort of assuming additivity? In which case the additive model is more
> appropriate. If not, the `marginal effects' can be misleading.
>
> In terms of comparing a loess with 5 terms with a less
> complicated model, I
> think it needs to be pointed out that (AFAIK) it can only be
> done on a more
> or less qualitative level, as the models are not nested.
>
> Cheers,
> Andy
>
> > -----Original Message-----
> > From: John Fox [mailto:jfox at mcmaster.ca]
> > Sent: Monday, September 16, 2002 1:59 PM
> > To: jdeke2 at comcast.net
> > Cc: r-help at stat.math.ethz.ch
> > Subject: RE: [R] loess crash
> >
> >
> > Dear John,
> >
> > It's true that the gam function in mgcv fits with splines
> > while loess uses
> > local regression, but an even more fundamental difference is
> > that gam fits
> > additive models (though, with some care, you can include
> > higher-dimensional
> > terms). Given your description of what you plan to do with
> the fitted
> > model, an additive model might be what you want.
> >
> > More generally, a model that fits five-way interactions may
> > be useful as a
> > point of comparison for simpler models, but I doubt that it
> > will provide a
> > digestible description of the data.
> >
> > I hope that this helps,
> > John
> >
> > At 10:45 AM 9/16/2002 -0400, you wrote:
> > >Thanks for the suggestion. I've only used splines for
> > desnity estimation
> > >before -- I've never used them for regression (although I'm
> > aware that
> > >people do). I'll look into it...
> > >
> > >
> > >-----Original Message-----
> > >From: Rafael A. Irizarry [mailto:ririzarr at jhsph.edu]
> > >Sent: Monday, September 16, 2002 10:17 AM
> > >To: jdeke2 at comcast.net
> > >Cc: 'r-help at stat.math.ethz.ch'
> > >Subject: RE: [R] loess crash
> > >
> > >
> > >i would suggest looking at the package mgcv.
> > >you can fit generalized additive models which are useful for what
> > >you desribe below.
> > >
> > >On Mon, 16 Sep 2002, John Deke wrote:
> > >
> > > > Ah... I hadn't noticed that option! Thanks... that's a
> > good idea. I'm
> > >quite
> > > > happy to use local linear regression.
> > > >
> > > > To answer your question -- perhaps I'm off base, but my
> > reason for wanting
> > > > to do this is that I have a set of explanatory variables
> > that most likely
> > > > influence my dependent variable in ways that are
> > difficult to model
> > > > parametrically. That is, I suspect that there are all sorts of
> > >complementary
> > > > relationships between these variables, and its not at all
> > clear that
> > >there's
> > > > a satisfying theoretical model that would suggest a
> > clear-cut parametric
> > > > relationship. So, rather than using parametric
> > regression, I'd like to try
> > > > something non-parametric.
> > > >
> > > > My plan for summarizing the results is to find the
> > average marginal effect
> > > > of each explanatory variable of interest, holding all
> > else constant. Also,
> > >I
> > > > would calculate predicted outcomes for combinations of
> > the explanatory
> > > > variables that are most likely to occur in "the real world".
> > > >
> > > > John
> > > >
> > > > -----Original Message-----
> > > > From: John Fox [mailto:jfox at mcmaster.ca]
> > > > Sent: Monday, September 16, 2002 9:31 AM
> > > > To: John Deke
> > > > Cc: r-help at stat.math.ethz.ch
> > > > Subject: Re: [R] loess crash
> > > >
> > > >
> > > > Dear John,
> > > >
> > > > For curiosity, I tried your example under R 1.5.1 on an
> > 800 MHz PC with
> > >512
> > > > Mb of memory running Windows 2000. The results were just
> > as you described:
> > >
> > > > The four-predictor problem ran essentially instantly, and the
> > > > five-predictor problem crashed R, again instantly.
> > > >
> > > > I also tried making the problem less computationally
> demanding by
> > > > specifying locally linear, rather than quadratic, fits;
> > this appears to
> > > > work:
> > > >
> > > > > loess(y~x1+x2+x3+x4+x5, data2, degree=1)
> > > > Call:
> > > > loess(formula = y ~ x1 + x2 + x3 + x4 + x5, data = data2,
> > degree = 1)
> > > >
> > > > Number of Observations: 500
> > > > Equivalent Number of Parameters: 13.5
> > > > Residual Standard Error: 1.012
> > > > >
> > > >
> > > >
> > > > Although something is obviously wrong here, I wonder
> > whether it makes
> > >sense
> > > > to fit a local regression with so many predictors (unless
> > the object is to
> > >
> > > > compare the general nonparametric fit with some more
> > constrained model):
> > > > how would you describe the five-dimensional surface
> > that's produced?
> > > >
> > > > John
> > > >
> > > > At 07:36 AM 9/16/2002 -0400, John Deke wrote:
> > > > >Here's a simple example that yields the crash:
> > > > >
> > > > >library(modreg)
> > > > >data1 <- array(runif(500*5),c(500,5))
> > > > >colnames(data1) <- c("x1","x2","x3","x4","x5")
> > > > >y <-
> > > >
> > >
> > >3+2*data1[,"x1"]+15*data1[,"x2"]+13*data1[,"x3"]-8*data1[,"x4
> > "]+14*data1[,"
> > > > x5"]+rnorm(500)
> > > > >data2 <- cbind(y,data1)
> > > > >data2 <- as.data.frame(data2)
> > > > >result1 <- loess(y~x1+x2+x3+x4,data2)
> > > > >
> > > > >To get the crash, I just add x5--
> > > > >
> > > > >result1 <- loess(y~x1+x2+x3+x4+x5,data2)
> > > > >
> > > > >And bammo -- I'm dead. It doesn't even pause -- Rgui
> > crashes, and I mean
> > > > >really crashes -- the program is terminated, I get the
> > little Windows
> > > > >dialogue saying that a log file is being generated --
> > the whole dramatic
> > > > >death scene.
> > > > >
> > > > >I know its a computationally intensive thing, but the
> > one that doesn't
> > > > >crash (with four explanatory variables) runs almost
> > instantly. Its hard
> > >to
> > > > >see how adding a fifth could be so catastrophic. But I
> > am somewhat new to
> > >
> > > > >this particular methodology....
> > > > >
> > > > >John
> > > > >
> > > > >At 03:38 AM 9/16/2002, Peter Dalgaard BSA wrote:
> > > > >>John Deke <jdeke2 at comcast.net> writes:
> > > > >>
> > > > >> > Hmm... if I reduce the number of observations to
> > just 500, I still
> > >get
> > > > >> > the error.
> > > > >> >
> > > > >> > I don't think its an issue of colinearity, because
> > I've tried several
> > > > >> > different combinations of variables, all of which
> > work just fine in
> > >an
> > > > >> > OLS or logistic regression.
> > > > >> >
> > > > >> > I'm probably doing something stupid, but I'm not
> seeing it...
> > > > >> >
> > > > >> > At 02:00 PM 9/15/2002, John Deke wrote:
> > > > >> > >Hi,
> > > > >> > >
> > > > >> > > I have a data frame with 6563 observations. I can
> > run a regression
> > > > >> > > with loess using four explanatory variables. If I
> > add a fifth, R
> > > > >> > > crashes. There are no missings in the data, and
> if I run a
> > > > >> > > regression with any four of the five explanatory
> > variables, it
> > > > >> > > works. Its only when I go from four to five that
> > it crashes.
> > > > >>
> > > > >>Hmm... I wouldn't try loess with more than one or two
> > descriptors. I
> > > > >>mean, it's a smoothing method and representing a smooth
> > function of
> > > > >>many variables can be computationally demanding.
> > > > >>
> > > > >>The Fortran source code for loess is one of the more
> > obfuscated pieces
> > > > >>of R, but I can see that some structures inside of it
> > are of fixed
> > > > >>size, which might explain it (BTW: Does R really crash,
> > or just say
> > > > >>memory exhausted?).
> > > > >>
> > > > >>Do you have a simple example that reproduces the crash
> > (using random
> > > > >>numbers, e.g.)?
> > > >
> > > > -----------------------------------------------------
> > > > John Fox
> >
> > ____________________________
> > John Fox
> > Department of Sociology
> > McMaster University
> > email: jfox at mcmaster.ca
> > web: http://www.socsci.mcmaster.ca/jfox
> > ____________________________
> >
> > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
> > -.-.-.-.-.-.-.-.-
> > r-help mailing list -- Read
> http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !) To:
> r-help-request at stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
> _._._._._._._.
> _._
>
>
> --------------------------------------------------------------
> ----------------
> Notice: This e-mail message, together with any attachments,
> contains information of Merck & Co., Inc. (Whitehouse
> Station, New Jersey, USA) that may be confidential,
> proprietary copyrighted and/or legally privileged, and is
> intended solely for the use of the individual or entity named
> in this message. If you are not the intended recipient, and
> have received this message in error, please immediately
> return this by e-mail and then delete it.
>
> ==============================================================
> ================
>
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
> -.-.-.-.-.-.-.-.-
> r-help mailing list -- Read
http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
_._
----------------------------------------------------------------------------
--
Notice: This e-mail message, together with any attachments, contains
information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA) that
may be confidential, proprietary copyrighted and/or legally privileged, and
is intended solely for the use of the individual or entity named in this
message. If you are not the intended recipient, and have received this
message in error, please immediately return this by e-mail and then delete
it.
============================================================================
==
------------------------------------------------------------------------------
Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it.
==============================================================================
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list