[R] loess crash

John Fox jfox at mcmaster.ca
Mon Sep 16 19:59:01 CEST 2002


Dear John,

It's true that the gam function in mgcv fits with splines while loess uses 
local regression, but an even more fundamental difference is that gam fits 
additive models (though, with some care, you can include higher-dimensional 
terms). Given your description of what you plan to do with the fitted 
model, an additive model might be what you want.

More generally, a model that fits five-way interactions may be useful as a 
point of comparison for simpler models, but I doubt that it will provide a 
digestible description of the data.

I hope that this helps,
  John

At 10:45 AM 9/16/2002 -0400, you wrote:
>Thanks for the suggestion. I've only used splines for desnity estimation
>before -- I've never used them for regression (although I'm aware that
>people do). I'll look into it...
>
>
>-----Original Message-----
>From: Rafael A. Irizarry [mailto:ririzarr at jhsph.edu]
>Sent: Monday, September 16, 2002 10:17 AM
>To: jdeke2 at comcast.net
>Cc: 'r-help at stat.math.ethz.ch'
>Subject: RE: [R] loess crash
>
>
>i would suggest looking at the package mgcv.
>you can fit generalized additive models which are useful for what
>you desribe below.
>
>On Mon, 16 Sep 2002, John Deke wrote:
>
> > Ah... I hadn't noticed that option! Thanks... that's a good idea. I'm
>quite
> > happy to use local linear regression.
> >
> > To answer your question -- perhaps I'm off base, but my reason for wanting
> > to do this is that I have a set of explanatory variables that most likely
> > influence my dependent variable in ways that are difficult to model
> > parametrically. That is, I suspect that there are all sorts of
>complementary
> > relationships between these variables, and its not at all clear that
>there's
> > a satisfying theoretical model that would suggest a clear-cut parametric
> > relationship. So, rather than using parametric regression, I'd like to try
> > something non-parametric.
> >
> > My plan for summarizing the results is to find the average marginal effect
> > of each explanatory variable of interest, holding all else constant. Also,
>I
> > would calculate predicted outcomes for combinations of the explanatory
> > variables that are most likely to occur in "the real world".
> >
> > John
> >
> > -----Original Message-----
> > From: John Fox [mailto:jfox at mcmaster.ca]
> > Sent: Monday, September 16, 2002 9:31 AM
> > To: John Deke
> > Cc: r-help at stat.math.ethz.ch
> > Subject: Re: [R] loess crash
> >
> >
> > Dear John,
> >
> > For curiosity, I tried your example under R 1.5.1 on an 800 MHz PC with
>512
> > Mb of memory running Windows 2000. The results were just as you described:
>
> > The four-predictor problem ran essentially instantly, and the
> > five-predictor problem crashed R, again instantly.
> >
> > I also tried making the problem less computationally demanding by
> > specifying locally linear, rather than quadratic, fits; this appears to
> > work:
> >
> >  > loess(y~x1+x2+x3+x4+x5, data2, degree=1)
> > Call:
> > loess(formula = y ~ x1 + x2 + x3 + x4 + x5, data = data2, degree = 1)
> >
> > Number of Observations: 500
> > Equivalent Number of Parameters: 13.5
> > Residual Standard Error: 1.012
> >  >
> >
> >
> > Although something is obviously wrong here, I wonder whether it makes
>sense
> > to fit a local regression with so many predictors (unless the object is to
>
> > compare the general nonparametric fit with some more constrained model):
> > how would you describe the five-dimensional surface that's produced?
> >
> > John
> >
> > At 07:36 AM 9/16/2002 -0400, John Deke wrote:
> > >Here's a simple example that yields the crash:
> > >
> > >library(modreg)
> > >data1 <- array(runif(500*5),c(500,5))
> > >colnames(data1) <- c("x1","x2","x3","x4","x5")
> > >y <-
> >
> >3+2*data1[,"x1"]+15*data1[,"x2"]+13*data1[,"x3"]-8*data1[,"x4"]+14*data1[,"
> > x5"]+rnorm(500)
> > >data2 <- cbind(y,data1)
> > >data2 <- as.data.frame(data2)
> > >result1 <- loess(y~x1+x2+x3+x4,data2)
> > >
> > >To get the crash, I just add x5--
> > >
> > >result1 <- loess(y~x1+x2+x3+x4+x5,data2)
> > >
> > >And bammo -- I'm dead. It doesn't even pause -- Rgui crashes, and I mean
> > >really crashes -- the program is terminated, I get the little Windows
> > >dialogue saying that a log file is being generated -- the whole dramatic
> > >death scene.
> > >
> > >I know its a computationally intensive thing, but the one that doesn't
> > >crash (with four explanatory variables) runs almost instantly. Its hard
>to
> > >see how adding a fifth could be so catastrophic. But I am somewhat new to
>
> > >this particular methodology....
> > >
> > >John
> > >
> > >At 03:38 AM 9/16/2002, Peter Dalgaard BSA wrote:
> > >>John Deke <jdeke2 at comcast.net> writes:
> > >>
> > >> > Hmm... if I reduce the number of observations to just 500, I still
>get
> > >> > the error.
> > >> >
> > >> > I don't think its an issue of colinearity, because I've tried several
> > >> > different combinations of variables, all of which work just fine in
>an
> > >> > OLS or logistic regression.
> > >> >
> > >> > I'm probably doing something stupid, but I'm not seeing it...
> > >> >
> > >> > At 02:00 PM 9/15/2002, John Deke wrote:
> > >> > >Hi,
> > >> > >
> > >> > > I have a data frame with 6563 observations. I can run a regression
> > >> > > with loess using four explanatory variables. If I add a fifth, R
> > >> > > crashes. There are no missings in the data, and if I run a
> > >> > > regression with any four of the five explanatory variables, it
> > >> > > works. Its only when I go from four to five that it crashes.
> > >>
> > >>Hmm... I wouldn't try loess with more than one or two descriptors. I
> > >>mean, it's a smoothing method and representing a smooth function of
> > >>many variables can be computationally demanding.
> > >>
> > >>The Fortran source code for loess is one of the more obfuscated pieces
> > >>of R, but I can see that some structures inside of it are of fixed
> > >>size, which might explain it (BTW: Does R really crash, or just say
> > >>memory exhausted?).
> > >>
> > >>Do you have a simple example that reproduces the crash (using random
> > >>numbers, e.g.)?
> >
> > -----------------------------------------------------
> > John Fox

____________________________
John Fox
Department of Sociology
McMaster University
email: jfox at mcmaster.ca
web: http://www.socsci.mcmaster.ca/jfox
____________________________

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list