[R] loess crash

John Deke JDeke at mathematica-mpr.com
Mon Sep 16 15:51:36 CEST 2002


Ah... I hadn't noticed that option! Thanks... that's a good idea. I'm quite
happy to use local linear regression.

To answer your question -- perhaps I'm off base, but my reason for wanting
to do this is that I have a set of explanatory variables that most likely
influence my dependent variable in ways that are difficult to model
parametrically. That is, I suspect that there are all sorts of complementary
relationships between these variables, and its not at all clear that there's
a satisfying theoretical model that would suggest a clear-cut parametric
relationship. So, rather than using parametric regression, I'd like to try
something non-parametric. 

My plan for summarizing the results is to find the average marginal effect
of each explanatory variable of interest, holding all else constant. Also, I
would calculate predicted outcomes for combinations of the explanatory
variables that are most likely to occur in "the real world". 

John

-----Original Message-----
From: John Fox [mailto:jfox at mcmaster.ca]
Sent: Monday, September 16, 2002 9:31 AM
To: John Deke
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] loess crash


Dear John,

For curiosity, I tried your example under R 1.5.1 on an 800 MHz PC with 512 
Mb of memory running Windows 2000. The results were just as you described: 
The four-predictor problem ran essentially instantly, and the 
five-predictor problem crashed R, again instantly.

I also tried making the problem less computationally demanding by 
specifying locally linear, rather than quadratic, fits; this appears to
work:

 > loess(y~x1+x2+x3+x4+x5, data2, degree=1)
Call:
loess(formula = y ~ x1 + x2 + x3 + x4 + x5, data = data2, degree = 1)

Number of Observations: 500
Equivalent Number of Parameters: 13.5
Residual Standard Error: 1.012
 >


Although something is obviously wrong here, I wonder whether it makes sense 
to fit a local regression with so many predictors (unless the object is to 
compare the general nonparametric fit with some more constrained model): 
how would you describe the five-dimensional surface that's produced?

John

At 07:36 AM 9/16/2002 -0400, John Deke wrote:
>Here's a simple example that yields the crash:
>
>library(modreg)
>data1 <- array(runif(500*5),c(500,5))
>colnames(data1) <- c("x1","x2","x3","x4","x5")
>y <- 
>3+2*data1[,"x1"]+15*data1[,"x2"]+13*data1[,"x3"]-8*data1[,"x4"]+14*data1[,"
x5"]+rnorm(500)
>data2 <- cbind(y,data1)
>data2 <- as.data.frame(data2)
>result1 <- loess(y~x1+x2+x3+x4,data2)
>
>To get the crash, I just add x5--
>
>result1 <- loess(y~x1+x2+x3+x4+x5,data2)
>
>And bammo -- I'm dead. It doesn't even pause -- Rgui crashes, and I mean 
>really crashes -- the program is terminated, I get the little Windows 
>dialogue saying that a log file is being generated -- the whole dramatic 
>death scene.
>
>I know its a computationally intensive thing, but the one that doesn't 
>crash (with four explanatory variables) runs almost instantly. Its hard to 
>see how adding a fifth could be so catastrophic. But I am somewhat new to 
>this particular methodology....
>
>John
>
>At 03:38 AM 9/16/2002, Peter Dalgaard BSA wrote:
>>John Deke <jdeke2 at comcast.net> writes:
>>
>> > Hmm... if I reduce the number of observations to just 500, I still get
>> > the error.
>> >
>> > I don't think its an issue of colinearity, because I've tried several
>> > different combinations of variables, all of which work just fine in an
>> > OLS or logistic regression.
>> >
>> > I'm probably doing something stupid, but I'm not seeing it...
>> >
>> > At 02:00 PM 9/15/2002, John Deke wrote:
>> > >Hi,
>> > >
>> > > I have a data frame with 6563 observations. I can run a regression
>> > > with loess using four explanatory variables. If I add a fifth, R
>> > > crashes. There are no missings in the data, and if I run a
>> > > regression with any four of the five explanatory variables, it
>> > > works. Its only when I go from four to five that it crashes.
>>
>>Hmm... I wouldn't try loess with more than one or two descriptors. I
>>mean, it's a smoothing method and representing a smooth function of
>>many variables can be computationally demanding.
>>
>>The Fortran source code for loess is one of the more obfuscated pieces
>>of R, but I can see that some structures inside of it are of fixed
>>size, which might explain it (BTW: Does R really crash, or just say
>>memory exhausted?).
>>
>>Do you have a simple example that reproduces the crash (using random
>>numbers, e.g.)?

-----------------------------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario, Canada L8S 4M4
email: jfox at mcmaster.ca
phone: 905-525-9140x23604
web: www.socsci.mcmaster.ca/jfox
-----------------------------------------------------
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list