[R] What is the most useful way to detect nonlinearity in lo
Liaw, Andy
andy_liaw at merck.com
Mon Dec 6 03:26:02 CET 2004
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of
> Ted.Harding at nessie.mcc.ac.uk
> Sent: Sunday, December 05, 2004 7:14 PM
> To: r-help at stat.math.ethz.ch
> Subject: Re: [R] What is the most useful way to detect
> nonlinearity in lo
>
>
> On 05-Dec-04 Peter Dalgaard wrote:
> > (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> writes:
> >
> >> >> x <- runif(500)
> >> >> y <- rbinom(500,size=1,p=plogis(x))
> >> >> xx <- predict(loess(resid(glm(y~x,binomial))~x),se=T)
> >> >> matplot(x,cbind(xx$fit, 2*xx$se.fit, -2*xx$se.fit),pch=20)
> >> >>
> >> >> Not sure my money isn't still on the splines, though.
> > .....
> >> > Serves me right for posting way beyond my bedtime...
> >>
> >> Hi Peter,
> >>
> >> Yes, the above is certainly misleading (try it with 2000 instead
> >> of 500)! But what would you suggest instead?
> >
> > (I did and this little computer came tumbling down...).
>
> So did mine -- but at 5000 (which is the value I first tried):
> lots of disk grinding and then it went "prprprprp" and wrote
> words to the effect "Calloc cannot allocate (18790050 times 4)"
> i.e. it needed 72MB, which bankrupted my 192MB baby.
>
> 2000 was OK, however, but I had plenty of time for a meal etc.
> before it finished.
>
> Which brings up that predict(loess(....)) seems to be very
> memory-hungry.
locfit to the rescue, perhaps?
> library(locfit)
> n <- 5000
> x <- sort(runif(n))
> y <- rbinom(n, size=1, p=plogis(x))
> system.time(xx <- predict(locfit(resid(glm(y~x, binomial))~x),
where="data",
+ se=TRUE), gcFirst=TRUE)
[1] 0.79 0.00 0.84 NA NA
> matplot(x, cbind(xx$fit, 2*xx$se.fit, -2*xx$se.fit), pch=20)
[The plot looks strange...]
This is on my mobile Pentium 1.6GHz w/512MB laptop. Using loess it also ran
out of memory. At n=2000,
the loess route took just under 3 seconds.
Cheers,
Andy
> > Basically, I'd reconsider the type= option to residual.glm.
> As I said,
> > at least type="response" should have the right mean. Ideally, you'd
> > want to take advantage of the fact that the variance of the
> residuals
> > is known too, rather than have the smoother estimate it. The more I
> > think, the more I like the splines...
>
> I'll have a go at your suggestions (if I can get the syntax
> right ... )
>
> Thanks,
> Ted.
>
>
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
> Fax-to-email: +44 (0)870 094 0861 [NB: New number!]
> Date: 06-Dec-04 Time: 00:13:53
> ------------------------------ XFMail ------------------------------
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
>
More information about the R-help
mailing list