[R] unexpected GAM result - at least for me!

Wed Apr 2 13:30:14 CEST 2008

You may want to plot your smooth terms:

plot(can3.gam,residuals=TRUE,pch=1). 

The 7 and 4 estimated degrees of freedom on the two middle terms can give
you a quite curvy smooth term, and you might overfit the data (as mentioned
before by somebody else). Also, you may want to look at the correlation
between the smoothing variables. Compute the correlation matrix as a first
step and plot each of the variables against the others, which better allows
you identifying nonlinear dependencies. If one of these relationships is
nearly perfect you may face serious issues due to multicollinearity. 

I am sorry if I am doubling somebody's earlier response.

Cheers,
Daniel

-------------------------
cuncta stricte discussurus
-------------------------

-----Ursprüngliche Nachricht-----
Von: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Im
Auftrag von Monica Pisica
Gesendet: Tuesday, April 01, 2008 2:44 PM
An: Duncan Murdoch
Cc: r-help at r-project.org
Betreff: Re: [R] unexpected GAM result - at least for me!

Hi,

I've compared observed and predicted and they match 100%.

For 90% probability of occurrence:

table(can>0,fitted(can3.gam)>0.9)

        FALSE TRUE

  FALSE    23    0

  TRUE      0  125

So i guess it is a valid result ..... but very unexpected for me.

Thank you again for all the help,

Monica

> Date: Mon, 31 Mar 2008 09:30:01 -0400
> From: murdoch at stats.uwo.ca
> To: pisicandru at hotmail.com
> CC: r-help at r-project.org
> Subject: Re: [R] unexpected GAM result - at least for me!
>
> On 3/31/2008 9:01 AM, Monica Pisica wrote:
>> Thanks Duncan.
>>
>> Yes i do have variation in the lidar metrics (be, ch, crr, and home) 
>> although i have a quite high correlation between ch and home. But 
>> even if i eliminate one metric (either ch or home) i end up with a 
>> deviation of 99.99. The species has values of 0 and 1 since i try to 
>> predict presence / absence.
>>
>> Do you think it is still a valid result?
>
> I repeat: look at the data. Compare the observed and predicted. That's 
> the only way to know whether this is reasonable or not.
>
> If you're getting reasonable predictions, then it's a valid fit. (The 
> tests and approximations used in the reported p-values may not be at 
> all valid. I don't know what the requirements are for those in a GAM, 
> but if you're getting a perfect fit, then they probably aren't being 
> met.)
>
> Duncan Murdoch
>
>
>>
>> Thanks again,
>>
>> Monica
>>
>>> Date: Mon, 31 Mar 2008 08:47:48 -0400
>>> From: murdoch at stats.uwo.ca
>>> To: pisicandru at hotmail.com
>>> CC: r-help at r-project.org
>>> Subject: Re: [R] unexpected GAM result - at least for me!
>>>
>>> On 3/31/2008 8:34 AM, Monica Pisica wrote:
>>>>
>>>> Hi
>>>>
>>>>
>>>> I am afraid i am not understanding something very fundamental....
>> and does not matter how much i am looking into the book "Generalized 
>> Additive Models" of S. Wood i still don't understand my result.
>>>>
>>>> I am trying to model presence / absence (presence = 1, absence = 0)
>> of a species using some lidar metrics (i have 4 of these). I am using 
>> different models and such .... and when i used gam i got this very 
>> weird (for me) result which i thought it is not possible - or i have 
>> no idea how to interpret it.
>>>>
>>>>> can3.gam <- gam(can>0~s(be)+s(crr)+s(ch)+s(home), family = 
>>>>> 'binomial')
>>>>> summary(can3.gam)
>>>> Family: binomial
>>>> Link function: logit
>>>> Formula:
>>>> can> 0 ~ s(be) + s(crr) + s(ch) + s(home)
>>>> Parametric coefficients:
>>>> Estimate Std. Error z value Pr(>|z|)
>>>> (Intercept) 85.39 162.88 0.524 0.6
>>>> Approximate significance of smooth terms:
>>>> edf Est.rank Chi.sq p-value
>>>> s(be) 1.000 1 0.100 0.751
>>>> s(crr) 3.929 8 0.380 1.000
>>>> s(ch) 6.820 9 0.396 1.000
>>>> s(home) 1.000 1 0.314 0.575
>>>> R-sq.(adj) = 1 Deviance explained = 100% UBRE score = -0.81413 
>>>> Scale est. = 1 n = 148
>>>>
>>>> Is this a perfect fit with no statistical significance, an
>> over-estimating or what???? It seems that the significance of the 
>> smooths terms is "null". Of course with such a model i predict 
>> perfectly presence / absence of species.
>>>>
>>>> Again, i hope you don't mind i'm asking you this. Any explanation
>> will be very much appreciated.
>>>
>>> Look at the data. You can get a perfect fit to a logistic regression 
>>> model fairly easily, and it looks as though you've got one. (In 
>>> fact, the huge intercept suggests that all predictions will be 1. Do 
>>> you actually have any variation in the data?)
>>>
>>> Duncan Murdoch
>>
>>
>> In a rush? Get real-time answers with Windows Live Messenger.
>> 
>

_________________________________________________________________

esh_instantaccess_042008
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.