[R] Strange behaviour of lars method

Wed Sep 19 18:00:00 CEST 2007

Hi!

When I apply the lars (least-angle-regression) method to my data  
(3655 features, only 355 data points, no I did not mistype), I  
observe a strange behaviour:

1) The beta values tend to grow into real high values quite fast up  
to a point where they overflow and get negative. The overflow is not  
a problem, I don't need the last part of the analysis anyway, but why  
do they just shoot up to high values like that...? Any explanation?

2) The Cp values... they start at about -360 and grow linearly with  
increasing steps. This is totally strange since they ought to be an  
"overly optimistic estimation of the generalization error" according  
to Hastie's book.

3) Lastly, I get a curve for the r^2 correlation values, that grows  
up to a plateau where they are 1 (until they reach the point where  
betas overflow, then it gets negative, but forget about that). This  
is classic overfitting happening. The calculation IS right though,  
since using the components and betas from one of those r^2=1 steps  
gives a correlation of like 0.96 with nu-SVR too. The generalization  
is pretty bad though.

The funny thing: I observe qualitatively the same when starting with  
359 of these features and do a lars on them.

So, questions I have:
* Regarding to point 1 and 2, does anybody have an explanation for  
the described behaviour? I can't seem to find one myself...
* Did anybody try lars on data with such a bad feature to data points  
ratio before? What were the experiences?
* Why does it overfit so bad?

I have also tried the crossvalidation selection (cv.lars) but it does  
not give me the selected features or betas, just the r^2 and RSS  
values from its runs...

Thanks for any thoughts on this!

Ciao!
    Wiebke