[R] Bug in stepAIC?

Martin C. Martin martin at martincmartin.com
Thu Oct 12 15:47:10 CEST 2006


Prof Brian Ripley wrote:
> You sent this earlier to R-devel.  Please do see the posting guide! 
> Since you (incorrectly) thought this was a bug in MASS, you should have 
> contacted the maintainer.

Thanks, but I did try emailing both you and Prof. Venables directly a 
month ago.  After not receiving a response, I emailed R-devel last week. 
  After not receiving a response there, I thought perhaps the code was 
correct after all, and I misunderstood how to call it - a perfect 
question for R-help.

There can be a fine line between R-help and R-devel, which is even 
harder to find when you're new to R and you don't really know where the 
problem is.

> On Wed, 11 Oct 2006, Martin C. Martin wrote:
> 
>> Hi,
>>
>> First of all, thanks for the great work on R in general, and MASS in
>> particular.  It's been a life saver for me many times.
>>
>> However, I think I've discovered a bug.  It seems that, when I use
>> weights during an initial least-squares regression fit, and later try to
>> add terms using stepAIC(), it uses the weights when looking to remove
>> terms, but not when looking to add them:
>>
>> hills.lm <- lm(time ~ dist + climb, data = hills, weights = 1/dist2)
> 
> Presumably dist^2?

Yes, sorry, a problem with Thunderbird being a little too smart for it's 
own good.  :)

>> small.hills.lm <- stepAIC(hills.lm)
>> stepAIC(small.hills.lm, time ~ dist + climb)
>>
>> In the first stepAIC(), it says that the AIC for the full "time ~ dist +
>> climb" is 94.41.  Yet, during the second stepAIC, it says adding climb
>> would produce an AIC of 212.1 (and an RSS of 12633.3).  Is this a bug?
> 
> Yes, but not in stepAIC.  Consider
> 
>> drop1(hills.lm)
> Single term deletions
> 
> Model:
> time ~ dist + climb
>        Df Sum of Sq    RSS    AIC
> <none>              437.64  94.41
> dist    1    164.05 601.68 103.55
> climb   1      8.66 446.29  93.10
>> add1(small.hills.lm, time ~ dist + climb)
> Single term additions
> 
> Model:
> time ~ dist
>        Df Sum of Sq     RSS     AIC
> <none>              15787.2   217.9
> climb   1    3153.8 12633.3   212.1
>> stats:::add1.default(small.hills.lm, time ~ dist + climb)
> Single term additions
> 
> Model:
> time ~ dist
>        Df    AIC
> <none>    93.097
> climb   1 94.411
> 
> so the bug is in add1.lm, part of R itself.  Other code has been altered 
> which then broke add1.lm and 'z' needs to be given class "lm".  Now 
> fixed in r-devel and r-patched.

Great; thanks!

Best,
Martin



More information about the R-help mailing list