[R] variable/model selction (step/stepAIC) for biglm ?

Charles C. Berry cberry at tajo.ucsd.edu
Sat Feb 21 19:09:59 CET 2009


On Sat, 21 Feb 2009, Tal Galili wrote:

> Hello dear R mailing list members.
>
> I have recently became curious of the possibility applying model
> selection algorithms (even as simple as AIC) to regressions of large
> datasets.


Large in the sense of many observations, one assumes.

But how large in terms of the number of variables??

If not too many variables, then you can form the regression sums of 
squares for all 2^p combinations of regressors from a biglm() fit of all 
variables as biglm provides coef() and vcov() methods.

If it is large, then you most likely will need to do subsampling to reduce 
the number to 'not too many' via lm() and friends then and apply the above 
strategy.

I searched as best as I could, but couldn't find any
> reference or wrapper for using step or stepAIC to packages such as
> biglm.


Surely any direct implementation of step() would be hopelessly long in 
execution time.


HTH,

Chuck


>
> Any ideas or directions of how to implement such a concept ?
>
>
> Best,
> Tal
>
>
>
>
>
>
>
>
>
> -- 
> ----------------------------------------------
>
>
> My contact information:
> Tal Galili
> Phone number: 972-50-3373767
> FaceBook: Tal Galili
> My Blogs:
> www.talgalili.com
> www.biostatistics.co.il
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901




More information about the R-help mailing list