[R] ff usage for glm

Thomas Lumley tlumley at uw.edu
Tue Apr 3 21:51:19 CEST 2012


On Tue, Apr 3, 2012 at 1:42 AM, Benilton Carvalho
<beniltoncarvalho at gmail.com> wrote:
> Did you try the example described on the ff man page?

Also, the last error message you report happens when chunks of the
data set give design matrices that don't line up correctly.  You said
you added one new variable, but there are actually two new variables
in the formula you show, compared to the previous run you showed.

If you only have 32bit R then doing this from a data frame is not
going to be efficient, and you do want to use ff (or SQLite or
something).  You may also want to decrease the chunk size from the
default -- 5000 observations at a time might be too much.

Incidentally, putting ATTN: Thomas Lumley on a nabble post would be
counterproductive if I read nabble, but since I don't it's completely
pointless.

    -thomas


> On Monday, April 2, 2012, Bond, Stephen wrote:
>
>> Thomas,
>>
>> I tried biglm and it does not work see
>>
>>
>> http://r.789695.n4.nabble.com/unable-to-get-bigglm-working-ATTN-Thomas-Lumley-td2276524.html#a2278381
>>
>> . There are other posts from people who cannot get biglm working and
>> others who get strange results.
>> Please, advise if you can help.
>> I have row based native code which works, but it is inconvenient as it
>> does not produce an R object, which can be fed to anova etc. offered it to
>> the developer forum, but message is still waiting for mod approval.
>> regards
>>
>> Stephen B
>>
>> -----Original Message-----
>> From: Thomas Lumley [mailto:tlumley at uw.edu <javascript:;>]
>> Sent: Friday, March 30, 2012 7:32 PM
>> To: Bond, Stephen
>> Cc: r-help at r-project.org <javascript:;>
>> Subject: Re: [R] ff usage for glm
>>
>> On Sat, Mar 31, 2012 at 9:05 AM, Bond, Stephen <Stephen.Bond at cibc.com<javascript:;>>
>> wrote:
>> > Greetings useRs,
>> >
>> > Can anyone provide an example how to use ff to feed a very large data
>> frame to glm?
>> > The data.frame cannot be loaded in R using conventional read.csv as it
>> is too big.
>> >
>> > glm(...,data=ff.file) ??
>> >
>>
>> I shouldn't think glm() will work on data that are too big to read into R.
>>  However, bigglm() from the biglm package should work.  You just need to
>> write a function that supplies chunks of data from ff.file as requested
>> (see the example on ?bigglm).  I haven't used ff, but it looks from the
>> documentation as though chunk() will do all the difficult parts.
>>
>>  -thomas
>>
>> --
>> Thomas Lumley
>> Professor of Biostatistics
>> University of Auckland
>>
>> ______________________________________________
>> R-help at r-project.org <javascript:;> mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland



More information about the R-help mailing list