[R] big data

Greg Snow Greg.Snow at imail.org
Wed Sep 8 19:05:56 CEST 2010


In addition to Dirks advice about the biglm package, you may also want to look at the RSQLite and SQLiteDF packages which may make dealing with the large dataset faster and easier, especially for passing the chunks to bigglm.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of André de Boer
> Sent: Wednesday, September 08, 2010 5:27 AM
> To: r-help at r-project.org
> Subject: [R] big data
> 
> Hello,
> 
> I searched the internet but i didn't find the answer for the next
> problem:
> I want to do a glm on a csv file consisting of 25 columns and 4 mln
> rows.
> Not all the columns are relevant. My problem is to read the data into
> R.
> Manipulate the data and then do a glm.
> 
> I've tried with:
> 
> dd<-scan("myfile.csv",colClasses=classes)
> dat<-as.data.frame(dd)
> 
> My question is: what is the right way to do is?
> Can someone give me a hint?
> 
> Thanks,
> Arend
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list