[R] Handling 8GB .txt file in R?

Rainer M Krug r.m.krug at gmail.com
Mon Mar 26 10:16:13 CEST 2012


On 24/03/12 09:08, iliketurtles wrote:
> Hi,
> 
> I am mediocre at R, maybe 1000 hours experience, but I received an 8GB
> dataset and I don't know what to do with it. I have to do extensive analysis
> over it for my Honours thesis. 
> 
> I can't even import it. I've tried;
> - Splitting it up using the free csv-splitter-1.1.zip that seems to be
> working for everyone else (it doesn't work for me, it just outputs 1 single
> line).
> - Splitting it with Text Splitter doesn't work because you have to load it
> into memory first.
> - Importing using BigMemory's big.matrix(), however my computer just
> freezes.
> - Importing using ff's read.table.ffdf(), however I get the error message 
> " in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
>   line 5 did not have 9 elements"
> 
> Thanks for any ideas and assistance. 

1) you should look if you really need to load the complete dataset - you might be able to load a
subset, sample it for the analysis, discard columns, ... There are many things possible

2) With csv files this size, it usually pays off to covert them into a database - sqlite coming to
mind as an easy to use one with sql support to select columns and rows to load. sqlite has a tool to
import a csv file into a sqlite database.

Concerning the general format of the csv, see the other suggestions.

Cheers,

Rainer



> 
> Can R do this on a computer with 4 GB of memory and a dual core i5xx ?
> 
> -----
> ----
> 
> Isaac
> Research Assistant
> Quantitative Finance Faculty, UTS
> --
> View this message in context: http://r.789695.n4.nabble.com/Handling-8GB-txt-file-in-R-tp4500971p4500971.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Stellenbosch University
South Africa

Tel :       +33 - (0)9 53 10 27 44
Cell:       +33 - (0)6 85 62 59 98
Fax :       +33 - (0)9 58 10 27 44

Fax (D):    +49 - (0)3 21 21 25 22 44

email:      Rainer at krugs.de

Skype:      RMkrug



More information about the R-help mailing list