[R] read.table.ffdf and fixed width files

Jan van der Laan rhelp at eoos.dds.nl
Wed Aug 7 21:17:46 CEST 2013


What probably is the problem is that read.table.ffdf uses the nrows 
argument to read the file in chunks. However, read.fwf doesn't use a 
nrow argument but a n argument.

One (non tested) solution is to write a wrapper around read.fwf and pass 
this wrapper to read.table.ffwf. Something like:

my.read.fwf <- function(file, nrow=-1, ...) {
    read.fwf(file=file, n=nrow, ...)
}

Perhaps you'll also need to wrap some additional arguments.


read.fwf is terribly slow for large fixed width files. I would advise to 
use the LaF package in combination with the laf_to_ffwf function from 
the ffbase package. ... Although judging from your other question you 
already looked at that.

HTH,
Jan



On 08/06/2013 10:47 AM, christian.kamenik at astra.admin.ch wrote:
> Dear all
>
> I am working on Windows 7 32-bit, and the ff- package is my daily life-saver to overcome the inherent memory limitations. Recently, I tried using read.table.ffdf to import data from a fixed-width ASCII file (file size: 1'440'865'015 Bytes) with 6'079'455 lines and 32 variables using the command
> read.table.ffdf(file=my.filename, FUN="read.fwf", width=my.format, asffdf_args=list(col_args=list(pattern = my.pattern))
>
> The command generates a temporary file, which has 1'629'328'120 Bytes, plus 32 ff files following my.pattern. The latter 32 files, however, only take up 136'000 Bytes. And the resulting R object has a dimension of 1000 x 32. To me, it seems that read.table.ffdf aborts the data import after 1000 lines, instead of importing the entire file.
>
> I tried running read.table.ffdf with different parameter settings, I was browsing the help pages and the mailing lists, but I did not find any hint on why read.table.ffdf aborts the data import. (Does it really? - The file size of the temporary file suggests that all data were read.)
>
> Any help would be highly appreciated
>
> Best Regard
>
> Christian Kamenik
> Project Manager
>
> Federal Department of the Environment, Transport, Energy and Communications DETEC
> Federal Roads Office FEDRO
> Division Road Traffic
> Road Accident Statistics
>
> Mailing Address: 3003 Bern
> Location: Weltpoststrasse 5, 3015 Bern
>
> Tel +41 31 323 14 89
> Fax +41 31 323 43 21
>
> christian.kamenik at astra.admin.ch<mailto:christian.kamenik at astra.admin.ch>
> www.astra.admin.ch<http://www.astra.admin.ch/>
>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list