[R] Help troubleshooting silent failure reading huge file with read.delim

jim holtman jholtman at gmail.com
Wed Oct 6 18:08:38 CEST 2010


Beside the mismatched quotes, another I had with a file is some
illegal characters (0x1A in this case) signaled an end of read when
reading.

On Wed, Oct 6, 2010 at 3:15 AM, Earl F. Glynn <efglynn at gmail.com> wrote:
>
> I am trying to read a tab-delimited 1.25 GB file of 4,115,119 records each
> with 52 fields.
>
> I am using R 2.11.0 on a 64-bit Windows 7 machine with 8 GB memory.
>
> I have tried the two following statements with the same results:
>
> d <- read.delim(filename, as.is=TRUE)
>
> d <- read.delim(filename, as.is=TRUE, nrows=4200000)
>
> I have tried starting R with this parameter but that changed nothing:
> --max-mem-size=6GB
>
> Everything appeared to have worked fine until I studied frequency counts of
> the fields and realized data were missing.
>
>> dim(d)
> [1] 3388444      52
>
> R read 3,388,444 records and missed 726,754 records.  There were no error
> messages or exceptions.  I plotted a chart using the data and later
> discovered not all the data were represented in the chart.
>
> R didn't just read the first 3,388,444 records and quit.
>
> Here's what I believe happened (based on frequency counts of the first field
> in the data.frame from R, and independently from another source):
> * R read the first 1,866,296 records and then skipped 419,340 records.
> * Next, R read 1,325,552 records and skipped 307,414 records.
> * R read the last 196,596 records without any problems.
>
> Questions:
>
> Is there some memory-related parameter that I should adjust that might
> explain the observed details above?
>
> Shouldn't read.delim catch this failure instead of being silent about
> dropping data?
>
> Thanks for any help with this.
>
> Earl F Glynn
> Overland Park, KS
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?



More information about the R-help mailing list