[R] Can't import this 4GB DATASET

Jan van der Laan rhelp at eoos.dds.nl
Sat May 5 14:59:24 CEST 2012


Perhaps you could contact the persons that supplied/created the file and 
ask them what the format of the file exactly is. That is probably the 
safest thing to do.

If you are sure that the lines containing only whitespace are 
meaningless, then you could alter the previous code to make a copy of 
the file containing only lines with a length equal to 97 characters (you 
can do this by changing the '!=' to '==').

Since all lines are then of equal length, I suspect you have fixed width 
file. You could open and read this file using the LaF package 
(http://cran.r-project.org/web/packages/LaF/index.html; see the manual 
vignette for more information). In the package ffbase 
(http://cran.r-project.org/web/packages/ffbase/index.html) is a function 
to convert from LaF to ff (laf_to_ffdf). I do not known if packages such 
as rsqlite or bigmemory can import fixed width files.

The warning message indicates that the last line does not end with a new 
line character which could indicate an incomplete file but often doesn't 
mean anything. You could check the last line of the file to be sure.

HTH,

Jan



On 05/05/2012 05:21 AM, iliketurtles wrote:
> Your code works!
>
> strangelines.txt was created, and it's a text file with just spacebars ...
> Seems like a few thousand lines of complete blanks (not 1 non-blank entry).
>
> One thing, when I ran your code there was an error message;
>
>> setwd("C:/Users/admin/Desktop/hons/Thesis")
>> con<- file("dataset.txt", "rt")
>> out<- file("strangelines.txt", "wt")
>> # skip first 5 lines
>> lines<- readLines(con, n=5)
>> # read the rest in blocks of 100.000 lines
>> while (TRUE) {
> +     lines<- readLines(con, n=1E5)
> +     if (length(lines) == 0) break;
> +     strangelines<- lines[nchar(lines) != 97]
> +     writeLines(strangelines, con=out)
> + }
> Warning message:
> In readLines(con, n = 1e+05) : incomplete final line found on 'dataset.txt'
>
>
>
>
> I'm really not sure where to go from here. This has gone way out of my
> depth.
>
> -----
> ----
>
> Isaac
> Research Assistant
> Quantitative Finance Faculty, UTS
> --
> View this message in context: http://r.789695.n4.nabble.com/Can-t-import-this-4GB-DATASET-tp4607862p4610446.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list