[R] Can't import this 4GB DATASET
Jan van der Laan
rhelp at eoos.dds.nl
Sat May 5 14:59:24 CEST 2012
Perhaps you could contact the persons that supplied/created the file and
ask them what the format of the file exactly is. That is probably the
safest thing to do.
If you are sure that the lines containing only whitespace are
meaningless, then you could alter the previous code to make a copy of
the file containing only lines with a length equal to 97 characters (you
can do this by changing the '!=' to '==').
Since all lines are then of equal length, I suspect you have fixed width
file. You could open and read this file using the LaF package
(http://cran.r-project.org/web/packages/LaF/index.html; see the manual
vignette for more information). In the package ffbase
(http://cran.r-project.org/web/packages/ffbase/index.html) is a function
to convert from LaF to ff (laf_to_ffdf). I do not known if packages such
as rsqlite or bigmemory can import fixed width files.
The warning message indicates that the last line does not end with a new
line character which could indicate an incomplete file but often doesn't
mean anything. You could check the last line of the file to be sure.
HTH,
Jan
On 05/05/2012 05:21 AM, iliketurtles wrote:
> Your code works!
>
> strangelines.txt was created, and it's a text file with just spacebars ...
> Seems like a few thousand lines of complete blanks (not 1 non-blank entry).
>
> One thing, when I ran your code there was an error message;
>
>> setwd("C:/Users/admin/Desktop/hons/Thesis")
>> con<- file("dataset.txt", "rt")
>> out<- file("strangelines.txt", "wt")
>> # skip first 5 lines
>> lines<- readLines(con, n=5)
>> # read the rest in blocks of 100.000 lines
>> while (TRUE) {
> + lines<- readLines(con, n=1E5)
> + if (length(lines) == 0) break;
> + strangelines<- lines[nchar(lines) != 97]
> + writeLines(strangelines, con=out)
> + }
> Warning message:
> In readLines(con, n = 1e+05) : incomplete final line found on 'dataset.txt'
>
>
>
>
> I'm really not sure where to go from here. This has gone way out of my
> depth.
>
> -----
> ----
>
> Isaac
> Research Assistant
> Quantitative Finance Faculty, UTS
> --
> View this message in context: http://r.789695.n4.nabble.com/Can-t-import-this-4GB-DATASET-tp4607862p4610446.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list