[R] Can't import this 4GB DATASET
Jan van der Laan
rhelp at eoos.dds.nl
Fri May 4 21:01:20 CEST 2012
OK, not all, but most lines have the same length. Perhaps you could
write the lines with a different line size to a separate file to have
a closer look at those lines. Modifying the previous code (again not
tested):
con <- file("dataset.txt", "rt")
out <- file("strangelines.txt", "wt")
# skip first 5 lines
lines <- readLines(con, n=5)
# read the rest in blocks of 100.000 lines
while (TRUE) {
lines <- readLines(con, n=1E5)
if (length(lines) == 0) break;
strangelines <- lines[nchar(lines) != 97]
writeLines(strangelines, con=out)
}
close(con)
close(out)
Jan
Quoting iliketurtles <isaacm200 at gmail.com>:
> Jan, thank you.
>
>> table(line_sizes)
> line_sizes
> 0 1 97 256
> 1430 2860 46869069 1430
>
> -----
> ----
>
> Isaac
> Research Assistant
> Quantitative Finance Faculty, UTS
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Can-t-import-this-4GB-DATASET-tp4607862p4608172.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list