[R] Can't import this 4GB DATASET
David Winsemius
dwinsemius at comcast.net
Fri May 4 16:46:41 CEST 2012
On May 4, 2012, at 1:34 AM, iliketurtles wrote:
> Dear Experienced R Practitioners,
>
> I have 4GB .txt data called "dataset.txt" and have attempted to use
> *ff,
> bigmemory, filehash and sqldf *packages to import it, but have had no
> success. The readLines output of this data is:
>
Ther alignment of that output makes me wonder if the file is tab-
speparated. You have considered the possibility that tab is the
separator but have you actually tried using sep = "\t" in your read
operations?
--
David.
> readLines("dataset.txt",n=20)
> [1] " "
> [2] "
> "
> [3] " "
> [4] " PERMNO DATE SHRCD COMNAM
> PRC VOL"
> [5] ""
> [6] " 10001 01/09/1986 11 GREAT FALLS GAS CO
> -5.75000 14160"
> [7] " 10001 01/10/1986 11 GREAT FALLS GAS CO
> -5.87500 0"
> [8] " 10001 01/13/1986 11 GREAT FALLS GAS CO
> -5.87500 2805"
> [9] " 10001 01/14/1986 11 GREAT FALLS GAS CO
> [20] " 10001 01/29/1986 11 GREAT FALLS GAS CO
> -6.06250 4600"
>
> This data goes on for a huge number of rows (not sure exactly how
> many).
> Each element in each row is separated by and uneven number of (what
> seem to
> be) spaces (maybe TAB? not sure). Further, there are some rows that
> are
> "incomplete", i.e. there's missing elements.
>
> Take the first 29 rows of "dataset.txt" into a separate data file,
> let's
> call it "dataset2.txt". read.table("dataset2.txt",skip=5) gives the
> perfect
> table that I want to end up with, except I want it with the 4GB data
> through
> bigmemory, ff or filehash.
snipped several failed attempts
NA NA NA NA NA NA NA NA NA NA NA NA NA
>
> #Even worse.
> ###/*MY ATTEMPT USING sqldf*/###
> No idea what to do here.
>
> -----
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list