[R] Can't import this 4GB DATASET

Fri May 4 16:46:41 CEST 2012

On May 4, 2012, at 1:34 AM, iliketurtles wrote:

> Dear Experienced R Practitioners,
>
> I have 4GB .txt data called "dataset.txt" and have attempted to use  
> *ff,
> bigmemory, filehash and sqldf *packages to import it, but have had no
> success. The readLines output of this data is:
>

Ther alignment of that output makes me wonder if the file is tab- 
speparated. You have considered the possibility that tab is the  
separator but have you actually tried using sep = "\t" in your read  
operations?

-- 
David.
> readLines("dataset.txt",n=20)
> [1] " "
> [2] "
> "
> [3] " "
> [4] "  PERMNO          DATE    SHRCD    COMNAM
> PRC           VOL"
> [5] ""
> [6] "   10001    01/09/1986     11      GREAT FALLS GAS CO
> -5.75000         14160"
> [7] "   10001    01/10/1986     11      GREAT FALLS GAS CO
> -5.87500             0"
> [8] "   10001    01/13/1986     11      GREAT FALLS GAS CO
> -5.87500          2805"
> [9] "   10001    01/14/1986     11      GREAT FALLS GAS CO
> [20] "   10001    01/29/1986     11      GREAT FALLS GAS CO
> -6.06250          4600"
>
> This data goes on for a huge number of rows (not sure exactly how  
> many).
> Each element in each row is separated by and uneven number of (what  
> seem to
> be) spaces (maybe TAB? not sure). Further, there are some rows that  
> are
> "incomplete", i.e. there's missing elements.
>
> Take the first 29 rows of "dataset.txt" into a separate data file,  
> let's
> call it "dataset2.txt".  read.table("dataset2.txt",skip=5) gives the  
> perfect
> table that I want to end up with, except I want it with the 4GB data  
> through
> bigmemory, ff or filehash.

snipped several failed attempts

NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
>
> #Even worse.
> ###/*MY ATTEMPT USING sqldf*/###
> No idea what to do here.
>
> -----

David Winsemius, MD

West Hartford, CT