[R] Dealing With Extremely Large Files

zerfetzen zerfetzen at yahoo.com
Tue Sep 30 23:18:29 CEST 2008


Thank you Gabor, this is fantastic, easy to use and so powerful.  I was
instantly able to many things with .csv files that are much too large for my
PC's memory.  This is clearly my new favorite way to read in data, I love
it!

Is it possible to use sqldf with a fixed width format that requires a file
layout?

For example, let's say you have a .dat file called madeup.dat, without a
header row.  The hypothetical file madeup.dat for discussion has 3 variables
(state, zipcode, and score), is 10 characters wide, and has 20 rows (again,
just a made-up file).

Here is my fumbling attempt at code that will read in only state and score,
and randomly select 10 obs:

library(sqldf)

# Source pulls in the development version of sqldf.
source("http://sqldf.googlecode.com/svn/trunk/R/sqldf.R")

#Open a connection to that file.
MyConnection <- file("madeup.dat")

# Read in only state and score variables, and randomly select only 10 rows.
MyData <- sqldf("select state,score from MyConnection order by random(*)
limit 10")

# I think everything about this would work, except it should not currently
know which
# columns are to be brought in for the state variable (which would be 1-2),
and that
# the text columns for zipcode (3-7) should be ignored, and finally that
score (text
# columns 8-10) should be included again.  If I have overlooked this, I
apologize.
# Thank you.
-- 
View this message in context: http://www.nabble.com/Dealing-With-Extremely-Large-Files-tp19695311p19750580.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list