[R] Dealing With Extremely Large Files
zerfetzen
zerfetzen at yahoo.com
Tue Sep 30 23:18:29 CEST 2008
Thank you Gabor, this is fantastic, easy to use and so powerful. I was
instantly able to many things with .csv files that are much too large for my
PC's memory. This is clearly my new favorite way to read in data, I love
it!
Is it possible to use sqldf with a fixed width format that requires a file
layout?
For example, let's say you have a .dat file called madeup.dat, without a
header row. The hypothetical file madeup.dat for discussion has 3 variables
(state, zipcode, and score), is 10 characters wide, and has 20 rows (again,
just a made-up file).
Here is my fumbling attempt at code that will read in only state and score,
and randomly select 10 obs:
library(sqldf)
# Source pulls in the development version of sqldf.
source("http://sqldf.googlecode.com/svn/trunk/R/sqldf.R")
#Open a connection to that file.
MyConnection <- file("madeup.dat")
# Read in only state and score variables, and randomly select only 10 rows.
MyData <- sqldf("select state,score from MyConnection order by random(*)
limit 10")
# I think everything about this would work, except it should not currently
know which
# columns are to be brought in for the state variable (which would be 1-2),
and that
# the text columns for zipcode (3-7) should be ignored, and finally that
score (text
# columns 8-10) should be included again. If I have overlooked this, I
apologize.
# Thank you.
--
View this message in context: http://www.nabble.com/Dealing-With-Extremely-Large-Files-tp19695311p19750580.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list