[R] Reading large files

Vadlamani, Satish {FLNA} SATISH.VADLAMANI at fritolay.com
Fri Feb 5 02:27:53 CET 2010


Folks:
I am trying to read in a large file. Definition of large is:
Number of lines: 333, 250
Size: 850 MB

The maching is a dual core intel, with 4 GB RAM and nothing else running on it. I read the previous threads on read.fwf and did not see any conclusive statements on how to read fast. Example record and R code given below. I was hoping to purchase a better machine and do analysis with larger datasets - but these preliminary results do not look good.

Does anyone have any experience with large files (> 1GB) and using them with Revolution-R?


Thanks.

Satish

Example Code
key_vec <- c(1,3,3,4,2,8,8,2,2,3,2,2,1,3,3,3,3,9)
key_names <- c("allgeo","area1","zone","dist","ccust1","whse","bindc","ccust2","account","area2","ccust3","customer","allprod","cat","bu","class","size","bdc")
key_info <- data.frame(key_vec,key_names)
col_names <- c(key_names,sas_time$week)
num_buckets <- rep(12,209)
width_vec = c(key_vec,num_buckets)
col_classes<-c(rep("factor",18),rep("numeric",209))
#threewkoutstat <- read.fwf(file="3wkoutstatfcst_file02.dat",widths=width_vec,header=FALSE,colClasses=col_classes,n=100)
threewkoutstat <- read.fwf(file="3wkoutstatfcst_file02.dat",widths=width_vec,header=FALSE,colClasses=col_classes)
names(threewkoutstat) <- col_names

Example record (only one record pasted below)
A004001003799000049250000492599990049999A001002002015002015009        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.60        0.60        0.60        0.70        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00        0.00



More information about the R-help mailing list