[R] List of lists? Data frames? (Or other data structures?)

Peter Dalgaard BSA p.dalgaard at biostat.ku.dk
Thu May 1 14:19:32 CEST 2003


"R A F" <raf1729 at hotmail.com> writes:

> Thanks for your comments.  I'm not too familiar with these differences,
> but here's a simple experiment.  In a data file with 139,000 rows and
> 5 columns (double string double double double),
> 
> >system.time( aaa <- read.table( "file" ) )
> 20.67 0.41 21.10 0.00 0.00
> 
> >system.time( aaa <- scan( "file", list( 0, "", 0, 0, 0 ) ) )
> 6.07 0.01 6.09 0.00 0.00
> 
> It seems like scan is much faster -- and as the data file grows,
> read.table seems to choke.  (I actually tried this with a data file
> with over 2 million rows.)

You're not taking Brian's hint!:

> >Only if you don't specify colClasses: if you do (and you would need the
> >information to use scan()) there should be no performance penalty. (Note
> >that matrices can be scan()-ed into a vector and the dimensions added, and
> >that will be faster.)

Try this:

cls <- sapply(list(0,"",0,0,0),class)
# older versions may need cls <- c("numeric","character",rep("numeric",3))
aaa <- read.table( "file", colClasses=cls )

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907



More information about the R-help mailing list