[R] List of lists? Data frames? (Or other data structures?)
Peter Dalgaard BSA
p.dalgaard at biostat.ku.dk
Thu May 1 14:19:32 CEST 2003
"R A F" <raf1729 at hotmail.com> writes:
> Thanks for your comments. I'm not too familiar with these differences,
> but here's a simple experiment. In a data file with 139,000 rows and
> 5 columns (double string double double double),
>
> >system.time( aaa <- read.table( "file" ) )
> 20.67 0.41 21.10 0.00 0.00
>
> >system.time( aaa <- scan( "file", list( 0, "", 0, 0, 0 ) ) )
> 6.07 0.01 6.09 0.00 0.00
>
> It seems like scan is much faster -- and as the data file grows,
> read.table seems to choke. (I actually tried this with a data file
> with over 2 million rows.)
You're not taking Brian's hint!:
> >Only if you don't specify colClasses: if you do (and you would need the
> >information to use scan()) there should be no performance penalty. (Note
> >that matrices can be scan()-ed into a vector and the dimensions added, and
> >that will be faster.)
Try this:
cls <- sapply(list(0,"",0,0,0),class)
# older versions may need cls <- c("numeric","character",rep("numeric",3))
aaa <- read.table( "file", colClasses=cls )
--
O__ ---- Peter Dalgaard Blegdamsvej 3
c/ /'_ --- Dept. of Biostatistics 2200 Cph. N
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list