[R] "Large" data set: performance issue
Till Baumgaertel
till.baumgaertel at epost.de
Tue Apr 2 14:51:26 CEST 2002
hi all,
I've got to import CSV-datasets (with variable-names in the first line)
into data.frames. each is about 12MB (or more!) with 1823 columns and about
500 rows. the first 22 columns are in "character"-mode, the rest is "numeric".
I run R 1.4.1 on a Windows 2000 system.
First I tried read.table() which works fine for a low number of cases (say,
40). with all cases the function does not return within one hour (celeron at 600mhz,
256 MB).
Then I tried scan() which is almost OK.
I scan() the first line for var-names, then the rest. the data-matrix get
transposed and as.data.frame()'ed.
the problem is converting the last 1801 variabales to "numeric"-mode.
i use the following snippet:
i <- 23;
while( i <= totCols){
datframe[,i]<-as.numeric(datframe[,i]);
i <- i + 1;
}
each step takes ~2 secs which makes all in all about an hour.
I suppose I do something really stupid. For reading the data I use
datfull<-scan(filename,sep=",",skip=1,what="character")
which gives me a transposed matrix of my data (variables in rows).
If this wasn't, maybe I could give the "what"-parameter a vector value with
the appropriate variable-types?
Sorry, but I really got stuck and don't know any further.
thanks,
Till
________________________________________
Zeitschriftenabos online bestellen - jetzt neu im Infoboten! http://www.epost.de
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list