[R] "Large" data set: performance issue

Till Baumgaertel till.baumgaertel at epost.de
Tue Apr 2 14:51:26 CEST 2002


hi all,

I've got to import CSV-datasets (with variable-names in the first line)
into data.frames. each is about 12MB (or more!) with 1823 columns and about
500 rows. the first 22 columns are in "character"-mode, the rest is "numeric".

I run R 1.4.1 on a Windows 2000 system.

First I tried read.table() which works fine for a low number of cases (say,
40). with all cases the function does not return within one hour (celeron at 600mhz,
256 MB).

Then I tried scan() which is almost OK.
I scan() the first line for var-names, then the rest. the data-matrix get
transposed and as.data.frame()'ed. 

the problem is converting the last 1801 variabales to "numeric"-mode.

i use the following snippet:
i <- 23;
while( i <= totCols){
	datframe[,i]<-as.numeric(datframe[,i]);
	i <- i + 1;
}

each step takes ~2 secs which makes all in all about an hour.

I suppose I do something really stupid. For reading the data I use
datfull<-scan(filename,sep=",",skip=1,what="character")
which gives me a transposed matrix of my data (variables in rows).

If this wasn't, maybe I could give the "what"-parameter a vector value with
the appropriate variable-types?

Sorry, but I really got stuck and don't know any further.

thanks,
Till







________________________________________
Zeitschriftenabos online bestellen - jetzt neu im Infoboten! http://www.epost.de


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list