[R] Data input performance

Filip Ginter ginter at cs.utu.fi
Thu Jan 24 14:57:21 CET 2002


Dear list,

I'm brand new to R (started using it few days ago...), so sorry for possibly 
stupid question.

Anyways, I'm using R to cluster my data. I do have the dissimilarity matrix 
as a text file, numbers separated by space. It's at its best something like 
2300x2300 matrix.

Now, it seems to me, that the process of importing the matrix into R is 
rather slow. For the peak size of 2300x2300 it takes almost two hours. The 
clustering itself takes a minimum of time when compared to importing the 
data. I have 256MB memory, 900MHz processor PC, Linux (RH7.1). The version of 
R is "Version 1.4.0  (2001-12-19)"

I have tried to follow all the recomendations I found in the documentation, 
so I do something like this: (The file consists of 2300 rows, each containing 
2300 real numbers, separated by space. Nothing else.)

__________________________

library(cluster)
CC<-c("numeric")
T1<-read.table("matrix",nrows=2300,colClasses=CC)
T2<-as.dist(T1)
rm(T1)
T3<-agnes(T2,diss=TRUE)
write.table(T3$merge,file=outfile,quote=FALSE)

___________________________

The CC vector contains the "numeric" only once, as I read that the values are 
"recycled"...

So, is there any room for improvement? Any way to make the data import 
quicker?

Thanks a lot.

Best regards,

Filip

-- 

-----------------------------------------------------------------
Filip Ginter
Ph.D. student

Email: ginter at cs.utu.fi
Phone: +358-2-2154078
Office: 4122, 4th floor
ICQ: 146959496

Turku Centre for Computer Science
Lemminkäisenkatu 14A
20520 Turku
Finland

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list