[R] Seeking a more efficient way to read in a file
Charilaos Skiadas
cskiadas at gmail.com
Thu Jan 3 02:42:21 CET 2008
On Jan 2, 2008, at 6:05 PM, Talbot Katz wrote:
> Hi.
>
> I have a matrix stored in a large, tab-delimited flat file. The
> first row contains column names. Because the matrix is symmetric,
> the file has lower triangular format, so the second row contains
> one number, the third row two numbers, etc. In general, row k+1
> contains k numbers; the matrix has 3000 rows, so the file has 3001
> rows. The file has variable length records, so each row ends with
> its last piece of data. I read in the file and produced the full
> symmetric matrix as follows:
>
>> mana01 <- scan( file = "C:/mat.dat", sep = "\t", nlines = 1, what
>> = "character" )Read 3000 items> nco <- length( mana01 )> malt <-
>> matrix(0, nrow = nco, ncol = nco )> colnames( malt ) <- mana01>
>> rownames( malt ) <- mana01> for ( i in 1:3000 ) { malt[ i, (1:i) ]
>> <- scan( file="C:/mat.dat", skip = i, n = i, quiet = TRUE ) }
>> mat <- malt + t( malt ) - diag( diag( malt ) )>
>
> The for loop took a couple of hours to complete. I suspect there's
> a much faster way to do this. Any suggestions? Thanks!
I saw Jim's reply just after having just written a solution, so here
is my take on it. The key thing, as Jim mentioned, is to not use scan
each time, but to read the whole thing in and then process it. I read
the lines, used strsplit to get a list of each individual line, and
then used sapply after extending each row by the right number of zeros.
Not sure which of the two is faster.
nms <- scan("~/Desktop/testing.txt", sep="\t", nlines=1,
what=character(0))
x <- scan("~/Desktop/testing.txt", sep="\n", skip=1, what=character
(0)) # read as a vector of lines
splt <- strsplit(x,"\t") # split at the tabs
nr <- length(nms)
splt <- sapply(splt, function(x) c(as.numeric(x), rep(0,nr-length
(x)))) # extend each for by the right number of zeros.
Haris Skiadas
Department of Mathematics and Computer Science
Hanover College
More information about the R-help
mailing list