[R] distance matrix as text file - how to import?
Hans-Joerg Bibiko
bibiko at eva.mpg.de
Wed Apr 9 09:26:12 CEST 2008
> On Tue, Apr 8, 2008 at 1:50 PM, Hans-Jörg Bibiko <bibiko at eva.mpg.de>
> wrote:
>> I was sent a text file containing a distance matrix à la:
>>
>> 1
>> 2 3
>> 4 5 6
Thanks a lot for your hints.
At the end all hints ends up more or less in my "stony" way to do it.
Let me summarize it.
The clean way is to initialize a matrix containing my distance matrix
and generate a dist object by using as.dist(mat).
Fine. But how to read the text data (triangular) into a matrix?
#1 approach - using 'read.table'
mat = read.table("test.txt", fill=T)
The problem here is that the first line doesn't contain the correct
number of columns of my matrix, thus 'read.table' sets the number of
columns to 5 as default.
Ergo I have to know the number of columns (num_cols) in beforehand in
order to do this:
mat = read.table("test.txt", fill=T, col.names=rep('', num_cols))
Per definitionem the last line of "test.txt" contains the correct
number of columns.
On a UNIX/Mac you can do the following:
num_cols <- as.numeric(system("tail -n 1 'test.txt' | wc -
w",intern=TRUE))
In other words, read the last line of 'test.txt' and count the number
of words if the delimiter is a space. Or one could use 'readLines' and
split the last array element to get num_cols.
#2 approach - using 'scan()'
mat = matrix(0, num_cols, num_cols)
mat[row(mat) >= col(mat)] <- scan("test.txt")
But this also leads to my problem:
1
2 4
3 5 6
instead of
1
2 3
4 5 6
==== one solution ============
The approach #2 has two advantages: it's faster than read.table AND I
can calculate num_cols. The only problem is the correct order. But
this is solvable via: reading the data into the upper triangle and
transpose the matrix
mat <- matrix(0, num_cols, num_cols)
mat[row(mat) <= col(mat)] <- scan("test.txt")
mat <- t(mat)
Next. If I know that my text file really contains a distance matrix
(i.e. the diagonals have been removed) then I can do the following:
data <- scan("test.txt")
num_cols <- (1 + sqrt(1 + 8*length(data)))/2 - 1
mat <- matrix(0, num_cols, num_cols)
mat[row(mat) <= col(mat)] <- data
mat <- t(mat)
#Finally to get a 'dist' object:
mat <- rbind(0, mat)
mat <- cbind(mat, 0)
dobj <- as.dist(mat)
Again, thanks a lot!
--Hans
More information about the R-help
mailing list