[R] distance matrix as text file - how to import?

Hans-Joerg Bibiko bibiko at eva.mpg.de
Wed Apr 9 09:26:12 CEST 2008


> On Tue, Apr 8, 2008 at 1:50 PM, Hans-Jörg Bibiko <bibiko at eva.mpg.de>  
> wrote:
>> I was sent a text file containing a distance matrix à la:
>>
>> 1
>> 2 3
>> 4 5 6

Thanks a lot for your hints.

At the end all hints ends up more or less in my "stony" way to do it.

Let me summarize it.

The clean way is to initialize a matrix containing my distance matrix  
and generate a dist object by using as.dist(mat).
Fine. But how to read the text data (triangular) into a matrix?

#1 approach - using 'read.table'

mat = read.table("test.txt", fill=T)

The problem here is that the first line doesn't contain the correct  
number of columns of my matrix, thus 'read.table' sets the number of  
columns to 5 as default.
Ergo I have to know the number of columns (num_cols) in beforehand in  
order to do this:

mat = read.table("test.txt", fill=T, col.names=rep('', num_cols))

Per definitionem the last line of "test.txt" contains the correct  
number of columns.
On a UNIX/Mac you can do the following:

num_cols <- as.numeric(system("tail -n 1 'test.txt' | wc - 
w",intern=TRUE))

In other words, read the last line of 'test.txt' and count the number  
of words if the delimiter is a space. Or one could use 'readLines' and  
split the last array element to get num_cols.

#2 approach - using 'scan()'

mat = matrix(0, num_cols, num_cols)
mat[row(mat) >= col(mat)] <- scan("test.txt")

But this also leads to my problem:
1
2 4
3 5 6

instead of
1
2 3
4 5 6

==== one solution ============

The approach #2 has two advantages: it's faster than read.table AND I  
can calculate num_cols. The only problem is the correct order. But  
this is solvable via: reading the data into the upper triangle and  
transpose the matrix

mat <- matrix(0, num_cols, num_cols)
mat[row(mat) <= col(mat)] <- scan("test.txt")
mat <- t(mat)


Next. If I know that my text file really contains a distance matrix  
(i.e. the diagonals have been removed) then I can do the following:

data <- scan("test.txt")
num_cols <- (1 + sqrt(1 + 8*length(data)))/2 - 1
mat <- matrix(0, num_cols, num_cols)
mat[row(mat) <= col(mat)] <- data
mat <- t(mat)

#Finally to get a 'dist' object:

mat <- rbind(0, mat)
mat <- cbind(mat, 0)
dobj <- as.dist(mat)


Again, thanks a lot!

--Hans



More information about the R-help mailing list