[R] Compare two distance matrices

bady@univ-lyon1.fr bady at univ-lyon1.fr
Thu Oct 6 14:39:27 CEST 2005


Hi, hi all,



> I am trying to compare two distance matrices with R. I would like to
> create a XY plot of these matrices and do some linear regression on
> it. But, I am a bit new to R, so i have a few questions (I searched in
> the documentation with no success).
> The first problem is loading a distance matrix into R. This matrix is
> the output of a the Phylip program Protdist and lookes like this:
> I tried with the scan() function to load the files, but with no
> success. How should i load in these files? ....
>


you can separately load each matrix with two text files.

require(ade4)
mat1 <- read.table("mat1.txt")
nam1 <- mat1[,1]
mat1 <- mat1[,-1]
row.names(mat1) <- names(mat1) <- nam1
mat2 <- read.table("mat2.txt")
nam2 <- mat2[,1]
mat2 <- mat2[,-1]
row.names(mat2) <- names(mat2) <- nam2

dist1 <- mat2dist(mat1)
dist2 <- mat2dist(mat2)

# with mat1:
# n_crassa    0.000000  0.690737  0.895257  0.882576  2.365386
# c_neufor    0.690737  0.000000  0.956910  0.979988  2.103041
# a_thaliana  0.895257  0.956910  0.000000  1.003668  2.724847
# pompep      0.882576  0.979988  1.003668  0.000000  2.065202
# s_cerevis   2.365386  2.103041  2.724847  2.065202  0.000000

# and mat2:
# n_crassa    0.000000  0.739560  0.933986  0.861644  2.207467
# c_neufor    0.739560  0.000000  0.988779  0.925168  1.941141
# a_thaliana  0.933986  0.988779  0.000000  1.007803  2.415320
# pompep      0.861644  0.925168  1.007803  0.000000  2.394490
# s_cerevis   2.207467  1.941141  2.415320  2.394490  0.000000

I think that it’s the more simple solution, NOT the optimal solution. Maybe,
there is an interface to read Phylip file in Bioconductor project(?).

To transform matrix to dist, you can use the function mat2dist of the library
ade4 (see example)

To compare 2 distances matrices, there are many possibility  ! For example, if
the distances matrices are Euclidian, you can used directly principal
coordinate analyses (dudi.pco), and co-inertia analysis or procuste
analysis,etc ...

# examples:


# Data preparation
require(ade4)
mat1 <- read.table("mat1.txt")
nam1 <- mat1[,1]
mat1 <- mat1[,-1]
row.names(mat1) <- names(mat1) <- nam1
mat2 <- read.table("mat2.txt")
nam2 <- mat2[,1]
mat2 <- mat2[,-1]
row.names(mat2) <- names(mat2) <- nam2

? mat2dist
dist1 <- mat2dist(mat1)
dist2 <- mat2dist(mat2)

dist1
dist2

is.euclid(dist1)
is.euclid(dist2)
# cool the distances matrices are euclidian :)

# to compare the 2 matrices
# example 1 :  mantel test
? mantel.randtest
mt1 <- mantel.randtest(dist1,dist2,nrepet=10)
plot(mt1)

# Example 2: coefficient of vectorial correlation (Escoufier, 1973)
? RVdist.randtest
RV1 <- RVdist.randtest(dist1,dist2,nrepet=10)
plot(RV1)

# Example 3: coinertia analysis
?dudi.pco
# ?cmdscale
?coinertia
?randtest.coinertia

pco1 <- dudi.pco(dist1,nf=3,scannf=F)
pco2 <- dudi.pco(dist2,nf=3,scannf=F)
co1 <- coinertia(pco1,pco2,nf=3,scannf=F)
plot(co1)
testco1 <-randtest.coinertia(co1,nrepet=10)
plot(testco1)

# Example 4: procuste analysis
? procuste
? procuste.randtest
pco1 <- dudi.pco(dist1,nf=3,scannf=F)
pco2 <- dudi.pco(dist2,nf=3,scannf=F)
proc1 <- procuste(pco1$tab,pco2$tab)
plot(proc1)
testproc1 <-procuste.randtest(pco1$tab,pco2$tab,nrepet=10)
plot(testproc1)

....

the choice depend to your outcome.


hopes this help


P.BADY




More information about the R-help mailing list