R: Classical (Metric) Multidimensional Scaling

cmdscale {stats}

R Documentation

Classical (Metric) Multidimensional Scaling

Description

Classical multidimensional scaling (MDS) of a data matrix. Also known as principal coordinates analysis (Gower 1966).

Usage

cmdscale(d, k = 2, eig = FALSE, add = FALSE, x.ret = FALSE,
         list. = eig || add || x.ret)

Arguments

d

a distance structure such as that returned by dist or a full symmetric matrix containing the dissimilarities.

k

the maximum dimension of the space which the data are to be represented in; must be in \{1, 2, \ldots, n-1\}.

eig

indicates whether eigenvalues should be returned.

add

logical indicating if an additive constant c* should be computed, and added to the non-diagonal dissimilarities such that the modified dissimilarities are Euclidean.

x.ret

indicates whether the doubly centred symmetric distance matrix should be returned.

list.

logical indicating if a list should be returned or just the n \times k matrix, see ‘Value:’.

Details

Multidimensional scaling takes a set of dissimilarities and returns a set of points such that the distances between the points are approximately equal to the dissimilarities. (It is a major part of what ecologists call ‘ordination’.)

A set of Euclidean distances on n points can be represented exactly in at most n - 1 dimensions. cmdscale follows the analysis of ⁠Mardia (1978), and returns the best-fitting k-dimensional representation, where k may be less than the argument k.

The representation is only determined up to location (cmdscale takes the column means of the configuration to be at the origin), rotations and reflections. The configuration returned is given in principal-component axes, so the reflection chosen may differ between R platforms (see prcomp).

When add = TRUE, a minimal additive constant c* is computed such that the dissimilarities d_{ij} + c* are Euclidean and hence can be represented in n - 1 dimensions. Whereas S (Becker et al., 1988) computes this constant using an approximation suggested by Torgerson, R uses the analytical solution of ⁠Cailliez (1983), see also Cox and Cox (2001). Note that because of numerical errors the computed eigenvalues need not all be non-negative, and even theoretically the representation could be in fewer than n - 1 dimensions.

Value

If list. is false (as per default), a matrix with k columns whose rows give the coordinates of the points chosen to represent the dissimilarities.

Otherwise, a list containing the following components.

points

a matrix with up to k columns whose rows give the coordinates of the points chosen to represent the dissimilarities.

eig

the n eigenvalues computed during the scaling process if eig is true. NB: versions of R before 2.12.1 returned only k but were documented to return n - 1.

x

the doubly centered distance matrix if x.ret is true.

ac

the additive constant c*, 0 if add = FALSE.

GOF

a numeric vector of length 2, equal to say (g_1,g_2), where g_i = (\sum_{j=1}^k \lambda_j)/ (\sum_{j=1}^n T_i(\lambda_j)), where \lambda_j are the eigenvalues (sorted in decreasing order), T_1(v) = \left| v \right|, and T_2(v) = max( v, 0 ).

References

⁠Becker RA, Chambers JM, Wilks AR (1988). The New S Language. Chapman and Hall/CRC, London.

⁠Cailliez F (1983). “The Analytical Solution of the Additive Constant Problem.” Psychometrika, 48(2), 305–308. doi:10.1007/BF02294026.

⁠Gower JC (1966). “Some Distance Properties of Latent Root and Vector Methods Used in Multivariate Analysis.” Biometrika, 53(3/4), 325. doi:10.2307/2333639.

⁠Krzanowski WJ, Marriott FHC (1994). Multivariate Analysis Part I: Distributions, Ordination and Inference.. Hodder Arnold. ISBN 978-0340593264.
(Especially pages 108–111.)

⁠Mardia KV (1978). “Some Properties of Classical Multi-dimensional Scaling.” Communications in Statistics - Theory and Methods, 7(13), 1233–1241. doi:10.1080/03610927808827707.

⁠Mardia KV, Kent JT, Bibby JM (1979). Multivariate Analysis. Academic Press, London.
Chapter 14

⁠Seber GAF (1984). Multivariate Observations. Wiley, New York. doi:10.1002/9780470316641.

Cox, T. F. and Cox, M. A. A. (2001). Multidimensional Scaling. Second edition. Chapman and Hall.

Torgerson, W. S. (1958). Theory and Methods of Scaling. New York: Wiley.

Examples

require(graphics)

loc <- cmdscale(eurodist)
x <- loc[, 1]
y <- -loc[, 2] # reflect so North is at the top
## note asp = 1, to ensure Euclidean distances are represented correctly
plot(x, y, type = "n", xlab = "", ylab = "", asp = 1, axes = FALSE,
     main = "cmdscale(eurodist)")
text(x, y, rownames(loc), cex = 0.6)

[Package stats version 4.6.0 Index]