[R] What is mclust up to? Different clusters found if x and y interchanged

Bryan Hanson hanson at depauw.edu
Tue Apr 20 01:31:03 CEST 2010


Hello All...

I gave a task to my students that involved using mclust to look for clusters
in some bivariate data of isotopes vs various mining locations.  They
discovered something I didn¹t expect; the data (called tur) is appended
below.

p <- qplot(x = dD, y = dCu65, data = tur, color = mine)
print(p) # simple bivariate plot of the data; looks fine

mod1 <- Mclust(tur[,2:3])
mod1$G
mod2 <- Mclust(tur[,3:2])
mod2$G

One can use coordProj to see the clusters found, but the basic result is
that mclust found 3 clusters in mod1, but if you interchange the x and y
columns it finds 7 clusters (mod2).  Since this is bivariate data, I find
this result pretty strange.  The only thing I can think of is that it has
something to do with how the algorithm is seeded, but that isn't too
convincing.  Since I couldn't believe my eyes, I made up some bivariate data
with known clusters (not given here) and interchanged the x and y columns
and got the same clusters.  I'm at a loss.  Hopefully someone can set my
understanding on the right path.

Thanks for any insight!  Bryan

> sessionInfo()
R version 2.10.1 (2009-12-14)
x86_64-apple-darwin9.8.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] datasets  tools     grid      graphics  grDevices utils     stats
methods   base     

other attached packages:
 [1] faraway_1.0.4      GGally_0.1         xtable_1.5-6       mvbutils_2.5.1
ggplot2_0.8.7     
 [6] digest_0.4.2       reshape_0.8.3      proto_0.3-8        ChemoSpec_1.43
R.utils_1.4.0     
[11] R.oo_1.7.1         R.methodsS3_1.2.0  rgl_0.91           lattice_0.18-3
mvoutlier_1.4     
[16] plyr_0.1.9         RColorBrewer_1.0-2 chemometrics_0.8   som_0.3-5
robustbase_0.5-0-1
[21] rpart_3.1-46       pls_2.1-0          pcaPP_1.8          mvtnorm_0.9-9
nnet_7.3-1        
[26] mclust_3.4.4       MASS_7.3-5         lars_0.9-7         e1071_1.5-23
class_7.3-2       

And the data:

> dput(tur)
structure(list(mine = structure(c(13L, 13L, 13L, 13L, 13L, 13L,
13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L,
13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L,
6L, 6L, 6L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L,
9L, 9L, 9L, 9L, 12L, 12L, 12L, 12L, 12L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 4L, 4L, 4L, 4L, 4L, 11L, 11L, 11L, 11L,
11L, 11L, 11L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 5L, 5L, 5L, 5L, 5L, 5L), .Label = c("Cas",
"CH", "CR", "FM", "GU", "KG", "KM", "LV", "MC", "MZ", "N8", "OR",
"SB", "TF"), class = "factor"), dCu65 = c(6.1, 6.5, 5.7, 6.4,
6.1, 5.9, 6.5, 5.7, 6, 6.4, 6.8, 6.9, 6.7, 6.7, 5.8, 5.3, 6.7,
5.7, 6.7, 5.6, 6.4, 6.4, 6.5, 7.6, 6.7, 6.4, 6.4, 7, 7, 5.9,
5.3, 6.1, 11.8, 12.1, 12.4, 1, 1.8, 1.4, 1.2, 1.2, 1.2, 1.1,
1.2, 1.8, 1, 0.8, 0.7, 1.2, 1.5, 1, 0.2, 1.4, 0.3, 1.1, 0, -0.5,
-1.4, -0.2, 1.1, 0.9, 0.1, 1.4, -1.2, -0.1, -0.4, -0.8, 3, 3.3,
3.9, 3.9, 3.7, 9.5, 9.6, 9.6, 8.3, 10.3, 8.4, 9.4, 0.8, 0.9,
1.3, 0.2, 2.2, 0.7, 1.8, 4.5, 4.5, 4.6, 4.5, 4.6, 3.9, 4.2, 4.9,
4.9, 4.5, 16.7, 17.4, 17.5, 17.5, 17.4, 1.9, 2.4, 1.7, 1.2, 1.5,
1.1, 1.8, 3.1, 3.1, 3, 4.1, 4.2, 2.7, 4.5, 0.4, 1.2, 3, 3.5,
1.2, 0.8, 0.9, 2, -0.8, 5.8, 0.6, 1, 0.9, 4.4, 2.3, -0.1), dD = c(-66L,
-81L, -81L, -71L, -68L, -79L, -73L, -77L, -74L, -71L, -73L, -80L,
-76L, -80L, -74L, -77L, -74L, -77L, -74L, -82L, -79L, -72L, -74L,
-71L, -76L, -72L, -76L, -73L, -75L, -78L, -81L, -77L, -77L, -73L,
-79L, -104L, -98L, -98L, -109L, -106L, -99L, -107L, -100L, -98L,
-95L, -95L, -95L, -97L, -98L, -121L, -119L, -118L, -121L, -121L,
-106L, -116L, -122L, -112L, -125L, -109L, -114L, -118L, -102L,
-109L, -101L, -101L, -74L, -78L, -63L, -67L, -75L, -65L, -60L,
-62L, -67L, -59L, -62L, -68L, -61L, -58L, -69L, -57L, -59L, -67L,
-73L, -81L, -90L, -87L, -82L, -80L, -80L, -81L, -83L, -90L, -89L,
-107L, -122L, -128L, -113L, -124L, -93L, -89L, -92L, -85L, -79L,
-84L, -83L, -148L, -142L, -127L, -130L, -123L, -131L, -85L, -87L,
-90L, -81L, -65L, -81L, -119L, -121L, -94L, -90L, -117L, -95L,
-69L, -122L, -103L, -85L, -106L)), .Names = c("mine", "dCu65",
"dD"), class = "data.frame", row.names = c(NA, -130L))



More information about the R-help mailing list