[R] mda and kmeans
avanisco at univ-fcomte.fr
avanisco at univ-fcomte.fr
Wed Aug 15 11:24:35 CEST 2007
Hello,
I am using the function mda of the mda library in order to discriminate
4 groups with 8 explanatory variables. I only have 66 observations.
I tested all possible combinations of those variable and run for each
the Mixture Discriminant Analysis.
For some iterations, I got an error message: "error in kmeans(xx,
start): initial centers are not distinct".
I understood that the function kmeans() called by mda() choose randomly
the initial centers for starting the clustering procedure.
As I aim to boostrap this function and need a lot of random selections,
I'd like to avoid the effects of replicated centers by keeping the
initial centers constant.
When debugging, it seems that mda() is linked with kmeans() by the
following condition:
if (inherits(weights, "mda")) {
if (is.null(weights$weights)) weights <-
predict(weights, x, type = "weights", g = fg)
else weights <- weights$weights
}
This condition call mda.start() if "weight" is null.
Kmeans() is called in mda.start() by starter() where arguments for
kmeans (xx and start) are calculated.
The problem arises in the function sample() in starter() which sample
randomly the data set.
For example, I could obtain duplicated row such as followed:
Debug: start <- xx[sample(1:nrow(xx), size = nc), ]
debug: TT <- kmeans(xx, start)
Browse[1]> start
etm5 etm6 elevation slope SI NDVI EVI
28 0.7746975 0.4611835 -0.5566161 1.646738 4.5260250 1.519095 0.2501180
28.1 0.7746975 0.4611835 -0.5566161 1.646738 4.5260250 1.519095 0.2501180
30.1 0.4137596 0.2615745 -0.5367707 1.889310 -0.2040883 0.824643 -0.1526292
In sample function,it seems that sampling without replacement is the
default. But actually, in the case above it sampled 2 times the same
row (28).
So, this is still a black box for me.
Even if as it is mentionned in the help page of mda(), "the 'weights'
argument need never be accessed", do you think it's possible to avoid
this duplicated sampling?
Thanks in advance for your ideas,
Amelie Vaniscotte
University of Franche-comté
25000 Besançon
More information about the R-help
mailing list