[R] error using daisy() in library(cluster). Bug?
Martin Maechler
maechler at stat.math.ethz.ch
Thu Aug 12 17:59:21 CEST 2004
[Reverted back to R-help, after private exchange]
>>>>> "MM" == Martin Maechler <maechler at stat.math.ethz.ch>
>>>>> on Thu, 12 Aug 2004 17:12:01 +0200 writes:
>>>>> "javier" == javier garcia <- CEBAS <rn001 at cebas.csic.es>>
>>>>> on Thu, 12 Aug 2004 16:28:27 +0200 writes:
javier> Martin; Yes I know that there are variables with all
javier> five values 'NA'. I've left them as they are just
javier> because of saving a couple of lines in the script,
javier> and because I like to see that they are there,
javier> although all values are 'NA'. I don't expect they
javier> are used in the analysis, but are they the source of
javier> the problem?
MM> yes, but only because of "stand = TRUE".
MM> Yes, one could imagine that it might be good when
MM> standardizing these "all NA variables" would work
MM> I'll think a bit more about it. Thank you for the
MM> example.
Ok. I've thought (and looked at the R code) a bit longer.
Also considered the fact (you mentioned) that this worked in R 1.8.0.
Hence, I'm considering the current behavior a bug.
Here is the patch (apply to cluster/R/daisy.q in the *source*
or at the appriopriate place in <cluster_installed>/R/cluster ) :
--- daisy.q 2004/06/25 16:17:47 1.17
+++ daisy.q 2004/08/12 15:23:26
@@ -78,8 +78,8 @@
if(all(type2 == "I")) {
if(stand) {
x <- scale(x, center = TRUE, scale = FALSE) #-> 0-means
- sx <- colMeans(abs(x))
- if(any(sx == 0)) {
+ sx <- colMeans(abs(x), na.rm = TRUE)# can still have NA's
+ if(0 %in% sx) {
warning(sQuote("x"), " has constant columns ",
pColl(which(sx == 0)), "; these are standardized to 0")
sx[sx == 0] <- 1
Thank you for helping to find and fix this bug.
Martin Maechler, ETH Zurich, Switzerland
javier> El Jue 12 Ago 2004 15:11, MM escribió:
>>> Javier, I could well read your .RData and try your
>>> script to produce the same error from daisy().
>>>
>>> Your dataframe is of dimension 5 x 180 and has many
>>> variables that have all five values 'NA' (see below).
>>>
>>> You can't expect to use these, do you? Martin
More information about the R-help
mailing list