[R] memory problem [cluster]

Martin Maechler maechler at stat.math.ethz.ch
Tue Dec 5 10:04:42 CET 2006


>>>>> "Roger" == Roger Bivand <Roger.Bivand at nhh.no>
>>>>>     on Sat, 2 Dec 2006 22:11:12 +0100 (CET) writes:

    Roger> On Sat, 2 Dec 2006, Dylan Beaudette wrote:
    >> Hi Stephano,

    Roger> Looks like you used my example verbatim 
    Roger> (http://casoilresource.lawr.ucdavis.edu/drupal/node/221)

    Roger> :)

    >> From exchanges on R-sig-geo, I believe the original questioner is feeding
    Roger> NAs to clara, and the error message in clara() is overrunning the buffer
    Roger> in sprintf(), so the memory problem isn't correctly identified. Using
    Roger> scripts out of context without checking whether the input data frame 
    Roger> satifies the conditions of the functions being used is asking for trouble. 
    Roger> The error message:

    >> traceback()
    Roger> 2: stop(ngettext(length(i), sprintf("Observation %d has", i[1]),
    Roger> sprintf("Observations %s have", paste(i, collapse = ","))),
    Roger> " *only* NAs --> omit for clustering")
    Roger> 1: clara(morph, k = 5, stand = F)

    Roger> is coming from lines:

    Roger> i[1]), sprintf("Observations %s have", paste(i, 
    Roger> collapse = ","))), " *only* NAs --> omit for clustering")

    Roger> in clara(). I have suggested dropping those rows from the data frame in a 
    Roger> reply on R-sig-geo, but maybe clara() could be patched to count the # of 
    Roger> completely missing rows, and if # is more than a modest number, not print 
    Roger> the obs. numbers, just the total?

Yes, thanks Roger, for the hint; I have now done that
(will be in cluster_1.11.4):

  > data(xclara)
  > xclara[sample(nrow(xclara), 50),] <- NA
  > clara(xclara, k = 3)
  Error in clara(xclara, k = 3) : 50 observations (6,95,106,191,258,294,295,321,432,601,662,702 ...)
	  have *only* NAs --> na.omit() them for clustering!


Lessons to be learned (I have learned it earlier; but not
scrutinized all my code to see if it's obeyed :-):  

- Inside stop(..) be careful not produce another error;
  particularly not a memory-related one, since this will give
  user-error messages that are not at all helpful.

- All non-beginner R users should be trained to routinely say
  'traceback()' after they've seen an error.

Regards,
Martin Maechler, ETH Zurich




More information about the R-help mailing list