[R] Error on distance matrix

Gavin Simpson gavin.simpson at ucl.ac.uk
Mon Jan 14 17:30:42 CET 2008


On Mon, 2008-01-14 at 17:19 +0100, Martin Maechler wrote:
> >>>>> "GS" == Gavin Simpson <gavin.simpson at ucl.ac.uk>
> >>>>>     on Thu, 10 Jan 2008 14:16:36 +0000 writes:
> 
>     GS> On Thu, 2008-01-10 at 10:48 +0000, Marc Moragues wrote:
>     >> Hi,
>     >> 
>     >> I am trying to calculate a distance matrix on a binary
>     >> data frame using dist.binary() {ade4}. This is the code I
>     >> run and the error I get:
>     >> 
>     >> > sjlc.dist <- dist.binary(as.data.frame(data), method=2)
>     >> #D = (a+d) / (a+b+c+d) Error in if (any(df < 0))
>     >> stop("non negative value expected in df") : missing value
>     >> where TRUE/FALSE needed
>     >> 
>     >> I don't know if the problem are the missing values in my
>     >> data. If so how can I handle them?
> 
>     GS> Marc,
> 
>     GS> Take a look at distance in package analogue and method =
>     GS> "mixed" which implements Gower's general dissimilarity
>     GS> coefficient for mixed data. 
> 
> daisy() in recommended package 'cluster' has been doing this
> for"ever". Has there been a reason for reimplementing that?

Martin,

Yes, but only because I wanted the distances between matrix X and matrix
Y (on a common set of variables) and not the pairwise distances between
observations of matrix X. It is a side effect that if only X is provided
to distance() it returns the pairwise distances. distance() also allows
for varying weights (which I understand daisy() does not) and for
pre-specifying the ranges (which are used to standardise each variable).

I must admit, however, that I had forgotten about daisy() and it's
abilities in relation to the OP's email, probably because I tend not to
use it - most of the coefficients are elsewhere (in dist() and vegdist()
in package vegan), and, as the author, distance() sprang to mind more
readily.

For the record however, if you want to calculate Gower's general
dissimilarity coefficient for pairwise distances of a single matrix X,
then I would very much recommend daisy() over distance(), as daisy() is
faster, specifically designed for this task, returns an object that
inherits (if that is the right word for an S3 class?) from class "dist"
and is likely far more tried and far better tested than my distance()
function.

If I'd been thinking more clearly when I replied, I would not have
mentioned daisy() over my distance().

G

> 
>     GS> It can deal quite happily
>     GS> with binary data and where there is missing-ness.
>     GS> Binary data are handled through a simple matching
>     GS> coefficient, 1 if variable i present in both samples, 0
>     GS> otherwise, and then summed over all variables i. You
>     GS> should probably read up on how the missing-ness is
>     GS> handled with this method and what properties the
>     GS> resulting dissimilarity has.
> 
>     GS> Note that distance() outputs full dissimilarity
>     GS> matrices. To get something to plug into functions that
>     GS> require a dist object, just use as.dist() on the output
>     GS> from distance().
> 
>     GS> HTH
> 
>     GS> G
> 
>     >> 
>     >> Thank you, Marc.  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>     >> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>     >> 
>     >> SCRI, Invergowrie, Dundee, DD2 5DA.  The Scottish Crop
>     >> Research Institute is a charitable company limited by
>     >> guarantee.  Registered in Scotland No: SC 29367.
>     >> Recognised by the Inland Revenue as a Scottish Charity
>     >> No: SC 006662.
>     >> 
>     >> 
>     >> DISCLAIMER:
>     >> 
>     >> This email is from the Scottish Crop Research Institute,
>     >> but the views expressed by the sender are not necessarily
>     >> the views of SCRI and its subsidiaries.  This email and
>     >> any files transmitted with it are confidential to the
>     >> intended recipient at the e-mail address to which it has
>     >> been addressed.  It may not be disclosed or used by any
>     >> other than that addressee.  If you are not the intended
>     >> recipient you are requested to preserve this
>     >> confidentiality and you must not use, disclose, copy,
>     >> print or rely on this e-mail in any way. Please notify
>     >> postmaster at scri.ac.uk quoting the name of the sender and
>     >> delete the email from your system.
>     >> 
>     >> Although SCRI has taken reasonable precautions to ensure
>     >> no viruses are present in this email, neither the
>     >> Institute nor the sender accepts any responsibility for
>     >> any viruses, and it is your responsibility to scan the
>     >> email and the attachments (if any).
>     >> 
>     >> 
>     >> ______________________________________________
>     >> R-help at r-project.org mailing list
>     >> https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
>     >> read the posting guide
>     >> http://www.R-project.org/posting-guide.html and provide
>     >> commented, minimal, self-contained, reproducible code.
>     GS> --
>     GS> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>     GS> Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL
>     GS> Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e]
>     GS> gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w]
>     GS> http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT.  [w]
>     GS> http://www.freshwaters.org.uk
>     GS> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> 
>     GS> ______________________________________________
>     GS> R-help at r-project.org mailing list
>     GS> https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
>     GS> read the posting guide
>     GS> http://www.R-project.org/posting-guide.html and provide
>     GS> commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%




More information about the R-help mailing list