# [R] pam() with more general dissimilarity / distance

Martin Maechler m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Fri Apr 8 13:55:15 CEST 2022

```I was asked in private, but reply in public,
so others can also find this answer in the future:

On Fri, Apr 8, 2022 at 1:11 PM  ..... wrote :
>  Hello
> dear Dr. Maechler
> I have a question about "pam" function in the cluster package. In this
> function, we choose one of the  euclidean or manhattan distances to
> calculate dissimilarity but in the mixed typed data sets the true index may
> be jaccard or other indicators.
> How can we allocate the "true" metric for each variable?
> Best regards
>

yes,  you can use pam() use in two ways;  see this part of the help page :

Arguments:

x: data matrix or data frame, or dissimilarity matrix or object,
depending on the value of the ‘diss’ argument.

In case of a matrix or data frame, each row corresponds to an
observation, and each column corresponds to a variable.  All
variables must be numeric.  Missing values (NAs) _are_
allowed-as long as every pair of observations has at least
one case not missing.

In case of a dissimilarity matrix, ‘x’ is typically the
output of daisy or dist.  Also a vector of length
n*(n-1)/2 is allowed (where n is the number of observations),
and will be interpreted in the same way as the output of the
above-mentioned functions. Missing values (NAs) are _not_
allowed.

So, you can first use   dx <-  daisy(x, ...)     and use the correct
After that you can use the computed distance / dissimilarity matrix
(the `dx`)  in you call to pam():

px <- pam(dx, k=., ....)

I hope this helps you.
With best regards,
Martin

--
Martin Maechler
ETH Zurich

‪

```