[R] adding a method to the dist function

Mon May 3 14:09:55 CEST 2004

On Mon, 3 May 2004, Giampiero Salvi wrote:

> On Mon, 3 May 2004, Prof Brian Ripley wrote:
> 
> > dist() compares pairs of rows in the x matrix.  How can they have `means
> > and covariances'? -- you have a sample of size one from each of two
> > populations.
> >
> > It seems that (Gaussian) Bhattacharyya is more like mahalanobis().
> 
> I had planned to use mean vectors and covariance matrices I computed
> over N groups of data samples as input to dist, like this
> 
> mu_1_1 mu_1_2 ... mu_1_M cov_1_1_1 cov_1_1_2 ... cov_1_M_M
> mu_2_1 mu_2_2 ... mu_2_M cov_2_1_1 cov_2_1_2 ... cov_2_M_M
> ...
> mu_N_1 mu_N_2 ... mu_N_M cov_N_1_1 cov_N_1_2 ... cov_N_M_M
> 
> where N is the number of groups and M the dimension.
> 
> I agree that it would be better to use a new function (similar to
> mahalanobis), as the function dist in all the other cases uses raw
> data samples, and my interpretation of the input data might be
> confusing. The reason why I though of dist is that bhattacharyya is
> a symmetrical distance, and the result fits well the dist class.
> 
> One way to solve this, if you agree, would be to write a new function
> bhattacharyya() that returns a dist object.

So you would be computing distances for groups of rows.  That needs a 
different interface from dist().

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595