[R] adding a method to the dist function
Prof Brian Ripley
ripley at stats.ox.ac.uk
Mon May 3 14:09:55 CEST 2004
On Mon, 3 May 2004, Giampiero Salvi wrote:
> On Mon, 3 May 2004, Prof Brian Ripley wrote:
> > dist() compares pairs of rows in the x matrix. How can they have `means
> > and covariances'? -- you have a sample of size one from each of two
> > populations.
> > It seems that (Gaussian) Bhattacharyya is more like mahalanobis().
> I had planned to use mean vectors and covariance matrices I computed
> over N groups of data samples as input to dist, like this
> mu_1_1 mu_1_2 ... mu_1_M cov_1_1_1 cov_1_1_2 ... cov_1_M_M
> mu_2_1 mu_2_2 ... mu_2_M cov_2_1_1 cov_2_1_2 ... cov_2_M_M
> mu_N_1 mu_N_2 ... mu_N_M cov_N_1_1 cov_N_1_2 ... cov_N_M_M
> where N is the number of groups and M the dimension.
> I agree that it would be better to use a new function (similar to
> mahalanobis), as the function dist in all the other cases uses raw
> data samples, and my interpretation of the input data might be
> confusing. The reason why I though of dist is that bhattacharyya is
> a symmetrical distance, and the result fits well the dist class.
> One way to solve this, if you agree, would be to write a new function
> bhattacharyya() that returns a dist object.
So you would be computing distances for groups of rows. That needs a
different interface from dist().
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help