# [R] Fast R implementation of Gini mean difference

Thu Apr 24 15:41:42 CEST 2003

```On Thursday 24 April 2003 05:17, you wrote:
> I have written the following function to calculate the weighted mean
> difference for univariate data (see
> http://www.xycoon.com/gini_mean_difference.htm for a related
> formula). Unsurprisingly, the function is slow (compared to sd or mad)
> for long vectors. I wonder if there's a way to make the function
> faster, short of creating an external C function. Thanks very much
>
>
> gmd <- function(x, w) { # x=data vector, w=weights vector
>    n <- length(x)
>    tmp <- 0
>    for (i in 1:n) {
>       for (j in 1:n) {
>          tmp <- tmp + w[i]*abs(x[i]-x[j])
>       }
>    }
>    retval <- 0.5*sqrt(pi)*tmp/((n-1)*sum(w))
> }

I have a few remarks on the "definition" of Gini index, and one on the most
suitable way to compute it.

(a) there no 0.5*sqrt(pi) neither in the standard definition of the index, nor
in the quoted source, at http://www.xycoon.com/gini_mean_difference.htm
(b) when the values of x vector have frequencies, then the common
definition knwon to me is not the above one, but one such that the nested line
above should be replaced by
tmp <- tmp + w[i]*w[j]*abs(x[i]-x[j])
and the final assignment should be
0.5*sqrt(pi)*tmp/((sum(w)-1)*sum(w))
In fact, the above function produces inconsident results; see this:
>  gmd(c(1,2,2,4),c(1,1,1,1))
[1] 1.329
> gmd(c(1,2,4), c(1,2,1))
[1] 1.662

while the above modifications lead again to
> gmd(c(1,2,4), c(1,2,1))
[1] 1.329

(c) from the computational viewpoint, there is a much more convenient equivalent
expression, based on the order statistics, leading to the following function:

gini.md<- function(x)
{ # x=data vector
j <-order(x)
n <-length(x)
return(4*sum((1:length(x))*x[j])/(n*(n-1))
-2*mean(x)*(n+1)/(n-1))
}

This is intendend for the basic case of unreplicated data. With some algebraic work
one should be able to obtain a similar expression for the general case ...once an
agreement has been achieved on the definition of the index.

Clearly, to convert the output from gini.md() to gmd..() one must multiply by
0.5*sqrt(pi).

Here is a short comparison of timing, following the one prepared by Dan Nordlund.

> x<-rnorm(5000)
>  n<-rpois(5000,5)
> w<-n/sum(n)

>  system.time(gmd(x, w))
[1] 241.44   0.21 250.20   0.00   0.00
>  system.time(gmd.1(x, w))
[1]   5.01   8.69 129.95   0.00   0.00
>  system.time(gmd.2(x, w))
[1]  1.21  1.55 10.19  0.00  0.00
> system.time(gini.md(x))
[1] 0 0 0 0 0

also, the new function is  requires little memory

regards