[R] Fast R implementation of Gini mean difference

Adelchi Azzalini azzalini at stat.unipd.it
Mon Apr 28 10:55:08 CEST 2003

This is to complement my previous contribution on computation of Gini mean  
difference - a discussion started by Andrew Ward. The index is "defined" as
    gini  <- 0
      for (i in 1:n) 
         for (j in 1:n)  gini <- gini + freq[i]*freq[j]*abs(x[i]-x[j])
    gini<- gini/((sum(freq)-1)*sum(freq))

This is  the so-called form "without repetition"; the variant "with repetition"
does not have -1 in the final line.

Since computaation via the definition is totally inefficient, alternative
approaches have been put forward, following Andrew's message.

My first version of a computationally convenient implementation was
essentially this:

gini.md0<- function(x)
 { # x=data vector
   n <-length(x)

Since Andrew (private message) has stressed the importance in his problem
of allowing for replicated data, here is a more general version, obtained by 
elaborating on the previous one with a bit of algebra:

gini.md <- function(x, freq=rep(1,length(x)))
{# x=data vector, freq=vector of frequencies
  if(!is.vector(x)) stop("x must be a vector")
  if(length(x) != length(freq)) 
       stop("x and freq must have same length")  
  if(min(freq)<0 | sum(freq)==0 | any(freq != as.integer(freq)) ) 
             stop("freq must be counts")
     x <- x[freq>0]
     freq <- freq[freq>0]
     j <- order(x)
     x <- x[j]
     n <- as.integer(freq[j])
     n. <- sum(n)
     u <- (cumsum(n)-n)*n+ n*(n+1)/2
Notice that gini.md(x,freq) gives the same of mini.md0(rep(x,freq)), but the latter 
is obviously less efficient. Either are however far more efficient that straight
implementation of the "definition".


Adelchi Azzalini

Adelchi Azzalini  <azzalini at stat.unipd.it>
Dipart.Scienze Statistiche, Università di Padova, Italia

More information about the R-help mailing list