[R] Fast R implementation of Gini mean difference
Adelchi Azzalini
azzalini at stat.unipd.it
Mon Apr 28 10:55:08 CEST 2003
This is to complement my previous contribution on computation of Gini mean
difference - a discussion started by Andrew Ward. The index is "defined" as
gini <- 0
for (i in 1:n)
{
for (j in 1:n) gini <- gini + freq[i]*freq[j]*abs(x[i]-x[j])
}
gini<- gini/((sum(freq)-1)*sum(freq))
This is the so-called form "without repetition"; the variant "with repetition"
does not have -1 in the final line.
Since computaation via the definition is totally inefficient, alternative
approaches have been put forward, following Andrew's message.
My first version of a computationally convenient implementation was
essentially this:
gini.md0<- function(x)
{ # x=data vector
n <-length(x)
return(4*sum((1:length(x))*sort(x)/(n*(n-1)))
-2*mean(x)*(n+1)/(n-1))
}
Since Andrew (private message) has stressed the importance in his problem
of allowing for replicated data, here is a more general version, obtained by
elaborating on the previous one with a bit of algebra:
gini.md <- function(x, freq=rep(1,length(x)))
{# x=data vector, freq=vector of frequencies
if(!is.vector(x)) stop("x must be a vector")
if(length(x) != length(freq))
stop("x and freq must have same length")
if(min(freq)<0 | sum(freq)==0 | any(freq != as.integer(freq)) )
stop("freq must be counts")
x <- x[freq>0]
freq <- freq[freq>0]
j <- order(x)
x <- x[j]
n <- as.integer(freq[j])
n. <- sum(n)
u <- (cumsum(n)-n)*n+ n*(n+1)/2
return(4*sum(u*x)/(n.*(n.-1))
-2*weighted.mean(x,n)*(n.+1)/(n.-1))
}
Notice that gini.md(x,freq) gives the same of mini.md0(rep(x,freq)), but the latter
is obviously less efficient. Either are however far more efficient that straight
implementation of the "definition".
regards
Adelchi Azzalini
--
Adelchi Azzalini <azzalini at stat.unipd.it>
Dipart.Scienze Statistiche, Università di Padova, Italia
http://azzalini.stat.unipd.it/
More information about the R-help
mailing list