[R] Measure Difference Between Two Distributions

Lorenzo Isella lorenzo.isella at gmail.com
Sat Sep 25 15:53:53 CEST 2010


On 09/25/2010 03:23 PM, Rainer M Krug wrote:

>
> Evaluate, for me, does not necessary mean "test if they are
> significantly different", but rather to quantify the difference. If that
> is what you are looking for, you could look at the "Earth Movers
> Distance", where a package is available at R-forge
> (https://r-forge.r-project.org/projects/earthmovdist/) which I co-wrote
> and used before.
>
> Cheers,
>
> Rainer
>

Thanks Rainer. I had a quick look at wikipedia and the package you 
mention, and it seems what I am looking for.
Just a question about normalization of the distance calculated by the 
algorithm.
Let us say that I have 4 distributions A,B,C,D coupled this way (A,B) 
and (C,D).
The length of data in A is equal to the length of data in B, same 
applies to C and D but length(A)!=length(C).
Now, the argument I would like to make is that A and B are more similar 
than C and D and show a couple of numbers to prove this.
Bottom line: provided my data lists are long enough, does this distance 
scale with the number of data? and if they do, how should I normalize 
this distance to compare the results?
Cheers

Lorenzo



More information about the R-help mailing list