[R] Measure Difference Between Two Distributions

Lorenzo Isella lorenzo.isella at gmail.com
Sat Sep 25 18:24:16 CEST 2010


ld represent the distance as the proportion of maximum possible
> distance, i.e. scaling it to be between 0 and 1.
>
> An example:
> A and B have the same length (x), and you calculate the emd(A, B), which
> is d.
> Now you have to determine the maximum distance between these two:
> remembering the analogy of moving earth, the biggest distance between
> the two distributions would be if in A, all elements would be in A(1)
> and all other would be zero, and in B all elements would be zero, except
> of B(x). Now you can calculate the difference between these two, and you
> get dmax
> The last step is to divide d/dmax, i.e. scaling to a value between 0 and 1.
>
> this value then can be compared with the same ratio obtained from C and
> D with length y.
>
> One important point to keep in mind when using the emd: if the sum(A) is
> not the same as sum(B), emd(A,B) is NOT EQUAL to emd(B,A). If this
> applies to your case, you have to decide what to do, but one option is
> to standardise A and B so that their sum is the same (effectively
> comparing the SHAPES and not the actual values.

OK, I see. The standardization part is not a terrible problem, I guess.
The other bit is less clear (to me). What are A(1) and B(x)? Am I piling 
up all the elements in A and B in a single bin?
Cheers

Lorenzo



More information about the R-help mailing list