[R] Transport and Earth Mover's Distance
Lorenzo Isella
lorenzo.isella at gmail.com
Tue Mar 7 16:32:04 CET 2017
Dear Dominic,
Thanks a lot for the quick reply.
Just a few questions to make sure I got it all right (I now understand that
transport and spatstat in particular can do much more than I need
right now).
Essentially I am after the Wasserstein distance between univariate
distributions (and it would be great if I can extend it to the
case of two distributions with a different bin structure).
1) two distributions with the same bins (I identify each bin by the
central point in the bin).
n_bin <- 11 # number of bins
bin_structure <- seq(10, by=1, len=n_bin)
set.seed(1234)
x_counts <- rpois(n_bin, 10)
y_counts <- rpois(n_bin, 10)
x <- pp(as.matrix(cbind(bin_structure, x_counts)))
y <- pp(as.matrix(cbind(bin_structure, y_counts)))
match <- transport(x,y,p=1)
plot(x,y,match)
wasserstein_dist <- wasserstein(x,y,p=1,match)
2) Now I do not have the same bin structure
y2 <- pp(as.matrix(cbind(bin_structure+2, y_counts)))
match <- transport(x,y2,p=1)
plot(x,y2,match)
wasserstein_dist2 <- wasserstein(x,y2,p=1,match)
Do 1) and 2) make sense?
>
>If you have no particular need for binning, check out the function
>pppdist in the R-package spatstat, which offers a more flexible way
>to deal with point patterns of different size.
Well, this is not clear, but possibly very important for me.
My raw data consists of 2 univariate samples of unequal length.
suppose that
x<-rnorm(100)
and
y<-rnorm(90)
Is there a way to define the Wasserstein distance between them without
going through the binning procedure?
Many thanks!
Lorenzo
More information about the R-help
mailing list