[R] Working with tables with missing levels

Andre Nathan andre at digirati.com.br
Mon Jul 27 21:21:53 CEST 2009


Hello

I'm trying to write a function to calculate the relative entropy between
two distributions. The data I have is in table format, for example:

> t1 <- prop.table(table(c(0,0,2,4,4)))
> t2 <- prop.table(table(c(0,2,2,2,3)))
> t1

  0   2   4 
0.4 0.2 0.4 
> t2

  0   2   3 
0.2 0.6 0.2

The relative entropy is given by

  H[P||Q] = sum(p * log2(p/q))

with the conventions that 0*log2(0/q) = 0 and p*log2(p/0) = Inf.

I'm not sure about what is the best way to achieve that. Is there a way
to test if a table has a value for a given level, so that I can detect
that, for example, t1 is missing levels 1 and 3 and t2 is missing levels
1 and 4 (is "level" the correct terminology here?)? Simply trying to
access t1[["1"]], for example, gives a "subscript out of bounds" error.

Another option would be to "expand" the tables, so that, for example, t1
becomes

  0   1   2   3   4 
0.4 0.0 0.2 0.0 0.4

Is there a way to do that?

Thanks,
Andre




More information about the R-help mailing list