[R] deviance vs entropy

RemoteAPL remoteapl at obninsk.com
Fri Feb 16 01:24:46 CET 2001


Warren,

Thank you for your answer. It gave some food to my brain. Let me ask more...

> I'm not quite sure what you have in mind, but I'm inferring from your
comments that by "deviance"
> you mean:
>
>    -SUM p_i log (p_i/q_i)  (or -2 SUM p_i log (p_i/q_i))

I am sorry for my language. I meant in particular those deviance which is
calculated when
we select split of a node building classification tree. As far as I know it
should be:

 -2 SUM n_i log (p_i)

where n_i is number of points of class c_i at this node and p_i=n_i/N, where
N is total number
of cases at this node. Probably it refers some way to what you wrote above.
May you tell more
on p_i and q_i in your formula?

>    D(p_i||q_i) = - SUM p_i log p_i + SUM p_i log q_i = H(p) - H(p:q)
>
> where H(p) is entropy of p, and H(p:q) is the cross entropy.  If q is the
uniform distribution, then
> the cross entropy reduces to:

I probably understand this and the next statements if I understand the first
formula.

> I'm guessing that in the things you've read, when they are talking about
deviance, q can (and
> generally is) something other than the uniform distribution.  For example,
p is often the empirical
> distribution of a data sample, and q is the distribution corresponding to
some induced model.  Then
> D(p||q) is a measure of how far the model is from the observed data.

It sounds interesting. May you please repeat this in terms of classification
trees? I mean what is
"induced model" and "corresponding distribution" if we are speaking on CART?

> entropy (entropy - cross_entropy, or KL-divergence).  Statisticians are
interested in deviance
What "KL" stands for?

> because (with the factor of 2) it is asymptotically chi-square for many
modeling families.  In
That's probably the most important argument PRO.

> information theoretic terms it's nice to think of the deviance as the
number of bits extra that it
> would take to transmit the data for a system assuming the distribution q,
relative to a system that
> had assumed p, which is the best system for transmitting that particular
data set.
Very interesting! I must think over this more.

> Then again, maybe I've misunderstood you completely.  Please set me
straight if I have.
I see that you sit quite straight. I am afraid that I lie horizontally:-)

Regards,
Alexander.

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list