# [R] deviance vs entropy

RemoteAPL remoteapl at obninsk.com
Fri Feb 16 00:53:14 CET 2001

```----- Original Message -----
From: "Thomas Lumley" <tlumley at u.washington.edu>
To: "RemoteAPL" <remoteapl at obninsk.com>
Cc: <r-help at hypatia.math.ethz.ch>
Sent: Thursday, February 15, 2001 7:41 PM
Subject: Re: [R] deviance vs entropy

> On Thu, 15 Feb 2001, RemoteAPL wrote:
>
> > Hello,
> >
> > The question looks like simple. It's probably even stupid. But I spent
several hours
mentioned and...
> > And haven't found an answer.
> >
> > Well, it is clear for me the using of entropy when I split some node of
a classification tree.
> > The sense is clear, because entropy is an old good measure of how
uniform is distribution.
> > And we want, for sure, the distribution to be uniform, represent one
class only as the best.
> >
> > Where deviance come from at all? I look at a formula and see that the
only difference to
> > entropy is use of *number* of each class points, instead of
*probability* as a multiplier
> > of log(Pik). So, it looks like the deviance and entropy differ by factor
1/N (or 2/N), where
> > N is total number of cases. Then WHY to say "deviance"? Any historical
reason?
> > Or most likely I do not understand something very basic. Please, help.
>
>
> Entropy is, as you say, a measure of non-uniformity. Deviance (which is
> based on the loglikelihood function) is a measure of evidence.  A given
> level of improvement in classification is much stronger evidence for a
> split if it is based on a large number of points.  For example, with 2
> points you can always find a split that gives perfect classification. With
> 2000 points it is very impressive to be able to get perfect classification
> with one split.
>
> -thomas
>
> Thomas Lumley Asst. Professor, Biostatistics
> tlumley at u.washington.edu University of Washington, Seattle
>

Thomas,

Thanks a lot. I catch your explanation. My problem was that I thought on one
node, where it doesn't matter
if you calculate deviance or entropy. But when you compare the parent and
child nodes it does matter indeed.
Entropy shows the same improvement if you split perfectly a mixture of 2
classes of 10 or 1000 cases.
And deviance says that in the second case you may have more cake (or beer:-)
in the evening.
Also, now I understand, that while construction tree we have less and less
points at each split.
And deviance helps to see that the next split does add complexity, but
of classification quality.

Well, now I read in a paper: "Other approaches for choosing future splits
involve entropy measures
and the Gini index". Why entropy is used any way? May you point me to some
paper on all these measures comparison?