[R] obtaining the discriminant line from lda

Fri Apr 28 23:44:01 CEST 2000

On Fri, 28 Apr 2000, Clayton Springer wrote:

> Dear R folks,
> 
> Thanks to all your help before I have loaded a 1-D toy data set into
> R and did LDA on it. The toy data has Class=0 if value>0.
> 
> > XY <-- read.table ("test.xy",header=T )
> > XY              
>      X.Class       value
> 1          0  60.4897262
> 2          0  32.9554489
> 3         -1 -53.6459189
> 4          0  44.4450579
> .
> .
> .
> 998       -1 -43.4183157
> 999        0   7.9865092
> 1000      -1  -8.2279180
> > XY.lda <- lda(X.Class ~ value,XY)
> > XY.lda
> Call:
> lda.formula(X.Class ~ value, data = XY)
> 
> Prior probabilities of groups:
>    -1     0 
> 0.521 0.479 
> 
> Group means:
>        value
> -1 -48.66322
> 0   49.91819
> 
> Coefficients of linear discriminants:
>             LD1
> value 0.0357248
> > XY.lda$svd
> [1] 55.63543
> > XY.lda$class
> NULL
> > XY.lda$posterior
> NULL
> 
> Question #1: How do I obtain the line that lda thinks divides the
> two groups?  (which here it is between 1 and 2.)

Use the prediction equation, and solve for equal probabilities in the
groups.

> Next I load in a test set for prediction:
> 
> > Predict0
>    value
> 1    -10
> 2     -9
> 3     -8
> 4     -7
> 5     -6
> 6     -5
> 7     -4
> 8     -3
> 9     -2
> 10    -1
> 11     0
> 12     1
> 13     2
> 14     3
> 15     4
> 16     5
> 17     6
> 18     7
> 19     8
> 20     9
> 21    10
> 
> > Predict0.lda <- predict(XY.lda,Predict0)
> > Predict0.lda$class
>  [1] -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0  0  0  0  0  0  0  0  0 
> 
> For those who don't want to count this shows that the dividing
> line is somewhere between 1 & 2, even though my toy data set
> can be perfectly divided at 0.  I had not expected (Fischer's) LDA
> to behave this way.

lda is not Fisher's (no c) LDF, it is Rao's LDA. In particular, it takes
the class prevalences into account unless you set prior.  lda is not a
perceptron, nor logistic discrimination.

> Question #2:  Are there parameter adjustments and/or other LDA methods
> where I can get the expected dividing surface at 0. (presumability
> a classification tree would choose the line I desire, but I want
> a lda method that does this.) 

No, a tree will not (it will not use linear combinations).  Why do you
think the dividing surface should be at zero?  Your training set is
asymmetric.  I think you are looking for logistic discrimination not lda,
and are confusing performance on the training set with performance on
future examples: lda `knows' the populations are normally distributed.

I think you need to understand better the theory behing lda: see the book
for which it is supporting software.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._