[R] lda (MASS)

Thu Apr 21 12:01:12 CEST 2005

hi!

this is a question about lda (MASS) in R on a particular dataset.
I'm not a specialist about any of this but:
First with the well-known "iris" dataset, I tried using lda to discriminate 
versicolor from the other to classes and I got approx. 70% of accuracy
testing on train set. In iris, versicolor stands "between" the 2 other so
one can expect lda not to perform well since it cannot cluser the negative
instances (seposa+virginica) together (Is this correct?) (KNN=96% in xval.)

Now, I use my "real" dataset (900 instances, 21 attributes), which 2 classes
can be serparated with accuracy no more than 80% (10xval) with KNN, SVM, C4.5
and the like. 
So I was very surprised to see that lda also gets an accuracy of 80% on it,
because lda is very simple (finding the best line -- for a 2 classes 
problem -- and using projections on the line for classification.)

So my question is: how does lda (in MASS) use the projections to make
the decision? Usually the decision for a test instances is made
using means and variances of the 2 classes but there are other possibilites
(especially in higher dimensions.)

Thanks for any idea, the doc is a bit spares and Venebles&Ripley's book
also for this particular matter.

Samuel

PS: and does anybody know how to use the CV option of lda to make xval?
I can't get it.