[R] Questions about use of multinomial for discrimination.

Ricardo Antunes rna at st-andrews.ac.uk
Sat Nov 28 00:24:56 CET 2009


Dear All,

I am looking at discriminating among several individuals based on a few 
variable sets (I think some variables do not make sense unless they are 
entered together, so I "force" them into the models together, hence 
datasets). I have done so with linear discriminant analysis (LDA) using 
"MASS::lda",  with acceptable results. However, one of my collaborators 
suggested I use multinomial regression instead. I think his suggestion 
is mainly concerned with the choice of which variables (sets) best 
describe the data. I have used a stepwise approach (using 
klaR::stepclass) using the proportion of correct classifications to 
choose among the sets of variables. However I've been suggested that use 
a method that will give out an AIC instead, that will "penalize" the use 
of more variables. I have never done multinomial regression, and am 
uncertain about some details. I am looking into using R for this, and 
function multinom from MASS in particular.

In my previous analysis with LDA I have measured the proportion of 
correct classifications using a jackknife procedure (i.e. leaving each 
datum out of the LDA at a time, and using the obtained discriminant 
functions to classify it). I am thinking about doing the same with the 
multinomial regression. I would appreciate any ideas about if this may 
not be good for some reason.

Also, with the LDA I have looked at how much better the discriminant 
functions are compared with random assignment of individual identity. To 
do this I randomly shuffle the categories prior to running the LDA, then 
run the LDA, and measure the proportion of correct classifications using 
the above described jackknife procedure. I run this for many iterations 
and compare the distribution of proportion of correct classifications 
obtained from random assignment, with the one I obtained initially.
Again, I though about repeating this with the multinom. Is this 
unnecessary as another way of looking at this already included in the 
multinom function?

Perhaps this is more of a general statistics question, that one about 
the use of R, but I would appreciate any helpful comments.

Thank you in advance.

Ricardo Antunes




More information about the R-help mailing list