[R] Questions about use of multinomial for discrimination.
Ricardo Antunes
rna at st-andrews.ac.uk
Sat Nov 28 00:24:56 CET 2009
Dear All,
I am looking at discriminating among several individuals based on a few
variable sets (I think some variables do not make sense unless they are
entered together, so I "force" them into the models together, hence
datasets). I have done so with linear discriminant analysis (LDA) using
"MASS::lda", with acceptable results. However, one of my collaborators
suggested I use multinomial regression instead. I think his suggestion
is mainly concerned with the choice of which variables (sets) best
describe the data. I have used a stepwise approach (using
klaR::stepclass) using the proportion of correct classifications to
choose among the sets of variables. However I've been suggested that use
a method that will give out an AIC instead, that will "penalize" the use
of more variables. I have never done multinomial regression, and am
uncertain about some details. I am looking into using R for this, and
function multinom from MASS in particular.
In my previous analysis with LDA I have measured the proportion of
correct classifications using a jackknife procedure (i.e. leaving each
datum out of the LDA at a time, and using the obtained discriminant
functions to classify it). I am thinking about doing the same with the
multinomial regression. I would appreciate any ideas about if this may
not be good for some reason.
Also, with the LDA I have looked at how much better the discriminant
functions are compared with random assignment of individual identity. To
do this I randomly shuffle the categories prior to running the LDA, then
run the LDA, and measure the proportion of correct classifications using
the above described jackknife procedure. I run this for many iterations
and compare the distribution of proportion of correct classifications
obtained from random assignment, with the one I obtained initially.
Again, I though about repeating this with the multinom. Is this
unnecessary as another way of looking at this already included in the
multinom function?
Perhaps this is more of a general statistics question, that one about
the use of R, but I would appreciate any helpful comments.
Thank you in advance.
Ricardo Antunes
More information about the R-help
mailing list