[R] lda in R vs S

Marc R. Feldesman feldesmanm at pdx.edu
Thu May 6 23:12:22 CEST 1999

At 09:24 PM 5/6/1999 +0100, Prof Brian D Ripley wrote:

>> I'm running a discriminant analysis in R (0.64.1) to compare it with SPlus
>That's not released until tomorrow!  I guess you have the pre-release,
>prerw0641, which is actually of 0.64.0.

Yes.  Actually the pre-release of 0.64.1

>> 4.5R2.  The following command line works fine in SPlus but gives an error
>> in R.  I've only used R for a little while so I'm not certain here what R
>> (or lda) is complaining about.  The dependent variable (sarich.na[,3]) is
>> an alpha categorical variable, if that makes a difference.  I'm using
>What's that? The response ought to be a factor, according to the docs:

SAS & SPSS speak.  Alpha categorical variable = factor.

> formula: A formula of the form `groups ~ x1 + x2 + ...{}'
>          That is, the response is the grouping factor and
>          the right hand side specifies the (non-factor)
>          discriminators.
>> version VR5.3 (file name VR5.3pl037.zip).
>> lda.out<-lda(sarich.na[,3]~., data=sarich.na[,4:32])
>> Error in model.frame(formula, rownames, variables, varnames, extras,
>> extranames,  : invalid variable type
>> Is this an lda issue or an R issue?
>It is an R issue. Only logical, integer and real variables are allowed
>in R model frames, for as the code says

I haven't delved deeply into R internals yet.  I just started experimenting
with it as I was learning SPlus in parallel.  So at the present time, even
though sarich.na[,3] *is* a factor but with alpha levels, are you saying
that R won't allow this?  

>    /* Sanity checks to ensure that the the answer can become */
>    /* a data frame.  Be deeply suspicious here! */

Deeply suspicious of what?

>But that is not the `right' way to do this in either. Use either

Either?  Are you saying that the formulation above isn't correct in
*either* R or SPlus?  It works fine in SPlus (and sarich.na[,3] is coded as
a factor with levels "AINU", "BUSHMAN", etc...).  But, SPlus also allows
sarich.na[,3] to be on the left side even if it isn't an explicit factor.
Even if it is coded only as a character variable, SPlus allows it, lda
calculates the results, and gives the correct answers.  Presumably if this
isn't the "correct" approach, SPlus or lda is coercing the character
variable to a factor.  This also works in aov and other functions that take
a formula.

>lda.out<-lda(sarich.na[,4:32], sarich.na[,3])

This works fine in Splus, not in R, at least not with this data set.

>lda.out<-lda(somename ~ ., data=sarich.na[,3:32])
>where somename is the name of column 3, and that had better be a factor.

Also works in Splus, but not in R.

However, *this* works:

lda.out<-lda(as.factor(populati)~., data=sarich.na[,4:32])

This puzzles me.  The variable "populati" *is* a factor already.  Why would
I have to coerce a factor to a factor to get this to run?  But, following
the logic above, the next variant ought to work, but it doesn't.

lda.out<-lda(sarich.na[,4:32], as.factor(sarich.na[,3])

This emits an error message telling me I can't have negative length
subscripts, an error message that leaves me without a clue at the moment.

Dr. Marc R. Feldesman
email:  feldesmanm at pdx.edu
email:  feldesman at ibm.net
fax:    503-725-3905

"Math is hard.  Let's go to the mall"  Barbie

Powered by:  Monstrochoerus - the 300 MHz Pentium II
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list