[R] lda in R vs S

Prof Brian Ripley ripley at stats.ox.ac.uk
Thu May 6 23:39:09 CEST 1999

On Thu, 6 May 1999, Marc R. Feldesman wrote:

> At 09:24 PM 5/6/1999 +0100, Prof Brian D Ripley wrote:
> >> I'm running a discriminant analysis in R (0.64.1) to compare it with SPlus
> >
> >That's not released until tomorrow!  I guess you have the pre-release,
> >prerw0641, which is actually of 0.64.0.
> Yes.  Actually the pre-release of 0.64.1
> >> 4.5R2.  The following command line works fine in SPlus but gives an error
> >> in R.  I've only used R for a little while so I'm not certain here what R
> >> (or lda) is complaining about.  The dependent variable (sarich.na[,3]) is
> >> an alpha categorical variable, if that makes a difference.  I'm using
> >
> >What's that? The response ought to be a factor, according to the docs:
> SAS & SPSS speak.  Alpha categorical variable = factor.
> > formula: A formula of the form `groups ~ x1 + x2 + ...{}'
> >          That is, the response is the grouping factor and
> >          the right hand side specifies the (non-factor)
> >          discriminators.
> >
> >> version VR5.3 (file name VR5.3pl037.zip).
> >> 
> >> lda.out<-lda(sarich.na[,3]~., data=sarich.na[,4:32])
> >> Error in model.frame(formula, rownames, variables, varnames, extras,
> >> extranames,  : invalid variable type
> >> 
> >> Is this an lda issue or an R issue?
> >
> >It is an R issue. Only logical, integer and real variables are allowed
> >in R model frames, for as the code says
> I haven't delved deeply into R internals yet.  I just started experimenting
> with it as I was learning SPlus in parallel.  So at the present time, even
> though sarich.na[,3] *is* a factor but with alpha levels, are you saying
> that R won't allow this?  

It will allow factors: they get coerced to integers. I think from the
evidence later that sarich.na[,3] is not a factor, even if it looks like

> >
> >    /* Sanity checks to ensure that the the answer can become */
> >    /* a data frame.  Be deeply suspicious here! */
> >
> Deeply suspicious of what?

Of things that look like factors? (I don't know, I didn't write this.)

> >But that is not the `right' way to do this in either. Use either
> Either?  Are you saying that the formulation above isn't correct in
> *either* R or SPlus?  It works fine in SPlus (and sarich.na[,3] is coded as
> a factor with levels "AINU", "BUSHMAN", etc...).  But, SPlus also allows
> sarich.na[,3] to be on the left side even if it isn't an explicit factor.

I am saying that it is legal in S-PLUS but poor style, and likely to cause
methods (e.g. for prediction) to fail. In neither dialect is it what the
designers intended.

> Even if it is coded only as a character variable, SPlus allows it, lda
> calculates the results, and gives the correct answers.  Presumably if this
> isn't the "correct" approach, SPlus or lda is coercing the character
> variable to a factor.  This also works in aov and other functions that take
> a formula.

Yes, S-PLUS coerces character vars in model frames to factor, and I
believe R does not allow them. Here is a simple experiment.


> data(iris)
> names(iris)
[1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"     
species <- as.character(iris$Species)

> lda(species ~ . - Species, data=iris)
Error in model.frame(formula, rownames, variables, varnames, extras,
extranames,  : invalid variable type

> lda(Species ~ ., data=iris)
> lda(as.matrix(iris[, 1:4]), species)

works fine. It looks like R is having problems with data frames here
that I will have to look into. In R a data frame is not a matrix, and
much less coercion gets done.


Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list