[R] Discriminant function analysis

Gavin Simpson gavin.simpson at ucl.ac.uk
Thu Feb 7 15:36:58 CET 2008


hits=-2.6 testsºYES_00
X-USF-Spam-Flag: NO

On Thu, 2008-02-07 at 13:21 +0000, Tyler Smith wrote:
> On 2008-02-07, Birgit Lemcke <birgit.lemcke at systbot.uzh.ch> wrote:
> >
> > Am 06.02.2008 um 21:00 schrieb Tyler Smith:
> >>
> >>> My dataset contains variables of the classes factor and numeric. Is
> >>> there another function that is able to handle this?
> >>
> >> The numeric variables are fine. The factor variables may have to be
> >> recoded into dummy binary variables, I'm not sure if lda() will deal
> >> with them properly otherwise.
> >
> > But aren´t binary variables also factors? Or is there another  
> > variable class than factor or numeric?
> > Do I have have to set the classe of the binaries as numeric?
> >
> 
> There is no binary class in R, so you would have to use a numeric
> field. For example:

I think Birgit (from previous emails to the list) has been treating
binary data as factors when producing Gower's dissimilarity.

In R binary data can be represented in various ways:

bin <- factor(sample(0:1, 20, replace = TRUE))
bin2 <- as.numeric(as.character(bin))
bin3 <- sample(0:1, 20, replace = TRUE)
bin4 <- sample(c(0, 1), 20, replace = TRUE)

dat <- data.frame(bin, bin2, bin3, bin4)
sapply(dat, class)

The /numeric/ representation can be "numeric" or "integer".

But I'm not sure this matters much. If you use the formula interface to
lda(), factors get expanded to the dummy variables Tyler is talking
about. But of course, a factor with two levels 0/1 doesn't need much
manipulation as you only need a single dummy variable to represent its
two states:

model.matrix(gl(4,5) ~ bin + bin2 + bin3 + bin4, data = dat)

See how bin is converted to bin1 only. So you can either do the
conversion before hand (as I did to get bin2) or just supply bin
directly in the formula to lda and model.matrix will take care of it for
you.

You might want to standardise your exp variables to zero mean and unit
variance prior to doing the lda so that all variables carry the same
weight, if you have mixtures of numeric (continuous) variables and
binary ones.

G

> | sample | factor_1 |
> |--------+----------|
> | A      | red      |
> | B      | green    |
> | C      | blue     |
> 
> becomes:
> 
> | sample | dummy_1 | dummy_2 |
> |--------+---------+---------|
> | A      |       1 |       0 |
> | B      |       0 |       1 |
> | C      |       0 |       0 |
> 
> R can deal with dummy_1 and dummy_2 as numeric vectors. The details
> should be explained in a good reference on multivariate statistics
> (I'm looking at Legendre and Legendre (1998) section 1.5.7 and 11.5).
> 
> HTH,
> 
> Tyler
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%



More information about the R-help mailing list