[R] [correction] Animal Morphology: Deriving Classification Equation with

Sun May 24 23:20:09 CEST 2009

Ted,

I just ran everything using the log of all variables. Much better analysis
and it doesn't violate the assumptions.

I'm still in the dark concerning the classification equation- other than the
fact that it now will contain log functions.

Thank you for you help,

Chase

Ted.Harding-2 wrote:
> 
> [Apologies -- I made an error (see at [***] near the end)]
> 
> On 24-May-09 19:07:46, Ted Harding wrote:
>> [Your data and output listings removed. For comments, see at end]
>> 
>> On 24-May-09 13:01:26, cdm wrote:
>>> Fellow R Users:
>>> I'm not extremely familiar with lda or R programming, but a recent
>>> editorial review of a manuscript submission has prompted a crash
>>> course. I am on this forum hoping I could solicit some much needed
>>> advice for deriving a classification equation.
>>> 
>>> I have used three basic measurements in lda to predict two groups:
>>> male and female. I have a working model, low Wilk's lambda, graphs,
>>> coefficients, eigenvalues, etc. (see below). I adjusted the sample
>>> analysis for Fisher's or Anderson's Iris data provided in the MASS
>>> library for my own data.
>>> 
>>> My final and last step is simply form the classification equation.
>>> The classification equation is simply using standardized coefficients
>>> to classify each group- in this case male or female. A more thorough
>>> explanation is provided:
>>> 
>>> "For cases with an equal sample size for each group the classification
>>> function coefficient (Cj) is expressed by the following equation:
>>> 
>>> Cj = cj0+ cj1x1+ cj2x2+...+ cjpxp
>>> 
>>> where Cj is the score for the jth group, j = 1 â€¦ k, cjo is the
>>> constant for the jth group, and x = raw scores of each predictor.
>>> If W = within-group variance-covariance matrix, and M = column matrix
>>> of means for group j, then the constant   cjo= (-1/2)CjMj" (Julia
>>> Barfield, John Poulsen, and Aaron French 
>>> http://userwww.sfsu.edu/~efc/classes/biol710/discrim/discriminant.htm).
>>> 
>>> I am unable to navigate this last step based on the R output I have.
>>> I only have the linear discriminant coefficients for each predictor
>>> that would be needed to complete this equation.
>>> 
>>> Please, if anybody is familiar or able to to help please let me know.
>>> There is a spot in the acknowledgments for you.
>>> 
>>> All the best,
>>> Chase Mendenhall
>> 
>> The first thing I did was to plot your data. This indicates in the
>> first place that a perfect discrimination can be obtained on the
>> basis of your variables WRMA_WT and WRMA_ID alone (names abbreviated
>> to WG, WT, ID, SEX):
>> 
>>   d.csv("horsesLDA.csv")
>>   # names(D0) # "WRMA_WG"  "WRMA_WT"  "WRMA_ID"  "WRMA_SEX"
>>   WG<-D0$WRMA_WG; WT<-D0$WRMA_WT;
>>   ID<-D0$WRMA_ID; SEX<-D0$WRMA_SEX
>> 
>>   ix.M<-(SEX=="M"); ix.F<-(SEX=="F")
>> 
>>   ## Plot WT vs ID (M & F)
>>   plot(ID,WT,xlim=c(0,12),ylim=c(8,15))
>>   points(ID[ix.M],WT[ix.M],pch="+",col="blue")
>>   points(ID[ix.F],WT[ix.F],pch="+",col="red")
>>   lines(ID,15.5-1.0*(ID))
>> 
>> and that there is a lot of possible variation in the discriminating
>> line WT = 15.5-1.0*(ID)
>> 
>> Also, it is apparent that the covariance between WT and ID for Females
>> is different from the covariance between WT and ID for Males. Hence
>> the assumption (of common covariance matrix in the two groups) for
>> standard LDA (which you have been applying) does not hold.
>> 
>> Given that the sexes can be perfectly discriminated within the data
>> on the basis of the linear discriminator (WT + ID) (and others),
>> the variable WG is in effect a close approximation to noise.
>> 
>> However, to the extent that there was a common covariance matrix
>> to the two groups (in all three variables WG, WT, ID), and this
>> was well estimated from the data, then inclusion of the third
>> variable WG could yield a slightly improved discriminator in that
>> the probability of misclassification (a rare event for such data)
>> could be minimised. But it would not make much difference!
>> 
>> However, since that assumption does not hold, this analysis would
>> not be valid.
>> 
>> If you plot WT vs WG, a common covariance is more plausible; but
>> there is considerable overlap for these two variables:
>> 
>>   plot(WG,WT)
>>   points(WG[ix.M],WT[ix.M],pch="+",col="blue")
>>   points(WG[ix.F],WT[ix.F],pch="+",col="red")
>> 
>> If you plot WG vs ID, there is perhaps not much overlap, but a
>> considerable difference in covariance between the two groups:
>> 
>>   plot(ID,WG)
>>   points(ID[ix.M],WG[ix.M],pch="+",col="blue")
>>   points(ID[ix.F],WG[ix.F],pch="+",col="red")
>> 
>> This looks better on a log scale, however:
>> 
>>   lWG <- log(WG) ; lWT <- log(WT) ; lID <- log(ID)
>>## Plot log(WG) vs log(ID) (M & F)
>>   plot(lID,lWG)
>>   points(lID[ix.M],lWG[ix.M],pch="+",col="blue")
>>   points(lID[ix.F],lWG[ix.F],pch="+",col="red")
>> 
>> and common covaroance still looks good for WG vs WT:
>> 
>>   ## Plot log(WT) vs log(WG) (M & F)
>>   plot(lWG,lWT)
>>   points(lWG[ix.M],lWT[ix.M],pch="+",col="blue")
>>   points(lWG[ix.F],lWT[ix.F],pch="+",col="red")
>> 
>> but there is no improvement for WG vs IG:
>> 
>>   ## Plot log(WT) vs log(ID) (M & F)
>>   plot(ID,WT,xlim=c(0,12),ylim=c(8,15))
>>   points(ID[ix.M],WT[ix.M],pch="+",col="blue")
>>   points(ID[ix.F],WT[ix.F],pch="+",col="red")
> 
> [***]
> The above is incorrect! Apologies. I plotted the raw WT and ID
> instead of their logs. In fact, if you do plot the logs:
> 
>   ## Plot log(WT) vs log(ID) (M & F)
>   plot(lID,lWT)
>   points(lID[ix.M],lWT[ix.M],pch="+",col="blue")
>   points(lID[ix.F],lWT[ix.F],pch="+",col="red")
> 
> you now get what looks like much closer agreement between the
> covariance cov(lID,lWT) then before. Hence, I would now suggest
> that you do your limear discrimination on the logarithms of the
> variables (since you also get agreement for the other pairs on
> the log scale.
> 
> In fact:
> 
> [Raw]:
>   [Male]:
>   cov(cbind(WG,WT,ID)[ix.M,])
>   #            WG         WT          ID
>   # WG  2.2552465 0.11074710 -0.02202080
>   # WT  0.1107471 0.33853450  0.06601287
>   # ID -0.0220208 0.06601287  0.31979368
> 
>   [Female]:
>   cov(cbind(WG,WT,ID)[ix.F,])
>   #           WG        WT        ID
>   # WG  2.4716912 0.1577307   0.6670657
>   # WT  0.1577307 0.3183928   0.2973335
>   # I D 0.6670657 0.2973335   2.8326520
> 
> [log]:
>   [Male]:
>   cov(cbind(lWG,lWT,lID)[ix.M,])
>   #               lWG          lWT           lID
>   # lWG  0.0006584465 0.0001813315 -0.0002133576
>   # lWT  0.0001813315 0.0030368382  0.0030442356
>   # lID -0.0002133576 0.0030442356  0.0693965979
> 
>   [Female]:
>   cov(cbind(lWG,lWT,lID)[ix.F,])
>   #              lWG          lWT         lID
>   # lWG  0.0007244826 0.0002171885  0.001951343
>   # lWT  0.0002171885 0.0019640076  0.003305884
>   # lID  0.0019513428 0.0033058841  0.068406840
> 
> 
>> So there is no simple road to applying a routine LDA to your data.
>> 
>> To take account of different covariances between the two groups,
>> you would normally be looking at a quadratic discriminator. However,
>> as indicated above, the fact that a linear discriminator using
>> the variables ID & WT alone works so well would leave considerable
>> imprecision in conclusions to be drawn from its results.
>> 
>> Sorry this is not the straightforward answer you were hoping for
>> (which I confess I have not sought); it is simply a reaction to
>> what your data say.
>> 
>> Ted.
> 
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
> Fax-to-email: +44 (0)870 094 0861
> Date: 24-May-09                                       Time: 21:49:50
> ------------------------------ XFMail ------------------------------
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: http://www.nabble.com/Animal-Morphology%3A-Deriving-Classification-Equation-with-Linear-Discriminat-Analysis-%28lda%29-tp23693355p23698217.html
Sent from the R help mailing list archive at Nabble.com.