[R] help with lda function from MASS package

David Winsemius dwinsemius at comcast.net
Tue Sep 29 19:34:33 CEST 2009


If you want to see how Venables and Ripley get their lda distances,  
then this is a quick path to the (uncommented) source:

1) > methods(lda)
[1] lda.data.frame* lda.default*    lda.formula*    lda.matrix*
2) since you call involved a formula I looked first at:
 > getAnywhere(lda.formula)
A single object matching ‘lda.formula’ was found
<snipped output because after setup the data was dispatched to  
lda.default
3) > getAnywhere(lda.default)
A single object matching ‘lda.default’ was found
<snipped output.
<scroll down to the section where they do their scaling and learn from  
the masters.

The source code in the original package may have advantages in that  
the comments would be visible. There was an R News, if memory serves,  
by Uwe Ligges, if memory serves, a few years ago on getting access to  
the source that is good reading.
-- 
David


2)On Sep 29, 2009, at 12:39 PM, Pete Shepard wrote:

> Thanks David,
>
> Yes, I am talking about the MASS package.Thank you for pointing out  
> that
> these scale the same. My question is, how do I get from the V1 data:
>
>       V1
> 1 164.4283
> 2 166.2492
> 3 170.5232
> 4 156.5622
> 5 127.7540
> 6 136.7704
> 7 136.3436
>
> to the other set of data:
>
>
> + 1 -2.3769280
> + 2 -2.7049437
> + 3 -3.4748309
> + 4 -0.9599825
> + 5  4.2293774
> + 6  2.6052193
> + 7  2.6820884
>
> On Mon, Sep 28, 2009 at 3:41 PM, David Winsemius <dwinsemius at comcast.net 
> >wrote:
>
>> Your results are the same (after scaling and sign reversal) out to  
>> the 4th
>> decimal place as those from lda (which by the way is almost  
>> certainly from
>> the MASS package and not from an impossible to find "lda package".)
>>
>>> read.table(textConnection(txt))
>>       V1
>> 1 164.4283
>> 2 166.2492
>> 3 170.5232
>> 4 156.5622
>> 5 127.7540
>> 6 136.7704
>> 7 136.3436
>>> est <-read.table(textConnection(txt))
>>> scale(est)
>>            V1
>> [1,]  0.7656185
>> [2,]  0.8712707
>> [3,]  1.1192567
>> [4,]  0.3092117
>> [5,] -1.3622976
>> [6,] -0.8391481
>> [7,] -0.8639119
>> attr(,"scaled:center")
>>    V1
>> 151.233
>> attr(,"scaled:scale")
>>     V1
>> 17.23484
>>
>>> LD1est <- read.table(textConnection(" LD1
>> + 1 -2.3769280
>> + 2 -2.7049437
>> + 3 -3.4748309
>> + 4 -0.9599825
>> + 5  4.2293774
>> + 6  2.6052193
>> + 7  2.6820884"), header=T)
>>
>>
>>> scale(LD1est)
>>        LD1
>> 1 -0.7656170
>> 2 -0.8712721
>> 3 -1.1192555
>> 4 -0.3092138
>> 5  1.3622976
>> 6  0.8391505
>> 7  0.8639103
>> attr(,"scaled:center")
>>         LD1
>> -3.172066e-17
>> attr(,"scaled:scale")
>>    LD1
>> 3.104591
>>
>>
>> On Sep 28, 2009, at 5:54 PM, Pete Shepard wrote:
>>
>> I am having a problem understanding the lda package. I have a dataset
>>> here:
>>>
>>>  [,1] [,2] [,3]
>>> [1,] 2.95 6.63    0
>>> [2,] 2.53 7.79    0
>>> [3,] 3.57 5.65    0
>>> [4,] 3.16 5.47    0
>>> [5,] 2.58 4.46    1
>>> [6,] 2.16 6.22    1
>>> [7,] 3.27 3.52    1
>>>
>>> If I do the following;
>>>
>>> "names(d)<-c("y","x1","x2")
>>> d$x1 = d$x1 * 100
>>> d$x2 = d$x2 * 100
>>> g<-lda( y ~ x1 + x2, data=d)
>>> v2 <- predict(g, d)",
>>>
>>> I get;
>>>      LD1
>>> 1 -2.3769280
>>> 2 -2.7049437
>>> 3 -3.4748309
>>> 4 -0.9599825
>>> 5  4.2293774
>>> 6  2.6052193
>>> 7  2.6820884
>>>
>>> However, If I do it manually,
>>>
>>> "rawdata<-matrix(scan("tab1_1.
>>>
>>>>
>>>> dat"),ncol=3,byrow=T)
>>>> group <- rawdata[,1]
>>>> X <- 100 * rawdata[,2:3]
>>>> Apf <- X[group==1,]
>>>> Af <- X[group==0,]
>>>> xbar1 <- apply(Af, 2, mean)
>>>> S1 <- var(Af)
>>>> N1 <- dim(Af)[1]
>>>> xbar2 <- apply(Apf, 2, mean)
>>>> S2 <- var(Apf)
>>>> N2 <- dim(Apf)[1]
>>>> S<-((N1-1)*S1+(N2-1)*S2)/(N1+N2-2)
>>>> Sinv=solve(S)
>>>> d<-xbar1-xbar2
>>>> b <- Sinv %*% d
>>>> v <- X %*% b",
>>>>
>>>> I get;
>>>>
>>>>      [,1]
>>>> [1,] 164.4283
>>>> [2,] 166.2492
>>>> [3,] 170.5232
>>>> [4,] 156.5622
>>>> [5,] 127.7540
>>>> [6,] 136.7704
>>>> [7,] 136.3436
>>>>
>>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>> I am having a problem understanding the lda package. I have a  
>>>> dataset
>>>> here:
>>>>
>>>>  [,1] [,2] [,3]
>>>> [1,] 2.95 6.63    0
>>>> [2,] 2.53 7.79    0
>>>> [3,] 3.57 5.65    0
>>>> [4,] 3.16 5.47    0
>>>> [5,] 2.58 4.46    1
>>>> [6,] 2.16 6.22    1
>>>> [7,] 3.27 3.52    1
>>>>
>>>> If I do the following;
>>>>
>>>> "names(d)<-c("y","x1","x2")
>>>> d$x1 = d$x1 * 100
>>>> d$x2 = d$x2 * 100
>>>> g<-lda( y ~ x1 + x2, data=d)
>>>> v2 <- predict(g, d)",
>>>>
>>>> I get;
>>>>      LD1
>>>> 1 -2.3769280
>>>> 2 -2.7049437
>>>> 3 -3.4748309
>>>> 4 -0.9599825
>>>> 5  4.2293774
>>>> 6  2.6052193
>>>> 7  2.6820884
>>>>
>>>> However, If I do it manually,
>>>>
>>>> "rawdata<-matrix(scan("tab1_1.dat"),ncol=3,byrow=T)
>>>> group <- rawdata[,1]
>>>> X <- 100 * rawdata[,2:3]
>>>> Apf <- X[group==1,]
>>>> Af <- X[group==0,]
>>>> xbar1 <- apply(Af, 2, mean)
>>>> S1 <- var(Af)
>>>> N1 <- dim(Af)[1]
>>>> xbar2 <- apply(Apf, 2, mean)
>>>> S2 <- var(Apf)
>>>> N2 <- dim(Apf)[1]
>>>> S<-((N1-1)*S1+(N2-1)*S2)/(N1+N2-2)
>>>> Sinv=solve(S)
>>>> d<-xbar1-xbar2
>>>> b <- Sinv %*% d
>>>> v <- X %*% b",
>>>>
>>>> I get;
>>>>
>>>>      [,1]
>>>> [1,] 164.4283
>>>> [2,] 166.2492
>>>> [3,] 170.5232
>>>> [4,] 156.5622
>>>> [5,] 127.7540
>>>> [6,] 136.7704
>>>> [7,] 136.3436
>>>>
>>>>
>>>> It seems there is an extra step that I am missing? The predict  
>>>> step that
>>>> adds a constant to the second set of values? Can anyone clear  
>>>> this up for
>>>> me?
>>>>
>>>
>>>
>>
>>
>> David Winsemius, MD
>> Heritage Laboratories
>> West Hartford, CT
>>
>>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT




More information about the R-help mailing list