[R] help with lda function
David Winsemius
dwinsemius at comcast.net
Tue Sep 29 00:41:50 CEST 2009
Your results are the same (after scaling and sign reversal) out to the
4th decimal place as those from lda (which by the way is almost
certainly from the MASS package and not from an impossible to find
"lda package".)
> read.table(textConnection(txt))
V1
1 164.4283
2 166.2492
3 170.5232
4 156.5622
5 127.7540
6 136.7704
7 136.3436
> est <-read.table(textConnection(txt))
> scale(est)
V1
[1,] 0.7656185
[2,] 0.8712707
[3,] 1.1192567
[4,] 0.3092117
[5,] -1.3622976
[6,] -0.8391481
[7,] -0.8639119
attr(,"scaled:center")
V1
151.233
attr(,"scaled:scale")
V1
17.23484
> LD1est <- read.table(textConnection(" LD1
+ 1 -2.3769280
+ 2 -2.7049437
+ 3 -3.4748309
+ 4 -0.9599825
+ 5 4.2293774
+ 6 2.6052193
+ 7 2.6820884"), header=T)
> scale(LD1est)
LD1
1 -0.7656170
2 -0.8712721
3 -1.1192555
4 -0.3092138
5 1.3622976
6 0.8391505
7 0.8639103
attr(,"scaled:center")
LD1
-3.172066e-17
attr(,"scaled:scale")
LD1
3.104591
On Sep 28, 2009, at 5:54 PM, Pete Shepard wrote:
> I am having a problem understanding the lda package. I have a
> dataset here:
>
> [,1] [,2] [,3]
> [1,] 2.95 6.63 0
> [2,] 2.53 7.79 0
> [3,] 3.57 5.65 0
> [4,] 3.16 5.47 0
> [5,] 2.58 4.46 1
> [6,] 2.16 6.22 1
> [7,] 3.27 3.52 1
>
> If I do the following;
>
> "names(d)<-c("y","x1","x2")
> d$x1 = d$x1 * 100
> d$x2 = d$x2 * 100
> g<-lda( y ~ x1 + x2, data=d)
> v2 <- predict(g, d)",
>
> I get;
> LD1
> 1 -2.3769280
> 2 -2.7049437
> 3 -3.4748309
> 4 -0.9599825
> 5 4.2293774
> 6 2.6052193
> 7 2.6820884
>
> However, If I do it manually,
>
> "rawdata<-matrix(scan("tab1_1.
>>
>> dat"),ncol=3,byrow=T)
>> group <- rawdata[,1]
>> X <- 100 * rawdata[,2:3]
>> Apf <- X[group==1,]
>> Af <- X[group==0,]
>> xbar1 <- apply(Af, 2, mean)
>> S1 <- var(Af)
>> N1 <- dim(Af)[1]
>> xbar2 <- apply(Apf, 2, mean)
>> S2 <- var(Apf)
>> N2 <- dim(Apf)[1]
>> S<-((N1-1)*S1+(N2-1)*S2)/(N1+N2-2)
>> Sinv=solve(S)
>> d<-xbar1-xbar2
>> b <- Sinv %*% d
>> v <- X %*% b",
>>
>> I get;
>>
>> [,1]
>> [1,] 164.4283
>> [2,] 166.2492
>> [3,] 170.5232
>> [4,] 156.5622
>> [5,] 127.7540
>> [6,] 136.7704
>> [7,] 136.3436
>>
>
>
>
>
>
>
>>
>> I am having a problem understanding the lda package. I have a
>> dataset here:
>>
>> [,1] [,2] [,3]
>> [1,] 2.95 6.63 0
>> [2,] 2.53 7.79 0
>> [3,] 3.57 5.65 0
>> [4,] 3.16 5.47 0
>> [5,] 2.58 4.46 1
>> [6,] 2.16 6.22 1
>> [7,] 3.27 3.52 1
>>
>> If I do the following;
>>
>> "names(d)<-c("y","x1","x2")
>> d$x1 = d$x1 * 100
>> d$x2 = d$x2 * 100
>> g<-lda( y ~ x1 + x2, data=d)
>> v2 <- predict(g, d)",
>>
>> I get;
>> LD1
>> 1 -2.3769280
>> 2 -2.7049437
>> 3 -3.4748309
>> 4 -0.9599825
>> 5 4.2293774
>> 6 2.6052193
>> 7 2.6820884
>>
>> However, If I do it manually,
>>
>> "rawdata<-matrix(scan("tab1_1.dat"),ncol=3,byrow=T)
>> group <- rawdata[,1]
>> X <- 100 * rawdata[,2:3]
>> Apf <- X[group==1,]
>> Af <- X[group==0,]
>> xbar1 <- apply(Af, 2, mean)
>> S1 <- var(Af)
>> N1 <- dim(Af)[1]
>> xbar2 <- apply(Apf, 2, mean)
>> S2 <- var(Apf)
>> N2 <- dim(Apf)[1]
>> S<-((N1-1)*S1+(N2-1)*S2)/(N1+N2-2)
>> Sinv=solve(S)
>> d<-xbar1-xbar2
>> b <- Sinv %*% d
>> v <- X %*% b",
>>
>> I get;
>>
>> [,1]
>> [1,] 164.4283
>> [2,] 166.2492
>> [3,] 170.5232
>> [4,] 156.5622
>> [5,] 127.7540
>> [6,] 136.7704
>> [7,] 136.3436
>>
>>
>> It seems there is an extra step that I am missing? The predict step
>> that
>> adds a constant to the second set of values? Can anyone clear this
>> up for
>> me?
>
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
More information about the R-help
mailing list