[R] Lda and Qda

pedrosmarques at portugalmail.pt pedrosmarques at portugalmail.pt
Fri Dec 28 00:14:26 CET 2007



Hi all,

I'm working with some data: 54 variables and a column of classes, each observation as one of a possible seven different classes:

> var.can3<-lda(x=dados[,c(1:28,30:54)],grouping=dados[,55],CV=TRUE)
Warning message:
In lda.default(x, grouping, ...) : variables are collinear
> summary(var.can3)
          Length Class  Mode   
class      30000 factor numeric   ### why?? I don't understand it
posterior 210000 -none- numeric
call           4 -none- call    ## what's this?


> var.can<-lda(dados[,c(1:28,30:54)],dados[,55])#porque a variavel 29 é constante
Warning message:
In lda.default(x, grouping, ...) : variables are collinear
> summary(var.can)
        Length Class  Mode     
prior     7    -none- numeric  
counts    7    -none- numeric  
means   371    -none- numeric  
scaling 318    -none- numeric  
lev       7    -none- character
svd       6    -none- numeric  
N         1    -none- numeric  
call      3    -none- call     
> (normalizar<-function(matriz){ n<-dim(matriz)[1]; m<-dim(matriz)[2]; normas<-sqrt(colSums(matriz*matriz)); matriz.normalizada<-matriz/t(matrix(rep(normas,n),m,n));return(matriz.normalizada)})
function(matriz){ n<-dim(matriz)[1]; m<-dim(matriz)[2]; normas<-sqrt(colSums(matriz*matriz)); matriz.normalizada<-matriz/t(matrix(rep(normas,n),m,n));return(matriz.normalizada)}
> var.canonicas<-as.matrix(dados[,c(1:28,30:54)])%*%(normalizar(var.can$scaling))
> summary(var.canonicas)
      LD1               LD2              LD3               LD4        
 Min.   :-21.942   Min.   :-6.820   Min.   :-10.138   Min.   :-6.584  
 1st Qu.:-20.014   1st Qu.:-5.480   1st Qu.: -8.280   1st Qu.: 0.872  
 Median :-19.495   Median :-5.007   Median : -7.800   Median : 1.083  
 Mean   :-18.827   Mean   :-4.760   Mean   : -7.803   Mean   : 1.134  
 3rd Qu.:-18.975   3rd Qu.:-4.456   3rd Qu.: -7.278   3rd Qu.: 1.311  
 Max.   : -7.886   Max.   : 3.116   Max.   : -1.619   Max.   : 5.556  
      LD5               LD6         
 Min.   :-11.083   Min.   :-4.4972  
 1st Qu.: -1.237   1st Qu.:-1.6497  
 Median : -1.100   Median :-1.0909  
 Mean   : -1.100   Mean   :-0.9808  
 3rd Qu.: -0.957   3rd Qu.:-0.4598  
 Max.   :  4.712   Max.   : 7.5356  
> 


I don't know wether I need to specify a training set and a testing set, I also don't know the error nor the classifier; shouldn't the lenght of class of var.can3 be 7 since  I only have 7 different classes?

Best regards,

Pedro Marques



More information about the R-help mailing list