[R] conflicting results on NA in a qda predicted object:

Thu Dec 20 12:25:32 CET 2001

Using unclass I'm still very confused:

> unique(mod23S.qda.pred$class) 
 [1] 12 17 8  10 4  9  5  13 14 19 20 15 6  3  7  1  23 11 18 21 16 2  22  NA
Levels:  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 

> unique(unclass(mod23S.qda.pred$class))
 [1]  12  17   8  10   4   9   5  13  14  19  20  15   6   3   7   1  23 11  18
[20]  21  16   2  22 262

I think that the NA is related to the 262, as there should be
only 23 classes.

The data used in predict.qda seem correct (only cols X2 to X5
are used, col X1 are the individual labels):

> summary(liss.seg.medias)
       X1              X2               X3               X4        
 Min.   :    1   Min.   : 57.80   Min.   : 17.00   Min.   : 34.94  
 1st Qu.: 6594   1st Qu.: 78.50   1st Qu.: 26.50   1st Qu.: 83.50  
 Median :13188   Median : 89.72   Median : 33.40   Median : 91.43  
 Mean   :13188   Mean   : 95.18   Mean   : 37.01   Mean   : 92.47  
 3rd Qu.:19782   3rd Qu.:106.47   3rd Qu.: 44.50   3rd Qu.:100.50  
 Max.   :26375   Max.   :245.29   Max.   :125.25   Max.   :156.82  
       X5       
 Min.   : 65.0  
 1st Qu.:108.4  
 Median :128.4  
 Mean   :134.2  
 3rd Qu.:155.7  
 Max.   :254.3 

Also, the qda object semms correct:

> str(mod23.qda)
List of 8
 $ prior  : Named num [1:23] 0.0842 0.0485 0.0357 0.0332 0.0357 ...
  ..- attr(*, "names")= chr [1:23] "1" "2" "3" "4" ...
 $ counts : Named int [1:23] 33 19 14 13 14 41 33 8 11 14 ...
  ..- attr(*, "names")= chr [1:23] "1" "2" "3" "4" ...
 $ means  : num [1:23, 1:4] 71.4 68.9 72.9 81.5 92.6 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:23] "1" "2" "3" "4" ...
  .. ..$ : chr [1:4] "lissb2" "lissb3" "lissb4" "lissb5"
 $ scaling: num [1:4, 1:4, 1:23] 0.463 0.000 0.000 0.000 1.149 ...
  ..- attr(*, "dimnames")=List of 3
  .. ..$ : chr [1:4] "lissb2" "lissb3" "lissb4" "lissb5"
  .. ..$ : chr [1:4] "1" "2" "3" "4"
  .. ..$ : chr [1:23] "1" "2" "3" "4" ...
 $ ldet   : num [1:23]  4.38  4.28  7.03  5.77 10.48 ...
 $ lev    : chr [1:23] "1" "2" "3" "4" ...
 $ N      : int 392
 $ call   : language qda.matrix(x = mod23[, 2:5], grouping = mod23[, 6])
 - attr(*, "class")= chr "qda"

Finally, I can detect the individual, but don't think
it's a rare one:

> b <- unclass(mod23S.qda.pred$class)
> b[b==262]
[1] 262
> liss.seg.medias[b==262,1]
[1] 11385
> liss.seg.medias[liss.seg.medias[,1]==11385,]
[1] 11385.0000    70.7619    22.8095    78.0476    90.6667

11385 is actually similar to its
neighbors:

> liss.seg.medias[liss.seg.medias[,1]==11384,]
[1] 11384.0000    74.8462    24.8462    89.3077    97.0000
> liss.seg.medias[liss.seg.medias[,1]==11386,]
[1] 11386.0000    71.2857    22.4286    88.8571    95.9286

Why does predict.qda assign a non-existent class (262 or NA)
to individual 11385 ?

Thanks for the help and sorry for the length
of the message.

Agus

On Thu, 20 Dec 2001, Prof Brian Ripley wrote:

> This is a factor.  You have to be careful with NAs in factors (and 1.4.0
> is different there as it happens).
> 
> Nevertheless, there is no way to reproduce this from what you have given.
> Check that the class really is "factor", and then unclass it to see what
> the codes actually are.  One or more of them should be NA from what you
> have given.
> 
> 
> On Thu, 20 Dec 2001, Agustin Lobo wrote:
> 
> >
> > Dear list,
> >
> > (I've not upgraded to R1.4 yet)
> >
> > I have the following $class component in a predict.qda object:
> > > unique(mod23S.qda.pred$class)
> >  [1] 12 17 8  10 4  9  5  13 14 19 20 15 6  3  7  1  23 11 18 21 16 2  22 NA
> > Levels:  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
> >
> > Nevertheless, when I try to identify the individual(s) with NA, I get:
> > > any(is.na(mod23S.qda.pred$class))
> > [1] FALSE
> >
> > and
> >
> > > mod23S.qda.pred$class[is.na(mod23S.qda.pred$class)]
> > factor(0)
> > Levels:  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
> >
> > So, actually, is there a NA value in mod23S.qda.pred$class or not?
> >
> > (screening by eye it`s impossible:
> > length(mod23S.qda.pred$class) is 26375 )
> >
> > Agus
> >
> > Dr. Agustin Lobo
> > Instituto de Ciencias de la Tierra (CSIC)
> > Lluis Sole Sabaris s/n
> > 08028 Barcelona SPAIN
> > tel 34 93409 5410
> > fax 34 93411 0012
> > alobo at ija.csic.es
> >
> >
> >
> > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> > r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> > Send "info", "help", or "[un]subscribe"
> > (in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
> > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
> >
> 
> -- 
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272860 (secr)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
> 
> 

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._