[R] subscripting in data frames with NA

Peter Dalgaard P.Dalgaard at biostat.ku.dk
Tue Jun 24 13:11:37 CEST 2008


Agustin Lobo wrote:
> Dear list:
>
> Given
> > str(b3)
> 'data.frame':    159 obs. of  6 variables:
>  $ index_pollution : num   8.228 10.513  0.549  0.915 10.416 ...
>  $ position_descrip: chr  "2" "2" "2" NA ...
>  $ position_geo    : chr  "3" "0" "3" "3" ...
>  $ institution     : Factor w/ 3 levels "digesa","mem",..: 3 3 3 3 3 3
> 3 3 3 3 ...
>  $ p_desc_no3      : chr  "2" "2" "2" NA ...
>  $ p_geo_no3       : chr  "3" "0" "3" "3" ...
>
> I try to subscript but get:
>
> > b3[b3[,3]=="3",5] <-NA
> Error in `[<-.data.frame`(`*tmp*`, b3[, 3] == "3", 5, value = NA) :
>   missing values are not allowed in subscripted assignments of data
> frames
Notice that it is not the NA on the right that is the problem, but those
in the subscript, so try

b3[b3[,3]=="3" |  is.na(b3[,3]), 5] <- NA

(or ... &!is.na... if that is what you want)
> Why? What's  the correct way of doing this operation?
I forget the exact reason, but as far as I remember, we allowed it at
some point, but found that behaviour was inconsistent between differnt
modes of subassignment.

> Actually, I previously tried with:
> > str(b2)
> 'data.frame':    159 obs. of  6 variables:
>  $ index_pollution : num   8.228 10.513  0.549  0.915 10.416 ...
>  $ position_descrip: Factor w/ 3 levels "0","1","2": 3 3 3 NA NA NA 3
> 3 3 3 ...
>  $ position_geo    : Factor w/ 4 levels "0","1","2","3": 4 1 4 4 3 NA
> 3 3 3 4 ...
>  $ institution     : Factor w/ 3 levels "digesa","mem",..: 3 3 3 3 3 3
> 3 3 3 3 ...
>  $ p_desc_no3      : Factor w/ 3 levels "0","1","2": 3 3 3 NA NA NA 3
> 3 3 3 ...
>  $ p_geo_no3       : Factor w/ 4 levels "0","1","2","3": 4 1 4 4 3 NA
> 3 3 3 4 ...
>
> > table(b2$p_desc_no3)
>
>  0  1  2
> 42 44 66
>
> and
>
> > levels(b2$p_desc_no3)[levels(b2$position_geo)=="3"] <- NA
>
> which does not result into error but leaves b2$p_desc_no3 unchanged:
>
I don't think this makes sense at all. It changes the 4th level of a
three-level factor???
> > table(b2$p_desc_no3)
>
>  0  1  2
> 42 44 66
>
>
> what am i doing wrong?
>
> Thanks
>
> Agus
>
>
>


-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907



More information about the R-help mailing list