[R] NA in logical vector = data frame row numbers scrambled

Prof Brian Ripley ripley at stats.ox.ac.uk
Mon Apr 14 13:00:59 CEST 2003


On Mon, 14 Apr 2003, Petr Pikal wrote:

> Dear all.
> 
> RE how to estimate parameters of multimodal distribution
> Thank to prof.Ripley for pointing me to mclust package, although I am not sure I 
> can apply it to my problem.
> 
> I have another question. 
> 
> I need to change some of my values in data frame to NA.
> 
> I use something like  
> df[df$v1 < 5, 5:10] <- NA 
> 
> which is OK if there are no NA values in v1.
> 
> here are some foo attempts 
> > test
>    index cislo     time den hod min  zatizdp   plyndp  skalice
> 5      5     1 37693.79  13  19   0 106.6707 533.0288 5.932448
> 6      6     1 37693.80  13  19  15 106.2308 533.8799 6.008640
> 7      7     1 37693.81  13  19  30 106.3643 534.5321 5.960807
> 8      8     1 37693.82  13  19  45 106.9483 533.9640 5.962759
> 9      9     1 37693.83  13  20   0 106.9289 533.9978 5.939210
> 10    10     1 37693.84  13  20  15 107.1585 518.3881 5.980370
> 
> > test[test$min==0,7:9]<-NA
> 
> > test
>    index cislo     time den hod min  zatizdp   plyndp  skalice
> 5      5     1 37693.79  13  19   0       NA       NA       NA
> 6      6     1 37693.80  13  19  15 106.2308 533.8799 6.008640
> 7      7     1 37693.81  13  19  30 106.3643 534.5321 5.960807
> 8      8     1 37693.82  13  19  45 106.9483 533.9640 5.962759
> 9      9     1 37693.83  13  20   0       NA       NA       NA
> 10    10     1 37693.84  13  20  15 107.1585 518.3881 5.980370
> 
> but further on
> 
> > test[test$plyndp<520,7:9]<-NA
> Error in if (all(i >= 0) && (nn <- max(i)) > nrows) { : 
>         missing value where logical needed
> 
> the problem is in logical vector having NA
> 
> > test$plyndp<520
> [1]    NA FALSE FALSE FALSE    NA  TRUE
> 
> and subsequent scrambled row numbering

No, that's not `scrambled', and those are row names and not row numbers.  
You asked for a missing value in two rows, and that is what you got.
You don't know if those are rows 5 and 9 or not, so the name has correctly
been changed.  However, when doing replacement, we could probably assume 
that one true value should be replaced, but then it is unclear whether the
values corresponding to the NA indices on the RHS should be used or not.

> > test[test$plyndp<520,7:9]
>      zatizdp   plyndp skalice
> NA        NA       NA      NA
> NA1       NA       NA      NA
> X10 107.1585 518.3881 5.98037
> 
> Is there some more simple or direct way how to achieve this

test[!is.na(test$plyndp) & test$plyndp<520,7:9] <- NA

or (R >= 1.7.0)

is.na(test)[, 7:9] <- test$plyndp<520

(The last does not work in S-PLUS, btw, as it does skip the NA values.)

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list