[R] subsetting and NAs

Gabor Grothendieck ggrothendieck at gmail.com
Mon Mar 20 20:29:58 CET 2006


On 3/20/06, P Ehlers <ehlers at math.ucalgary.ca> wrote:
>
>
> Eric Archer wrote:
> > R-help,
> >
> > I'm getting some unexpected behavior with subsetting a data frame
> > (aircraft flight data) that I can't sort out.
> > Here is a simplified version of my data frame and problem:
> >
> >  > flight
> >       FlightID TailNo FlightDate HobbsTime FlightCost       Date year
> > 1         4497  6009K       <NA>       2.2      330.0       <NA>   NA
> > 2         4498  6009K       <NA>       0.8      120.0       <NA>   NA
> > 3         4499  6009K       <NA>       0.9      135.0       <NA>   NA
> > 4         4500  6009K       <NA>       1.1      165.0       <NA>   NA
> > 5         4501  6009K       <NA>       1.5      225.0       <NA>   NA
> > 2587      7083  9206N   4/8/2009       1.5      103.5 2009-04-08 2009
> > 2588      7084  9206N  4/10/2009       1.3       89.7 2009-04-10 2009
> > 2589      7085  9206N  4/11/2009       1.9      131.1 2009-04-11 2009
> > 2590      7086  9206N  4/12/2009       1.3       89.7 2009-04-12 2009
> > 2591      7087  9206N  4/15/2009       1.1       75.9 2009-04-15 2009
> > 29793    35208  91630  1/21/2006       1.4      107.8 2006-01-21 2006
> > 29794    35209  91630  1/21/2006       0.7       53.9 2006-01-21 2006
> > 29795    35210  9725B  1/21/2006       1.4      138.6 2006-01-21 2006
> > 29796    35212  91630  1/28/2006       1.0       77.0 2006-01-28 2006
> > 29797    35213  91630  1/28/2006       1.6      123.2 2006-01-28 2006
> > 29798    35214  3386E   1/5/2006       1.1       86.9 2006-01-05 2006
> >
> > I then try to extract the error years :
> >
> >  > errors <- flight[flight$year > 2006,]
> >  > errors
> >      FlightID TailNo FlightDate HobbsTime FlightCost       Date year
> > NA         NA   <NA>       <NA>        NA         NA       <NA>   NA
> > NA.1       NA   <NA>       <NA>        NA         NA       <NA>   NA
> > NA.2       NA   <NA>       <NA>        NA         NA       <NA>   NA
> > NA.3       NA   <NA>       <NA>        NA         NA       <NA>   NA
> > NA.4       NA   <NA>       <NA>        NA         NA       <NA>   NA
> > 2587     7083  9206N   4/8/2009       1.5      103.5 2009-04-08 2009
> > 2588     7084  9206N  4/10/2009       1.3       89.7 2009-04-10 2009
> > 2589     7085  9206N  4/11/2009       1.9      131.1 2009-04-11 2009
> > 2590     7086  9206N  4/12/2009       1.3       89.7 2009-04-12 2009
> > 2591     7087  9206N  4/15/2009       1.1       75.9 2009-04-15 2009
> >
> > Would someone please explain to me why the new data frame has all
> > columns (and row names) replaced with NA where year was NA and how to
> > avoid this behavior?.
> > Thanks in advance.
> >
> > I am using R v2.2.1 on Windows XP.
> >
> > Cheers,
> > eric
>
>  [snip]
>
> flight$year > 2006 will return TRUE/FALSE, not row numbers. Try this:
>
> errors <- subset(flight, subset = year > 2006)
>

Another solution is:

flight[which(flight$year > 2006),]

Also note that the problem is not the TRUE and FALSE.  The problem is
that in addition to the TRUE and FALSE entries there are NA entries.

For example flight2 had no NA entries the original code works fine:

flight2 <- na.omit(flight)
flight2[flight2$year > 2006,] # ok




More information about the R-help mailing list