[R] subsetting and NAs
Gabor Grothendieck
ggrothendieck at gmail.com
Mon Mar 20 20:29:58 CET 2006
On 3/20/06, P Ehlers <ehlers at math.ucalgary.ca> wrote:
>
>
> Eric Archer wrote:
> > R-help,
> >
> > I'm getting some unexpected behavior with subsetting a data frame
> > (aircraft flight data) that I can't sort out.
> > Here is a simplified version of my data frame and problem:
> >
> > > flight
> > FlightID TailNo FlightDate HobbsTime FlightCost Date year
> > 1 4497 6009K <NA> 2.2 330.0 <NA> NA
> > 2 4498 6009K <NA> 0.8 120.0 <NA> NA
> > 3 4499 6009K <NA> 0.9 135.0 <NA> NA
> > 4 4500 6009K <NA> 1.1 165.0 <NA> NA
> > 5 4501 6009K <NA> 1.5 225.0 <NA> NA
> > 2587 7083 9206N 4/8/2009 1.5 103.5 2009-04-08 2009
> > 2588 7084 9206N 4/10/2009 1.3 89.7 2009-04-10 2009
> > 2589 7085 9206N 4/11/2009 1.9 131.1 2009-04-11 2009
> > 2590 7086 9206N 4/12/2009 1.3 89.7 2009-04-12 2009
> > 2591 7087 9206N 4/15/2009 1.1 75.9 2009-04-15 2009
> > 29793 35208 91630 1/21/2006 1.4 107.8 2006-01-21 2006
> > 29794 35209 91630 1/21/2006 0.7 53.9 2006-01-21 2006
> > 29795 35210 9725B 1/21/2006 1.4 138.6 2006-01-21 2006
> > 29796 35212 91630 1/28/2006 1.0 77.0 2006-01-28 2006
> > 29797 35213 91630 1/28/2006 1.6 123.2 2006-01-28 2006
> > 29798 35214 3386E 1/5/2006 1.1 86.9 2006-01-05 2006
> >
> > I then try to extract the error years :
> >
> > > errors <- flight[flight$year > 2006,]
> > > errors
> > FlightID TailNo FlightDate HobbsTime FlightCost Date year
> > NA NA <NA> <NA> NA NA <NA> NA
> > NA.1 NA <NA> <NA> NA NA <NA> NA
> > NA.2 NA <NA> <NA> NA NA <NA> NA
> > NA.3 NA <NA> <NA> NA NA <NA> NA
> > NA.4 NA <NA> <NA> NA NA <NA> NA
> > 2587 7083 9206N 4/8/2009 1.5 103.5 2009-04-08 2009
> > 2588 7084 9206N 4/10/2009 1.3 89.7 2009-04-10 2009
> > 2589 7085 9206N 4/11/2009 1.9 131.1 2009-04-11 2009
> > 2590 7086 9206N 4/12/2009 1.3 89.7 2009-04-12 2009
> > 2591 7087 9206N 4/15/2009 1.1 75.9 2009-04-15 2009
> >
> > Would someone please explain to me why the new data frame has all
> > columns (and row names) replaced with NA where year was NA and how to
> > avoid this behavior?.
> > Thanks in advance.
> >
> > I am using R v2.2.1 on Windows XP.
> >
> > Cheers,
> > eric
>
> [snip]
>
> flight$year > 2006 will return TRUE/FALSE, not row numbers. Try this:
>
> errors <- subset(flight, subset = year > 2006)
>
Another solution is:
flight[which(flight$year > 2006),]
Also note that the problem is not the TRUE and FALSE. The problem is
that in addition to the TRUE and FALSE entries there are NA entries.
For example flight2 had no NA entries the original code works fine:
flight2 <- na.omit(flight)
flight2[flight2$year > 2006,] # ok
More information about the R-help
mailing list