[R] removing NA from a data frame
Prof Brian Ripley
ripley at stats.ox.ac.uk
Fri Jun 22 12:11:13 CEST 2012
On 22/06/2012 09:41, Stuart Leask wrote:
> Removing rows with NAs, using na.omit(), doesn't seem to be working for me.
It won't if NA is a level of the factor, which is what you seems to have
here. For
> table(as.factor(c(1,2,NA)))
1 2
1 1
omits NAs by default.
> Dataset:
>
>> str ( ex10s )
>
> 'data.frame': 2189576 obs. of 5 variables:
> $ LOPNR : int 58 58 58 58 64 64 64 64 64 64 ...
> $ DIAGNOS: Factor w/ 173 levels "F20","F200","F2000",..: 128 128 128 128 105 105 105 160 105 105 ...
> $ X_DATE : int 20060821 20061207 20080102 20090904 20010327 20010925 20020307 20021007 20021007 20030320 ...
> $ SOURCE : int 2 2 2 2 2 2 2 2 2 1 ...
> $ dg : Factor w/ 7 levels "0","1","2","3",..: 6 6 6 6 5 5 5 6 5 5 ...
>
> The only NAs are in the factor dg (put in by 'recode' from the car library; I'm trying to eliminate cases with particular factor levels)
>
>> table ( ex10s$dg )
>
> 0 1 2 3 4 5 NA
> 2851 271501 63112 98425 335593 1257299 160795
>
> So, I remove the rows with NAs, to a new dataframe ex10ss:
>
>> ex10ss<-na.omit(ex10s)
>
> Check all the NAs have been removed:
>
>> table(ex10ss$dg)
>
> 0 1 2 3 4 5 NA
> 2851 271501 63112 98425 335593 1257299 160795
>
>> dim(ex10s)
> [1] 2189576 5
>> dim(ex10ss)
> [1] 2189576 5
>
> Nothing seems to have changed. I want all the rows with NA in removed.
>
> I am clearly doing something wrong.
>
> The only alternative I could find is pretty similar:
> use <- complete.cases ( ex10 )
> ex10ss<-ex10s[use,]
> which leads to the same result.
>
>
> Stuart
>
>
> Dr Stuart John Leask DM FRCPsych MB Mchir
> Clinical Senior Lecturer and Honorary Consultant Pychiatrist
> Institute of Mental Health, Innovation Park
> Triumph Road, Nottingham, Notts. NG7 2TU. UK
> Tel. +44 115 82 30419 stuart.leask at nottingham.ac.uk<mailto:stuart.leask at nottingham.ac.uk>
> Google 'Dr Stuart Leask'
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list