[R] how to replace <NA> values

jwd jwd at surewest.net
Tue Jan 21 02:24:07 CET 2014


On Sun, 19 Jan 2014 11:39:43 -0800 (PST)
kingsly <ecokingsly at yahoo.co.in> wrote:

> Dear R community
>  
> I have a large data set contain some empty cells. Because of that,
> may be I am wrong, <NA> values are produced. Now I want replace both
> empty and <NA> values with zero. 
> Elder1 <- data.frame(
>   ID=c("ID1","ID2","ID3","ID6","ID8"),
>   age=c(38,35,"",NA,NA))
> Output I am expecting
>  
> ID   age
> ID1  38
> ID2  35
> ID3  0
> ID6  0
> ID8  0
>  
> In advance I thank your help.
> 
The age variable is being read in as a factor because of the 
"".  If you were to replace it with NA, the type becomes numerical:

Before replacement:

str(Elder1)
'data.frame':   5 obs. of  2 variables:
 $ ID : Factor w/ 5 levels "ID1","ID2","ID3",..: 1 2 3 4 5
 $ age: Factor w/ 3 levels "","35","38": 3 2 1 NA NA

Notice that the "" is treated as a factor level.

After:

str(Elder1)
'data.frame':   5 obs. of  2 variables:
 $ ID : Factor w/ 5 levels "ID1","ID2","ID3",..: 1 2 3 4 5
 $ age: num  38 35 NA NA NA

SO, the question, is what do you want to do with that column?  An "NA"
value tells you honestly that the information is missing.  Replacing it
with a zero can be misleading and can bias some basic parameter
estimates.

After you know how you want to treat the data in that field, you may
have a better idea of how to handle the missing data.

JWD




More information about the R-help mailing list