[R] how to replace <NA> values
jwd
jwd at surewest.net
Tue Jan 21 02:24:07 CET 2014
On Sun, 19 Jan 2014 11:39:43 -0800 (PST)
kingsly <ecokingsly at yahoo.co.in> wrote:
> Dear R community
>
> I have a large data set contain some empty cells. Because of that,
> may be I am wrong, <NA> values are produced. Now I want replace both
> empty and <NA> values with zero.
> Elder1 <- data.frame(
> ID=c("ID1","ID2","ID3","ID6","ID8"),
> age=c(38,35,"",NA,NA))
> Output I am expecting
>
> ID age
> ID1 38
> ID2 35
> ID3 0
> ID6 0
> ID8 0
>
> In advance I thank your help.
>
The age variable is being read in as a factor because of the
"". If you were to replace it with NA, the type becomes numerical:
Before replacement:
str(Elder1)
'data.frame': 5 obs. of 2 variables:
$ ID : Factor w/ 5 levels "ID1","ID2","ID3",..: 1 2 3 4 5
$ age: Factor w/ 3 levels "","35","38": 3 2 1 NA NA
Notice that the "" is treated as a factor level.
After:
str(Elder1)
'data.frame': 5 obs. of 2 variables:
$ ID : Factor w/ 5 levels "ID1","ID2","ID3",..: 1 2 3 4 5
$ age: num 38 35 NA NA NA
SO, the question, is what do you want to do with that column? An "NA"
value tells you honestly that the information is missing. Replacing it
with a zero can be misleading and can bias some basic parameter
estimates.
After you know how you want to treat the data in that field, you may
have a better idea of how to handle the missing data.
JWD
More information about the R-help
mailing list