[R] Replace NAs in dataframe: what am I doing wrong
jim holtman
jholtman at gmail.com
Sun Aug 12 00:44:18 CEST 2007
The problem is that the first column is probably a factor and you are
trying to assign a value that is not already a 'level' in the factor.
One way is to read the data with as.is=TRUE to keep it as character,
replace the NAs and then convert back to factors if you want to:
> x <- read.csv(textConnection("A,B
+ a,3
+ b,4
+ .,.
+ c,5"), na.strings='.', as.is=TRUE) # keep as character
> # replace NAs
> x[is.na(x[,1]), 1] <- "Missing Value"
> # convert back to factors if you want to
> x[[1]] <- factor(x[[1]])
> str(x)
'data.frame': 4 obs. of 2 variables:
$ A: Factor w/ 4 levels "a","b","c","Missing Value": 1 2 4 3
$ B: int 3 4 NA 5
>
>
On 8/11/07, Sébastien <pomchip at free.fr> wrote:
> Dear R-users,
>
> My script imports a dataset from a csv file, in which missing values are
> represented by ".". This importation is done into a dataframe using the
> read.table function with na.strings = "." Then I want to replace the
> NAs in the first column of the dataframe by "Missing data". I am using
> the following code to do so :
>
> mydata<-data.frame(read.table(myFile,sep=",",header=TRUE,na.strings="."))
> # myFile is the full path of the source file
>
> mydata[,1][is.na(mydata[,1])]<-"Missing value"
>
> This code works perfectly fine if this first column contains only
> missing values, i.e. ".". As soon as it contains multiple levels and
> missing values, things start to get wrong. I get the following error
> message and the replacement is not done.
>
> Warning message:
> invalid factor level, NAs generated in: `[<-.factor`(`*tmp*`,
> is.na(mydata[, 1]), value = "Missing value")
>
> Is there an error in my code or is that a bug (I doubt about it) ?
>
> Thanks in advance.
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem you are trying to solve?
More information about the R-help
mailing list