[R] Replacing NAs in one variable with values of another variable

Nordlund, Dan (DSHS/RDA) NordlDJ at dshs.wa.gov
Tue Aug 23 20:25:47 CEST 2011


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Ista Zahn
> Sent: Tuesday, August 23, 2011 11:06 AM
> To: StellathePug
> Cc: r-help at r-project.org
> Subject: Re: [R] Replacing NAs in one variable with values of another
> variable
> 
> Hi,
> 
> On Tue, Aug 23, 2011 at 12:29 PM, StellathePug
> <ritacarreira at hotmail.com> wrote:
> > Hello everyone,
> > I am trying to figure out a way of replacing missing observations in
> one of
> > the variables of a data frame by values of another variable. For
> example,
> > assume my data is X
> >
> > X <-as.data.frame(matrix(c(9, 6, 1, 3, 9, "NA", "NA","NA","NA","NA",
> >                    6, 4, 3,"NA", "NA", "NA", 5, 4, 1, 3), ncol=2))
> > names(X)<-c("X1","X2")
> >
> > I want to change X1 so that instead of the missing values it uses the
> values
> > in X2 (regardless of whether these are missing).
> 
> Note that you don't have any missing values in X, as "NA" != NA
> 
> So my X1, should become
> > X$X1 <- c(9, 6, 1, 3, 9, "NA", 5, 4, 1, 3).
> >
> > I have searched online for a while and looked at the manuals and the
> best
> > (unsuccessful) attempt I have come up with is
> >
> > X$X1[X$X1=="NA"] <- X$X2
> >
> > and that produces the following X1
> >
> > X$X1<-c(9, 6, 1, 3, 9, 6, "NA", 3, "NA", "NA")
> >
> > and generates the following warning:
> >
> > Warning messages:
> > 1: In `[<-.factor`(`*tmp*`, X$X1 == "NA", value = c(5L, 3L, 2L, 6L,
>  :
> >  invalid factor level, NAs generated
> > 2: In x[...] <- m :
> >  number of items to replace is not a multiple of replacement length
> >
> > I think that my error is that it is ignoring the non-missing values
> of X1
> > and the dimensions don't match. But what I want my code to do is to
> look at
> > the rows of X1, see if it's a missing value; if it is, replace it
> with the
> > value that is in the row of X2; if it's not missing, leave it as is.
> 
> Here are two solutions, one that is a correction to your first
> attempt, and another using ifelse:
> 
> X$X1[X$X1=="NA"] <- X$X2[X$X1=="NA"]
> 
> X$X1 <- ifelse(X$X1 == "NA", X$X2, X$X1)
> 
> 
> Best,
> Ista
> 

Rita,

In addition Ista's advice, I have a question.  Did you really want your columns X1 and X2 to be factors?  Your use of "NA" to represent missing has caused the columns to become factors.  If you actually wanted a numeric matrix | data.frame then remove the quotes from around the NA.  The you need to use is.na() to test for missing.

X <-as.data.frame(matrix(c(9, 6, 1, 3, 9, NA, NA, NA, NA, NA,
                   6, 4, 3, NA, NA, NA, 5, 4, 1, 3), ncol=2))
names(X)<-c("X1","X2")

X$X1 <- ifelse(is.na(X$X1), X$X2, X$X1)



Hope this is helpful,

Dan

Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204




More information about the R-help mailing list