[R] Replace NAs in one column with data from another column
David Winsemius
dwinsemius at comcast.net
Wed Sep 8 21:02:05 CEST 2010
On Sep 8, 2010, at 2:24 PM, Joshua Wiley wrote:
> Hi Jakob,
>
> You can use is.na() to create an index of which rows in column 3 are
> missing data, and then select these from column 1. Here is a simple
> example:
>
> dat <- data.frame(V1 = 1:5, V3 = c(1, NA, 3, 4, NA))
> dat$new <- dat$V3
> my.na <- is.na(dat$V3)
> dat$new[my.na] <- dat$V1[my.na]
>
> dat
>
> This should be quite fast. I broke the steps up to be explicit, but
> you can readily simplify them.
I was about to post something similar except I was going to avoid the
"$" operator thinking, incorrectly as it turned out, that it would be
faster. I also include the Holtman/Rizopoulos suggestion of ifelse().
I was also surprised that ifelse is the winning strategy:
dat[4] <- dat[3]; idx <-is.na(dat[, 3])
dat[is.na(dat[, 3]), 4] <- dat[is.na(dat[, 3]), 1]
> benchmark(meth.ifelse = {dat$z.new <- ifelse(is.na(dat$V3), dat$V1,
dat$V3)},
+ meth.dlr.sign={dat$new <- dat$V3
+ my.na <- is.na(dat$V3)
+ dat$new[my.na] <- dat$V1[my.na]},
+ meth.index ={dat[4] <- dat[3]; idx <-is.na(dat[, 3])
+ dat[idx, 4] <- dat[idx, 1]},
+ meth.forloop ={for (i in 1:nrow(dat)){
+ if (is.na(dat[i,3])==TRUE){
+ dat[i,4]<- dat[i,1]}
+ else{
+ dat[i,4]<- dat[i,3]} }
+ },
+ replications=5000, columns = c("test", "replications", "elapsed",
+ "relative", "user.self") )
test replications elapsed relative user.self
2 meth.dlr.sign 5000 0.502 1.081897 0.501
4 meth.forloop 5000 6.419 13.834052 6.409
1 meth.ifelse 5000 0.464 1.000000 0.463
3 meth.index 5000 2.908 6.267241 2.904
--
David.
>
> HTH,
>
> Josh
>
> On Wed, Sep 8, 2010 at 11:17 AM, Jakob Hedegaard
> <Jakob.Hedegaard at agrsci.dk> wrote:
>> Hi list,
>>
>> I have a data frame (m) with 169221 rows and 10 columns and would
>> like to make a new column containing the content of column 3 but
>> replace the NAs in column 3 with the data in column 1 (from the
>> same row as the NA in column 3). Column 1 has data in all rows.
>>
>> My first attempt was:
>>
>> for (i in 1:169221){
>> if (is.na(m[i,3])==TRUE){
>> m[i,11] <- as.character(m[i,1])}
>> else{
>> m[i,11] <- as.character(m[i,3])}
>> }
>>
>> Works - but takes too long time.
>> I would appreciate alternative solutions.
>>
>> Best regards, Jakob
>
--
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list