[R] Replace missing value within group with non-missing value

David Winsemius dwinsemius at comcast.net
Sat Apr 6 18:46:00 CEST 2013


On Apr 6, 2013, at 9:26 AM, David Winsemius wrote:

> 
> On Apr 6, 2013, at 9:16 AM, Leask, Graham wrote:
> 
>> Hi Rui,
>> 
>> Data as follows
>> 
>> structure(list(dn = c(4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 
>> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 
>> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4), obs = c(1, 1, 
>> 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 
>> 4, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 8, 8, 
>> 8, 8, 8, 8, 9, 9), choice = c(0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 
>> 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 
>> 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0), br = c(1, 
>> 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 
>> 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 
>> 2, 3, 4, 5, 6, 1, 2), mth = c(NA, NA, NA, NA, NA, 487, NA, NA, 
>> 488, NA, NA, NA, NA, NA, NA, NA, NA, 488, NA, NA, 489, NA, NA, 
>> NA, NA, NA, NA, NA, NA, 489, NA, NA, NA, NA, NA, 489, NA, NA, 
>> NA, NA, NA, 490, NA, NA, NA, NA, NA, 491, NA, NA)), .Names = c("dn", 
>> "obs", "choice", "br", "mth"), row.names = c("1", "2", "3", "4", 
>> "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", 
>> "16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", 
>> "27", "28", "29", "30", "31", "32", "33", "34", "35", "36", "37", 
>> "38", "39", "40", "41", "42", "43", "44", "45", "46", "47", "48", 
>> "49", "50"), class = "data.frame")
>> 
> 
> Looks like a job for na.locf in the zoo package:
> 
> require(zoo)
> # will fail if first value is NA so either this ...
> dat$mth[-(1:5)] <- na.locf(dat$mth[-(1:5)])
> # ... or this:
> dat$mth <- na.locf(dat$mth, na.rm=FALSE)

If on the other hand you wnated cases to be be handled within individual values of "obs" then you could do this to the categories of obs where there was a value to replace (omitting the last two rows):

dat$mth[-(49:50)] <- ave(dat$mth[-(49:50)] , dat$obs,
                           FUN=function(m) {m[is.na(m)] <- m[!is.na(m)]; m } )

If there were more than one choice in a category you might need to pick the first or the last.

> 
> -- 
> David.
> 
>> Best wishes
>> 
>> 
>> Graham
>> 
>> -----Original Message-----
>> From: Rui Barradas [mailto:ruipbarradas at sapo.pt] 
>> Sent: 06 April 2013 16:32
>> To: Leask, Graham
>> Cc: r-help at r-project.org
>> Subject: Re: [R] Replace missing value within group with non-missing value
>> 
>> Hello,
>> 
>> Can't you post a data example? If your dataset is named 'dat' use
>> 
>> dput(head(dat, 50))  # paste the output of this in a post
>> 
>> 
>> Rui Barradas
>> 
>> Em 06-04-2013 15:34, Leask, Graham escreveu:
>>> Hi Rui,
>>> 
>>> Thank you for your suggestion which is very much appreciated. Unfortunately running this code produces the following error.
>>> 
>>> error in '$<-.data.frame' ('*tmp*', "mth", value = NA_real_) :
>>>    replacement has 1 rows, data has 0
>>> 
>>> I'm sure there must be an elegant solution to this problem?
>>> 
>>> Best wishes
>>> 
>>> 
>>> 
>>> Graham
>>> 
>>> On 6 Apr 2013, at 12:15, "Rui Barradas" <ruipbarradas at sapo.pt> wrote:
>>> 
>>>> Hello,
>>>> 
>>>> That's not a very good way of posting your data, preferably paste the output of ?dput in a post.
>>>> Some thing along the lines of the following might do what you want. 
>>>> It seems that the groups are established by 'dn' and 'obs' numbers. 
>>>> If so, try
>>>> 
>>>> 
>>>> # Make up some data
>>>> dat <- data.frame(dn = 4, obs = rep(1:5, each = 6), mth = NA) 
>>>> dat$mth[6] <- 487 dat$mth[9] <- 488 dat$mth[18] <- 488 dat$mth[21] <- 
>>>> 489 dat$mth[30] <- 489
>>>> 
>>>> 
>>>> sp <- split(dat, list(dat$dn, dat$obs))
>>>> names(sp) <- NULL
>>>> tmp <- lapply(sp, function(x){
>>>>       idx <- which(!is.na(x$mth))[1]
>>>>       x$mth <- x$mth[idx]
>>>>       x
>>>>   })
>>>> do.call(rbind, tmp)
>>>> 
>>>> 
>>>> Hope this helps,
>>>> 
>>>> Rui Barradas
>>>> 
>>>> 
>>>> Em 06-04-2013 11:33, Leask, Graham escreveu:
>>>>> Dear List members
>>>>> 
>>>>> I have a large dataset organised in choice groups see sample below
>>>>> 
>>>>>     +-------------------------------------------------------------------------------------------------+
>>>>>     | dn   obs   choice      acid   br                 date       cdate   situat~n   mth   year   set |
>>>>>     |-------------------------------------------------------------------------------------------------|
>>>>>  1. |  4     1        0     LOSEC    1                    .           .                .      .     1 |
>>>>>  2. |  4     1        0    NEXIUM    2                    .           .                .      .     1 |
>>>>>  3. |  4     1        0    PARIET    3                    .           .                .      .     1 |
>>>>>  4. |  4     1        0   PROTIUM    4                    .           .                .      .     1 |
>>>>>  5. |  4     1        0    ZANTAC    5                    .           .                .      .     1 |
>>>>>     |-------------------------------------------------------------------------------------------------|
>>>>>  6. |  4     1        1     ZOTON    6   23aug2000 01:00:00   23aug2000         NS   487   2000     1 |
>>>>>  7. |  4     2        0     LOSEC    1                    .           .                .      .     2 |
>>>>>  8. |  4     2        0    NEXIUM    2                    .           .                .      .     2 |
>>>>>  9. |  4     2        1    PARIET    3   25sep2000 01:00:00   25sep2000          L   488   2000     2 |
>>>>> 10. |  4     2        0   PROTIUM    4                    .           .                .      .     2 |
>>>>>     |-------------------------------------------------------------------------------------------------|
>>>>> 11. |  4     2        0    ZANTAC    5                    .           .                .      .     2 |
>>>>> 12. |  4     2        0     ZOTON    6                    .           .                .      .     2 |
>>>>> 13. |  4     3        0     LOSEC    1                    .           .                .      .     3 |
>>>>> 14. |  4     3        0    NEXIUM    2                    .           .                .      .     3 |
>>>>> 15. |  4     3        0    PARIET    3                    .           .                .      .     3 |
>>>>>     |-------------------------------------------------------------------------------------------------|
>>>>> 16. |  4     3        0   PROTIUM    4                    .           .                .      .     3 |
>>>>> 17. |  4     3        0    ZANTAC    5                    .           .                .      .     3 |
>>>>> 18. |  4     3        1     ZOTON    6   20sep2000 00:00:00   20sep2000          R   488   2000     3 |
>>>>> 19. |  4     4        0     LOSEC    1                    .           .                .      .     4 |
>>>>> 20. |  4     4        0    NEXIUM    2                    .           .                .      .     4 |
>>>>>     |-------------------------------------------------------------------------------------------------|
>>>>> 21. |  4     4        1    PARIET    3   27oct2000 00:00:00   27oct2000         NL   489   2000     4 |
>>>>> 22. |  4     4        0   PROTIUM    4                    .           .                .      .     4 |
>>>>> 23. |  4     4        0    ZANTAC    5                    .           .                .      .     4 |
>>>>> 24. |  4     4        0     ZOTON    6                    .           .                .      .     4 |
>>>>> 25. |  4     5        0     LOSEC    1                    .           .                .      .     5 |
>>>>>     |-------------------------------------------------------------------------------------------------|
>>>>> 26. |  4     5        0    NEXIUM    2                    .           .                .      .     5 |
>>>>> 27. |  4     5        0    PARIET    3                    .           .                .      .     5 |
>>>>> 28. |  4     5        0   PROTIUM    4                    .           .                .      .     5 |
>>>>> 29. |  4     5        0    ZANTAC    5                    .           .                .      .     5 |
>>>>> 30. |  4     5        1     ZOTON    6   23oct2000 03:00:00   23oct2000         NS   489   2000     5 |
>>>>> 
>>>>> I wish to fill in the missing values in each choice set - delineated by dn (Doctor) obs (Observation number) and choices (1 to 6).
>>>>> For each choice set one choice is chosen which contains full time 
>>>>> information for that choice set ie in set 1 choice 6 was chosen and shows the month 487. The other 5 choices show mth as missing. I want to fill these with the correct mth.
>>>>> 
>>>>> I am sure there must be an elegant way to do this in R?
>>>>> 
>>>>> 
>>>>> Best wishes
>>>>> 
>>>>> 
>>>>> 
>>>>> Graham
>>>>> 
>>>>> 
>>>>>   [[alternative HTML version deleted]]
>>>>> 
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide 
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>> 
>>> 
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> David Winsemius
> Alameda, CA, USA
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list