[R] Replace missing value within group with non-missing value
Leask, Graham
g.leask at aston.ac.uk
Sun Apr 7 10:22:03 CEST 2013
Hi Bill,
Thank you for your suggestion.
I shall try running the code and test as you suggest.
Is there a straightforward way to routinely test the structure of a complex survey data set such as this?
For example with a multinomial choice model such as this for the data to be correct for each
observation set of say 6 choices there can be only 1 choice selected and 1 non-missing month. This can
however be an issue to check when dealing with very large datasets.
Presumably if more than one choice in a set is positive this will show by the model failing to converge
due to singularity but this should have been detected at the data cleaning stage.
Best wishes
Graham
-----Original Message-----
From: William Dunlap [mailto:wdunlap at tibco.com]
Sent: 06 April 2013 22:49
To: Rui Barradas; Leask, Graham
Cc: r-help at r-project.org
Subject: RE: [R] Replace missing value within group with non-missing value
> Anyway, try replacing the lapply instruction with this.
>
> tmp <- lapply(sp, function(x){
> idx <- which(!is.na(x$mth))[1]
> if(length(idx) > 0)
> x$mth <- x$mth[idx]
> x
> })
Note that
which(anyLogicalVector)[1]
always has length 1, because of the subscript [1], so the 'if' statement may as well be omitted.
There are 2 cases the above code does not detect or deal with.
(a) nrow(x)==0
(b) all(is.na(x$mth))
(c) length(which(is.na(x$mth))) > 1
Case (a) causes the function to stop in way you saw:
> f <- function(x) { # the function passed to lapply
+ idx <- which(!is.na(x$mth))[1]
+ if (length(idx) > 0)
+ x$mth <- x$mth[idx]
+ x
+ }
> f(data.frame(mth=integer()))
Error in `$<-.data.frame`(`*tmp*`, "mth", value = NA_integer_) :
replacement has 1 rows, data has 0
but (b) and (c) may indicate some errors in your data and cause some surprises down the line.
> f(data.frame(mth=c(NA,NA)))
mth
1 NA
2 NA
> f(data.frame(mth=c(NA,2,3)))
mth
1 2
2 2
3 2
You could have your code check whether there is exactly one non-missing value for mth in each non-empty group and warn if that assumption is not true for some group (but also return some reasonable result)? The following does
that:
f2 <- function (x) {
idx <- !is.na(x$mth) # logical vector with length nrow(x)
nNotNA <- sum(idx)
if (nNotNA > 1) {
warning("more than one non-missing mth value in group, using the first")
idx[cumsum(idx) > 1] <- FALSE
}
else if (nrow(x) > 0 && nNotNA == 0) {
warning("no non-missing values in group, all mth values will be NA")
idx[1] <- TRUE
}
x$mth <- x$mth[idx]
x
}
The error messages do not say where in 'sp' the problem arose. You could change your lapply call so the group number was in the warning:
lapply(seq_along(sp), function(i) {
x <- sp[[i]]
... same code as in f2, but add the group number, i, to the end of warnings ...
warning("more than one ... in group number", i)
...
})
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> -----Original Message-----
> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] On Behalf Of Rui Barradas
> Sent: Saturday, April 06, 2013 10:24 AM
> To: Leask, Graham
> Cc: r-help at r-project.org
> Subject: Re: [R] Replace missing value within group with non-missing
> value
>
> Hello,
>
> I've just run my code with your data and found no error. Anyway, try
> replacing the lapply instruction with this.
>
>
> tmp <- lapply(sp, function(x){
> idx <- which(!is.na(x$mth))[1]
> if(length(idx) > 0)
> x$mth <- x$mth[idx]
> x
> })
>
>
> Rui Barradas
>
> Em 06-04-2013 18:12, Leask, Graham escreveu:
> > Hi Arun,
> >
> > How odd. Directly pasting the code from your email precisely repeats the error.
> > See below. Any thoughts on the cause of this anomaly?
> >
> >> dput(head(dat,50))
> > structure(list(dn = c(4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
> > 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
> > 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4), obs = c(1, 1, 1, 1, 1, 1, 2, 2,
> > 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 6,
> > 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 9, 9), choice =
> > c(0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0,
> > 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0,
> > 0, 0, 1, 0, 0), br = c(1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3,
> > 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2,
> > 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2), mth = c(NA, NA, NA, NA, NA,
> > 487, NA, NA, 488, NA, NA, NA, NA, NA, NA, NA, NA, 488, NA, NA, 489,
> > NA, NA, NA, NA, NA, NA, NA, NA, 489, NA, NA, NA, NA, NA, 489, NA,
> > NA, NA, NA, NA, 490, NA, NA, NA, NA, NA, 491, NA, NA)), .Names =
> > c("dn", "obs", "choice", "br", "mth"), row.names = c("1", "2", "3",
> > "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15",
> > "16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26",
> > "27", "28", "29", "30", "31", "32", "33", "34", "35", "36", "37",
> > "38", "39", "40", "41", "42", "43", "44", "45", "46", "47", "48",
> > "49", "50"), class = "data.frame")
> >> sp <- split(dat, list(dat$dn, dat$obs))
> >> names(sp) <- NULL
> >> tmp <- lapply(sp, function(x){
> > + idx <- which(!is.na(x$mth))[1]
> > + x$mth <- x$mth[idx]
> > + x
> > + })
> > Error in `$<-.data.frame`(`*tmp*`, "mth", value = NA_real_) :
> > replacement has 1 rows, data has 0
> >> head(do.call(rbind, tmp),7)
> > Error in do.call(rbind, tmp) : object 'tmp' not found
> >
> > Best wishes
> >
> >
> > Graham
> >
> > -----Original Message-----
> > From: arun [mailto:smartpink111 at yahoo.com]
> > Sent: 06 April 2013 17:25
> > To: Leask, Graham
> > Cc: Rui Barradas
> > Subject: Re: [R] Replace missing value within group with non-missing
> > value
> >
> > Hello,
> > By running Rui's code, I am getting this:
> > sp <- split(dat, list(dat$dn, dat$obs))
> > names(sp) <- NULL
> > tmp <- lapply(sp, function(x){
> > idx <- which(!is.na(x$mth))[1]
> > x$mth <- x$mth[idx]
> > x
> > })
> > head(do.call(rbind, tmp),7)
> > dn obs choice br mth
> > 1 4 1 0 1 487
> > 2 4 1 0 2 487
> > 3 4 1 0 3 487
> > 4 4 1 0 4 487
> > 5 4 1 0 5 487
> > 6 4 1 1 6 487
> > 7 4 2 0 1 488
> >
> > Couldn't reproduce the error you cited.
> > A.K.
> >
> >
> >
> >
> > ----- Original Message -----
> > From: "Leask, Graham" <g.leask at aston.ac.uk>
> > To: Rui Barradas <ruipbarradas at sapo.pt>
> > Cc: "r-help at r-project.org" <r-help at r-project.org>
> > Sent: Saturday, April 6, 2013 12:16 PM
> > Subject: Re: [R] Replace missing value within group with non-missing
> > value
> >
> > Hi Rui,
> >
> > Data as follows
> >
> > structure(list(dn = c(4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
> > 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4),
> obs = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4,
> 4, 4, 4, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 8, 8,
> 8, 8, 8, 8, 9, 9), choice = c(0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0,
> 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1,
> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0), br = c(1, 2, 3, 4, 5, 6, 1,
> 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6,
> 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2), mth =
> c(NA, NA, NA, NA, NA, 487, NA, NA, 488, NA, NA, NA, NA, NA, NA, NA, NA, 488, NA, NA, 489, NA, NA, NA, NA, NA, NA, NA, NA, 489, NA, NA, NA, NA, NA, 489, NA, NA, NA, NA, NA, 490, NA, NA, NA, NA, NA, 491, NA, NA)), .Names = c("dn", "obs", "choice", "br", "mth"), row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11"!
> , "12",
> "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23",
> "24", "25", "26", "27", "28", "29", "30", "31", "32", "33", "34",
> "35", "36", "37", "38", "39", "40", "41", "42", "43", "44", "45",
> "46", "47", "48", "49", "50"), class = "data.frame")
> >
> > Best wishes
> >
> >
> > Graham
> >
> > -----Original Message-----
> > From: Rui Barradas [mailto:ruipbarradas at sapo.pt]
> > Sent: 06 April 2013 16:32
> > To: Leask, Graham
> > Cc: r-help at r-project.org
> > Subject: Re: [R] Replace missing value within group with non-missing
> > value
> >
> > Hello,
> >
> > Can't you post a data example? If your dataset is named 'dat' use
> >
> > dput(head(dat, 50)) # paste the output of this in a post
> >
> >
> > Rui Barradas
> >
> > Em 06-04-2013 15:34, Leask, Graham escreveu:
> >> Hi Rui,
> >>
> >> Thank you for your suggestion which is very much appreciated.
> >> Unfortunately running
> this code produces the following error.
> >>
> >> error in '$<-.data.frame' ('*tmp*', "mth", value = NA_real_) :
> >> replacement has 1 rows, data has 0
> >>
> >> I'm sure there must be an elegant solution to this problem?
> >>
> >> Best wishes
> >>
> >>
> >>
> >> Graham
> >>
> >> On 6 Apr 2013, at 12:15, "Rui Barradas" <ruipbarradas at sapo.pt> wrote:
> >>
> >>> Hello,
> >>>
> >>> That's not a very good way of posting your data, preferably paste
> >>> the output of
> ?dput in a post.
> >>> Some thing along the lines of the following might do what you want.
> >>> It seems that the groups are established by 'dn' and 'obs' numbers.
> >>> If so, try
> >>>
> >>>
> >>> # Make up some data
> >>> dat <- data.frame(dn = 4, obs = rep(1:5, each = 6), mth = NA)
> >>> dat$mth[6] <- 487 dat$mth[9] <- 488 dat$mth[18] <- 488 dat$mth[21]
> >>> <-
> >>> 489 dat$mth[30] <- 489
> >>>
> >>>
> >>> sp <- split(dat, list(dat$dn, dat$obs))
> >>> names(sp) <- NULL
> >>> tmp <- lapply(sp, function(x){
> >>> idx <- which(!is.na(x$mth))[1]
> >>> x$mth <- x$mth[idx]
> >>> x
> >>> })
> >>> do.call(rbind, tmp)
> >>>
> >>>
> >>> Hope this helps,
> >>>
> >>> Rui Barradas
> >>>
> >>>
> >>> Em 06-04-2013 11:33, Leask, Graham escreveu:
> >>>> Dear List members
> >>>>
> >>>> I have a large dataset organised in choice groups see sample
> >>>> below
> >>>>
> >>>>
> >>>> +----------------------------------------------------------------
> >>>> +----
> >>>> -----------------------------+
> >>>> | dn obs choice acid br date
> >>>> cdate situat~n mth year set |
> >>>>
> >>>> |----------------------------------------------------------------
> >>>> |----
> >>>> -----------------------------|
> >>>> 1. | 4 1 0 LOSEC 1 .
> >>>> . . . 1 |
> >>>> 2. | 4 1 0 NEXIUM 2 .
> >>>> . . . 1 |
> >>>> 3. | 4 1 0 PARIET 3 .
> >>>> . . . 1 |
> >>>> 4. | 4 1 0 PROTIUM 4 .
> >>>> . . . 1 |
> >>>> 5. | 4 1 0 ZANTAC 5 .
> >>>> . . . 1 |
> >>>>
> >>>> |----------------------------------------------------------------
> >>>> |----
> >>>> -----------------------------|
> >>>> 6. | 4 1 1 ZOTON 6 23aug2000 01:00:00
> >>>> 23aug2000 NS 487 2000 1 |
> >>>> 7. | 4 2 0 LOSEC 1 .
> >>>> . . . 2 |
> >>>> 8. | 4 2 0 NEXIUM 2 .
> >>>> . . . 2 |
> >>>> 9. | 4 2 1 PARIET 3 25sep2000 01:00:00
> >>>> 25sep2000 L 488 2000 2 | 10. | 4 2 0
> >>>> PROTIUM 4 . . . .
> >>>> 2 |
> >>>>
> >>>> |----------------------------------------------------------------
> >>>> |----
> >>>> -----------------------------| 11. | 4 2 0 ZANTAC
> >>>> 5 . . . . 2 |
> >>>> 12. | 4 2 0 ZOTON 6 .
> >>>> . . . 2 | 13. | 4 3 0 LOSEC
> >>>> 1 . . . . 3 |
> >>>> 14. | 4 3 0 NEXIUM 2 .
> >>>> . . . 3 | 15. | 4 3 0 PARIET
> >>>> 3 . . . . 3 |
> >>>>
> >>>> |----------------------------------------------------------------
> >>>> |----
> >>>> -----------------------------| 16. | 4 3 0 PROTIUM
> >>>> 4 . . . . 3 |
> >>>> 17. | 4 3 0 ZANTAC 5 .
> >>>> . . . 3 | 18. | 4 3 1 ZOTON
> >>>> 6 20sep2000 00:00:00 20sep2000 R 488 2000 3 |
> >>>> 19. | 4 4 0 LOSEC 1 .
> >>>> . . . 4 | 20. | 4 4 0 NEXIUM
> >>>> 2 . . . . 4 |
> >>>>
> >>>> |----------------------------------------------------------------
> >>>> |----
> >>>> -----------------------------| 21. | 4 4 1 PARIET
> >>>> 3 27oct2000 00:00:00 27oct2000 NL 489 2000 4 |
> >>>> 22. | 4 4 0 PROTIUM 4 .
> >>>> . . . 4 | 23. | 4 4 0 ZANTAC
> >>>> 5 . . . . 4 |
> >>>> 24. | 4 4 0 ZOTON 6 .
> >>>> . . . 4 | 25. | 4 5 0 LOSEC
> >>>> 1 . . . . 5 |
> >>>>
> >>>> |----------------------------------------------------------------
> >>>> |----
> >>>> -----------------------------| 26. | 4 5 0 NEXIUM
> >>>> 2 . . . . 5 |
> >>>> 27. | 4 5 0 PARIET 3 .
> >>>> . . . 5 | 28. | 4 5 0 PROTIUM
> >>>> 4 . . . . 5 |
> >>>> 29. | 4 5 0 ZANTAC 5 .
> >>>> . . . 5 | 30. | 4 5 1 ZOTON
> >>>> 6 23oct2000 03:00:00 23oct2000 NS 489 2000 5 |
> >>>>
> >>>> I wish to fill in the missing values in each choice set -
> >>>> delineated by dn (Doctor) obs
> (Observation number) and choices (1 to 6).
> >>>> For each choice set one choice is chosen which contains full time
> >>>> information for that choice set ie in set 1 choice 6 was chosen
> >>>> and shows the
> month 487. The other 5 choices show mth as missing. I want to fill
> these with the correct mth.
> >>>>
> >>>> I am sure there must be an elegant way to do this in R?
> >>>>
> >>>>
> >>>> Best wishes
> >>>>
> >>>>
> >>>>
> >>>> Graham
> >>>>
> >>>>
> >>>> [[alternative HTML version deleted]]
> >>>>
> >>>> ______________________________________________
> >>>> R-help at r-project.org mailing list
> >>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>> PLEASE do read the posting guide
> >>>> http://www.R-project.org/posting-guide.html
> >>>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list