[R] replacing missing values in a dataframe with reference values.
Jim Regetz
regetz at nceas.ucsb.edu
Fri Aug 22 02:29:16 CEST 2008
Gary Collins wrote:
> Any thoughts on the following I'd be most grateful - I'm sure there is
> an easy and quick way to do this but I'm having a mental block this
> evening. Essentially, I'm trying to replace missing data in my dataset
> with reference values based on age and sex.
>
> So an example dataset is
> set.seed(1)
> X = data.frame(age=rnorm(10, 50, 10), sex=rbinom(10, 1, 0.5),
> A=rnorm(10), B=rnorm(10))
>
> X$agegroup = cut(X$age, breaks=c(20, 30, 40, 50, 60, 70, 80), labels =
> c("20-29", "30-39", "40-49", "50-59", "60-69", "70-79"))
>
> # make some missing
> X$A[c(1, 5, 6)]=NA
> X$B[c(2, 3, 8)]=NA
>
> and my reference dataset is
> refval = data.frame(agegroup = rep(c("20-29", "30-39", "40-49", "50-59",
> "60-69", "70-79"), 2), sex = c(rep(0,6), rep(1,6)),A = c(seq(0, 1,
> length=6), seq(4,5,length=6)), B = c(seq(1, 2, length=6),
> seq(3,4,length=6)))
Especially if you're certain every combination of agegroup and sex in X
has a matching entry in the refval data frame, I would just create a
concatenated key from your two reference variables, and then use that to
do the lookups using rownames:
rownames(refval) <- paste(refval$sex, refval$agegroup, sep=".")
X.key <- paste(X$sex, X$agegroup, sep=".")
X$A[is.na(X$A)] <- refval[X.key[is.na(X$A)], "A"]
X$B[is.na(X$B)] <- refval[X.key[is.na(X$B)], "B"]
etc.
HTH,
Jim
> My ugly "solution" is
>
> for(i in 1:nrow(X)){
> if(is.na(X$A[i])){
> X$A[i] = refval$A[refval$sex == X$sex[i] & refval$agegroup ==
> X$agegroup[i]]
> }
> if(is.na(X$B[i])){
> X$B[i] = refval$B[refval$sex == X$sex[i] & refval$agegroup ==
> X$agegroup[i]]
> }
> }
>
> Which not only is this ugly, but is slow, in that my actual dataset is
> over 1 million rows with 5 or 6 variables to check and replace missing
> values.
>
> Thanks in advance for any help.
>
> Gary
> ------------------------------------
> Dr Gary S Collins
> Medical Statistician
> Centre for Statistics in Medicine
> Wolfson College Annexe
> University of Oxford
> Linton Road
> Oxford, OX2 6UD
>
> Tel: +44 (0)1865 284418
> Fax: +44 (0)1865 284424
> www.csm-oxford.org.uk
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list