[R] how to replace NA with a specific score that is dependant on another indicator variable
David Winsemius
dwinsemius at comcast.net
Wed Sep 1 16:19:05 CEST 2010
On Sep 1, 2010, at 9:55 AM, David Winsemius wrote:
>
> On Sep 1, 2010, at 9:20 AM, Chris Howden wrote:
>
>> Hi everyone,
>>
>>
>>
>> I’m looking for a clever bit of code to replace NA’s with a
>> specific score
>> depending on an indicator variable.
>>
>> I can see how to do it using lots of if statements but I’m sure
>> there most
>> be a neater, better way of doing it.
>>
>> Any ideas at all will be much appreciated, I’m dreading coding up
>> all those
>> if statements!!!!!
>>
>> My problem is as follows:
>>
>> I have a data set with lots of missing data:
>>
>> EG Raw Data Set
>>
>> Category variable1 variable2
>> variable3
>>
>> 1 5 NA
>> NA
>>
>> 1 NA
>> 3 4
>>
>> 2 NA
>> 7 NA
>
> This does not do its work by category (since I got tired of fixing
> mangled htmlized datasets) but it seems to me that a tapply "wrap"
> could do either of these operations within categories:
Why not try out Hadley's plyr package?
require(plyr)
ddply(egraw2, .(category), .fun=function(df) {
sapply(df[-1],
function(x) {mnx <- mean(x, na.rm=TRUE);
sapply(x, function(z) if (is.na(z))
{mnx}else{z})
}
) } )
Tested on
egraw2 <- data.frame(category=rep(1:4, 4),
var1=sample(c(1:3, NA,NA), 16, replace =TRUE),
var2=sample(c(5:10, NA,NA), 16, replace =TRUE),
var3=sample(c(15:20, NA,NA), 16, replace =TRUE) )
--
David.
>
>
> > egraw
> Category variable1 variable2 variable3
> 1 1 5 NA NA
> 2 1 NA 3 4
> 3 2 NA 7 NA
>
> > lapply(egraw, function(x) {mnx <- mean(x, na.rm=TRUE)
> sapply(x, function(z) if (is.na(z))
> {mnx}else{z})
> }
> )
> $Category
> [1] 1 1 2
>
> $variable1
> [1] 5 5 5
>
> $variable2
> [1] 5 3 7
>
> $variable3
> [1] 4 4 4
>
> > sapply(egraw, function(x) {mnx <- mean(x, na.rm=TRUE)
> sapply(x, function(z) if (is.na(z))
> {mnx}else{z})
> }
> )
> Category variable1 variable2 variable3
> [1,] 1 5 5 4
> [2,] 1 5 3 4
> [3,] 2 5 7 4
>
>>
>> etc
>>
>> Now I want to replace the NA’s with the average for each category,
>> so if
>> these averages were:
>>
>> EG Averages
>>
>> Category variable1 variable2
>> variable3
>>
>> 1 4.5
>> 3.2 2.5
>>
>> 2 3.5
>> 7.4 5.9
>>
>>
>>
>> So I’d like my data set to look like the following once I’ve
>> replaced the
>> NA’s with the appropriate category average:
>>
>> EG Imputed Data Set
>>
>> Category variable1 variable2
>> variable3
>>
>> 1 5 3.2
>> 2.5
>>
>> 1 4.5
>> 3 4
>>
>> 2 3.5
>> 7 5.9
>>
>> etc
>>
>> Any ideas would be very much appreciated!!!!!
>
> You might add reading the Posing Guide and setting up your reader to
> post in plain text to your TODO list.
>>
>> thankyou
>>
>> Chris Howden
>
>> .
>
> David Winsemius, MD
> West Hartford, CT
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list