[R] how to replace NA with a specific score that is dependant on another indicator variable
David Winsemius
dwinsemius at comcast.net
Wed Sep 1 16:30:12 CEST 2010
On Sep 1, 2010, at 10:19 AM, David Winsemius wrote:
>
> On Sep 1, 2010, at 9:55 AM, David Winsemius wrote:
>
>>
>> On Sep 1, 2010, at 9:20 AM, Chris Howden wrote:
>>
>>> Hi everyone,
>>>
>>>
>>>
>>> I’m looking for a clever bit of code to replace NA’s with a
>>> specific score
>>> depending on an indicator variable.
>>>
>>> I can see how to do it using lots of if statements but I’m sure
>>> there most
>>> be a neater, better way of doing it.
>>>
>>> Any ideas at all will be much appreciated, I’m dreading coding up
>>> all those
>>> if statements!!!!!
>>>
>>> My problem is as follows:
>>>
>>> I have a data set with lots of missing data:
>>>
>>> EG Raw Data Set
>>>
>>> Category variable1 variable2
>>> variable3
>>>
>>> 1 5 NA
>>> NA
>>>
>>> 1 NA
>>> 3 4
>>>
>>> 2 NA
>>> 7 NA
>>
>> This does not do its work by category (since I got tired of fixing
>> mangled htmlized datasets) but it seems to me that a tapply "wrap"
>> could do either of these operations within categories:
>
> Why not try out Hadley's plyr package?
>
> require(plyr)
> ddply(egraw2, .(category), .fun=function(df) {
> sapply(df[-1],
#Take out the [-1]
> function(x) {mnx <- mean(x, na.rm=TRUE);
> sapply(x, function(z) if (is.na(z))
> {mnx}else{z})
> }
> ) } )
>
> Tested on
> egraw2 <- data.frame(category=rep(1:4, 4),
> var1=sample(c(1:3, NA,NA), 16, replace =TRUE),
> var2=sample(c(5:10, NA,NA), 16, replace =TRUE),
> var3=sample(c(15:20, NA,NA), 16, replace =TRUE) )
It did not create an error and only after I sorted that dataframe and
the first ddply result did I see that some sort of misregistration had
occurred; Better with:
res <-ddply(egraw2, .(category), .fun=function(df) {
sapply(df,
function(x) {mnx <- mean(x, na.rm=TRUE);
sapply(x, function(z) if (is.na(z))
{mnx}else{z})
}
) } )
>
> --
> David.
>>
>>
>> > egraw
>> Category variable1 variable2 variable3
>> 1 1 5 NA NA
>> 2 1 NA 3 4
>> 3 2 NA 7 NA
>>
>> > lapply(egraw, function(x) {mnx <- mean(x, na.rm=TRUE)
>> sapply(x, function(z) if (is.na(z))
>> {mnx}else{z})
>> }
>> )
>> $Category
>> [1] 1 1 2
>>
>> $variable1
>> [1] 5 5 5
>>
>> $variable2
>> [1] 5 3 7
>>
>> $variable3
>> [1] 4 4 4
>>
>> > sapply(egraw, function(x) {mnx <- mean(x, na.rm=TRUE)
>> sapply(x, function(z) if (is.na(z))
>> {mnx}else{z})
>> }
>> )
>> Category variable1 variable2 variable3
>> [1,] 1 5 5 4
>> [2,] 1 5 3 4
>> [3,] 2 5 7 4
>>
>>>
>>> etc
>>>
>>> Now I want to replace the NA’s with the average for each category,
>>> so if
>>> these averages were:
>>>
>>> EG Averages
>>>
>>> Category variable1 variable2
>>> variable3
>>>
>>> 1 4.5
>>> 3.2 2.5
>>>
>>> 2 3.5
>>> 7.4 5.9
>>>
>>>
>>>
>>> So I’d like my data set to look like the following once I’ve
>>> replaced the
>>> NA’s with the appropriate category average:
>>>
>>> EG Imputed Data Set
>>>
>>> Category variable1 variable2
>>> variable3
>>>
>>> 1 5 3.2
>>> 2.5
>>>
>>> 1 4.5
>>> 3 4
>>>
>>> 2 3.5
>>> 7 5.9
>>>
>>> etc
>>>
>>> Any ideas would be very much appreciated!!!!!
>>
>> You might add reading the Posing Guide and setting up your reader
>> to post in plain text to your TODO list.
>>>
>>> thankyou
>>>
>>> Chris Howden
>>
>>> .
>>
>> David Winsemius, MD
>> West Hartford, CT
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list