Advice on recoding a variable depending on another which contains NAs

David Winsemius dwinsemius at comcast.net
Sun Nov 20 00:55:21 CET 2011

```On Nov 19, 2011, at 6:31 PM, Anthony Staines wrote:

> Dear colleagues,
> I would be very grateful for your help with the following. I have
> banged my head off this question several times in the past, and
> repeatedly over the last week. I have looked in the usual places and
> found no obvious solution. I fear that this just means I didn't
> recognize it, but I'd be very grateful for your help.
> I am scoring 8000 psychometric tests - the SCQ, if you have heard of
> it. On this test the scoring rules depends on one variable SCQ1 - if
> this is answered yes, the final score is a function of 39 variables,
> and if no, of 31 variables.
>
> I've calculated both of these scores (SCQScore1 and SCQScore2)for
> all the children in my study, and I wish to create a final score,
> which is SCQScore1 when SCQ1 is 1, and SCQScore2 when SCQ1 is 2.
> There are also missing values for SCQ1, and I have chosen, for the
> moment, to set the final score to SCQScore1 for these. [[This is a

This would seem to be an obvious task for ifelse()

SCQScore <- NA
d\$SCQScore <- ifelse( SCQ1 == 1, d\$SCQScore1, d\$SCOScore2)

(And don't use 99 for missing. Use NA. It will protect you better than
"99".)

I suppose you could enforce the two level testing with:

d\$SCQScore <- ifelse( SCQ1 == 1, d\$SCQScore1,
ifelse(SCQ1 ==2,  d\$SCOScore2, NA))

> d\$SCQScore <- 99
> 	##Distinct value for any other values I've missed
> d\$SCQScore[SCQ1 == 1] <- d\$SCQScore1[SCQ1 == 1]
> 	## Talks using phrases/sentences, so sum S2CQ:SCQ40
> d\$SCQScore[SCQ1 == 2] <- d\$SCQScore2[SCQ1 == 2]
> 	## Can't do this, so sum SCQ8:SCQ40
> d\$SCQScore[is.na(d\$SCQ1)] <- d\$SCQScore1 [is.na(d\$SCQ1)]
> 	## SCQ1 is missing
> This fails on line 2
> (d\$SCQScore[SCQ1 == 1] <- d\$SCQScore1[SCQ1 == 1])
> with the error message
> "NAs are not allowed in subscripted assignments",
> presumably because SCQ1 does indeed contain missing values.
> This can be fixed, got around, or otherwise bypassed, by creating a
> new variable SCQ1, with no missing values, as shown :-
>
> SCQ1 <- d\$SCQ1
> SCQ1[is.na(SCQ1)] <- 3
> d\$SCQScore[SCQ1 == 1] <- d\$SCQScore1[SCQ1 == 1]
> 	## Talks using phrases/sentences so sum S2CQ:SCQ40
> d\$SCQScore[SCQ1 == 2] <- d\$SCQScore2[SCQ1 == 2]
> 	## Can't do this, so sum SCQ8:SCQ40
> d\$SCQScore[SCQ1 == 3] <- d\$SCQScore1[SCQ1 == 3]
> 	## We don't know if he/she can talk, so guess - sum S2:S40
> This type of thing is a common problem in my little world. Is there
> a better/less klutzy/smarter way of solving it than creating a new
> variable each time? Please bear in mind that it is critical, for
> later analysis, to keep the missing values in SCQ1.
> Best wishes,
> Anthony Staines
