[R] For-loop dummy variables?

Adrian Dusa dusa.adrian at gmail.com
Tue Oct 19 09:02:34 CEST 2010


gravityflyer <gravityflyer <at> yahoo.com> writes:
> 
> Hi everyone, 
> 
> I've got a dataset with 12,000 observations. One of the variables
> (cleary$D1) is for an individual's country, coded 1 - 15. I'd like to create
> a dummy variable for the Baltic states which are coded 4,6, and 7. In other
> words, as a dummy variable Baltic states would be coded 1, else 0.  I've
> attempted the following for loop:
> 
> dummy <- matrix(NA, nrow=nrow(cleary), ncol=1)
> for (i in 1:length(cleary$D1)){
> 	if (cleary$D1 == 4){dummy[i] = 1}
> 	else {dummy[i] = 0}
> 	}
> 
> Unfortunately it generates the following error:
> 
> 1: In if (cleary$D1 == 4) { ... :
>   the condition has length > 1 and only the first element will be used
> 
> Another options I've tried is the following:
> 
> binary <- vector(length=length(cleary$D1))
> for (i in 1:length(cleary$D1)) {
> 	if (cleary$D1 == 4 | cleary$D1 == 6 | cleary$D1 == 7 ) {binary[i] = 1}
>  	else {binary[i] = 0}
> }
> 
> Unfortunately it simply responds with "syntax error".
> 
> Any thoughts would be greatly appreciated!
> 

Be aware that R is a vectorised programming language, therefore your for loop in 
completely unnecessary.

This is what I'd do:

dummy <- rep(0, nrow(cleary))
dummy[cleary$D1 %in% c(4,6,7)] <- 1

This is your dummy variable.
Below is your working (though VERY inefficient) version of the for loop:

binary <- vector(length=length(cleary$D1))
for (i in 1:length(cleary$D1)) {
    if (cleary$D1[i] == 4 | cleary$D1[i] == 6 | cleary$D1[i] == 7 ) {
        binary[i] = 1
    } else {
        binary[i] = 0
    }
}

Now try to figure out:
- what is the difference between your for() loop and mine?
- which code is more simple (and better), the vectorised or the for() loop?

I hope it helps,
Adrian



More information about the R-help mailing list