[R] Need a more efficient way to implement this type of logic in R
Alexander Engelhardt
alex at chaotic-neutral.de
Wed Apr 6 23:04:12 CEST 2011
Am 06.04.2011 22:02, schrieb Walter Anderson:
> I have cobbled together the following logic. It works but is very slow.
> I'm sure that there must be a better r-specific way to implement this
> kind of thing, but have been unable to find/understand one. Any help
> would be appreciated.
>
> hh.sub <- households[c("HOUSEID","HHFAMINC")]
> for (indx in 1:length(hh.sub$HOUSEID)) {
> if ((hh.sub$HHFAMINC[indx] == '01') | (hh.sub$HHFAMINC[indx] == '02') |
> (hh.sub$HHFAMINC[indx] == '03') | (hh.sub$HHFAMINC[indx] == '04') |
> (hh.sub$HHFAMINC[indx] == '05'))
> hh.sub$CS_FAMINC[indx] <- 1 # Less than $25,000
> if ((hh.sub$HHFAMINC[indx] == '06') | (hh.sub$HHFAMINC[indx] == '07') |
> (hh.sub$HHFAMINC[indx] == '08') | (hh.sub$HHFAMINC[indx] == '09') |
> (hh.sub$HHFAMINC[indx] == '10'))
> hh.sub$CS_FAMINC[indx] <- 2 # $25,000 to $50,000
> if ((hh.sub$HHFAMINC[indx] == '11') | (hh.sub$HHFAMINC[indx] == '12') |
> (hh.sub$HHFAMINC[indx] == '13') | (hh.sub$HHFAMINC[indx] == '14') |
> (hh.sub$HHFAMINC[indx] == '15'))
> hh.sub$CS_FAMINC[indx] <- 3 # $50,000 to $75,000
> if ((hh.sub$HHFAMINC[indx] == '16') | (hh.sub$HHFAMINC[indx] == '17'))
> hh.sub$CS_FAMINC[indx] <- 4 # $75,000 to $100,000
> if ((hh.sub$HHFAMINC[indx] == '18'))
> hh.sub$CS_FAMINC[indx] <- 5 # More than $100,000
> if ((hh.sub$HHFAMINC[indx] == '-7') | (hh.sub$HHFAMINC[indx] == '-8') |
> (hh.sub$HHFAMINC[indx] == '-9'))
> hh.sub$CS_FAMINC[indx] = 0
> }
Hi,
the for-loop is entirely unnecessary. You can, as a first step, rewrite
the code like this:
if ((hh.sub$HHFAMINC == '01') | (hh.sub$HHFAMINC == '02') |
(hh.sub$HHFAMINC == '03') | (hh.sub$HHFAMINC == '04') |
(hh.sub$HHFAMINC == '05'))
hh.sub$CS_FAMINC <- 1 # Less than $25,000
This very basic concept is called "vectorization" in R. You should read
about it, it rocks.
In this case, though, you don't even need to do that:
If you cast the variable HHFAMINC into a number like this:
hh.sub$HHFAMINC <- as.numeric(hh.sub$HHFAMINC)
, then you can apply the cut() function to create a factor variable:
hh.sub$myawesomefactor <- cut(hh.sub$HHFAMINC, breaks=c(5.5, 10.5, 15.5,
17.5))
or something like that should do the trick. You will then have to rename
the factor values. I think it is the function names(), but I'm only 95%
sure (heh.)
Also, this might be my OCD speaking, but I would use NA instead of 0 for
non-available values.
Have fun,
Alex
More information about the R-help
mailing list