[R] recoding large number of categories (select in SAS)

james.holtman@convergys.com james.holtman at convergys.com
Wed Jan 19 15:30:57 CET 2005





Here is a way of doing it by setting up a matrix of values to test against.
Easier than writing all the 'select' statements.

> x.trans <- matrix(c(  # translation matrix; first column is min, second
is max,
+     149, 150, 150,      # and third is the value to be returned
+     186, 187, 187,
+     438, 438, 438,
+     430, 430, 430,
+     808, 826, 808,
+     830, 832, 808,
+     997, 998, 792,
+     792, 796, 792), ncol=3, byrow=T)
> colnames(x.trans) <- c('min', 'max', 'value')
>
> x.default <- 9999   # default/nomatch value
>
> x.test <- c(150, 149, 148, 438, 997, 791, 795, 810, 820, 834)   # test
data
> #
> # this function will test each value and if between the min/max, return 3
column
> #
> newValues <- sapply(x.test, function(x){
+     .value <- x.trans[(x >= x.trans[,'min']) & (x <=
x.trans[,'max']),'value']
+     if (length(.value) == 0) .value <- x.default    # on no match, take
default
+     .value[1]   # return first value if multiple matches
+ })
> newValues
 [1]  150  150 9999  438  792 9999  792  808  808 9999
>
__________________________________________________________
James Holtman        "What is the problem you are trying to solve?"
Executive Technical Consultant  --  Office of Technology, Convergys
james.holtman at convergys.com
+1 (513) 723-2929


                                                                                                                                           
                      Denis Chabot                                                                                                         
                      <chabotd at globetrotter        To:       r-help at stat.math.ethz.ch                                                      
                      .net>                        cc:                                                                                     
                      Sent by:                     Subject:  [R] recoding large number of categories (select in SAS)                       
                      r-help-bounces at stat.m                                                                                                
                      ath.ethz.ch                                                                                                          
                                                                                                                                           
                                                                                                                                           
                      01/19/2005 08:56 AM                                                                                                  
                                                                                                                                           
                                                                                                                                           




Hi,

I have data on stomach contents. Possible prey species are in the
hundreds, so a list of prey codes has been in used in many labs doing
this kind of work.

When comes time to do analyses on these data one often wants to regroup
prey in broader categories, especially for rare prey.

In SAS you can nest a large number of "if-else", or do this more
cleanly with "select" like this:
select;
   when (149 <= prey <=150)   preyGr= 150;
   when (186 <= prey <= 187)  preyGr= 187;
   when (prey= 438)                 preyGr= 438;
   when (prey= 430)                 preyGr= 430;
   when (prey= 436)                 preyGr= 436;
   when (prey= 431)                 preyGr= 431;
   when (prey= 451)                 preyGr= 451;
   when (prey= 461)                 preyGr= 461;
   when (prey= 478)                 preyGr= 478;
   when (prey= 572)                 preyGr= 572;
   when (692 <= prey <=  695 )
preyGr= 692;
   when (808 <= prey <=  826, 830 <= prey <= 832 )           preyGr= 808;
   when (997 <= prey <= 998, 792 <= prey <= 796)             preyGr= 792;
   when (882 <= prey <= 909)
                         preyGr= 882;
   when (prey in (999, 125, 994))
                    preyGr= 9994;
   otherwise                             preyGr= 1;
end; *select;

The number of transformations is usually much larger than this short
example.

What is the best way of doing this in R?

Sincerely,

Denis Chabot

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html




More information about the R-help mailing list