[R] max / pmax

Tue May 30 21:24:16 CEST 2006

Here's an example of how I think you can do what you want.  Play with 
the definition of the function highest.use() to get random selection of 
multiple maxima.

 > drug.names <- c("marijuana", "crack", "cocaine", "heroin")
 > drugs <- factor(drug.names, levels=drug.names)
 > drugs
[1] marijuana crack     cocaine   heroin
Levels: marijuana crack cocaine heroin
 > as.numeric(drugs)
[1] 1 2 3 4
 > N <- 20
 > set.seed(1)
 > primary.drug <- sample(drugs, N, rep=T)
 > primary.drug[sample(1:20, 10)] <- NA
 > primary.drug
  [1] <NA>      crack     <NA>      <NA>      <NA>      <NA>      heroin
  [8] cocaine   cocaine   marijuana <NA>      <NA>      cocaine   crack
[15] heroin    <NA>      cocaine   heroin    <NA>      <NA>
Levels: marijuana crack cocaine heroin
 > # usage frequencies
 > marijuana <- sample(1:3, N, rep=T)
 > crack <- sample(1:3, N, rep=T)
 > cocaine <- sample(1:3, N, rep=T)
 > heroin <- sample(1:3, N, rep=T)
 > cbind(marijuana, crack, cocaine, heroin)
       marijuana crack cocaine heroin
  [1,]         2     2       2      1
  [2,]         2     3       3      1
  [3,]         2     2       2      2
  [4,]         1     1       2      3
  [5,]         3     1       2      3
  [6,]         3     1       3      3
  [7,]         3     1       3      2
  [8,]         1     2       2      2
  [9,]         3     2       3      3
[10,]         2     2       3      2
[11,]         3     3       2      2
[12,]         2     1       3      2
[13,]         3     2       2      1
[14,]         2     1       1      3
[15,]         2     2       3      2
[16,]         3     1       1      1
[17,]         1     2       3      1
[18,]         2     3       1      2
[19,]         3     1       1      3
[20,]         3     3       1      2
 > highest.use <- function(x) {y <- which(x==max(x, na.rm=T)); if 
(length(y)==1) return(y) else return(NA)}
 > apply(cbind(marijuana, crack, cocaine, heroin), 1, highest.use)
  [1] NA NA NA  4 NA NA NA NA NA  3 NA  3  1  4  3  1  3  2 NA NA
 > impute.primary.drug <- drugs[ifelse(is.na(primary.drug), 
apply(cbind(marijuana, crack, cocaine, heroin), 1, highest.use), 
as.numeric(primary.drug))]
 > data.frame(primary.drug, impute.primary.drug)
    primary.drug impute.primary.drug
1          <NA>                <NA>
2         crack               crack
3          <NA>                <NA>
4          <NA>              heroin
5          <NA>                <NA>
6          <NA>                <NA>
7        heroin              heroin
8       cocaine             cocaine
9       cocaine             cocaine
10    marijuana           marijuana
11         <NA>                <NA>
12         <NA>             cocaine
13      cocaine             cocaine
14        crack               crack
15       heroin              heroin
16         <NA>           marijuana
17      cocaine             cocaine
18       heroin              heroin
19         <NA>                <NA>
20         <NA>                <NA>
 >

Brian Perron wrote:
> Hello R users,
> 
> I am relatively new to R and cannot seem to crack a coding problem.  I 
> am working with substance abuse data, and I have a variable called 
> "primary.drug" which is considered the drug of choice for each 
> subject.   I have just a few missing values on that variable.  Instead 
> of using a multiple imputation method like chained equations, I would 
> prefer to derive these values from other survey responses.  
> Specifically, I have a frequency of use (in days) for each of the major 
> drugs, so I would like the missing values to be replaced by that drug 
> with the highest level of use.  I am starting with the "ifelse" and 
> "max" statements, but I know it is wrong:
> 
> impute.primary.drug <-   ifelse(is.na(primary.drug), max(marijuana, 
> crack, cocaine, heroin), primary.drug)
> 
> Here are the problems.  First, the max statement (should it be "pmax"?), 
> returns the highest numeric quantity rather than the variable itself.  
> In other words, I want to test which drug has the highest value, but 
> return the variable name rather than the observed value.   Second, if 
> ties are observed, how can I specify the value to be NA?  Or, how can I 
> specify one of the values to be randomly selected?   
> 
>  Thank in advance for your assistance.
> 
> Regards,
> Brian
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>