[R] index question

Richard.Cotton at hsl.gov.uk Richard.Cotton at hsl.gov.uk
Fri Dec 28 14:23:25 CET 2007


>  From a dataframe there are 27 variables of interest, with the 
> prefix of "pre".
> 
>   [7] "Decision"  "MHCDate"   "pre01"     "pre01111"  "pre012" "pre013"
> [13] "pre02"     "pre02111"  "pre02114"  "pre0211"   "pre0212" "pre029"
> [19] "pre03a"    "pre0311"   "pre0312"   "pre03"     "pre04"     "pre05"
> [25] "pre06"     "pre07"     "pre08"     "pre09"     "pre10"     "pre11"
> [31] "pre12"     "pre13"     "pre14"     "pre15"     "pre16"
> 
> I want to combine these variables into new variables, using the 
> following criteria :
> 
> (1) create a single variable PRE, when any of the 27 'pre' variables 
> have a value >= '1'
> (2) create a variable HOM, when any of the pre01, pre01111, pre012, 
> pre013 variables have a value >= '1'
> (3) create a variable ASS, when any of the pre02, pre02111, pre02114, 
> pre0211, pre0212, pre029  variables have a value   >= '1'
> (4) create a variable SEX, when any of the pre03a, pre0311, pre0312, 
> pre03 variables have a value   >= '1'
> (5) create a variable VIO, when any of the pre01 to pre06 variables 
> have a value   >= '1'
> (6) create a variable SERASS. If pre02111 or pre2114 >= '1', assign a 
> value of 1, if there is a value of 1 or greater for pre0211 assign a 
> value of 2; &  if there is a value of
> 1 or greater for pre0212: assign a value of 3;  if there is a value 
> of 1 or greater for pre2029 assign a value of 4; everything else = 0. 
> If a case has multiple values, 02111 prevails over 2114, 2114 
> prevails over 0211, 0211 prevails over 0212; 0212 prevails over 2029.
> 
> 
> I believe I can generate new variables (1) - (5) using code such 
> as:  ASS <- (reoffend$pre02 | reoffend$pre02111 | reoffend$pre02114 | 
> reoffend$pre0211 | reoffend$pre0212 | reoffend$pre029 >= '1')
> 
> 
> I have three questions:
> 
> 1. If this is correct, what is the most efficient way to generate (1) 
> without having to type all the variable names. The following does not 
> work: PRE <- reoffend [,9:35], >= '1'

Try something like this (data frame simplified):
df <- data.frame(pre1=c(0,1,1,2),
                 pre2=c(0,0,1,0),
                 foo=c(0,0,1,3))
precols <- grep("pre", names(df))
gt1 <- function(x) x>=1
PRE <- apply(apply(df[,precols], 2, gt1), 1, any)


> 2. I am unsure as to how to generate Example 6.
SERASS <- rep(0, nrow(df))
SERASS[df$pre2029>=1] <- 4
SERASS[df$pre0212>=1] <- 3
SERASS[df$pre0211>=1] <- 2
SERASS[df$pre02111>=1 | df$pre2114>=1] <- 1


> 3. I wanted to exclude cases with a reoffend$Decision of value of 3, 
> using the code below. However, I received a message saying there were 
> NAs produced, however, the raw variable did not have NAs.
> 
>  > MHT.decision <- reoffend[reoffend$Decision >= '2',]
>  > table(MHT.decision)
> Error in vector("integer", length) : vector size cannot be NA
> In addition: Warning messages:
> 1: NAs produced by integer overflow in: pd * (as.integer(cat) - 1L)
> 2: NAs produced by integer overflow in: pd * nl
> 
>  > table(reoffend$Decision)
>     1    2    3
> 1136  445   66

I doubt that you want quotes around the '2' when defining MHT.decision.

Regards,
Richie.

Mathematical Sciences Unit
HSL


------------------------------------------------------------------------
ATTENTION:

This message contains privileged and confidential inform...{{dropped:20}}



More information about the R-help mailing list