[R] search across a row for strings
David Winsemius
dwinsemius at comcast.net
Mon Jun 15 22:34:25 CEST 2015
On Jun 15, 2015, at 1:12 PM, Federman, Douglas wrote:
> I'm trying to do the following: search each patient's list of diagnoses for a specific code then create a new column based upon the the presence of the specific code.
> Simplified data follows:
>
> con <- textConnection("
> ID DX1 DX2 DX3
> 1 4109 4280 7102
> 2 734 311 490
> 3 4011 42822 4101
> ")
> df <- read.table(con, header = TRUE, strip.white = TRUE, colClasses="character")
> #
> # I would like to add a column such the result of searching for 410 would give: The search string would always be at the start of a word and doesn't need regex.
> #
> # ID DX1 DX2 DX3 htn
> # 1 4109 4280 7102 1
> # 2 734 311 490 0
> # 3 4011 42822 4101 1
> #
> # The following works but is slow and returns NA if the search string is not found:
>
> for (i in 1:nrow(df)) {
> df[i,"htn"] <- any(sapply('410', function(x) which( grepl(x, df[i, 2:4], fixed = TRUE) )))
> }
Is this any better?
> df$htn <- apply(df[-1], 1, function(r) max( substr(r, 1,3) == "410" ))
> df
ID DX1 DX2 DX3 htn
1 1 4109 4280 7102 1
2 2 734 311 490 0
3 3 4011 42822 4101 1
Can add an na.rm=TRUE to the max call if warranted. `max` coerces logicals to integer.
--
David Winsemius
Alameda, CA, USA
More information about the R-help
mailing list