[R] search across a row for strings
Sarah Goslee
sarah.goslee at gmail.com
Mon Jun 15 22:29:37 CEST 2015
This faster than your version, and doesn't return NA:
df$htn <- apply(df[,2:4], 1, function(x)any(grepl("^410", x)))
> df
ID DX1 DX2 DX3 htn
1 1 4109 4280 7102 TRUE
2 2 734 311 490 FALSE
3 3 4011 42822 4101 TRUE
> system.time({
+ for(j in 1:10000) {
+ for (i in 1:nrow(df)) {
+ df[i,"htn"] <- any(sapply('410', function(x) which( grepl(x,
df[i, 2:4], fixed = TRUE) )))
+ }
+ }
+ })
user system elapsed
6.648 0.008 6.657
There were 50 or more warnings (use warnings() to see the first 50)
>
>
>
> system.time({
+ for(j in 1:10000) {
+ df$htn <- apply(df[,2:4], 1, function(x)any(grepl("^410", x)))
+ }
+ })
user system elapsed
1.826 0.000 1.826
On Mon, Jun 15, 2015 at 4:12 PM, Federman, Douglas
<Douglas.Federman at utoledo.edu> wrote:
> I'm trying to do the following: search each patient's list of diagnoses for a specific code then create a new column based upon the the presence of the specific code.
> Simplified data follows:
>
> con <- textConnection("
> ID DX1 DX2 DX3
> 1 4109 4280 7102
> 2 734 311 490
> 3 4011 42822 4101
> ")
> df <- read.table(con, header = TRUE, strip.white = TRUE, colClasses="character")
> #
> # I would like to add a column such the result of searching for 410 would give: The search string would always be at the start of a word and doesn't need regex.
> #
> # ID DX1 DX2 DX3 htn
> # 1 4109 4280 7102 1
> # 2 734 311 490 0
> # 3 4011 42822 4101 1
> #
> # The following works but is slow and returns NA if the search string is not found:
>
> for (i in 1:nrow(df)) {
> df[i,"htn"] <- any(sapply('410', function(x) which( grepl(x, df[i, 2:4], fixed = TRUE) )))
> }
>
> Thanks in advance. I never fail to learn new things from this list.
>
--
Sarah Goslee
http://www.functionaldiversity.org
More information about the R-help
mailing list