[R] search across a row for strings

Sarah Goslee sarah.goslee at gmail.com
Mon Jun 15 22:29:37 CEST 2015


This faster than your version, and doesn't return NA:

df$htn <- apply(df[,2:4], 1, function(x)any(grepl("^410", x)))

> df
  ID  DX1   DX2  DX3   htn
1  1 4109  4280 7102  TRUE
2  2  734   311  490 FALSE
3  3 4011 42822 4101  TRUE

> system.time({
+  for(j in 1:10000) {
+   for (i in 1:nrow(df)) {
+     df[i,"htn"] <- any(sapply('410', function(x)  which( grepl(x,
df[i, 2:4], fixed = TRUE) )))
+   }
+  }
+ })
   user  system elapsed
  6.648   0.008   6.657
There were 50 or more warnings (use warnings() to see the first 50)
>
>
>
> system.time({
+  for(j in 1:10000) {
+   df$htn <- apply(df[,2:4], 1, function(x)any(grepl("^410", x)))
+  }
+ })
   user  system elapsed
  1.826   0.000   1.826


On Mon, Jun 15, 2015 at 4:12 PM, Federman, Douglas
<Douglas.Federman at utoledo.edu> wrote:
> I'm trying to do the following: search each patient's list of diagnoses for a specific code then create a new column based upon the the presence of the specific code.
> Simplified data follows:
>
> con <- textConnection("
> ID      DX1     DX2     DX3
> 1       4109    4280    7102
> 2       734     311     490
> 3       4011    42822   4101
> ")
> df <- read.table(con, header = TRUE, strip.white = TRUE, colClasses="character")
> #
> # I would like to add a column such the result of searching for 410 would give:  The search string would always be at the start of a word and doesn't need regex.
> #
> # ID    DX1     DX2     DX3     htn
> # 1     4109    4280    7102    1
> # 2     734     311     490     0
> # 3     4011    42822   4101    1
> #
> # The following  works but is slow and returns NA if the search string is not found:
>
> for (i in 1:nrow(df)) {
>     df[i,"htn"] <- any(sapply('410', function(x)  which( grepl(x, df[i, 2:4], fixed = TRUE) )))
> }
>
> Thanks in advance.  I never fail to learn new things from this list.
>
-- 
Sarah Goslee
http://www.functionaldiversity.org



More information about the R-help mailing list