[R] How can I get this function to work?

Fri Jun 1 21:13:24 CEST 2012

Hi Paul,

I think you're making it far too complicated. With some minor tweaking
to your function, I can easily process the entire data frame you
originally presented.

nearTerms <- function(rawtext, target, before, after){
  Text <- unlist(strsplit(rawtext, " "))
  Target <- grep(target, Text)

  if (length(Target) == 0) {mydf <- ""} else{

  Length <- length(Text)
  Keep <- rep(NA, Length)
  Lower <- ifelse(Target - before > 0, Target - before, 1)
  Upper <- ifelse(Target + after < Length, Target + after, Length)

  for(i in 1:length(Keep)){
  for(j in 1:length(Lower)){
     Keep[i][i %in% seq(Lower[j], Upper[j])] <- i
  }}

  mydf <- paste(Text[!is.na(Keep)], collapse=" ")

  }

  names(mydf) <- NULL

  mydf
}

testData <-
structure(list(profile_key = structure(c(1L, 1L, 2L, 2L, 2L,
3L, 3L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 7L, 7L), .Label = c("001-001 ",
"001-002 ", "001-003 ", "001-004 ", "001-005 ", "001-006 ", "001-007 "
), class = "factor"), encounter_date = structure(c(9L, 10L, 11L,
12L, 13L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 4L, 7L, 7L), .Label = c("
2009-03-01 ",
" 2009-03-22 ", " 2009-04-01 ", " 2010-03-01 ", " 2010-10-15 ",
" 2010-11-15 ", " 2011-03-01 ", " 2011-03-14 ", " 2011-10-10 ",
" 2011-10-24 ", " 2012-09-15 ", " 2012-10-05 ", " 2012-10-17 "
), class = "factor"), raw = c(" ordered kras testing on 10102010
results not yet available if patient has a mutation will start erbitux
",
" received kras results on 10202010 test results indicate tumor is
wild type ua protein positve erpr positive her2neu positve ",
" will conduct kras mutation testing prior to initiation of therapy
with erbitux ",
" still need to order kras mutation testing ", " ordered kras testing
waiting for results ",
" kras test results pending note that patient was negative for lynch mutation ",
" kras results still pending note that patient was negative for lynch
mutation ",
" kras mutated will not prescribe erbitux due to mutation ",
" kras mutated therefore did not prescribe erbitux ", " kras wild ",
" tumor is negative for mutation ", " tumor is wild type patient is
eligible to receive eribtux ",
" if patient kras result is wild type they will start erbitux several
lines of material ordered kras mutation test 11112011 results are
still not available ",
" kras results are in patient has the mutation ", " ordered kras
mutation testing on 02152011 results came back negative several lines
of material patient kras mutation test is negative will start erbitux
",
" patient is kras negative started erbitux on 03012011 ")), .Names =
c("profile_key",
"encounter_date", "raw"), row.names = c(NA, -16L), class = "data.frame")

reduced <- sapply(testData$raw, nearTerms, target = "kras", before =
6, after = 6)
testData <- cbind(testData, reduced=reduced)

Sarah

On Fri, Jun 1, 2012 at 2:51 PM, Paul Miller <pjmiller_57 at yahoo.com> wrote:
> Hello Bert and Sarah,
>
> Thank you for your replies. Helped me understand how people might perceive my question and why they might not respond.
>
> Spent some time learning about R's debugging tools this morning. Began to realize why my function didn't work. My second argument was the name of a variable. What I didn't realize is that R would immediately expect this to be a previously defined object. I had thought that passing the name of the variable to the body of the function would generate a correct line of code, and that this was all that was required to get the function to work.
>
> Below is a function that does work, at least when applied to a single row of data. I had previously been reading about the Split-Apply-Combine  strategy in a paper about the plyr package. The paper advocates coming up with a function that works for a subset of one's data and then using plyr to split up the data and apply the function to each of the subsets. Was under the impression that this last part would be easy. Seems not to be the case though.
>
> So on to the next part.
>
> Thanks again for your feedback.
>
> Paul
>
>
> #### Test row ####
>
> testRow <-
> structure(list(profile_key = structure(6L, .Label = c("001-001 ",
> "001-002 ", "001-003 ", "001-004 ", "001-005 ", "001-006 ", "001-007 "
> ), class = "factor"), encounter_date = structure(4L, .Label = c(" 2009-03-01 ",
> " 2009-03-22 ", " 2009-04-01 ", " 2010-03-01 ", " 2010-04-01 ",
> " 2010-10-15 ", " 2010-11-15 ", " 2011-03-01 ", " 2011-03-14 ",
> " 2011-04-01 ", " 2011-10-10 ", " 2011-10-24 ", " 2012-09-15 ",
> " 2012-10-05 ", " 2012-10-17 "), class = "factor"), raw = " if patient kras result is wild type they will start erbitux several lines of material ordered kras mutation test 11112011 results are still not available "), .Names = c("profile_key",
> "encounter_date", "raw"), row.names = 13L, class = "data.frame")
>
> testRow
>
> #### Function for selecting words within specified range of a target term ####
>
> nearTerms <- function(df, rawtext, target, before, after, reduced){
>   Text <- unlist(strsplit(df[,rawtext], " "))
>   Target <- grep(target, Text)
>
>   if (length(Target) == 0) {df <- transform(df, outtext = "")} else{
>
>   Length <- length(Text)
>   Keep <- rep(NA, Length)
>   Lower <- ifelse(Target - before > 0, Target - before, 1)
>   Upper <- ifelse(Target + after < Length, Target + after, Length)
>
>   for(i in 1:length(Keep)){
>   for(j in 1:length(Lower)){
>      Keep[i][i %in% seq(Lower[j], Upper[j])] <- i
>   }}
>
>   df <- transform(df, outtext = paste(Text[!is.na(Keep)], collapse=" "))
>
>   }
>
>   names(df)[names(df) == "outtext"] <- reduced
>   df <- df
> }
>
> testRow <- nearTerms(df = testRow, rawtext = "raw", target = "kras", before = 6, after = 6, reduced = "reduced")
> testRow

-- 
Sarah Goslee
http://www.functionaldiversity.org