[R] How can I get this function to work?

Bert Gunter gunter.berton at gene.com
Thu May 31 19:54:47 CEST 2012


Well, good luck finding someone to wade through your code --
"small,reproducible" examples are requested for a reason -- but I will
offer that I have no idea what you mean with your remark about
anonymous functions, as the code you posted has none.

-- Bert

On Thu, May 31, 2012 at 10:38 AM, Paul Miller <pjmiller_57 at yahoo.com> wrote:
> Hello All,
>
> Can anyone tell help me understand why the function below doesn't work and how I can fix it? Below are some sample data, some code that works on individual rows of the data, and my attempt to translate that code into a function. My hope is to get the function working and then to apply it to the larger data frame using ddply() from the plyr package or possibly some other approach.
>
> As yet, I don't have much experience writing anonymous functions. I imagine I'm doing something that is obviously wrong, but I don't know what it is.
>
> Thanks,
>
> Paul
>
> #### Read in test data ####
>
> testData <-
> structure(list(profile_key = structure(c(1L, 1L, 2L, 2L, 2L,
> 3L, 3L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 7L, 7L), .Label = c("001-001 ",
> "001-002 ", "001-003 ", "001-004 ", "001-005 ", "001-006 ", "001-007 "
> ), class = "factor"), encounter_date = structure(c(9L, 10L, 11L,
> 12L, 13L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 4L, 7L, 7L), .Label = c(" 2009-03-01 ",
> " 2009-03-22 ", " 2009-04-01 ", " 2010-03-01 ", " 2010-10-15 ",
> " 2010-11-15 ", " 2011-03-01 ", " 2011-03-14 ", " 2011-10-10 ",
> " 2011-10-24 ", " 2012-09-15 ", " 2012-10-05 ", " 2012-10-17 "
> ), class = "factor"), raw = c(" ordered kras testing on 10102010 results not yet available if patient has a mutation will start erbitux ",
> " received kras results on 10202010 test results indicate tumor is wild type ua protein positve erpr positive her2neu positve ",
> " will conduct kras mutation testing prior to initiation of therapy with erbitux ",
> " still need to order kras mutation testing ", " ordered kras testing waiting for results ",
> " kras test results pending note that patient was negative for lynch mutation ",
> " kras results still pending note that patient was negative for lynch mutation ",
> " kras mutated will not prescribe erbitux due to mutation ",
> " kras mutated therefore did not prescribe erbitux ", " kras wild ",
> " tumor is negative for mutation ", " tumor is wild type patient is eligible to receive eribtux ",
> " if patient kras result is wild type they will start erbitux several lines of material ordered kras mutation test 11112011 results are still not available ",
> " kras results are in patient has the mutation ", " ordered kras mutation testing on 02152011 results came back negative several lines of material patient kras mutation test is negative will start erbitux ",
> " patient is kras negative started erbitux on 03012011 ")), .Names = c("profile_key",
> "encounter_date", "raw"), row.names = c(NA, -16L), class = "data.frame")
>
> #### Convert text record to lowercase ####
>
> testData$raw <- tolower(testData$raw)
>
> #### Remove punctuation and any multiple spaces ####
>
> testData$raw <- gsub("[[:punct:]]", "", testData$raw)
> testData$raw <- gsub(" +", " ", testData$raw)
>
> #### Select test row ####
>
> testRow <- testData[13,]
> testRow
>
> #### Select terms +/- a specified number of words from "kras" ####
>
> Text <- unlist(strsplit(testRow$raw, " "))
> Target <- grep("kras", Text)
>
> if (length(Target) == 0) {testRow$reduced <- ""} else{
>
> Length <- length(Text)
> Keep <- rep(NA, Length)
> Lower <- ifelse(Target - 6 > 0, Target - 6, 1)
> Upper <- ifelse(Target + 6 < Length, Target + 6, Length)
>
> for(i in 1:length(Keep)){
> for(j in 1:length(Lower)){
>        Keep[i][i %in% seq(Lower[j], Upper[j])] <- i
> }}
>
> testRow$reduced <- paste(Text[!is.na(Keep)], collapse=" ")
>
> }
>
> testRow
>
> length(Text)
> length(Text[!is.na(Keep)])
>
> #### Function for selecting words within specified range of a target term ####
>
> nearTerms <- function(df, text, target, before, after, outvar){
>
>   Text <- with(df, strsplit(text, " "))
>   Target <- grep(target, Text)
>
>   if (length(Target) == 0) {df$reduced <- ""} else{
>
>   Length <- length(Text)
>   Keep <- rep(NA, Length)
>   Lower <- ifelse(Target - before > 0, Target - before, 1)
>   Upper <- ifelse(Target + after < Length, Target + after, Length)
>
>   for(i in 1:length(Keep)){
>   for(j in 1:length(Lower)){
>      Keep[i][i %in% seq(Lower[j], Upper[j])] <- i
>   }}
>
>   df <- transform(df, outvar = paste(Text[!is.na(Keep)], collapse=" "))
>
>   }
>
> }
>
> nearTerms(testRow, raw, "kras", 6, 6)
>
> nearTerms(df = testRow, text = raw, target = "kras", before = 6, after = 6)
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm



More information about the R-help mailing list