[R] Regex Question: return digits after particular letters
David Winsemius
dwinsemius at comcast.net
Thu Jun 2 21:33:02 CEST 2011
On Jun 2, 2011, at 2:54 PM, Ben Ganzfried wrote:
> Hi,
>
> First of all, I would like to introduce myself as I will probably
> have many
> questions over the next few weeks and want to thank you guys in
> advance for
> your help. I'm a cancer researcher and I need to learn R to
> complete a few
> projects. I have an introductory background in Python.
>
> My questions at the moment are based on the following sample input
> file:
> *Sample_Input_File*
> characteristics_ch1.3 Stage: T1N0 Stage: T2N1 Stage: T0N0 Stage:
> T1N0 Stage:
> T0N3
>
I haven't quite figured out what your structure really is, and for
that you should learn to post the output of dput() on the R object...
but see if this helps:
> stg <- c('Stage: T1N0', 'Stage: T2N1', 'Stage: T0N0', 'Stage:
T1N0', 'Stage: T0N3')
> Tstg <- sub(".*T(\\d)N.", "\\1", stg)
> Tstg
#[1] "1" "2" "0" "1" "0"
> Nstg <- sub(".*T\\dN(\\d)", "\\1", stg)
> Nstg
#[1] "0" "1" "0" "0" "3"
> "characteristics_ch1.3" is a column header in the input excel file.
>
> "T's" represent stage and "N's" represent degree of disease spreading.
>
> I want to create output that looks like this:
> *Sample_Output_File*
> T N
> 1 0
> 2 1
> 0 0
> 1 0
> 0 3
>
> As it currently stands, my code is the following:
>
> # rm(list=ls())
####----
AND PLEASE DON"T POST THAT CODE WITHOUT A COMMENT.
I noticed it this time, but it is very aggravating to accidentally
wide out hours of work while trying to offer help.
> source("../../functions.R")
>
> uncurated <- read.csv("../uncurated/
> Sample_Input_File_full_pdata.csv",as.is
> =TRUE,row.names=1)
>
> ##initial creation of curated dataframe
> curated <-
> initialCuratedDF
> (rownames(uncurated),template.filename="Sample_Template_File.csv")
>
> ##--------------------
> ##start the mappings
> ##--------------------
>
>
> ##title -> alt_sample_name
> curated$alt_sample_name <- uncurated$title
>
> #T
> tmp <- uncurated$characteristics_ch1.3
> tmp <- *??????*
> curated$T <- tmp
So here Tstg is tmp
>
> #N
> tmp <- uncurated$characteristics_ch1.3
> tmp <- *??????*
> curated$N <- tmp
And Nstg is tmp
> write.table(curated, row.names=FALSE,
> file="../curated/Sample_Output_File_curated_pdata.txt",sep="\t")
>
> My question is the following:
>
> What code gets me the desired output (replacing the *??????*'s
> above)? I
> want to: a) Find the integer value one element to the right of "T";
> and b)
> find the integer value one element to the right of "N". I've read the
> regular expression tutorial for R, but could only figure out how to
> grab an
> integer value if it is the only integer value in the row (ie more
> than one
> integer value makes this basic regular expression unsuccessful).
Just surround it with a pattern and use the () , "\\n" mechanism
>
> Thank you very much for any help you can provide.
>
> Sincerely,
>
> Ben Ganzfried
>
> [[alternative HTML version deleted]]
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list