[R] Data parsing question: adding characters within a string of characters
Frede Aakmann Tøgersen
frtog at vestas.com
Thu Jan 2 12:19:08 CET 2014
Hi Joshua
This is one way to do it. Not sure if it this is an efficient implementation for your needs; it depends on the size of your data.
string1 <- "ATCGCCCGTA[AGA]TAACCG"
string2 <- "ATTATACGCA[AAATGCCCCA]GCTA[AT]GCATTA"
foo <- function(genes){
mypaste <- function(x) paste("[", paste(x, collapse = "]["), "]", sep = "")
tmp <- strsplit(genes, "[[:punct:]]")[[1]]
str <- gregexpr("\\[", genes)[[1]]
stp <- gregexpr("\\]", genes)[[1]]
tmp2 <- substring(genes, str + 1, stp - 1)
ndx <- match(tmp2, tmp)
tmp[ndx] <- lapply(strsplit(tmp2, ""), mypaste)
result <- paste(tmp, collapse = "")
return(result)
}
> foo(string2)
[1] "ATTATACGCA[A][A][A][T][G][C][C][C][C][A]GCTA[A][T]GCATTA"
> foo(string1)
[1] "ATCGCCCGTA[A][G][A]TAACCG"
>
Yours sincerely / Med venlig hilsen
Frede Aakmann Tøgersen
Specialist, M.Sc., Ph.D.
Plant Performance & Modeling
Technology & Service Solutions
T +45 9730 5135
M +45 2547 6050
frtog at vestas.com
http://www.vestas.com
Company reg. name: Vestas Wind Systems A/S
This e-mail is subject to our e-mail disclaimer statement.
Please refer to www.vestas.com/legal/notice
If you have received this e-mail in error please contact the sender.
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
> On Behalf Of Joshua Banta
> Sent: 2. januar 2014 04:56
> To: R Help
> Subject: [R] Data parsing question: adding characters within a string of
> characters
>
> Dear Listserve,
>
> I have a data-parsing question for you. I recognize this is more in the domain
> of PERL/Python, but I don't know those languages! On the other hand, I am
> pretty good overall with R, so I'd rather get the job done within the R
> "ecosphere."
>
> Here is what I want to do. Consider the following data:
>
> string <- "ATCGCCCGTA[AGA]TAACCG"
>
> I want to alter string so that it looks like this:
>
> ATCGCCCGTA[A][G][A]TAACCG
>
> In other words, I want to design a piece of code that will scan a character
> string, find bracketed groups of characters, break up each character within
> the bracket into its own individual bracketed character, and then put the
> group of individually bracketed characters back into the character string. The
> lengths of the character strings enclosed by a bracket will vary, but in every
> case, I want to do the same thing: break up each character within the bracket
> into its own individual bracketed character, and then put the group of
> individually bracketed characters back into the character string.
>
> So, for example, another string may look like this:
>
> string2 <- "ATTATACGCA[AAATGCCCCA]GCTA[AT]GCATTA"
>
> I want to alter string so that it looks like this:
>
> "ATTATACGCA[A][A][A][T][G][C][C][C][C][A]GCTA[A][T]GCATTA"
>
> Thank you all in advance and have a great 2014!
>
> -----------------------------------
> Josh Banta, Ph.D
> Assistant Professor
> Department of Biology
> The University of Texas at Tyler
> Tyler, TX 75799
> Tel: (903) 565-5655
> http://plantevolutionaryecology.org
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list