[R] Paste every two columns together

John Posner john.posner at MJBIOSTAT.COM
Thu Jan 29 05:54:48 CET 2015


Kate, here's a solution that uses regular expressions, rather than vector manipulation:

> mystr = "ID1 A A T G C T G C G T C G T A"
> gsub(" ([ACGT]) ([ACGT])", " \\1\\2", mystr)
[1] "ID1 AA TG CT GC GT CG TA"

-John


> -----Original Message-----
> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Chel Hee
> Lee
> Sent: Wednesday, January 28, 2015 11:07 PM
> To: Bert Gunter
> Cc: r-help
> Subject: Re: [R] Paste every two columns together
> 
> Hi Bert! yes, you are VERY correct!!!  Why am I making this simple thing so
> complicated???  ;) Thank you so much for your nice lesson!
> 
> Chel Hee Lee
> 
> On 01/28/2015 09:59 PM, Bert Gunter wrote:
> > eek!
> >
> > Chel Hee,anything that complicated should engender fear and trembling.
> >
> > Much simpler and more efficient (if I understand correctly)
> >
> > i <- seq.int(1L,length(ID1),by = 2L)
> > paste0(ID1[i],ID1[i+1])
> >
> > That gives a vector of paired letters. If you want a single character
> > string, just collapse with a " " (space):
> >
> > paste0(ID1[i],ID1[i+1],collapse= " ")
> >
> > Cheers,
> > Bert
> >
> > Bert Gunter
> > Genentech Nonclinical Biostatistics
> > (650) 467-7374
> >
> > "Data is not information. Information is not knowledge. And knowledge
> > is certainly not wisdom."
> > Clifford Stoll
> >
> >
> >
> >
> > On Wed, Jan 28, 2015 at 7:41 PM, Chel Hee Lee <chl948 at mail.usask.ca>
> wrote:
> >> I am using just the first row of your data (i.e. ID1).
> >>
> >>> ID1 <- c("A", "A", "T", "G", "C", "T", "G", "C", "G", "T", "C", "G",
> >>> "T",
> >>> "A")
> >>> do.call(c,lapply(tapply(ID1, gl(7,2), c), paste, collapse=""))
> >>     1    2    3    4    5    6    7
> >> "AA" "TG" "CT" "GC" "GT" "CG" "TA"
> >>>
> >>
> >> Is this what you are looking for?  I hope this helps.
> >>
> >> Chel Hee Lee
> >>
> >>
> >> On 01/28/2015 05:55 PM, Kate Ignatius wrote:
> >>>
> >>> I have genetic data as follows (simple example, actual data is much
> >>> larger):
> >>>
> >>> comb =
> >>>
> >>> ID1 A A T G C T G C G T C G T A
> >>>
> >>> ID2 G C T G C C T G C T G T T T
> >>>
> >>> And I wish to get an output like this:
> >>>
> >>> ID1 AA TG CT GC GT CG TA
> >>>
> >>> ID2 GC TG CC TG CT GT TT
> >>>
> >>> That is, paste every two columns together.
> >>>
> >>> I have this code, but I get the error:
> >>>
> >>> Error in seq.default(2, nchar(x), 2) : 'to' must be of length 1
> >>>
> >>> conc <- function(x) {
> >>>     s <- seq(2, nchar(x), 2)
> >>>     paste0(x[s], x[s+1])
> >>> }
> >>>
> >>> combn <- as.data.frame(lapply(comb, conc), stringsAsFactors=FALSE)
> >>>
> >>> Thanks in advance!
> >>>
> >>> ______________________________________________
> >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >>> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list