[R] how to load only lines that start with a particular symbol

jim holtman jholtman at gmail.com
Tue Sep 15 23:04:44 CEST 2009


read in the data with 'readLines' and then use 'grep'

> x
[1] ">gene A;....." "AAAAACCCC"     "TTTTTGGGG"     "CCCTTTTTT"
">gene B;...."  "CCCCCAAAA"     "GGGGGTTTT"
> x <- x[grep("^>", x)]
> x
[1] ">gene A;....." ">gene B;...."
>


On Tue, Sep 15, 2009 at 4:59 PM, J Chen <jiaxuan.chen at mdc-berlin.de> wrote:
>
> Dear all,
>
> I have DNA sequence data which are fasta-formatted as
>
>>gene A;.....
> AAAAACCCC
> TTTTTGGGG
> CCCTTTTTT
>>gene B;....
> CCCCCAAAA
> GGGGGTTTT
>
> I want to load only the lines that start with ">" where the annotation
> information for the gene is contained. In principle, I can remove the
> sequences before loading or after loading all the lines. I just wonder if
> there's a way to load only lines with a particular pattern. The skip
> argument in read.table() doesn't work for my purpose.
>
> Thanks in advance,
> Jimmy
>
> --
> View this message in context: http://www.nabble.com/how-to-load-only-lines-that-start-with-a-particular-symbol-tp25461693p25461693.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?




More information about the R-help mailing list