[R] how to load only lines that start with a particular symbol
jim holtman
jholtman at gmail.com
Tue Sep 15 23:04:44 CEST 2009
read in the data with 'readLines' and then use 'grep'
> x
[1] ">gene A;....." "AAAAACCCC" "TTTTTGGGG" "CCCTTTTTT"
">gene B;...." "CCCCCAAAA" "GGGGGTTTT"
> x <- x[grep("^>", x)]
> x
[1] ">gene A;....." ">gene B;...."
>
On Tue, Sep 15, 2009 at 4:59 PM, J Chen <jiaxuan.chen at mdc-berlin.de> wrote:
>
> Dear all,
>
> I have DNA sequence data which are fasta-formatted as
>
>>gene A;.....
> AAAAACCCC
> TTTTTGGGG
> CCCTTTTTT
>>gene B;....
> CCCCCAAAA
> GGGGGTTTT
>
> I want to load only the lines that start with ">" where the annotation
> information for the gene is contained. In principle, I can remove the
> sequences before loading or after loading all the lines. I just wonder if
> there's a way to load only lines with a particular pattern. The skip
> argument in read.table() doesn't work for my purpose.
>
> Thanks in advance,
> Jimmy
>
> --
> View this message in context: http://www.nabble.com/how-to-load-only-lines-that-start-with-a-particular-symbol-tp25461693p25461693.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem that you are trying to solve?
More information about the R-help
mailing list