[R] how to load only lines that start with a particular symbol
Gabor Grothendieck
ggrothendieck at gmail.com
Wed Sep 16 02:50:54 CEST 2009
In the Windows cmd shell ^ means escape the next character
so try this (assuming the data you posted
is in genetest.dat in the current directory):
> readLines(pipe("findstr/b ^> genetest.dat"))
[1] ">gene A;....." ">gene B;...."
and on UNIX replace "..." with the corresponding grep command
making sure you appropriately escape the > depending on the
shell you use.
On Tue, Sep 15, 2009 at 4:59 PM, J Chen <jiaxuan.chen at mdc-berlin.de> wrote:
>
> Dear all,
>
> I have DNA sequence data which are fasta-formatted as
>
>>gene A;.....
> AAAAACCCC
> TTTTTGGGG
> CCCTTTTTT
>>gene B;....
> CCCCCAAAA
> GGGGGTTTT
>
> I want to load only the lines that start with ">" where the annotation
> information for the gene is contained. In principle, I can remove the
> sequences before loading or after loading all the lines. I just wonder if
> there's a way to load only lines with a particular pattern. The skip
> argument in read.table() doesn't work for my purpose.
>
> Thanks in advance,
> Jimmy
>
> --
> View this message in context: http://www.nabble.com/how-to-load-only-lines-that-start-with-a-particular-symbol-tp25461693p25461693.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list