[R] how to load only lines that start with a particular symbol
Doran, Harold
HDoran at air.org
Tue Sep 15 23:48:34 CEST 2009
Along those lines, python is so easy to use for stuff like this. Sample
code would be
# Read in a file with the data
filename = raw_input("Please enter the name of the original file: ")
new_file = raw_input("Enter the name of the file to output: ")
# create a new file defined by the user
f = open(new_file, 'w')
outfile = open(filename, 'r')
for line in outfile:
if line[0] == '>':
print >> f, line
f.close()
> -----Original Message-----
> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] On Behalf Of William Dunlap
> Sent: Tuesday, September 15, 2009 5:45 PM
> To: J Chen; r-help at r-project.org
> Subject: Re: [R] how to load only lines that start with a
> particular symbol
>
> > -----Original Message-----
> > From: r-help-bounces at r-project.org
> > [mailto:r-help-bounces at r-project.org] On Behalf Of J Chen
> > Sent: Tuesday, September 15, 2009 2:00 PM
> > To: r-help at r-project.org
> > Subject: [R] how to load only lines that start with a particular
> > symbol
> >
> >
> > Dear all,
> >
> > I have DNA sequence data which are fasta-formatted as
> >
> > >gene A;.....
> > AAAAACCCC
> > TTTTTGGGG
> > CCCTTTTTT
> > >gene B;....
> > CCCCCAAAA
> > GGGGGTTTT
> >
> > I want to load only the lines that start with ">" where the
> annotation
> > information for the gene is contained. In principle, I can
> remove the
> > sequences before loading or after loading all the lines. I
> just wonder
> > if there's a way to load only lines with a particular pattern. The
> > skip argument in read.table() doesn't work for my purpose.
>
> You could use pipe() to call an external program like grep or
> perl to filter the lines of interest from the file so R's
> input routine only has to allocate space for those. E.g.,
> the following makes a sample file and the readLines(pipe(...))
> call reads only the lines starting with ">> " from it. (It
> assumes you don't have grep in PATH and gives where it is
> installed on my Windows machine.)
>
> > tfile <- tempfile()
> > cat(file=tfile, sep="\n", c(">> Date", ">> Author",
> "columnA columnB", "1 2", "3 4"))
>
> > readLines(tfile)
> [1] ">> Date" ">> Author" "columnA columnB" "1 2"
>
> [5] "3 4"
> > readLines(pipe(paste("e:/cygwin/bin/grep \"^>> \" ", tfile)))
> [1] ">> Date" ">> Author"
>
> perl can do more complicated processing and filtering than grep.
>
> Bill Dunlap
> TIBCO Software Inc - Spotfire Division
> wdunlap tibco.com
>
> >
> > Thanks in advance,
> > Jimmy
> >
> > --
> > View this message in context:
> > http://www.nabble.com/how-to-load-only-lines-that-start-with-a
> > -particular-symbol-tp25461693p25461693.html
> > Sent from the R help mailing list archive at Nabble.com.
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list