[R] how to load only lines that start with a particular symbol

Tue Sep 15 23:48:34 CEST 2009

Along those lines, python is so easy to use for stuff like this. Sample
code would be

# Read in a file with the data

filename = raw_input("Please enter the name of the original file: ")
new_file = raw_input("Enter the name of the file to output: ")

# create a new file defined by the user
f = open(new_file, 'w')

outfile = open(filename, 'r')

for line in outfile:
	if line[0] == '>':
	   print >> f, line
f.close() 

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of William Dunlap
> Sent: Tuesday, September 15, 2009 5:45 PM
> To: J Chen; r-help at r-project.org
> Subject: Re: [R] how to load only lines that start with a 
> particular symbol
> 
> > -----Original Message-----
> > From: r-help-bounces at r-project.org
> > [mailto:r-help-bounces at r-project.org] On Behalf Of J Chen
> > Sent: Tuesday, September 15, 2009 2:00 PM
> > To: r-help at r-project.org
> > Subject: [R] how to load only lines that start with a particular 
> > symbol
> > 
> > 
> > Dear all,
> > 
> > I have DNA sequence data which are fasta-formatted as
> > 
> > >gene A;.....
> > AAAAACCCC
> > TTTTTGGGG
> > CCCTTTTTT
> > >gene B;....
> > CCCCCAAAA
> > GGGGGTTTT
> > 
> > I want to load only the lines that start with ">" where the 
> annotation 
> > information for the gene is contained. In principle, I can 
> remove the 
> > sequences before loading or after loading all the lines. I 
> just wonder 
> > if there's a way to load only lines with a particular pattern. The 
> > skip argument in read.table() doesn't work for my purpose.
> 
> You could use pipe() to call an external program like grep or 
> perl to filter the lines of interest from the file so R's 
> input routine  only has to allocate space for those.  E.g., 
> the following makes a sample file and the readLines(pipe(...))
> call reads only the lines starting with ">> " from it.   (It
> assumes you don't have grep in PATH and gives where it is 
> installed on my Windows machine.)
> 
>   > tfile <- tempfile()
>   > cat(file=tfile, sep="\n", c(">> Date", ">> Author", 
> "columnA columnB", "1 2", "3 4"))
> 
>   > readLines(tfile)
>   [1] ">> Date"         ">> Author"       "columnA columnB" "1 2"
> 
>   [5] "3 4"            
>   > readLines(pipe(paste("e:/cygwin/bin/grep \"^>> \" ", tfile)))
>   [1] ">> Date"   ">> Author"
> 
> perl can do more complicated processing and filtering than grep.
> 
> Bill Dunlap
> TIBCO Software Inc - Spotfire Division
> wdunlap tibco.com  
> 
> > 
> > Thanks in advance,
> > Jimmy
> > 
> > --
> > View this message in context: 
> > http://www.nabble.com/how-to-load-only-lines-that-start-with-a
> > -particular-symbol-tp25461693p25461693.html
> > Sent from the R help mailing list archive at Nabble.com.
> > 
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> > 
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>