[Bioc-devel] Append/combine option for filterFastq and similar?
mtmorgan at fredhutch.org
Tue Apr 21 23:31:51 CEST 2015
On 04/21/2015 02:18 PM, Ryan C. Thompson wrote:
> Often when sequence data is delivered to me, I receive each sample in several
> input files. Generally I want to get them into a single file ASAP, and the
> filterFastq step would be a convenient place to do it. Is there any possibility
> to add some way to append to an output file, or maybe automatically combine the
> outputs of any files with the same destination file. For example:
> filterFastq(files=c("input1.fastq", "input2.fastq"),
> destinations="output.fastq", ...)
actually I think the implementation almost does as you want, though
'destination' needs to be replicated to be as long as the inputs
> tmp = tempfile()
> filterFastq(c(fl, fl), c(tmp, tmp), filter=fun)
 "/tmp/RtmpxGSJ7G/file265712b558a0" "/tmp/RtmpxGSJ7G/file265712b558a0"
Reads KeptReads Nucl KeptNucl
s_1_sequence.txt 256 255 9216 9180
s_1_sequence.txt.1 256 255 9216 9180
> length(readLines(tmp)) / 4
I'll make this a little more convenient (no need to replicate 'destination') and
document the behavior.
> could process both input files and write their combined output to the one
> specified output file. Normal recycling rules would apply for the "destinations"
> argument, and input files would be grouped by destination file and each
> processed sequentially into that destination file. (This design is kind of
> magic, but it avoids the annoying pattern of having to process files one-by-one
> in a loop with append=FALSE for the first file and append=TRUE for the rest.
> (Also appending to a compressed fastq might not work?)
> Bioc-devel at r-project.org mailing list
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioc-devel