[Bioc-devel] Append/combine option for filterFastq and similar?

Martin Morgan mtmorgan at fredhutch.org
Tue Apr 21 23:31:51 CEST 2015


On 04/21/2015 02:18 PM, Ryan C. Thompson wrote:
> Hello,
>
> Often when sequence data is delivered to me, I receive each sample in several
> input files. Generally I want to get them into a single file ASAP, and the
> filterFastq step would be a convenient place to do it. Is there any possibility
> to add some way to append to an output file, or maybe automatically combine the
> outputs of any files with the same destination file. For example:
>
> filterFastq(files=c("input1.fastq", "input2.fastq"),
> destinations="output.fastq", ...)

actually I think the implementation almost does as you want, though 
'destination' needs to be replicated to be as long as the inputs

 > tmp = tempfile()
 > filterFastq(c(fl, fl), c(tmp, tmp), filter=fun)
[1] "/tmp/RtmpxGSJ7G/file265712b558a0" "/tmp/RtmpxGSJ7G/file265712b558a0"
attr(,"filter")
                    Reads KeptReads Nucl KeptNucl
s_1_sequence.txt     256       255 9216     9180
s_1_sequence.txt.1   256       255 9216     9180
 > length(readLines(tmp)) / 4
[1] 510

I'll make this a little more convenient (no need to replicate 'destination') and 
document the behavior.

Martin

>
> could process both input files and write their combined output to the one
> specified output file. Normal recycling rules would apply for the "destinations"
> argument, and input files would be grouped by destination file and each
> processed sequentially into that destination file. (This design is kind of
> magic, but it avoids the annoying pattern of having to process files one-by-one
> in a loop with append=FALSE for the first file and append=TRUE for the rest.
> (Also appending to a compressed fastq might not work?)
>
> -Ryan
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioc-devel mailing list