[Bioc-devel] Append/combine option for filterFastq and similar?

Martin Morgan mtmorgan at fredhutch.org
Wed Apr 22 19:40:30 CEST 2015


On 04/22/2015 10:28 AM, Jim Hester wrote:
> I typically use pipe() in these circumstances which avoids using any
> additional storage
>
>    readLines(pipe("cat file1 file2"))
>
> It should work with filterFastq assuming it can read from connections
> rather than just files, but I have not tested it to be sure.

these solutions don't work on windows or with compressed files (though zcat 
*fastq | gzip out.fastq.gz would, I guess) and don't filter reads (I guess 
that's what Ryan means by 'duplicating storage', i.e., concatenate then filter 
in two separate steps).

filterFastq is expecting character vectors for file names, rather than 
connections (at least for input), but to accept connections is I think straight 
forward (the underlying FastqStreamer works on connections) so I'll update that...

I think filterFastq should be at relative efficient in both space and time, 
though obviously cat and friends are highly optimized and use minimal memory.

Martin

>
> On Wed, Apr 22, 2015 at 1:16 PM, Ryan C. Thompson <rct at thompsonclan.org>
> wrote:
>
>> That's not ideal because it's duplicating storage unnecessarily
>>
>>
>> On 04/22/2015 04:07 AM, Aedin wrote:
>>
>>> This is one instance were a system or simple unix command is very easy
>>>
>>> system('cat *.fastq > all.fastq')
>>>
>>>
>>> ---
>>>
>>>   On Apr 22, 2015, at 6:00, bioc-devel-request at r-project.org wrote:
>>>>
>>>> Re: Append/combine option for filterFastq and similar?
>>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioc-devel mailing list