[BioC] writeFastq to gzip compressed files
Martin Morgan
mtmorgan at fhcrc.org
Wed Sep 25 20:01:04 CEST 2013
On 09/25/2013 09:09 AM, Michael Lawrence wrote:
> Ideally, it would be:
> writeFastq(rfq, gzfile(fout))
>
> But I don't think that works. In theory, it would be feasible to support
> this now that R has exposed its connection interface...
That'll be an interesting learning experience, and I'll come up with a proper
implementation. Here's a work-around, With
writeFastq_con(rfq, stdout())
yy = writeFastq_con(rfq, gzfile(tempfile()))
readFastq(yy)
Martin
.write1fastq <-
function(x, con, ..., full=FALSE)
{
i <- paste0("@", as.character(id(x)), "\n")
fill <- if (full)
paste0("\n+", as.character(id(x)), "\n")
else "\n+\n"
s <- as.character(sread(x))
q <- as.character(quality(quality(x)))
cat(paste(i, s, fill, q, sep=""), file=con, sep="\n")
length(i)
}
writeFastq_con <-
function(object, file, ..., chunkSize=1000000L)
## object: ShortReadQ instance
## file: connection, e.g., stdout(), gzfile(<...>, "wb")
## ...: additional arguments, in particular
## full=TRUE: logical(1), TRUE: replicate id tag on '+' line
## chunkSize: size of each chunk processed for output, smaller
## values use less memory but are slower
{
if (!isOpen(file)) {
open(file, "wb")
on.exit(close(file))
}
len <- length(object)
start <- 1L
tot <- 0L
while (start < len) {
end <- min(start + chunkSize - 1L, len)
tot <- tot + .write1fastq(object[seq(start, end)], file, ...)
start <- end + 1L
}
if (tot != len)
warning("output length does not equal input length")
summary(file)$description
}
>
> Michael
>
>
> On Wed, Sep 25, 2013 at 7:39 AM, Marcus Davy <mdavy86 at gmail.com> wrote:
>
>> Hi,
>> as fastq files are often compressed now, I would like to be able to write
>> out to compressed 'fq.gz' files, rather than raw text.
>>
>> I have searched the BioC threads and found a similar topic at;
>>
>>
>> http://article.gmane.org/gmane.science.biology.informatics.conductor/45306/match=writeFastq
>>
>> A poor example of what I would like to be able to achieve follows;
>>
>> example(readFastq)
>> print(rfq)
>> fout <- "test.fq"
>> writeFastq(rfq, fout)
>> system(paste("gzip", fout))
>>
>> but this does not compress the data as it is streaming out to file. I would
>> like to be able to do something like below, so this is a potential feature
>> request;
>>
>> fout <- "test.fq.gz"
>> writeFastq(rfq, fout, compress=TRUE)
>>
>> Is there an obvious way to compress using writeFastq that I am missing?
>>
>>
>> cheers,
>>
>> Marcus
>>
>>
>> sessionInfo()
>> R version 3.0.0 (2013-04-03)
>> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>>
>> locale:
>> [1] C
>>
>> attached base packages:
>> [1] parallel splines stats graphics grDevices utils datasets
>> [8] methods base
>>
>> other attached packages:
>> [1] ShortRead_1.18.0 latticeExtra_0.6-24 RColorBrewer_1.0-5
>> [4] Rsamtools_1.12.3 lattice_0.20-15 Biostrings_2.28.0
>> [7] GenomicRanges_1.12.2 IRanges_1.18.0 BiocGenerics_0.6.0
>> [10] BiocInstaller_1.10.3 limma_3.16.7 Hmisc_3.12-2
>> [13] Formula_1.1-1 survival_2.37-4
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioconductor
mailing list