[Bioc-devel] export() to 2bit file
Hervé Pagès
hpages at fhcrc.org
Mon May 12 23:10:54 CEST 2014
On 05/12/2014 12:23 PM, Michael Lawrence wrote:
>
>
>
> On Mon, May 12, 2014 at 11:41 AM, Hervé Pagès <hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>> wrote:
>
> Hi Michael,
>
>
> On 05/09/2014 04:39 PM, Michael Lawrence wrote:
>
> What would be the fastest way to do this with a DNAString? Just an
> alphabetFrequency?
>
>
> That would do it.
>
> A couple of other issues I ran into with the 2bit code:
>
> (1) It fails on empty sequences:
>
> > export(DNAStringSet(c("AA", "", "CC")), "ww.2bit")
> Warning message:
> In (function (object, seqname) :
> needLargeMem: trying to allocate 0 bytes (limit: 17179869184
> <tel:17179869184>)
> Error in sapply(object, function(x) typeof(x) == "externalptr"
> && is(x, :
> error in evaluating the argument 'X' in selecting a method for
> function 'sapply': Error in (function (object, seqname) : UCSC
> library operation failed
>
>
> Thanks for catching this one.
>
> (2) Could be that internal helper rtracklayer:::.DNAString_to___twoBit()
> is introducing a memory leak as it doesn't seem that the memory
> the returned external pointer is pointing to (a struct twoBit) is
> ever released. The memory leak is minor if the sequence passed via
> 'object' has no masks but can be important if there are masks and
> if the masks are made of hundreds of thousands of ranges.
>
>
> Right now it is the responsibility of the caller to free that memory.
> Probably should have used a finalizer on the externalptr, but the way it
> works now is that the write function frees the object. So it's not
> leaking (as far as I know), but the design could be improved.
I see. So we're probably OK as long as the loop containing the calls
to .DNAString_to_twoBit() is successful and nothing goes wrong after
that (e.g. no user interrupt).
Thanks,
H.
>
> Thanks,
> H.
>
>
>
> On Fri, May 9, 2014 at 4:07 PM, Hervé Pagès <hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>> wrote:
>
> Hi Michael,
>
> library(rtracklayer)
> library(Biostrings)
> x <- DNAStringSet("AAA-CCC-GGG-TTT-____NNN-KKK")
>
>
> Then:
>
> > x
> A DNAStringSet instance of length 1
> width seq
> [1] 23 AAA-CCC-GGG-TTT-NNN-KKK
>
> > export(x, "x.2bit")
>
> > import("x.2bit")
> A DNAStringSet instance of length 1
> width seq
> names
> [1] 23 AAATCCCTGGGTTTTTNNNTTTT
> 1
>
> What about having the "export" method for TwoBitFile raise
> an error
> (or at least issue a warning) instead of silently turning
> everything
> that is not A, C, G, T, or N into a T?
>
> Thanks,
> H.
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
> <tel:%28206%29%20667-5791>
> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
> <tel:%28206%29%20667-1319>
>
> ___________________________________________________
> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
> <mailto:Bioc-devel at r-project.__org
> <mailto:Bioc-devel at r-project.org>> mailing list
> https://stat.ethz.ch/mailman/____listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
> <https://stat.ethz.ch/mailman/__listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
>
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioc-devel
mailing list