[Bioc-devel] export() to 2bit file

Hervé Pagès hpages at fhcrc.org
Mon May 12 23:10:54 CEST 2014


On 05/12/2014 12:23 PM, Michael Lawrence wrote:
>
>
>
> On Mon, May 12, 2014 at 11:41 AM, Hervé Pagès <hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>> wrote:
>
>     Hi Michael,
>
>
>     On 05/09/2014 04:39 PM, Michael Lawrence wrote:
>
>         What would be the fastest way to do this with a DNAString?  Just an
>         alphabetFrequency?
>
>
>     That would do it.
>
>     A couple of other issues I ran into with the 2bit code:
>
>     (1) It fails on empty sequences:
>
>          > export(DNAStringSet(c("AA", "", "CC")), "ww.2bit")
>          Warning message:
>          In (function (object, seqname)  :
>            needLargeMem: trying to allocate 0 bytes (limit: 17179869184
>     <tel:17179869184>)
>          Error in sapply(object, function(x) typeof(x) == "externalptr"
>     && is(x,  :
>            error in evaluating the argument 'X' in selecting a method for
>            function 'sapply': Error in (function (object, seqname)  : UCSC
>            library operation failed
>
>
> Thanks for catching this one.
>
>     (2) Could be that internal helper rtracklayer:::.DNAString_to___twoBit()
>          is introducing a memory leak as it doesn't seem that the memory
>          the returned external pointer is pointing to (a struct twoBit) is
>          ever released. The memory leak is minor if the sequence passed via
>          'object' has no masks but can be important if there are masks and
>          if the masks are made of hundreds of thousands of ranges.
>
>
> Right now it is the responsibility of the caller to free that memory.
> Probably should have used a finalizer on the externalptr, but the way it
> works now is that the write function frees the object. So it's not
> leaking (as far as I know), but the design could be improved.

I see. So we're probably OK as long as the loop containing the calls
to .DNAString_to_twoBit() is successful and nothing goes wrong after
that (e.g. no user interrupt).

Thanks,
H.

>
>     Thanks,
>     H.
>
>
>
>         On Fri, May 9, 2014 at 4:07 PM, Hervé Pagès <hpages at fhcrc.org
>         <mailto:hpages at fhcrc.org>
>         <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>> wrote:
>
>              Hi Michael,
>
>                 library(rtracklayer)
>                 library(Biostrings)
>                 x <- DNAStringSet("AAA-CCC-GGG-TTT-____NNN-KKK")
>
>
>              Then:
>
>                 > x
>                   A DNAStringSet instance of length 1
>                     width seq
>                 [1]    23 AAA-CCC-GGG-TTT-NNN-KKK
>
>                 > export(x, "x.2bit")
>
>                 > import("x.2bit")
>                   A DNAStringSet instance of length 1
>                     width seq
>              names
>                 [1]    23 AAATCCCTGGGTTTTTNNNTTTT
>              1
>
>              What about having the "export" method for TwoBitFile raise
>         an error
>              (or at least issue a warning) instead of silently turning
>         everything
>              that is not A, C, G, T, or N into a T?
>
>              Thanks,
>              H.
>
>              --
>              Hervé Pagès
>
>              Program in Computational Biology
>              Division of Public Health Sciences
>              Fred Hutchinson Cancer Research Center
>              1100 Fairview Ave. N, M1-B514
>              P.O. Box 19024
>              Seattle, WA 98109-1024
>
>              E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>         <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
>              Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>         <tel:%28206%29%20667-5791>
>              Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>         <tel:%28206%29%20667-1319>
>
>              ___________________________________________________
>         Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>         <mailto:Bioc-devel at r-project.__org
>         <mailto:Bioc-devel at r-project.org>> mailing list
>         https://stat.ethz.ch/mailman/____listinfo/bioc-devel
>         <https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
>              <https://stat.ethz.ch/mailman/__listinfo/bioc-devel
>         <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
>
>
>
>     --
>     Hervé Pagès
>
>     Program in Computational Biology
>     Division of Public Health Sciences
>     Fred Hutchinson Cancer Research Center
>     1100 Fairview Ave. N, M1-B514
>     P.O. Box 19024
>     Seattle, WA 98109-1024
>
>     E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>     Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>     Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list