[Bioc-devel] export() to 2bit file

Hervé Pagès hpages at fhcrc.org
Mon May 12 20:41:03 CEST 2014


Hi Michael,

On 05/09/2014 04:39 PM, Michael Lawrence wrote:
> What would be the fastest way to do this with a DNAString?  Just an
> alphabetFrequency?

That would do it.

A couple of other issues I ran into with the 2bit code:

(1) It fails on empty sequences:

     > export(DNAStringSet(c("AA", "", "CC")), "ww.2bit")
     Warning message:
     In (function (object, seqname)  :
       needLargeMem: trying to allocate 0 bytes (limit: 17179869184)
     Error in sapply(object, function(x) typeof(x) == "externalptr" && 
is(x,  :
       error in evaluating the argument 'X' in selecting a method for
       function 'sapply': Error in (function (object, seqname)  : UCSC
       library operation failed

(2) Could be that internal helper rtracklayer:::.DNAString_to_twoBit()
     is introducing a memory leak as it doesn't seem that the memory
     the returned external pointer is pointing to (a struct twoBit) is
     ever released. The memory leak is minor if the sequence passed via
     'object' has no masks but can be important if there are masks and
     if the masks are made of hundreds of thousands of ranges.

Thanks,
H.

>
>
> On Fri, May 9, 2014 at 4:07 PM, Hervé Pagès <hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>> wrote:
>
>     Hi Michael,
>
>        library(rtracklayer)
>        library(Biostrings)
>        x <- DNAStringSet("AAA-CCC-GGG-TTT-__NNN-KKK")
>
>     Then:
>
>        > x
>          A DNAStringSet instance of length 1
>            width seq
>        [1]    23 AAA-CCC-GGG-TTT-NNN-KKK
>
>        > export(x, "x.2bit")
>
>        > import("x.2bit")
>          A DNAStringSet instance of length 1
>            width seq                                               names
>        [1]    23 AAATCCCTGGGTTTTTNNNTTTT                           1
>
>     What about having the "export" method for TwoBitFile raise an error
>     (or at least issue a warning) instead of silently turning everything
>     that is not A, C, G, T, or N into a T?
>
>     Thanks,
>     H.
>
>     --
>     Hervé Pagès
>
>     Program in Computational Biology
>     Division of Public Health Sciences
>     Fred Hutchinson Cancer Research Center
>     1100 Fairview Ave. N, M1-B514
>     P.O. Box 19024
>     Seattle, WA 98109-1024
>
>     E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>     Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>     Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>
>     _________________________________________________
>     Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing list
>     https://stat.ethz.ch/mailman/__listinfo/bioc-devel
>     <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list