[BioC] Biostrings bug?

Hervé Pagès hpages at fhcrc.org
Sat Oct 9 00:13:08 CEST 2010


Hi Arne,

Thanks for catching this. I'm working on a fix.

A temporary workaround for now is to generate smaller DNAStringSet
objects and combine them together with c():

 > myseq.bs1 <- DNAStringSet(rep(paste(rep("A", 1200), collapse=""), 
1000000))
 > myseq.bs2 <- DNAStringSet(rep(paste(rep("A", 1200), collapse=""), 
1000000))
 > myseq.bs <- c(myseq.bs1, myseq.bs2)
 > myseq.bs
   A DNAStringSet instance of length 2000000
           width seq
       [1]  1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA
       [2]  1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA
       [3]  1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA
       [4]  1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA
       [5]  1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA
       [6]  1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA
       [7]  1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA
       [8]  1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA
       [9]  1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA
       ...   ... ...
[1999992]  1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA
[1999993]  1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA
[1999994]  1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA
[1999995]  1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA
[1999996]  1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA
[1999997]  1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA
[1999998]  1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA
[1999999]  1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA
[2000000]  1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA

I'll post here again when I've solved the problem.

Cheers,
H.


On 10/07/2010 09:18 AM, arne.mueller at novartis.com wrote:
> Hi,
>
> sorry, the sequence in my original posting got screwed during copy/paste,
> this is the "real" sequence:
>
>>
> CTATGTGTGAGGGCAGCAACCAGAACTGTCTGCCCTGACTTCGCTCAGGATGCTGTGAACATGTGGCTCAGATGGTGCTA
> GGCATTTTCCTCTAGAGTCAGAAACGTGGACAGAGAGTCATCTCCTCTGGCTTCCCAGGCATGTCTGCCACTCTGAAGGT
> CTGAAGGTCTGGGTCTCCCTCCCATGGGATTTGAGTGCAGAGAGCTGTGTGACTGGGTCCCTTCAGATCCAGGTGGTGTC
> TGGACTGTAGCGTTGAGTGCCCTATCTTCCTGGTCTCAGAGCACCTATACAGTTTCCTCTTGGGCCAGGGATGTGGGCAG
> TGGTGGGCTGTACTGGAAGTCTCTCCTGTCCTGCAGTCTCAGGAGTGGCCACCTGTCTGGGTGGTGAGCTCTCTCTCCCA
> TGGGGTTAGGGAGCAGGGAGGTTTTGCAAGATTCAGATTTAAGGTCACATTTTATCATCATAATGGAGGACATTAGGAAG
> GTCAGAAATAACTCCCTTAAGGAAATACTTGACAACACAAGCAAACTAGTAGAAATCTTTTTAAAAGGAAACACAAAAGT
> ATTTTAAAGAATTACAGCAAACCACAACCAAATAGGAGAGGAAATTGAACAAAATCATCCAGGAGTTAAATATGGAAATA
> GAAACAATGAAGAGAGCACACAGCGAGACAACCCTGGAGATAGAAAATCTAAGGAAGAGATCAGGAGTCATAGATGCAAG
> CATCACTGACGGACTACATGAGATAGAAGAGAGAATTTTGGGAGCAGAAGATATCATAGAAAACATTGACACAACCTTCA
> AAGAGAACGTAAATAGGAAAAAGCTCCTAGCCCTAAACATGCAGGAAATCAGGAAACAAATCAAAGATCAAACCTAAATA
> TATCAGGTATAGAAGAGAGTGAAGACTCCCAACATAAAGGGATGGTAAATATCTTCAACAATATAAACAATATAAAGGAA
> AACATCCCTAACCAAAAGAAATAAATGTCCATAAATAGACATGAAGCCTGCAGAATTCCAAATAGAATGGACCAGAAAAT
> AAATTCCTCCTGTCACATAATAGTCAAAACACCAAATGCACAAAACAAAGAATGAATATTAAAAGCATTAACGGATAAAG
> GTCAAGTACATTTAAAGGCAGACATGTCAGAATTACACCAGAATTCTTACCATGGACTATGAAAGCCAGAAGACAGATGT
>
> It doesn't matter which sequence one uses to get the DNAStringSet error,
> it just has to be long and
> there have to be many of them, here's a more generic example:
>
>> myseq.bs = DNAStringSet(rep(paste(rep("A", 100), collapse=""), 2000))
>> myseq.bs = DNAStringSet(rep(paste(rep("A", 100), collapse=""), 2000000))
>> myseq.bs = DNAStringSet(rep(paste(rep("A", 1200), collapse=""), 2000))
>> myseq.bs = DNAStringSet(rep(paste(rep("A", 1200), collapse=""),
> 2000000))
> Error in .Call("new_SharedRaw_from_STRSXP", x, start(solved_SEW),
> width(solved_SEW),  :
>    negative length vectors are not allowed
>
> Arne
>
>
>
>
>
>
> arne.mueller at novartis.com
> Sent by: bioconductor-bounces at stat.math.ethz.ch
> 10/07/2010 05:55 PM
>
> To
> bioconductor at stat.math.ethz.ch
> cc
>
> Subject
> [BioC] Biostrings bug?
>
>
>
>
>
>
> Dear All,
>
> I came across the following error in DNAStringSet from the Biostrings
> package:
>
>> myseq =
> "CTATGTGTGAGGGCAGCAACCAGAACTGTCTGCCCTGACTTCGCTCAGGATGCTGTGAACATGTGGCTCAGATGGTGCTAGGCATTTTCCTCTAGAGTCAGAAACGTGGACAGAGAGTCATCTCCTCTGGCTTCCCAGGCATGTCTGCCACTCTGAAGGTCTGAAGGTCTGGGTCTCCCTCCCATGGGATTTGAGTGCAGAGAGCTGTGTGACTGGGTCCCTTCAGATCCAGGTGGTGTCTGGACTGTAGCGTTGAGTGCCCTATCTTCCTGGTCTCAGAGCACCTATACAGTTTCCTCTTGGGCCAGGGATGTGGGCAGTGGTGGGCTGTACTGGAAGTCTCTCCTGTCCTGCAGTCTCAGGAGTGGCCACCTGTCTGGGTGGTGAGCTCTCTCTCCCATGGGGTTAGGGAGCAGGGAGGTTTTGCAAGATTCAGATTTAAGGTCACATTTTATCATCATAATGGAGGACATTAGGAAGGTCAGAAATAACTCCCTTAAGGAAATACTTGACAACACAAGCAAACTAGTAGAAATCTTTTTAAAAGGAAACACAAAAGTATTTTAAAGAATTACAGCAAACCACAACCAAATAGGAGAGGAAATTGAACAAAATCATCCAGGAGTTAAATATGGAAATAGAAACAATGAAGAGAGCACACAGCGAGACAACCCTGGAGATAGAAAATCTAAGGAAGAGATCAGGAGTCATAGATGCAAGCATCACTGACGGACTACATGAGATAGAAGAGAGAATTTTGGGAGCAGAAGATATCATAGAAAACATTGACACAACCTTCAAAGAGAACGTAAATAGGAAAAAGCTCCTAGCCCTAAACATGCAGGAAATCAGGAAACAAATCAAAGATCAAACCTAAATATATCAGGTATAGAAGAGAGTGAAGACTCCCAACATAAAGGGATGGTAAATATCTTCAACAATATAAACAATATAAAGGAAAACATCCCTAACCAAAAGAAATAAATG
T!
>
> CCATAAATAGACATGAAGCCTGCAGAATTCCAAATAGAATGGACCAGAAAATAAATTCCTCCTGTCACATAATAGTCAAAACACCAAATGCACAAAACAAAGAATGAATATTAAAAGCATTAACGGATAAAGGTCAAGTACATTTAAAGGCAGACATGTCAGAATTACACCAGAATTCTTACCATGGACTATGAAAGCCAGAAGACAGATGT"
>> mysDNA = DNAStringSet(myseq) # ok!
>> myseq = rep(myseq, 2000000)
>> myseq.bs = DNAStringSet(myseq)
> Error in .Call("new_SharedRaw_from_STRSXP", x, start(solved_SEW),
> width(solved_SEW),  :
>
>     negative length vectors are not allowed
>
> Enter a frame number, or 0 to exit
> 1: DNAStringSet(myseq)
> 2: XStringSet("DNA", x, start = start, end = end, width = width, use.names
>
> = u
> 3: XStringSet("DNA", x, start = start, end = end, width = width, use.names
>
> = u
> 4: .charToXStringSet(basetype, x, start, end, width, use.names)
> 5: .charToXString(basetype, x, solved_SEW)
>
> Selection: 0
>>
>
> Strangely the following works ...:
>
> myseq.bs = c(DNAStringSet(myseq[1:1000000]),
> DNAStringSet(myseq[1000001:2000000]))
>
> Somehow there must be an overflow ... .
>
> Here's some more info on my system:
>
>> sessionInfo()
> R version 2.11.1 Patched (2010-06-20 r52342)
> x86_64-unknown-linux-gnu
>
> locale:
>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>   [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices datasets  utils     methods   base
>
> other attached packages:
> [1] BSgenome.Rnorvegicus.UCSC.rn4_1.3.16 BSgenome_1.16.4
> [3] Biostrings_2.16.5                    GenomicRanges_1.0.3
> [5] IRanges_1.6.11
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.8.0 tools_2.11.1
>
> Linux version 2.6.18-92.el5 (brewbuilder at ls20-bc2-13.build.redhat.com)
> (gcc version 4.1.2 20071124 (Red Hat 4.1.2-41)) #1 SMP Tue Apr 29 13:16:15
>
> EDT 2008
>
> 64 Gb memory
>
>      thanks for your help
>      +kind regards,
>
>      Arne
>
>                   [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list