[Bioc-sig-seq] short read quality score summary in ShortRead error

Martin Morgan mtmorgan at fhcrc.org
Tue Feb 1 20:26:12 CET 2011


On 02/01/2011 11:08 AM, Kunbin Qu wrote:
> Hi, all:
> 
> I am trying to assess the read quality score for each lane (mean and
median etc.) from a Hi-Seq run which has about 70-90 million reads per
lane. When I used as(quality(readFastQ), "numeric"), it gave an error
"allocMatrix: too many elements specified". Is there a way to get around
this? Thanks.

An R vector (including a matrix) must be less than 2^31-1 elements long,
so a work-around is to divide the ShortReadQ object into chunks such
that sum(width(s1q)) < 2^31-1.

Depending on what you do down-stream, it might be sufficient to use
alphabetByCycle (alphabet use epr cycle) or alphabetFrequency (alphabet
use per read).

Martin

> 
> -Kunbin>
> 
>> seqQ
> class: ShortReadQ
> length: 76115764 reads; width: 50 cycles
>> s1q<-quality(seqQ)
>> s1q
> class: SFastqQuality
> quality:
>   A BStringSet instance of length 76115764
>            width seq
>        [1]    50 aa``_BcccccccccddddddadJdQTTTB]]SZY[\_PZbc_bbddad\
>        [2]    50 ]\SUTB_[[][]]]]dfecfef^cddffcedbd^ddeed`cd`dedeef`
>        [3]    50 aa```Bcccb\ccccggggggggegegefgggege^abbdeeabedgade
>        [4]    50 ^VZXWB_YN]TMUTYddadd_dcda[[R[T\``Wc^bc\bacaddd`a\d
>        [5]    50 cccccBc]cacbb]cgggdfggegegaggeeggggaeegggbecgefcZ\
>        [6]    50 ddccdbeeea_]]`Z\cc`c]SZ_]]S[K]ccbbWaTL_``V]^_cddWV
>        [7]    50 XYWWYB[]_]]]^\]cddddbfedfdddbacacacfff_edTd`aU[XYX
>        [8]    50 bbb`bBcccccddddgggggggggggegggfggdgggggegeggdggggg
>        [9]    50 bba`bBcccccddddgeggegggggdggffggcfe`edeegae`geddd`
>        ...   ... ...
> [76115756]    50 e`eeeBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
> [76115757]    50 aaVa`BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
> [76115758]    50 ]ZXRERHZZFUZ[XDKKDDRHLPUQa```^XZ^HQ`]]T[WDXRH^YSWW
> [76115759]    50 eeededd_dcBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
> [76115760]    50 c[^BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
> [76115761]    50 BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
> [76115762]    50 __BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
> [76115763]    50 _^X_BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
> [76115764]    50 BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
>> summary(as(s1q, "numeric"))
> Error in asMethod(object) : allocMatrix: too many elements specified
>> sessionInfo()
> R version 2.11.0 (2010-04-22)
> x86_64-unknown-linux-gnu
> 
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> other attached packages:
> [1] ShortRead_1.6.2     Rsamtools_1.0.1     lattice_0.19-11
> [4] Biostrings_2.16.7   GenomicRanges_1.0.1 IRanges_1.6.8
> 
> loaded via a namespace (and not attached):
> [1] Biobase_2.8.0 grid_2.11.0   hwriter_1.3   tools_2.11.0
>>
> 
> 
> ______________________________________________________________________
> The contents of this electronic message, including any attachments, are intended only for the use of the individual or entity to which they are addressed and may contain confidential information. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this message or any attachment is strictly prohibited. If you have received this transmission in error, please send an e-mail to postmaster at genomichealth.com and delete this message, along with any attachments, from your computer.
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioc-sig-sequencing mailing list