[Bioc-sig-seq] Size of Illumina fastaq files to be read in shortReads

Kasper Daniel Hansen khansen at stat.berkeley.edu
Thu Jun 25 08:40:47 CEST 2009


Note that you are probably not using a 64bit version of R, so you  
cannot utilize all of your 8MB. Check by looking at .Machine 
$sizeof.pointer

As a minimum upgrade to R-2.9, if you want to use bioconductor for  
short reads.

Kasper

On Jun 24, 2009, at 11:37 , Anastasia Gioti wrote:

> Dear list,
> I just started playing with shortReads package in order to read  
> fastaq files from the illumina analyzer, and i have some issues.
> The most important is the fact that the readFastaq crashes because  
> of memory I suppose when i try to read files >1GB. Ex:
> fqpattern='s_3_1_sequence.txt'
> > afrN=file.path(analysisPath(sp), fqpattern)
> > afrN
> [1] "/Users/nat/Data/Illumina/Solexa_disk_modforR/Data/ 
> HJSN_FC1_280409_3//Data/C1-C55Firecrest/ 
> Bustard1.3.2_06-05-2009_rdixon/GERALD_06-05-2009_rdixon/ 
> s_3_1_sequence.txt"
> > afrNq=readFastq(sp, fqpattern)
> Error: cannot allocate vector of size 27.0 Mb
> R(1337,0xa07a2720) malloc: *** mmap(size=28340224) failed (error  
> code=12)
> *** error: can't allocate region
> *** set a breakpoint in malloc_error_break to debug
> R(1337,0xa07a2720) malloc: *** mmap(size=28340224) failed (error  
> code=12)
> *** error: can't allocate region
> *** set a breakpoint in malloc_error_break to debug
>
> I only succeeded in reading a file < 1GB, but i suppose that the  
> shortReads class is designed for big files ;-).
> Another minor issue is the names of the folders in the Illumina  
> output directory that I need to designate in exptPath so that  
> p=SolexaPath(exptPath) is correctly parsed. I finally managed to  
> find the logic behind this, but I would like to confirm that the  
> path absolutely needs to contain this string: Data/C1- 
> C(readlength)Firecrest. At least in my hands it would not work with  
> other names (which are currently produced by illumina, for ex IPAR  
> instead of Firecrest). Is that correct? Maybe this parser is hard  
> coded for previous versions of Illumina outputs? In that case is  
> there any plan to update it? Although this is not very important
>
> I use R2.8 on a Leopard with 8GB of memory, so I think that my  
> problem with fastq does not come from my computer...
> Any help /suggestions are welcome!
> Thank you,
>
> Anastasia Gioti
> Post-Doc, Evolutionary Biology Department
> Upssala University
> Norbyvagen 18D
> SE-752 36  UPPSALA
> anastasia.gioti at ebc.uu.se
> Tel: +46-18-471 6465
> Fax: +46-18-471 6310
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing



More information about the Bioc-sig-sequencing mailing list