[Bioc-sig-seq] Seeking advice on hardware options

Martin Morgan mtmorgan at fhcrc.org
Fri Jun 6 19:27:27 CEST 2008


"Paul Leo" <p.leo at uq.edu.au> writes:

[snip]

> Also if anyone with real word experience can comment on the typical
> size of the alignment file (for paired end reads on a good day), that
> is the s_N_export.txt file generated by ELAND and the s*_sequences.txt
> file generated by GERALD that would be helpful too (I have the

Here are file sizes from a typical middling-quality recent run, in MB;
not too bad by this point. These are NOT paired-end reads.

> library(ShortRead)
> sp <- SolexaPath("/path/to/run")
> seqs <- list.files(analysisPath(sp), "s_[1-8]_sequence.txt", full=TRUE)
> exps <- list.files(analysisPath(sp), "s_[1-8]_export.txt", full=TRUE)
> file.info(seqs)$size/(1024^2)
[1] 421.8892 461.4935 362.4373 426.9607 628.7526 353.9122 441.2186 475.7593
> file.info(exps)$size/(1024^2) # lane 5 not mapped so no export
[1] 603.0924 646.3515 466.8608 570.8070 445.1177 602.2756 691.8376

> standard product info). Are there other files generated by the
> pipeline that you have found particularly useful in downstream

I've sometimes found the image intensity (_int), unfiltered sequence
(_seq.txt) and base call probability (_prb) files useful, and also
'RunBrowser' files created during the run. The intensity and _prb
files are large (5-10Mb per tile x 300 tiles per lane). These and
other intermediate files are likely to be essential in any critical
assessment of the technology or methods (as opposed to down-stream
application).

Sean mentioned that multi-core processors mean requirments for
appropriate memory per-core. Other than PDict, I've found manipulating
objects either on a per-tile or per-lane basis to use on the order of
4-5 Gb. To effectively use an 8-core processor means that 32Gb is a
kind of hard 'minimum'.

Martin

> analysis or that are useful in other 3rd party applications that you
> have tried?

> Thanks in advance Paul
>
>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________ Bioc-sig-sequencing
> mailing list Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793



More information about the Bioc-sig-sequencing mailing list