[BioC] TEQC package very slow

nathalie nac at sanger.ac.uk
Wed Jun 13 13:07:49 CEST 2012


HI,
I am analysing coverage data using TEQC package from bioC for quality 
assessment of target enrichment experiment .
I am using a computer cluster farm to do the analysis and asked for 
large memory to be allocated, my bam files are 11 Gb in size and it 
seems that the analysis is taking very long, several hours, and then my 
session exit. Do I need to ask for this to be put on a long queue, more 
than 12 hours job? Do people use TEQC with large files? How can I be 
more efficient with this analysis?
these are my commands:
#get reads
myread<-get.reads("reads.bam",filetype="bam")
#get pair reads : at that point this will fail :in the doc it is stated 
" To run the function can be quite time consuming, depending on
the number of reads"
myreadpair<-reads2pairs(myread)

#drop single reads
myread<-myread[!(myread$ID %in% myreadpair$singleReads$ID), , drop=TRUE]


I have used efficiently these functions on smaller files with miSeq 
data, but not yet with HiSeq ...
Many thanks for sharing your experience in getting QC for large files 
efficiently
Nathalie

 > sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=C
  [7] LC_PAPER=C                 LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] TEQC_2.4.0          hwriter_1.3         Rsamtools_1.8.4
[4] Biostrings_2.24.1   GenomicRanges_1.8.3 IRanges_1.14.2
[7] BiocGenerics_0.2.0

loaded via a namespace (and not attached):
[1] Biobase_2.16.0 bitops_1.0-4.1 stats4_2.15.0  zlibbioc_1.2.0



-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE.



More information about the Bioconductor mailing list