[BioC] TEQC package very slow

nathalie nac at sanger.ac.uk
Wed Jun 13 16:53:43 CEST 2012


HI,


This is the error message produced at the 
myreadpair<-reads2pairs(myread) stage after it running for 7 hours:
 > readpairs4_2_PigS<-reads2pairs(reads4_2_PigS)
[1] "there were 1453928 reads found without matching second read, or 
whose second read matches to a different chromosome"

Error in endoapply(reads, mergefun) :
   'FUN' did not produce an endomorphism
 > Terminated

that may help,
thanks,



On 13/06/12 12:07, nathalie wrote:
> HI,
> I am analysing coverage data using TEQC package from bioC for quality 
> assessment of target enrichment experiment .
> I am using a computer cluster farm to do the analysis and asked for 
> large memory to be allocated, my bam files are 11 Gb in size and it 
> seems that the analysis is taking very long, several hours, and then 
> my session exit. Do I need to ask for this to be put on a long queue, 
> more than 12 hours job? Do people use TEQC with large files? How can I 
> be more efficient with this analysis?
> these are my commands:
> #get reads
> myread<-get.reads("reads.bam",filetype="bam")
> #get pair reads : at that point this will fail :in the doc it is 
> stated " To run the function can be quite time consuming, depending on
> the number of reads"
> myreadpair<-reads2pairs(myread)
>
> #drop single reads
> myread<-myread[!(myread$ID %in% myreadpair$singleReads$ID), , drop=TRUE]
>
>
> I have used efficiently these functions on smaller files with miSeq 
> data, but not yet with HiSeq ...
> Many thanks for sharing your experience in getting QC for large files 
> efficiently
> Nathalie
>
> > sessionInfo()
> R version 2.15.0 (2012-03-30)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
>  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=C
>  [7] LC_PAPER=C                 LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] TEQC_2.4.0          hwriter_1.3         Rsamtools_1.8.4
> [4] Biostrings_2.24.1   GenomicRanges_1.8.3 IRanges_1.14.2
> [7] BiocGenerics_0.2.0
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.16.0 bitops_1.0-4.1 stats4_2.15.0  zlibbioc_1.2.0
>



-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE.



More information about the Bioconductor mailing list