[BioC] BitSeq getExpression crash

Maayan Kreitzman mkreitzman at bcgsc.ca
Mon Mar 25 23:00:09 CET 2013


Hi Peter,
I'm still having some difficulty with the first step of BitSeq.
I made myself a mini dataset with just 5 million reads so that I could run everything in my R console without writing and R script and submitting to the cluster (just until I know what I'm doing)

but, I got this error:
Error in getMeanVariance(c(outFile), meanFile, log = log, pretend = pretend) : 
  Conditions: file /tmp/RtmpRHbr1N/A08485_gr4.sam_mini-BS-27c66ebf4a21.rpkm failed to open.

since it seemed to be an issue with the temp directory, I tried to change my TMPDIR in the shell to somewhere with plenty of space:
[mkreitzman at xhost09 ~]$ echo $TMPDIR
/projects/wtss_scratch/maayan

but, this did not make a difference to where the temp files were created. 
I copied the whole session below. 

thanks,
Maayan


> res1 <- getExpression("/projects/mkreitzman_prj/expression_quantification_testing/testing/BitSeq/A08485_gr4.sam_mini.sam", "/projects/mkreitzman_prj/expression_quantification_testing/testing/test_data/strand_specific/transcriptome/Homo_sapiens.GRCh37.69.cdna.all.fa",
+ log = TRUE, seed=47)
## Computing alignment probabilities.
[time: +1.283333 m]
[time: +0.333333 m]
[time: +0.000000 m]
[time: +1.200000 m]
[time: +0.500000 m]
[time: +0.000000 m]
## Estimating transcript expression levels.
Mappings: 1606295
Ntotal: 2408007
  10000 [time: +0.000000 s]
  100000 [time: +0.000000 s]
  1000000 [time: +2.000000 s]
Finished Reading!
Total hits = 3212590
Isoforms: 183986
Burn in: 1000 DONE. [time: +6.633333 m]

Sampling DONE. [time: +9.600000 m]
rHat (for 1000 samples) 
   rHat (rHat from subset |    tid | mean theta)
    1.0040 ( 1.0040 |  36651 |  0.0000)
    1.0040 ( 1.0072 | 178210 |  0.0000)
    1.0036 ( 1.0088 | 148680 |  0.0000)
  Mean rHat of worst 10 transcripts: 1.003527
  Mean C0: (50 50 50 50 ). Nunmap: 801712

Producing 649 final samples.

Sampling DONE. [time: +6.833333 m]
rHat (for 649 samples) 
   rHat (rHat from subset |    tid | mean theta)
    1.0061 ( 1.0051 | 157831 |  0.0000)
    1.0059 ( 1.0048 | 104363 |  0.0000)
    1.0058 ( 1.0068 |  71659 |  0.0000)
  Mean rHat of worst 10 transcripts: 1.005543
  Mean C0: (50 50 50 50 ). Nunmap: 801712

Total samples: 6596
## Computing means.
Error in getMeanVariance(c(outFile), meanFile, log = log, pretend = pretend) : 
  Conditions: file /tmp/RtmpRHbr1N/A08485_gr4.sam_mini-BS-27c66ebf4a21.rpkm failed to open.
________________________________________
From: Peter Glaus [glaus at cs.man.ac.uk]
Sent: Friday, March 22, 2013 5:32 AM
To: Maayan Kreitzman
Subject: Re: BitSeq getExpression crash

Hi Maayan,
I believe the error is caused by process running out of memory. I am not
100% sure, but when I saw this kinds of errors before, it was caused by
lack of memory. The estimation can be quite CPU and memory intensive, so
I advice running it on a computing cluster instead of using regular
desktop/notebook machine.

Regarding your function call, when running actual analysis (not just
testing/trying out), please use higher values for MCMC_burnIn,
MCMC_samplesN and MCMC_samplesSave (the default when leaving these blank
is 1000 and is usually "good enough"), the computation will take longer,
however the estimates will be much more accurate as well.
(The values 200, 200, 50 are used in the vignette because the example
data is very small, and the vignette has to run within time limit.)

Also, for future reference when you have questions regarding
Bioconductor packages, please post to Bioconductor user mailing list
(and CC package author), as you might sometimes get replies from other
users and also your post might help some other users if they encounter
similar problem in the future.

Best regards,
Peter.

On 21/03/13 21:46, Maayan Kreitzman wrote:
> Dear Peter,
> I'm trying to run BitSeq, and am running into a problem after several hours of the getExpression function running.
> This same thing happened twice, on different servers. What weird is that not only does the function crash, it actually exits R.
> this is the error message:
>
> terminate called after throwing an instance of 'std::bad_alloc'
>    what():  St9bad_alloc
> Aborted
>
> I have no experience whatsoever with R, so this may be a novice mistake, but your help would be greatly appreciated.
> I've copied the whole session below.
>
> thanks in advance,
> Maayan
>
>
>> library("BitSeq")
> Loading required package: Rsamtools
> Loading required package: IRanges
> Loading required package: BiocGenerics
>
> Attaching package: ‘BiocGenerics’
>
> The following object(s) are masked from ‘package:stats’:
>
>      xtabs
>
> The following object(s) are masked from ‘package:base’:
>
>      anyDuplicated, cbind, colnames, duplicated, eval, Filter, Find,
>      get, intersect, lapply, Map, mapply, mget, order, paste, pmax,
>      pmax.int, pmin, pmin.int, Position, rbind, Reduce, rep.int,
>      rownames, sapply, setdiff, table, tapply, union, unique
>
> Loading required package: GenomicRanges
> Loading required package: Biostrings
> Loading required package: zlibbioc
>> res1 <- getExpression("/projects/mkreitzman_prj/expression_quantification_testing/testing/test_data/strand_specific/transcriptome/bowtie2transcriptome/A08473_gr4.sam",
> + "/projects/mkreitzman_prj/expression_quantification_testing/testing/test_data/strand_specific/transcriptome/Homo_sapiens.GRCh37.69.cdna.all.fa",
> + log = TRUE, MCMC_burnIn=200,MCMC_samplesN=200,MCMC_samplesSave=50,seed=47)
> ## Computing alignment probabilities.
> [time: +1.866667 m]
> [time: +36.400000 m]
> [time: +0.000000 m]
> [time: +117.050000 m]
> [time: +0.500000 m]
> [time: +0.000000 m]
> ## Estimating transcript expression levels.
> Mappings: 71092830
> Ntotal: 123098679
>    10000 [time: +1.000000 s]
>    100000 [time: +0.000000 s]
>    1000000 [time: +3.000000 s]
>    10000000 [time: +25.000000 s]
> Read only 14186178 reads.
> Finished Reading!
> Total hits = 28372355
> Isoforms: 183985
> Burn in: 200 DONE. [time: +12.016667 m]
>
> Sampling DONE. [time: +12.850000 m]
> rHat (for 200 samples)
>     rHat (rHat from subset |    tid | mean theta)
>      1.0252 ( 1.1173 |  89080 |  0.0000)
>      1.0216 ( 1.1351 | 126802 |  0.0000)
>      1.0183 ( 1.0151 | 183201 |  0.0000)
>    Mean rHat of worst 10 transcripts: 1.018596
>    Mean C0: (3516 3520 3529 3518 ). Nunmap: 52005849
>
> Producing 33 final samples.
>
> Sampling DONE. [time: +2.166667 m]
> rHat (for 33 samples)
>     rHat (rHat from subset |    tid | mean theta)
>      1.1193 ( 1.1332 | 117458 |  0.0000)
>      1.1181 ( 1.1229 | 158878 |  0.0000)
>      1.1074 ( 1.1004 |  43840 |  0.0000)
>    Mean rHat of worst 10 transcripts: 1.108279
>    Mean C0: (3528 3512 3523 3520 ). Nunmap: 52005849
>
> Total samples: 932
> terminate called after throwing an instance of 'std::bad_alloc'
>    what():  St9bad_alloc
> Aborted



More information about the Bioconductor mailing list