[BioC] QuasR on Linux Cluster

Tue Oct 22 17:33:05 CEST 2013

Thanks Ugo, running the script directly on the server was a good idea -
something seems to have eaten the error messages before.

On 22.10.2013 15:08, Ugo Borello wrote:
> I will run more tests to understand where is the problem.
> 
> But I don't know if, in the meantime, this could help:
> when I run qAlign from the R console on the server I get:
> 
>> proj <- qAlign(sampleFile, genome= genomeName, splicedAlignment=TRUE,
> clObj=cl)
> alignment files missing - need to:
>     create 1 genomic alignment(s)
> will start in ..9s..8s..7s..6s..5s..4s..3s..2s..1s
> Testing the compute nodes...OK
> Loading QuasR on the compute nodes...OK
> Available cores:
> nodeNames
> ccage014 
>        4 
> Performing genomic alignments for 1 samples. See progress in the log file:
> /sps/inter/isc/uborello/input/QuasR_log_14c95dfa8d40.txt
> Error in checkForRemoteErrors(val) :
>   one node produced an error: Error on ccage014 processing sample
> /sps/inter/isc/uborello/input/133het.fastq : error in evaluating the
> argument 'file' in selecting a method for function 'scanFaIndex': Error in
> value[[3L]](cond) : 'open' index failed
>   file: /tmp/RtmpnzdpVP/file722251e6f03c.fa
> Calls: open ... tryCatch -> tryCatchList -> tryCatchOne -> <Anonymous>
The failure is in Rsamtools::scanFaIndex, which cannot open
/tmp/RtmpnzdpVP/file722251e6f03c.fa or it's index. As mentioned before,
my guess is that /tmp/ has run out of disk space...could you check that.

> 
> And when I drop the argument ' splicedAlignment=TRUE' I get:
> 
> Performing genomic alignments for 1 samples. See progress in the log file:
> /sps/inter/isc/uborello/input/QuasR_log_14c9af4e6be.txt
> sh: line 1:  7369 Aborted                 (core dumped)
> '/sps/inter/isc/uborello/software/lib64/R/library/Rbowtie/bowtie'
> '/sps/inter/isc/uborello/software/lib64/R/library/BSgenome.Mmusculus.UCSC.mm
> 10.Rbowtie/alignmentIndex/bowtieIndex'
> '/sps/inter/isc/uborello/input/133het.fastq' -m 1 --best --strata
> --phred33-quals -S -p 4 '/tmp/RtmpnzdpVP/133het.fastq7222669b51e6.sam' 2>&1
> Error in checkForRemoteErrors(val) :
>   one node produced an error: Error on ccage014 processing sample
> /sps/inter/isc/uborello/input/133het.fastq : bowtie failed to perform the
> alignments
Assuming that bowtie and the genome index are fine, this could also be
related to the disk beim full, since it fails to open the output file
/tmp/RtmpnzdpVP/133het.fastq7222669b51e6.sam

Michael

>> From: Michael Stadler <michael.stadler at fmi.ch>
>> Date: Tue, 22 Oct 2013 11:50:05 +0200
>> To: Ugo Borello <ugo.borello at inserm.fr>, <bioconductor at r-project.org>
>> Subject: Re: [BioC] QuasR on Linux Cluster
>>
>> I can see from the intermediate files that SpliceMap was stopped halfway
>> through, before it could create the single sam file with spliced alignments.
>>
>> QuasR tries to detect such cases in the child R process (one of the R
>> processes spawned in your cluster object) and throws an error with a
>> descriptive message. However, you do not get this error message. Rather,
>> you get an error indicating that the parent R process lost it's
>> connection to the child R process.
>>
>> It's hard to get at this from far, so I'll have to wildly guess. Could
>> it be that the child R process is terminated and therefore neither able
>> to signal failure, nor to communicate with the parent R process? Can you
>> give more details about your setup, e.g. if you are running some batch
>> or queueing system that controls job execution?
>>
>> Other things that may help to narrow down the problem is to rerun
>> qAlign() on a subset of the dataset, or without a cluster object. It may
>> also help to know a bit more about the sample you try to analyse (read
>> length, read number, sequence file format).
>>
>> Michael
>>
>>
>>
>>
>> On 22.10.2013 10:43, Ugo Borello wrote:
>>> Dear Michael,
>>> I think that the disk space is not an issue; anyway, I will double check
>>> with the administrator.
>>>
>>> I used 4 nodes and QuasR stopped at the .sam file. See the output files in
>>> attachment.
>>>
>>> When I use less than 4 nodes, it stops at the beginning of the process:
>>>
>>> [1] "Writing BSgenome to disk on ccwsge0144 :
>>> /scratch/4847271.1.huge/Rtmp7nHkpp/file5971727e49b5.fa"
>>>
>>>
>>>
>>> What am I missing?
>>>
>>> Thank you
>>>
>>> Ugo
>>>
>>>
>>>
>>>> From: Michael Stadler <michael.stadler at fmi.ch>
>>>> Date: Mon, 21 Oct 2013 17:48:53 +0200
>>>> To: Ugo Borello <ugo.borello at inserm.fr>, <bioconductor at r-project.org>
>>>> Subject: Re: [BioC] QuasR on Linux Cluster
>>>>
>>>> Your cluster object seems functional now.
>>>>
>>>> Another possible problem could be available diskspace in R's tempdir().
>>>> It is used by qAlign to temporarily store the uncompressed fastq files,
>>>> the sam files and the bam files (and thus needs several-fold more free
>>>> capacity than the size of your fastq.gz files). For more information,
>>>> see vignette section 4.1 "File storage locations".
>>>>
>>>> If tempdir() is too small, you can use redirect R's tempdir() by setting
>>>> the TMPDIR environment variable, or just for one qAlign call by using
>>>> the "cacheDir" parameter of qAlign.
>>>>
>>>> If you are sure that diskspace is not the issue, could you give qAlign()
>>>> another try, using a cluster object with only 4 nodes to avoid any
>>>> memory issues?
>>>>
>>>> Michael
>>>>
>>>>
>>>> On 21.10.2013 15:09, Ugo Borello wrote:
>>>>> Thank you Michael,
>>>>> My bad, I am not able to find the QuasR_log at the moment. Anyway the last
>>>>> step was the .sam file. QuasR was not proceeding in converting the .sam
>>>>> file
>>>>> to a .bam file.
>>>>> In attachment some other info on the running job before death.
>>>>> Those refer to a case where cl<- makeCluster(1).
>>>>>
>>>>>
>>>>> I run your test and I got:
>>>>>> library(parallel)
>>>>>> cl<- makeCluster(detectCores())
>>>>>> info<- parLapply(cl, seq_along(cl), function(i) Sys.info())
>>>>>> info
>>>>> [[1]]
>>>>>                              sysname                              release
>>>>>                              "Linux"                 "2.6.18-348.3.1.el5"
>>>>>                              version                             nodename
>>>>> "#1 SMP Tue Mar 5 13:19:32 EST 2013"                         "ccwsge0053"
>>>>>                              machine                                login
>>>>>                             "x86_64"                            "unknown"
>>>>>                                 user                       effective_user
>>>>>                           "uborello"                           "uborello"
>>>>>
>>>>> The same for the 32 nodes.
>>>>>
>>>>> Then I run:
>>>>>> library(parallel)
>>>>>> type <- if (exists("mcfork", mode="function")) "FORK" else "PSOCK"
>>>>>> type
>>>>> [1] "PSOCK"
>>>>>> cores <- getOption("mc.cores", detectCores())
>>>>>> cl <- makeCluster(cores, type=type)
>>>>>> cl
>>>>> socket cluster with 32 nodes on host 'localhost'
>>>>>> results <- parLapply(cl, 1:100, sqrt)
>>>>>> sum(unlist(results))
>>>>> [1] 671.4629
>>>>>> stopCluster(cl)
>>>>>
>>>>> I don't know if this could help.
>>>>>
>>>>> Any suggestions?
>>>>>
>>>>> Ugo
>>>>>
>>>>>
>>>>>
>>>>>> From: Michael Stadler <michael.stadler at fmi.ch>
>>>>>> Date: Mon, 21 Oct 2013 11:30:27 +0200
>>>>>> To: <bioconductor at r-project.org>
>>>>>> Subject: Re: [BioC] QuasR on Linux Cluster
>>>>>>
>>>>>> Hi Ugo,
>>>>>>
>>>>>> On 18.10.2013 13:56, Ugo Borello wrote:> Hi all,
>>>>>>> I am trying to use QuasR on a Linux Cluster:1 machine/multiple cores.
>>>>>>>
>>>>>>> I run:
>>>>>>> library(QuasR)
>>>>>>> library(BSgenome.Mmusculus.UCSC.mm10)
>>>>>>>
>>>>>>> cl <- makeCluster(1)
>>>>>>>
>>>>>>> sampleFile <- "sampleFile.txt"
>>>>>>>
>>>>>>> genomeName <- "BSgenome.Mmusculus.UCSC.mm10"
>>>>>>>
>>>>>>> proj <- qAlign(sampleFile, genome= genomeName, splicedAlignment=TRUE,
>>>>>>> clObj=cl)
>>>>>>>
>>>>>>> And I get
>>>>>>>> proj <- qAlign(sampleFile, genome= genomeName, splicedAlignment=TRUE,
>>>>>>> clObj=cl)
>>>>>>> alignment files missing - need to:
>>>>>>>     create 1 genomic alignment(s)
>>>>>>> Testing the compute nodes...OK
>>>>>>> Loading QuasR on the compute nodes...OK
>>>>>>> Available cores:
>>>>>>> nodeNames
>>>>>>> ccwsge0155
>>>>>>>          1
>>>>>>> Performing genomic alignments for 1 samples. See progress in the log
>>>>>>> file:
>>>>>>> /scratch/4401022.1.huge/QuasR_log_41394115a102.txt
>>>>>>> Error in unserialize(node$con) : error reading from connection
>>>>>>> Calls: qAlign ... FUN -> recvData -> recvData.SOCKnode -> unserialize
>>>>>>> Execution halted
>>>>>>
>>>>>> The error that you get is not created within QuasR; my guess is that it
>>>>>> comes from the "parallel" package, indicating that something goes wrong
>>>>>> when using your cluster object "cl".
>>>>>>
>>>>>> I would suggest testing whether your cluster object works fine. It would
>>>>>> help to know if the error message appears immediately after you call
>>>>>> qAlign(), or if it takes some time to process. Also, it would be great
>>>>>> to see the content of the QuasR log file.
>>>>>>
>>>>>> Here is a simple test you could try to check your cluster
>>>>>> object/connection:
>>>>>> parLapply(cl, seq_along(cl), function(i) Sys.info())
>>>>>>
>>>>>> As a result, you should get Sys.info() output from each of the cluster
>>>>>> nodes.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> I also tryied to modify the multicore option
>>>>>>>
>>>>>>> cl <- makeCluster(detectCores())
>>>>>>>
>>>>>>> And my job is killed because it uses more memory ( Max vmem = 17.118G)
>>>>>>> than
>>>>>>> allowed (16G)
>>>>>> With splicedAlignment=TRUE, QuasR will run spliceMap for aligning your
>>>>>> reads, which may require several GB of memory per node in your cluster
>>>>>> object. You can avoid the memory overflow by reducing the number of
>>>>>> nodes in your cluster object, e.g. by:
>>>>>>
>>>>>> cl <- makeCluster(4)
>>>>>>
>>>>>> which should run through on your machine with 16GB of memory.
>>>>>>
>>>>>> Best,
>>>>>> Michael
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioconductor mailing list
>>>>>> Bioconductor at r-project.org
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>> Search the archives:
>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>
>>>
> 
>