[BioC] QuasR on Linux Cluster

Ugo Borello ugo.borello at inserm.fr
Tue Oct 22 10:43:04 CEST 2013


Dear Michael,
I think that the disk space is not an issue; anyway, I will double check
with the administrator.

I used 4 nodes and QuasR stopped at the .sam file. See the output files in
attachment.

When I use less than 4 nodes, it stops at the beginning of the process:

[1] "Writing BSgenome to disk on ccwsge0144 :
/scratch/4847271.1.huge/Rtmp7nHkpp/file5971727e49b5.fa"



What am I missing?

Thank you

Ugo



> From: Michael Stadler <michael.stadler at fmi.ch>
> Date: Mon, 21 Oct 2013 17:48:53 +0200
> To: Ugo Borello <ugo.borello at inserm.fr>, <bioconductor at r-project.org>
> Subject: Re: [BioC] QuasR on Linux Cluster
> 
> Your cluster object seems functional now.
> 
> Another possible problem could be available diskspace in R's tempdir().
> It is used by qAlign to temporarily store the uncompressed fastq files,
> the sam files and the bam files (and thus needs several-fold more free
> capacity than the size of your fastq.gz files). For more information,
> see vignette section 4.1 "File storage locations".
> 
> If tempdir() is too small, you can use redirect R's tempdir() by setting
> the TMPDIR environment variable, or just for one qAlign call by using
> the "cacheDir" parameter of qAlign.
> 
> If you are sure that diskspace is not the issue, could you give qAlign()
> another try, using a cluster object with only 4 nodes to avoid any
> memory issues?
> 
> Michael
> 
> 
> On 21.10.2013 15:09, Ugo Borello wrote:
>> Thank you Michael,
>> My bad, I am not able to find the QuasR_log at the moment. Anyway the last
>> step was the .sam file. QuasR was not proceeding in converting the .sam file
>> to a .bam file.
>> In attachment some other info on the running job before death.
>> Those refer to a case where cl<- makeCluster(1).
>> 
>> 
>> I run your test and I got:
>>> library(parallel)
>>> cl<- makeCluster(detectCores())
>>> info<- parLapply(cl, seq_along(cl), function(i) Sys.info())
>>> info
>> [[1]]
>>                              sysname                              release
>>                              "Linux"                 "2.6.18-348.3.1.el5"
>>                              version                             nodename
>> "#1 SMP Tue Mar 5 13:19:32 EST 2013"                         "ccwsge0053"
>>                              machine                                login
>>                             "x86_64"                            "unknown"
>>                                 user                       effective_user
>>                           "uborello"                           "uborello"
>> 
>> The same for the 32 nodes.
>> 
>> Then I run:
>>> library(parallel)
>>> type <- if (exists("mcfork", mode="function")) "FORK" else "PSOCK"
>>> type
>> [1] "PSOCK"
>>> cores <- getOption("mc.cores", detectCores())
>>> cl <- makeCluster(cores, type=type)
>>> cl
>> socket cluster with 32 nodes on host 'localhost'
>>> results <- parLapply(cl, 1:100, sqrt)
>>> sum(unlist(results))
>> [1] 671.4629
>>> stopCluster(cl)
>> 
>> I don't know if this could help.
>> 
>> Any suggestions?
>> 
>> Ugo
>> 
>> 
>> 
>>> From: Michael Stadler <michael.stadler at fmi.ch>
>>> Date: Mon, 21 Oct 2013 11:30:27 +0200
>>> To: <bioconductor at r-project.org>
>>> Subject: Re: [BioC] QuasR on Linux Cluster
>>> 
>>> Hi Ugo,
>>> 
>>> On 18.10.2013 13:56, Ugo Borello wrote:> Hi all,
>>>> I am trying to use QuasR on a Linux Cluster:1 machine/multiple cores.
>>>> 
>>>> I run:
>>>> library(QuasR)
>>>> library(BSgenome.Mmusculus.UCSC.mm10)
>>>> 
>>>> cl <- makeCluster(1)
>>>> 
>>>> sampleFile <- "sampleFile.txt"
>>>> 
>>>> genomeName <- "BSgenome.Mmusculus.UCSC.mm10"
>>>> 
>>>> proj <- qAlign(sampleFile, genome= genomeName, splicedAlignment=TRUE,
>>>> clObj=cl)
>>>> 
>>>> And I get
>>>>> proj <- qAlign(sampleFile, genome= genomeName, splicedAlignment=TRUE,
>>>> clObj=cl)
>>>> alignment files missing - need to:
>>>>     create 1 genomic alignment(s)
>>>> Testing the compute nodes...OK
>>>> Loading QuasR on the compute nodes...OK
>>>> Available cores:
>>>> nodeNames
>>>> ccwsge0155
>>>>          1
>>>> Performing genomic alignments for 1 samples. See progress in the log file:
>>>> /scratch/4401022.1.huge/QuasR_log_41394115a102.txt
>>>> Error in unserialize(node$con) : error reading from connection
>>>> Calls: qAlign ... FUN -> recvData -> recvData.SOCKnode -> unserialize
>>>> Execution halted
>>> 
>>> The error that you get is not created within QuasR; my guess is that it
>>> comes from the "parallel" package, indicating that something goes wrong
>>> when using your cluster object "cl".
>>> 
>>> I would suggest testing whether your cluster object works fine. It would
>>> help to know if the error message appears immediately after you call
>>> qAlign(), or if it takes some time to process. Also, it would be great
>>> to see the content of the QuasR log file.
>>> 
>>> Here is a simple test you could try to check your cluster object/connection:
>>> parLapply(cl, seq_along(cl), function(i) Sys.info())
>>> 
>>> As a result, you should get Sys.info() output from each of the cluster
>>> nodes.
>>> 
>>> 
>>>> 
>>>> I also tryied to modify the multicore option
>>>> 
>>>> cl <- makeCluster(detectCores())
>>>> 
>>>> And my job is killed because it uses more memory ( Max vmem = 17.118G) than
>>>> allowed (16G)
>>> With splicedAlignment=TRUE, QuasR will run spliceMap for aligning your
>>> reads, which may require several GB of memory per node in your cluster
>>> object. You can avoid the memory overflow by reducing the number of
>>> nodes in your cluster object, e.g. by:
>>> 
>>> cl <- makeCluster(4)
>>> 
>>> which should run through on your machine with 16GB of memory.
>>> 
>>> Best,
>>> Michael
>>> 
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> 

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: QuasR.out 4800353.txt
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20131022/d1ebb2e4/attachment.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: QuasR_log_37446ea52111.txt
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20131022/d1ebb2e4/attachment-0001.txt>


More information about the Bioconductor mailing list