[R] writeLines + foreach/doMC

Mario Valle mvalle at cscs.ch
Mon Jul 4 14:55:20 CEST 2011


Read something about parallel processing and how I/O should be done by a 
single process.
Suggestion: write a different file from each thread then combine the 
results with cat or similar.
Hope it helps
                                  mario

On 04-Jul-11 11:58, Ramzi TEMANNI wrote:
> Hi
> I'm processing sequencing data trying to collapsing the locations of each
> unique sequence and write the results to a file (as storing that in a table
> will require 10GB mem at least)
> so I wrote a function that, given a sequence id, provide the needed line to
> be stored
> library(doMC) # load library
> registerDoMC(12) # assign the Number of CPU
>
>
> fileConn<-file(paste(fq_file,"_SeqID.txt",sep=""),open = "at") # open
> connection
> writeLines(paste("ReadID","Freq","Seq","LOC_UG","Nb_UG_Seq",sep="\t"),
> fileConn) # write header
> foreach(i=1:length(uniq.Seq)) %dopar% # for eqch unique sequence
> {
> writeLines(paste(gettable1(uniq.Seq[i]),collapse="   "), fileConn) #write
> the the results line
> }
> close(fileConn)
>
> the code excute well, but the problem is that some lines are wired:
> The  header and lot of lines are ok :
> ReadID    Freq    Seq    LOC_UG    Nb_UG_Seq
> HWI-EA332_0036:5:16:9530:21025#ATGC/1   XXXXXXXXXXXXXXXXXXXX  2
> XXXXX_10130:489:+,XXXXX_10130:489:+   2
> HWI-EA332_0036:5:117:6674:4940#ATGC/1   XXXXXXXXXXXXXXXXXXXX   1
> XXXXX:432:-,XXXXX:432:-   2
> HWI-EA332_0036:5:62:15592:7375#ATGC/1   XXXXXXXXXXXXXXXXXXXX   2
> XXXXX_22660:253:+,XXXXX_22660:253:+   2
> HWI-EA332_0036:5:110:14349:8422#ATGC/1   XXXXXXXXXXXXXXXXXXXX   4
> XXXXX_13806:399:+,XXXXX_13806:399:+,XXXXX_27263:481:+,XXXXX_27263:481:+   4
> other looks wired
> HWI-EA332_0036:5:17:1400ReadID    Freq    Seq    LOC_UG    Nb_UG_Seq
> HWI-EA332_0036:5:61:7734:4201ReadID    Freq    Seq    LOC_UG    Nb_UG_Seq
> HWI-EA332_0036:5:117:5361:10666#ATGReadID    Freq    Seq    LOC_UG
> Nb_UG_Seq
> HWI-EA332_0036:5:115:7421:20664#ATGC/1   GATCReadID    Freq    Seq
> LOC_UG    Nb_UG_Seq
> HWI-EA332_0036:5:175:95:-   2
> HWI-EA332_0036:5JCVI_35536:444:+   2
> XXXXXXXXX   1  XXXXX_22484:571:-,XXXXX_22484:571:-   2
>
> Is this due to the fact that one process start to write prior the other has
> finished ?
> Is there a way to solve this problem ?
> Any suggestions would be greatly appreciated.
> Thanks and have a nice day.
>
>
> Best,
> Ramzi TEMANNI
> http://www.linkedin.com/in/ramzitemanni
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Ing. Mario Valle
Data Analysis and Visualization Group            | http://www.cscs.ch/~mvalle
Swiss National Supercomputing Centre (CSCS)      | Tel:  +41 (91) 610.82.60
v. Cantonale Galleria 2, 6928 Manno, Switzerland | Fax:  +41 (91) 610.82.82



More information about the R-help mailing list