[BioC] Curious error with 'subseq' function from BSgenome (IRanges)

J.delasHeras at ed.ac.uk J.delasHeras at ed.ac.uk
Fri Jun 4 12:52:31 CEST 2010


Hi everyone,

I am using the BSgenome package and annotations to retrieve several  
thousand sequences (22k) corresponding to a promoter microarray.

Basically I run a loop through the whole list of chromosome name,  
start, and stop coordinates, and retrieve each 1Kb sequence using the  
'subseq' function.

When I run it, I get the following error *sometimes*:
Error in get(name, envir = .classTable) :
   formal argument "envir" matched by multiple actual arguments

The first time, I retrieved the index at which it had encountered the  
error, and ran the 'subseq' command alone. No problem. In fact, if I  
re-run teh whole thing the error may occur at another point. Once it  
even ran the whole thing without a hitch.

I ended up putting the loop within a 'try' function, so that if there  
was an error, the loop coould restart where it left earlier and  
eventually retrieve the whole list. The number of times there's an  
error varies from run to run, and I see that the error messages are  
also varied.

I just re-ran the loop again, just for fun. This is the code:

library(BSgenome.Mmusculus.UCSC.mm8)
# create vectors to store results in:
newseq2<-vector(mode="character", length=dim(UInfo)[1])
newstart2<-vector(mode="numeric", length=dim(UInfo)[1])
newstop2<-vector(mode="numeric", length=dim(UInfo)[1])
ambiguous.orientation<-c()

#UInfo is a data frame containing annotations. I extract  
chr,start,stop from it
j<-1
i<-1
while(i<=dim(UInfo)[1])
   {
   if (i==dim(UInfo)[1]) stop("finished")
   try(
   for (i in j:dim(UInfo)[1])
     {
     # first extract chromosome name from the "NimbleGenID" included
     # in the annotation.
     # It is in the same format as the BSgenome annotation package
     # for mouse, so it's a straight extraction:
     chr<-sub(":.+$","",unlist(strsplit(UInfo[i,"NimbleGenID"],split=" "))[1])
     if (chr=="NA") next
     # extract start and stop:
     start<-as.numeric(UInfo[i,"Start"])
     stop<-as.numeric(UInfo[i,"End"])
     # extract strand orientation:
     strand<-UInfo[i,"Frame"]
     # calculate the coordinates for the 1Kb upstream region:
     if (strand=="-")
       {
       upstart<-stop+1
       upstop<-min(upstart+1000,length(Mmusculus[[chr]]))
       }
     if (strand=="+")
       {
       upstart<-max(start-1000,1)
       upstop<-max(start-1,1)
       }
     if (!(strand %in% c("+","-")))
       {
       upstart<-upstop<-NA
       # when orientation is not clearly given, store indices for
       # further processing:
       ambiguous.orientation<-c(ambiguous.orientation,i)
       newseq2[i]<-"NNN"
       newstart2[i]<-upstart
       newstop2[i]<-upstop
       next
       }
     #extract sequence:
     sequence<-subseq(Mmusculus[[chr]],upstart,upstop)
     sequence<-as.character(sequence)
     #store results:
     newstart2[i]<-upstart
     newstop2[i]<-upstop
     newseq2[i]<-sequence
     })
   # check whether the last index done is the last in the list.
   # if not, it means tehre was an abnormal exit.
   # update "j" to teh value of the last index "i", and the
   # loop will restart from the point it left earlier:
   if (i!=dim(UInfo)[1]) j<-i
   # write a tell-tale file so I can see where the problems occur as they
   # happen:
   write.table(1, paste(i,"_"))
   }


This time it produced an error 7 times. The errors reported were:
Error in get(name, envir = .classTable) :
   formal argument "envir" matched by multiple actual arguments
Error in assign(".target", method at target, envir = envir) :
   formal argument "envir" matched by multiple actual arguments
Error in assign(".defined", method at defined, envir = envir) :
   formal argument "envir" matched by multiple actual arguments
Error in assign("disabled", disabled, envir = .validity_options) :
   formal argument "envir" matched by multiple actual arguments
Error in assign(".defined", method at defined, envir = envir) :
   no function to return from, jumping to top level
Error in shift(restrict(nir, start = solved_start, end = solved_end),  :
   error in evaluating the argument 'x' in selecting a method for  
function 'shift'
Error in assign(".Method", method, envir = envir) :
   formal argument "envir" matched by multiple actual arguments
Error: finished

The last one is not really an error, I just used the 'stop' function  
to report the job was done, so it says "error"...

Clearly there is nothing wrong with the coordinates or other  
parameters in the subseq command, because I can repeat it.
I find it very strange that the errors will happen at different  
points... or sometimes (rarely) nowhere at all.

I got the result I was after by embedding the loop in a 'try' command,  
and that inside a 'while' loop... But I wonder why this happened in  
the first place.

My session info follows:


> sessionInfo()
R version 2.10.0 (2009-10-26)
i386-pc-mingw32

locale:
[1] LC_COLLATE=English_United Kingdom.1252
[2] LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] BSgenome.Mmusculus.UCSC.mm8_1.3.16 BSgenome_1.14.2
[3] Biostrings_2.14.12                 IRanges_1.4.16

loaded via a namespace (and not attached):
[1] Biobase_2.6.1 tools_2.10.0


Jose

-- 
Dr. Jose I. de las Heras                      Email: J.delasHeras at ed.ac.uk
The Wellcome Trust Centre for Cell Biology    Phone: +44 (0)131 6507095
Institute for Cell & Molecular Biology        Fax:   +44 (0)131 6507360
Swann Building, Mayfield Road
University of Edinburgh
Edinburgh EH9 3JR
UK
*********************************************
Alternative email: nach.mcnach at gmail.com
*********************************************

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



More information about the Bioconductor mailing list