[BioC] Curious error with 'subseq' function from BSgenome (IRanges)
Martin Morgan
mtmorgan at fhcrc.org
Fri Jun 4 13:06:41 CEST 2010
On 06/04/2010 03:52 AM, J.delasHeras at ed.ac.uk wrote:
>
> Hi everyone,
>
> I am using the BSgenome package and annotations to retrieve several
> thousand sequences (22k) corresponding to a promoter microarray.
>
> Basically I run a loop through the whole list of chromosome name, start,
> and stop coordinates, and retrieve each 1Kb sequence using the 'subseq'
> function.
>
> When I run it, I get the following error *sometimes*:
> Error in get(name, envir = .classTable) :
> formal argument "envir" matched by multiple actual arguments
Hi Jose --
This sounds like a bug in R, fixed in the R-2.11.* series, and updating
your R (and packages, see http://bioconductor.org/docs/install/) should
fix this. If not, it would be great to hear...
Martin
>
> The first time, I retrieved the index at which it had encountered the
> error, and ran the 'subseq' command alone. No problem. In fact, if I
> re-run teh whole thing the error may occur at another point. Once it
> even ran the whole thing without a hitch.
>
> I ended up putting the loop within a 'try' function, so that if there
> was an error, the loop coould restart where it left earlier and
> eventually retrieve the whole list. The number of times there's an error
> varies from run to run, and I see that the error messages are also varied.
>
> I just re-ran the loop again, just for fun. This is the code:
>
> library(BSgenome.Mmusculus.UCSC.mm8)
> # create vectors to store results in:
> newseq2<-vector(mode="character", length=dim(UInfo)[1])
> newstart2<-vector(mode="numeric", length=dim(UInfo)[1])
> newstop2<-vector(mode="numeric", length=dim(UInfo)[1])
> ambiguous.orientation<-c()
>
> #UInfo is a data frame containing annotations. I extract chr,start,stop
> from it
> j<-1
> i<-1
> while(i<=dim(UInfo)[1])
> {
> if (i==dim(UInfo)[1]) stop("finished")
> try(
> for (i in j:dim(UInfo)[1])
> {
> # first extract chromosome name from the "NimbleGenID" included
> # in the annotation.
> # It is in the same format as the BSgenome annotation package
> # for mouse, so it's a straight extraction:
> chr<-sub(":.+$","",unlist(strsplit(UInfo[i,"NimbleGenID"],split="
> "))[1])
> if (chr=="NA") next
> # extract start and stop:
> start<-as.numeric(UInfo[i,"Start"])
> stop<-as.numeric(UInfo[i,"End"])
> # extract strand orientation:
> strand<-UInfo[i,"Frame"]
> # calculate the coordinates for the 1Kb upstream region:
> if (strand=="-")
> {
> upstart<-stop+1
> upstop<-min(upstart+1000,length(Mmusculus[[chr]]))
> }
> if (strand=="+")
> {
> upstart<-max(start-1000,1)
> upstop<-max(start-1,1)
> }
> if (!(strand %in% c("+","-")))
> {
> upstart<-upstop<-NA
> # when orientation is not clearly given, store indices for
> # further processing:
> ambiguous.orientation<-c(ambiguous.orientation,i)
> newseq2[i]<-"NNN"
> newstart2[i]<-upstart
> newstop2[i]<-upstop
> next
> }
> #extract sequence:
> sequence<-subseq(Mmusculus[[chr]],upstart,upstop)
> sequence<-as.character(sequence)
> #store results:
> newstart2[i]<-upstart
> newstop2[i]<-upstop
> newseq2[i]<-sequence
> })
> # check whether the last index done is the last in the list.
> # if not, it means tehre was an abnormal exit.
> # update "j" to teh value of the last index "i", and the
> # loop will restart from the point it left earlier:
> if (i!=dim(UInfo)[1]) j<-i
> # write a tell-tale file so I can see where the problems occur as they
> # happen:
> write.table(1, paste(i,"_"))
> }
>
>
> This time it produced an error 7 times. The errors reported were:
> Error in get(name, envir = .classTable) :
> formal argument "envir" matched by multiple actual arguments
> Error in assign(".target", method at target, envir = envir) :
> formal argument "envir" matched by multiple actual arguments
> Error in assign(".defined", method at defined, envir = envir) :
> formal argument "envir" matched by multiple actual arguments
> Error in assign("disabled", disabled, envir = .validity_options) :
> formal argument "envir" matched by multiple actual arguments
> Error in assign(".defined", method at defined, envir = envir) :
> no function to return from, jumping to top level
> Error in shift(restrict(nir, start = solved_start, end = solved_end), :
> error in evaluating the argument 'x' in selecting a method for
> function 'shift'
> Error in assign(".Method", method, envir = envir) :
> formal argument "envir" matched by multiple actual arguments
> Error: finished
>
> The last one is not really an error, I just used the 'stop' function to
> report the job was done, so it says "error"...
>
> Clearly there is nothing wrong with the coordinates or other parameters
> in the subseq command, because I can repeat it.
> I find it very strange that the errors will happen at different
> points... or sometimes (rarely) nowhere at all.
>
> I got the result I was after by embedding the loop in a 'try' command,
> and that inside a 'while' loop... But I wonder why this happened in the
> first place.
>
> My session info follows:
>
>
>> sessionInfo()
> R version 2.10.0 (2009-10-26)
> i386-pc-mingw32
>
> locale:
> [1] LC_COLLATE=English_United Kingdom.1252
> [2] LC_CTYPE=English_United Kingdom.1252
> [3] LC_MONETARY=English_United Kingdom.1252
> [4] LC_NUMERIC=C
> [5] LC_TIME=English_United Kingdom.1252
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] BSgenome.Mmusculus.UCSC.mm8_1.3.16 BSgenome_1.14.2
> [3] Biostrings_2.14.12 IRanges_1.4.16
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.6.1 tools_2.10.0
>
>
> Jose
>
--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioconductor
mailing list