[BioC] rtracklayer carrying over query results

Janet Young jayoung at fhcrc.org
Tue Jul 30 21:38:34 CEST 2013


Hi again,

I'm digging in to rtracklayer more, and find another weird issue.  I have a big query that I know fails. I then run a second, smaller query that succeeds. If I then run the first big query again, if appears to work and returns the result from the second small query, even if I try deleting the second small query and its result. Again, I hope the full code below will explain. It seems like something is being kept in memory that shouldn't be - does this make any sense?

thanks,

Janet


library(rtracklayer)
library(GenomicRanges)

session <- browserSession("UCSC")
genome(session) <- "hg19"

#### make some sample ranges - a large number of small ranges:
numRanges <- 50000
rangeWidths <- 50
myRanges <- GRanges( seqnames=rep("chr1",numRanges), 
              ranges=IRanges(start=1:numRanges*rangeWidths*2,width=rangeWidths) )

#### run a query - this one is too big, and fails (I already emailed about this error yesterday):
query <- ucscTableQuery (session, "cons46way", range=myRanges)
tableName(query) <- "phyloP46wayPrimates"
scores <- track(query)

## here's the error:
Error in .Call2("solve_user_SEW0", start, end, width, PACKAGE = "IRanges") : 
  solving row 100001: range cannot be determined from the supplied arguments (too many NAs)
In addition: Warning messages:
1: In matrix(as.numeric(unlist(split_lines)), nrow = 2) :
  NAs introduced by coercion
2: In matrix(as.numeric(unlist(split_lines)), nrow = 2) :
  data length [200005] is not a sub-multiple or multiple of the number of rows [2]

### now run a small query that works
query1 <- ucscTableQuery (session, "cons46way", range=myRanges[201:210])
tableName(query1) <- "phyloP46wayPrimates"
scores1 <- track(query1)
length(scores1)
# [1] 500

### now run first query again (the one that failed) - this time it appears to work and returns the same result as query1
query2 <- ucscTableQuery (session, "cons46way", range=myRanges)
tableName(query2) <- "phyloP46wayPrimates"
scores2 <- track(query2)
length(scores2)
# [1] 500

identical(scores1, scores2)
# [1] TRUE

#### even if I remove all the queries and results from before, the big query that would normally fail is still returning results of the second small query. Something is not being reset that should be:

rm(query, query1,scores1,query2,scores2,numRanges,rangeWidths)
ls()
#[1] "myRanges"    "numRanges"   "rangeWidths" "session"   

query3 <- ucscTableQuery (session, "cons46way", range=myRanges)
tableName(query3) <- "phyloP46wayPrimates"
scores3 <- track(query3)
length(scores3)
# [1] 500

##################

sessionInfo()

R version 3.0.1 Patched (2013-07-29 r63455)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] rtracklayer_1.21.9    GenomicRanges_1.13.35 XVector_0.1.0        
[4] IRanges_1.19.19       BiocGenerics_0.7.3   

loaded via a namespace (and not attached):
[1] Biostrings_2.29.14 bitops_1.0-5       BSgenome_1.29.1    RCurl_1.95-4.1    
[5] Rsamtools_1.13.26  stats4_3.0.1       tools_3.0.1        XML_3.98-1.1      
[9] zlibbioc_1.7.0    



More information about the Bioconductor mailing list