[Bioc-devel] systemPipeR error - Error in NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) :

Mon Oct 26 20:15:46 CET 2015

Hi Thomas,

Thanks for the quick fix.

Sonali.

On 10/25/2015 1:06 PM, Thomas Girke wrote:
> I fixed this in systemPipeR versions 1.4.3/1.5.3. The reason for this error
> was that the tx_type column contains only NA values when a txdb is generated with
> makeTxDbFromUCSC(). Returning here something more meaningful may be useful,
> such as the transcript type information available when a txdb is generated
> from a GFF.
>
> Thanks,
>
> Thomas
>
> On Fri, Oct 23, 2015 at 12:49:09AM +0000, Thomas Girke wrote:
>> Thanks. Good to know. I have never tried this with an txdb instance
>> from makeTxDbFromUCSC(). Will fix this over the weekend.
>> Thomas
>>
>>
>>
>> On Thu, Oct 22, 2015 at 5:39 PM Arora, Sonali <sarora at fredhutch.org> wrote:
>>
>>
>> Hi Thomas,
>>
>> I get the following error when I try to obtain the feature types using
>> the function genFeatures()
>>
>>
>>> library(systemPipeR)
>>> library(GenomicFeatures)
>> Loading required package: AnnotationDbi
>>> txdb <- makeTxDbFromUCSC(genome = "hg19", tablename = "refGene")
>> Download the refGene table ... OK
>> Download the refLink table ... OK
>> Extract the 'transcripts' data frame ... OK
>> Extract the 'splicings' data frame ... OK
>> Download and preprocess the 'chrominfo' data frame ... OK
>> Prepare the 'metadata' data frame ... OK
>> Make the TxDb object ... OK
>> Warning message:
>> In .extractCdsLocsFromUCSCTxTable(ucsc_txtable, exon_locs) :
>> UCSC data anomaly in 359 transcript(s): the cds cumulative length is
>> not a multiple of 3 for transcripts 'NM_001037501' 'NM_001277444'
>> 'NM_001037675' 'NM_001271872' 'NM_001170637' 'NM_001300952'
>> 'NM_015326' 'NM_017940' 'NM_001271870' 'NM_001143962' 'NM_001305275'
>> 'NM_001146344' 'NM_001300891' 'NM_001010890' 'NM_001300891'
>> 'NM_001289974' 'NM_001291281' 'NM_001301371' 'NM_016178'
>> 'NM_001134939' 'NM_001080427' 'NM_001145710' 'NM_001291328'
>> 'NM_001271466' 'NM_001017915' 'NM_005541' 'NM_000348' 'NM_001145051'
>> 'NM_001135649' 'NM_001128929' 'NM_001080423' 'NM_001144382'
>> 'NM_001291661' 'NM_002958' 'NM_001005861' 'NM_004636' 'NM_001005914'
>> 'NM_001290060' 'NM_001290061' 'NM_001289930' 'NM_003715'
>> 'NM_001290049' 'NM_001286054' 'NM_001286053' 'NM_001286052'
>> 'NM_182524' 'NM_001075' 'NM_00 [... truncated]
>>> feat <- genFeatures(txdb, featuretype="all", reduce_ranges=TRUE,
>> upstream=1000,
>> + downstream=0, verbose=TRUE)
>> Error in NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) :
>> subscript contains NAs
>>
>>
>> probably because -
>>
>> Browse[2]> tx
>> GRanges object with 54439 ranges and 3 metadata columns:
>> seqnames ranges strand | tx_name
>> <Rle> <IRanges> <Rle> | <character>
>> [1] chr1 [11874, 14409] + | NR_046018
>> [2] chr1 [30366, 30503] + | NR_036051
>> [3] chr1 [30366, 30503] + | NR_036266
>> [4] chr1 [30366, 30503] + | NR_036267
>> [5] chr1 [30366, 30503] + | NR_036268
>> ... ... ... ... ... ...
>> [54435] chrUn_gl000228 [112605, 114676] + | NM_001306068
>> [54436] chrUn_gl000228 [ 29339, 32226] - | NM_001005217
>> [54437] chrUn_gl000228 [ 29339, 32226] - | NM_001286820
>> [54438] chrUn_gl000241 [ 14739, 36767] - | NR_132315
>> [54439] chrUn_gl000241 [ 16025, 36957] - | NR_132320
>> gene_id tx_type
>> <CharacterList> <character>
>> [1] 100287102 <NA>
>> [2] 100302278 <NA>
>> [3] 100422831 <NA>
>> [4] 100422834 <NA>
>> [5] 100422919 <NA>
>> ... ... ...
>> [54435] 100288687 <NA>
>> [54436] 448831 <NA>
>> [54437] 448831 <NA>
>> [54438] 100289097 <NA>
>> [54439] 102723780 <NA>
>> -------
>> seqinfo: 93 sequences (1 circular) from hg19 genome
>> Browse[2]> unique(mcols(tx)$tx_type)
>> [1] NA
>> debug: tmp <- tx[mcols(tx)$tx_type == tx_type[i]]
>> Browse[2]>
>> Error in NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) :
>> subscript contains NAs
>>
>>
>> Here is my sessionInfo
>>
>>> sessionInfo()
>> R Under development (unstable) (2015-10-15 r69519)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>> Running under: Ubuntu 14.04.2 LTS
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] parallel stats4 stats graphics grDevices utils datasets
>> [8] methods base
>>
>> other attached packages:
>> [1] GenomicFeatures_1.23.3 AnnotationDbi_1.33.0
>> [3] systemPipeR_1.5.1 RSQLite_1.0.0
>> [5] DBI_0.3.1 ShortRead_1.25.10
>> [7] GenomicAlignments_1.7.1 SummarizedExperiment_1.1.0
>> [9] Biobase_2.31.0 BiocParallel_1.5.0
>> [11] Rsamtools_1.23.0 Biostrings_2.39.0
>> [13] XVector_0.11.0 GenomicRanges_1.21.32
>> [15] GenomeInfoDb_1.7.1 IRanges_2.5.3
>> [17] S4Vectors_0.9.5 BiocGenerics_0.17.0
>>
>> loaded via a namespace (and not attached):
>> [1] Rcpp_0.12.1 lattice_0.20-33 GO.db_3.2.2
>> [4] digest_0.6.8 plyr_1.8.3 futile.options_1.0.0
>> [7] BatchJobs_1.6 ggplot2_1.0.1 zlibbioc_1.17.0
>> [10] annotate_1.49.0 Matrix_1.2-2 checkmate_1.6.2
>> [13] proto_0.3-10 GOstats_2.37.0 splines_3.3.0
>> [16] stringr_1.0.0 pheatmap_1.0.7 RCurl_1.95-4.7
>> [19] biomaRt_2.27.0 munsell_0.4.2 sendmailR_1.2-1
>> [22] rtracklayer_1.31.1 base64enc_0.1-3 BBmisc_1.9
>> [25] fail_1.3 edgeR_3.13.0 XML_3.98-1.3
>> [28] AnnotationForge_1.13.0 MASS_7.3-44 bitops_1.0-6
>> [31] grid_3.3.0 RBGL_1.47.0 xtable_1.7-4
>> [34] GSEABase_1.33.0 gtable_0.1.2 magrittr_1.5
>> [37] scales_0.3.0 graph_1.49.1 stringi_1.0-1
>> [40] hwriter_1.3.2 reshape2_1.4.1 genefilter_1.53.0
>> [43] limma_3.27.0 latticeExtra_0.6-26 futile.logger_1.4.1
>> [46] brew_1.0-6 rjson_0.2.15 lambda.r_1.1.7
>> [49] RColorBrewer_1.1-2 tools_3.3.0 Category_2.37.0
>> [52] survival_2.38-3 colorspace_1.2-6
>>
>>
>>
>>
>> --
>> Thanks and Regards,
>> Sonali
>>
>>
>>