[Bioc-devel] systemPipeR error - Error in NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) :

Hervé Pagès hpages at fredhutch.org
Mon Oct 26 18:38:31 CET 2015


Hi Thomas,

On 10/25/2015 01:06 PM, Thomas Girke wrote:
> I fixed this in systemPipeR versions 1.4.3/1.5.3. The reason for this error
> was that the tx_type column contains only NA values when a txdb is generated with
> makeTxDbFromUCSC(). Returning here something more meaningful may be useful,
> such as the transcript type information available when a txdb is generated
> from a GFF.

We've considered this and might do it at some point. The difficulty
though is that UCSC does not provide this information as part of
the track itself so we'll have to go grab it from some other table
in their huge db through many joins. In the mean time, I'll try to
clarify this in the documentation.

H.

>
> Thanks,
>
> Thomas
>
> On Fri, Oct 23, 2015 at 12:49:09AM +0000, Thomas Girke wrote:
>> Thanks. Good to know. I have never tried this with an txdb instance
>> from makeTxDbFromUCSC(). Will fix this over the weekend.
>> Thomas
>>
>>
>>
>> On Thu, Oct 22, 2015 at 5:39 PM Arora, Sonali <sarora at fredhutch.org> wrote:
>>
>>
>> Hi Thomas,
>>
>> I get the following error when I try to obtain the feature types using
>> the function genFeatures()
>>
>>
>>> library(systemPipeR)
>>> library(GenomicFeatures)
>> Loading required package: AnnotationDbi
>>> txdb <- makeTxDbFromUCSC(genome = "hg19", tablename = "refGene")
>> Download the refGene table ... OK
>> Download the refLink table ... OK
>> Extract the 'transcripts' data frame ... OK
>> Extract the 'splicings' data frame ... OK
>> Download and preprocess the 'chrominfo' data frame ... OK
>> Prepare the 'metadata' data frame ... OK
>> Make the TxDb object ... OK
>> Warning message:
>> In .extractCdsLocsFromUCSCTxTable(ucsc_txtable, exon_locs) :
>> UCSC data anomaly in 359 transcript(s): the cds cumulative length is
>> not a multiple of 3 for transcripts 'NM_001037501' 'NM_001277444'
>> 'NM_001037675' 'NM_001271872' 'NM_001170637' 'NM_001300952'
>> 'NM_015326' 'NM_017940' 'NM_001271870' 'NM_001143962' 'NM_001305275'
>> 'NM_001146344' 'NM_001300891' 'NM_001010890' 'NM_001300891'
>> 'NM_001289974' 'NM_001291281' 'NM_001301371' 'NM_016178'
>> 'NM_001134939' 'NM_001080427' 'NM_001145710' 'NM_001291328'
>> 'NM_001271466' 'NM_001017915' 'NM_005541' 'NM_000348' 'NM_001145051'
>> 'NM_001135649' 'NM_001128929' 'NM_001080423' 'NM_001144382'
>> 'NM_001291661' 'NM_002958' 'NM_001005861' 'NM_004636' 'NM_001005914'
>> 'NM_001290060' 'NM_001290061' 'NM_001289930' 'NM_003715'
>> 'NM_001290049' 'NM_001286054' 'NM_001286053' 'NM_001286052'
>> 'NM_182524' 'NM_001075' 'NM_00 [... truncated]
>>> feat <- genFeatures(txdb, featuretype="all", reduce_ranges=TRUE,
>> upstream=1000,
>> + downstream=0, verbose=TRUE)
>> Error in NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) :
>> subscript contains NAs
>>
>>
>> probably because -
>>
>> Browse[2]> tx
>> GRanges object with 54439 ranges and 3 metadata columns:
>> seqnames ranges strand | tx_name
>> <Rle> <IRanges> <Rle> | <character>
>> [1] chr1 [11874, 14409] + | NR_046018
>> [2] chr1 [30366, 30503] + | NR_036051
>> [3] chr1 [30366, 30503] + | NR_036266
>> [4] chr1 [30366, 30503] + | NR_036267
>> [5] chr1 [30366, 30503] + | NR_036268
>> ... ... ... ... ... ...
>> [54435] chrUn_gl000228 [112605, 114676] + | NM_001306068
>> [54436] chrUn_gl000228 [ 29339, 32226] - | NM_001005217
>> [54437] chrUn_gl000228 [ 29339, 32226] - | NM_001286820
>> [54438] chrUn_gl000241 [ 14739, 36767] - | NR_132315
>> [54439] chrUn_gl000241 [ 16025, 36957] - | NR_132320
>> gene_id tx_type
>> <CharacterList> <character>
>> [1] 100287102 <NA>
>> [2] 100302278 <NA>
>> [3] 100422831 <NA>
>> [4] 100422834 <NA>
>> [5] 100422919 <NA>
>> ... ... ...
>> [54435] 100288687 <NA>
>> [54436] 448831 <NA>
>> [54437] 448831 <NA>
>> [54438] 100289097 <NA>
>> [54439] 102723780 <NA>
>> -------
>> seqinfo: 93 sequences (1 circular) from hg19 genome
>> Browse[2]> unique(mcols(tx)$tx_type)
>> [1] NA
>> debug: tmp <- tx[mcols(tx)$tx_type == tx_type[i]]
>> Browse[2]>
>> Error in NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) :
>> subscript contains NAs
>>
>>
>> Here is my sessionInfo
>>
>>> sessionInfo()
>> R Under development (unstable) (2015-10-15 r69519)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>> Running under: Ubuntu 14.04.2 LTS
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] parallel stats4 stats graphics grDevices utils datasets
>> [8] methods base
>>
>> other attached packages:
>> [1] GenomicFeatures_1.23.3 AnnotationDbi_1.33.0
>> [3] systemPipeR_1.5.1 RSQLite_1.0.0
>> [5] DBI_0.3.1 ShortRead_1.25.10
>> [7] GenomicAlignments_1.7.1 SummarizedExperiment_1.1.0
>> [9] Biobase_2.31.0 BiocParallel_1.5.0
>> [11] Rsamtools_1.23.0 Biostrings_2.39.0
>> [13] XVector_0.11.0 GenomicRanges_1.21.32
>> [15] GenomeInfoDb_1.7.1 IRanges_2.5.3
>> [17] S4Vectors_0.9.5 BiocGenerics_0.17.0
>>
>> loaded via a namespace (and not attached):
>> [1] Rcpp_0.12.1 lattice_0.20-33 GO.db_3.2.2
>> [4] digest_0.6.8 plyr_1.8.3 futile.options_1.0.0
>> [7] BatchJobs_1.6 ggplot2_1.0.1 zlibbioc_1.17.0
>> [10] annotate_1.49.0 Matrix_1.2-2 checkmate_1.6.2
>> [13] proto_0.3-10 GOstats_2.37.0 splines_3.3.0
>> [16] stringr_1.0.0 pheatmap_1.0.7 RCurl_1.95-4.7
>> [19] biomaRt_2.27.0 munsell_0.4.2 sendmailR_1.2-1
>> [22] rtracklayer_1.31.1 base64enc_0.1-3 BBmisc_1.9
>> [25] fail_1.3 edgeR_3.13.0 XML_3.98-1.3
>> [28] AnnotationForge_1.13.0 MASS_7.3-44 bitops_1.0-6
>> [31] grid_3.3.0 RBGL_1.47.0 xtable_1.7-4
>> [34] GSEABase_1.33.0 gtable_0.1.2 magrittr_1.5
>> [37] scales_0.3.0 graph_1.49.1 stringi_1.0-1
>> [40] hwriter_1.3.2 reshape2_1.4.1 genefilter_1.53.0
>> [43] limma_3.27.0 latticeExtra_0.6-26 futile.logger_1.4.1
>> [46] brew_1.0-6 rjson_0.2.15 lambda.r_1.1.7
>> [49] RColorBrewer_1.1-2 tools_3.3.0 Category_2.37.0
>> [52] survival_2.38-3 colorspace_1.2-6
>>
>>
>>
>>
>> --
>> Thanks and Regards,
>> Sonali
>>
>>
>>
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list