[Bioc-devel] makeTxDbFromGFF drops genes which have multiple chromosome locations. (with iGenome GTF)
Marlin JL.M
marlin- at gmx.cn
Sun Nov 13 19:07:14 CET 2016
No, I did not get that warning. But only:
> Import genomic features from the file as a GRanges object ... OK
> Prepare the 'metadata' data frame ... OK
> Make the TxDb object ... OK
Further, I noticed that:
1. Those genes are not actually "dropped", they can be shown with
`transcriptsBy(txdb,'gene')` but can not be shown with `genes(txdb)`.
2. The behavior is not associated with multiple chromosome locations,
though they have a large intersection.
Hence, I am not clear what is the actual factor causing this somehow
weired behavior.
As I am not using the development version of bioconductor, if anyone
can help to reproduce it? I have uploaded a small dummy gtf file at
https://gist.github.com/Marlin-Na/1eefedc4984e40b8ef76a0e1e7612dbb .
This is what I get:
$ wget https://gist.githubusercontent.com/Marlin-Na/1eefedc4984e40b8ef76a0e1e7612dbb/raw/9977b699a9fcdf70c8860de985ff550934c2c4a7/dummy.gtf
$ R
> library(GenomicFeatures)
> txdb = makeTxDbFromGFF('dummy.gtf')
# Import genomic features from the file as a GRanges object ... OK
# Prepare the 'metadata' data frame ... OK
# Make the TxDb object ... OK
> genes(txdb)
# GRanges object with 1 range and 1 metadata column:
# seqnames ranges strand | gene_id
# <Rle> <IRanges> <Rle> | <character>
# TTTY5 chrY [24442945, 24445023] - | TTTY5
# -------
# seqinfo: 2 sequences from an unspecified genome; no seqlengths
> transcriptsBy(txdb, 'gene')
# GRangesList object of length 3:
# $TTTY17A
# GRanges object with 3 ranges and 2 metadata columns:
# seqnames ranges strand | tx_id tx_name
# <Rle> <IRanges> <Rle> | <integer> <character>
# [1] chrY [24997731, 24998862] + | 3 NR_001526_1
# [2] chrY [26631479, 26632610] + | 4 NR_001526_2
# [3] chrY [27329790, 27330920] - | 6 NR_001526
#
# $TTTY5
# GRanges object with 1 range and 2 metadata columns:
# seqnames ranges strand | tx_id tx_name
# [1] chrY [24442945, 24445023] - | 5 NR_001541
#
# $XGY2
# GRanges object with 2 ranges and 2 metadata columns:
# seqnames ranges strand | tx_id tx_name
# [1] chrX [2670337, 2693037] + | 1 NR_003254
# [2] chrY [2620337, 2643037] + | 2 NR_003254_1
#
# -------
# seqinfo: 2 sequences from an unspecified genome; no seqlengths
As you can see, 'XGY2' and 'TTTY17A' are not shown with `genes(txdb)`.
> sessionInfo()
# R version 3.3.2 (2016-10-31)
# Platform: i686-pc-linux-gnu (32-bit)
# Running under: Ubuntu 16.04.1 LTS
#
# ......
#
# attached base packages:
# [1] stats4 parallel stats graphics grDevices utils datasets
# [8] methods base
#
# other attached packages:
# [1] GenomicFeatures_1.24.5 AnnotationDbi_1.34.4 Biobase_2.32.0
# [4] GenomicRanges_1.24.3 GenomeInfoDb_1.8.3 IRanges_2.6.1
# [7] S4Vectors_0.10.3 BiocGenerics_0.18.0
On Sun, 2016-11-13 at 08:24 -0500, Vincent Carey wrote:
> Did you not see a message like:
>
> Import genomic features from the file as a GRanges object ... OK
> Prepare the 'metadata' data frame ... OK
> Make the TxDb object ... OK
> Warning message:
> In makeTxDbFromGRanges(gr, metadata = metadata) :
> The following transcripts were dropped because their exon ranks
> could
> not be inferred (either because the exons are not on the same
> chromosome/strand or because they are not separated by introns):
> NR_003254
>
> On Sun, Nov 13, 2016 at 5:25 AM, Marlin JL.M <marlin- at gmx.cn> wrote:
> > Dear all,
> >
> >
> > When trying to import the the GTF file downloaded from iGenome
> > using
> > makeTxDbFromGFF, I figured out that many genes are dropped
> > silently,
> > probably because those genes happens to have multiple chromosome
> > locations in that GTF file.
> >
> > I have posted it at https://support.bioconductor.org/p/89401 with
> > no
> > reply yet. As I find it somehow a problem-causing bug, I decide to
> > send
> > the information here.
> >
> > Is there any suggested way to deal with the case?
> >
> >
> > Best regards,
> > Marlin
> >
> > _______________________________________________
> > Bioc-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
More information about the Bioc-devel
mailing list