[Bioc-devel] makeTxDbFromGFF drops genes which have multiple chromosome locations. (with iGenome GTF)
Martin Morgan
martin.morgan at roswellpark.org
Sun Nov 13 19:13:40 CET 2016
On 11/13/2016 01:07 PM, Marlin JL.M wrote:
> No, I did not get that warning. But only:
>
>> Import genomic features from the file as a GRanges object ... OK
>> Prepare the 'metadata' data frame ... OK
>> Make the TxDb object ... OK
>
>
> Further, I noticed that:
>
> 1. Those genes are not actually "dropped", they can be shown with
> `transcriptsBy(txdb,'gene')` but can not be shown with `genes(txdb)`.
>
> 2. The behavior is not associated with multiple chromosome locations,
> though they have a large intersection.
>
> Hence, I am not clear what is the actual factor causing this somehow
> weired behavior.
>
> As I am not using the development version of bioconductor, if anyone
> can help to reproduce it? I have uploaded a small dummy gtf file at
> https://gist.github.com/Marlin-Na/1eefedc4984e40b8ef76a0e1e7612dbb .
In the current release version
> packageVersion("GenomicFeatures")
[1] '1.26.0'
look up the help for ?genes and use the argument
single.strand.genes.only=FALSE
Martin
>
>
> This is what I get:
>
> $ wget https://gist.githubusercontent.com/Marlin-Na/1eefedc4984e40b8ef76a0e1e7612dbb/raw/9977b699a9fcdf70c8860de985ff550934c2c4a7/dummy.gtf
> $ R
>
>> library(GenomicFeatures)
>
>> txdb = makeTxDbFromGFF('dummy.gtf')
> # Import genomic features from the file as a GRanges object ... OK
> # Prepare the 'metadata' data frame ... OK
> # Make the TxDb object ... OK
>
>> genes(txdb)
> # GRanges object with 1 range and 1 metadata column:
> # seqnames ranges strand | gene_id
> # <Rle> <IRanges> <Rle> | <character>
> # TTTY5 chrY [24442945, 24445023] - | TTTY5
> # -------
> # seqinfo: 2 sequences from an unspecified genome; no seqlengths
>
>> transcriptsBy(txdb, 'gene')
> # GRangesList object of length 3:
> # $TTTY17A
> # GRanges object with 3 ranges and 2 metadata columns:
> # seqnames ranges strand | tx_id tx_name
> # <Rle> <IRanges> <Rle> | <integer> <character>
> # [1] chrY [24997731, 24998862] + | 3 NR_001526_1
> # [2] chrY [26631479, 26632610] + | 4 NR_001526_2
> # [3] chrY [27329790, 27330920] - | 6 NR_001526
> #
> # $TTTY5
> # GRanges object with 1 range and 2 metadata columns:
> # seqnames ranges strand | tx_id tx_name
> # [1] chrY [24442945, 24445023] - | 5 NR_001541
> #
> # $XGY2
> # GRanges object with 2 ranges and 2 metadata columns:
> # seqnames ranges strand | tx_id tx_name
> # [1] chrX [2670337, 2693037] + | 1 NR_003254
> # [2] chrY [2620337, 2643037] + | 2 NR_003254_1
> #
> # -------
> # seqinfo: 2 sequences from an unspecified genome; no seqlengths
>
> As you can see, 'XGY2' and 'TTTY17A' are not shown with `genes(txdb)`.
>
>
>
>> sessionInfo()
> # R version 3.3.2 (2016-10-31)
> # Platform: i686-pc-linux-gnu (32-bit)
> # Running under: Ubuntu 16.04.1 LTS
> #
> # ......
> #
> # attached base packages:
> # [1] stats4 parallel stats graphics grDevices utils datasets
> # [8] methods base
> #
> # other attached packages:
> # [1] GenomicFeatures_1.24.5 AnnotationDbi_1.34.4 Biobase_2.32.0
> # [4] GenomicRanges_1.24.3 GenomeInfoDb_1.8.3 IRanges_2.6.1
> # [7] S4Vectors_0.10.3 BiocGenerics_0.18.0
>
>
>
> On Sun, 2016-11-13 at 08:24 -0500, Vincent Carey wrote:
>> Did you not see a message like:
>>
>> Import genomic features from the file as a GRanges object ... OK
>> Prepare the 'metadata' data frame ... OK
>> Make the TxDb object ... OK
>> Warning message:
>> In makeTxDbFromGRanges(gr, metadata = metadata) :
>> The following transcripts were dropped because their exon ranks
>> could
>> not be inferred (either because the exons are not on the same
>> chromosome/strand or because they are not separated by introns):
>> NR_003254
>>
>> On Sun, Nov 13, 2016 at 5:25 AM, Marlin JL.M <marlin- at gmx.cn> wrote:
>>> Dear all,
>>>
>>>
>>> When trying to import the the GTF file downloaded from iGenome
>>> using
>>> makeTxDbFromGFF, I figured out that many genes are dropped
>>> silently,
>>> probably because those genes happens to have multiple chromosome
>>> locations in that GTF file.
>>>
>>> I have posted it at https://support.bioconductor.org/p/89401 with
>>> no
>>> reply yet. As I find it somehow a problem-causing bug, I decide to
>>> send
>>> the information here.
>>>
>>> Is there any suggested way to deal with the case?
>>>
>>>
>>> Best regards,
>>> Marlin
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
This email message may contain legally privileged and/or...{{dropped:2}}
More information about the Bioc-devel
mailing list