[BioC] rtracklayer gff import
Kathi Zarnack
zarnack at ebi.ac.uk
Thu Apr 14 14:16:08 CEST 2011
Hi,
I am using the package rtracklayer to import transcript.gtf files
produced by Cufflinks.
As I understand the gff3 specification, feature coordinates are given as
"start and end of the feature, in 1-based integer coordinates" (also
discussed in this mailing list lately), meaning that the line below from
my gtf file corresponds to an exons ranging from 1310534 to 1310771.
original line from the gtf file:
chr1 transcripts_C4 exon 1310534 1310771 78 - .
Parent=CUFF.1065.1
However, upon rtracklayer import, the exon ends at 1310770 (see below).
Thus, as I understand it, rtracklayer import.gff() interprets gtf as
"1-based right-open" (upon export using export.gff3(), it also becomes
1310771 again). I tried importing with explicitly specifying version="3"
and also updated to the latest rtracklayer version, but neither helped.
Is this a bug in the rtracklayer function or am I interpreting the gff
coordinates wrongly? Any comments will be appreciated.
Thanks for your help.
Best regards,
Kathi
> library(rtracklayer)
Loading required package: RCurl
Loading required package: bitops
>
gff=import.gff("/nfs/research2/luscombe/kathi/data/expression_data/hnRNPC_mRNAseq/cufflinks_0.9.3/cufflinks_C4/transcripts_C4.gtf",
+ genome="hg19",asRangedData=FALSE)
> gff[177]
GRanges with 1 range and 11 elementMetadata values
seqnames ranges strand | type source
phase
<Rle> <IRanges> <Rle> | <character> <character>
<character>
[1] chr1 [1310534, 1310770] - | exon
Cufflinks_C4 NA
conf_hi conf_lo cov FPKM frac ID
Parent
<numeric> <numeric> <numeric> <numeric> <numeric> <character>
<character>
[1] NA NA NA NA NA NA
CUFF.1065.1
score
<numeric>
[1] 78
seqlengths
chr1 chr10 chr11 chr12 chr13 chr14 ... chr7 chr8 chr9 chrM chrX chrY
NA NA NA NA NA NA ... NA NA NA NA NA NA
> export.gff3(gff[177],"test_export.gtf")
[zarnack at ebi-001 ~]$ more test_export.gtf
##gff-version 3
##date 2011-04-14
chr1 Cufflinks_C4 exon 1310534 1310771 78 - NA
Parent=CUFF.1065.1
> sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=C LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rtracklayer_1.10.6 RCurl_1.5-0 bitops_1.0-4.1
loaded via a namespace (and not attached):
[1] Biobase_2.10.0 Biostrings_2.18.0 BSgenome_1.18.1
[4] GenomicRanges_1.2.1 IRanges_1.8.9 tools_2.12.0
[7] XML_3.2-0
--
Dr. Kathi Zarnack
Luscombe Group
European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton, Cambridge
CB10 1SD, UK
tel +44 1223 494 526
More information about the Bioconductor
mailing list