some problems of easyRNASeqâ : about the gtf files
Hu Fuyan [guest]
guest at bioconductor.org
Tue Mar 19 05:27:52 CET 2013
I want to use easyRNASeq to get exon counts. But I found a strange thing:
I have two human annotation files from different sources: one(Homo_sapiens.GRCh37.70.gtf.gz
) is from ensemble ftp (ftp://ftp.ensembl.org/pub/release-70/gtf/homo_sapiens); the other(genes.gtf ensembl) is from Illumina igenomes (http://tophat.cbcb.umd.edu/igenomes.html).
The two annotation files are almost the same only with a small differentiation, such as the order of exons and attribute.
When I run easyRNASeq, I used the two gtf files to check the result.
I have got different results for SLC25A13 exons
-- output of sessionInfo():
Firstly,I got my bam file from tophat.
When I used Homo_sapiens.GRCh37.70.gtf as my annotation file in easyRNASeq, I got the result:
"\"ENSG00000004864\"_1" 2
"\"ENSG00000004864\"_2" 4
"\"ENSG00000004864\"_3" 16
"\"ENSG00000004864\"_4" 3
"\"ENSG00000004864\"_5" 7
"\"ENSG00000004864\"_6" 8
"\"ENSG00000004864\"_7" 5
"\"ENSG00000004864\"_8" 4
"\"ENSG00000004864\"_9" 4
"\"ENSG00000004864\"_10" 1
"\"ENSG00000004864\"_11" 6
"\"ENSG00000004864\"_12" 4
"\"ENSG00000004864\"_13" 4
"\"ENSG00000004864\"_14" 6
"\"ENSG00000004864\"_15" 8
"\"ENSG00000004864\"_16" 5
"\"ENSG00000004864\"_17" 3
"\"ENSG00000004864\"_18" 25
But when I used the gtf file from iIllumina igenomes, I got a wrong result (since we can view the bam form IGV):
"\"ENSG00000004864\"_18" 25
"\"ENSG00000004864\"_17" 13
"\"ENSG00000004864\"_2" 11
"\"ENSG00000004864\"_16" 3
"\"ENSG00000004864\"_1" 8
"\"ENSG00000004864\"_15" 5
"\"ENSG00000004864\"_14" 8
"\"ENSG00000004864\"_6" 6
"\"ENSG00000004864\"_13" 6
"\"ENSG00000004864\"_5" 0
"\"ENSG00000004864\"_3" 4
"\"ENSG00000004864\"_4" 4
"\"ENSG00000004864\"_12" 4
"\"ENSG00000004864\"_11" 4
"\"ENSG00000004864\"_10" 6
"\"ENSG00000004864\"_9" 1
"\"ENSG00000004864\"_8" 4
"\"ENSG00000004864\"_7" 4
I am so confused about the different result.
Here are my main program using easyRNASeq:
count_gene_gtf_ensembl.table <- easyRNASeq(filesDirectory=getwd(),
filenames="accepted_hits.sorted.bam",
organism="Hsapiens",
chr.sizes="auto",
annotationMethod="gtf",
annotationFile="/x400ifs-accel/ntteam/hufuyan/humanindex/Ensembl/ussd-ftp.illumina.com/Homo_sapiens/Ensembl/GRCh37/Homo_sapiens/Ensembl/GRCh37/Annotation/Archives/archive-2012-03-09-04-49-46/Genes/genes.gtf",
format="bam",
gapped=TRUE,
count="exon")
When I changed the order of exons of gene SLC25A13 in genes.gtf (illumina) according to Homo_sapiens.GRCh37.70.gtf., I run easyRNASeq again. Then I got the right exon counts.
Another problem is that I got the warning:" You enforce UCSC chromosome conventions, however the provided annotation is not compliant. Correcting it." When I used the gtf files from UCSC, I also got this warning.
How can I fix it?
--
Sent via the guest posting facility at bioconductor.org.
More information about the Bioconductor
mailing list