[BioC] some problems of easyRNASeqâ : about the gtf files
Nicolas Delhomme
delhomme at embl.de
Tue Mar 19 09:10:15 CET 2013
Hej Fuyan!
On 19 Mar 2013, at 05:27, Hu Fuyan [guest] wrote:
>
> I want to use easyRNASeq to get exon counts. But I found a strange thing:
>
> I have two human annotation files from different sources: one(Homo_sapiens.GRCh37.70.gtf.gz
> ) is from ensemble ftp (ftp://ftp.ensembl.org/pub/release-70/gtf/homo_sapiens); the other(genes.gtf ensembl) is from Illumina igenomes (http://tophat.cbcb.umd.edu/igenomes.html).
>
> The two annotation files are almost the same only with a small differentiation, such as the order of exons and attribute.
> When I run easyRNASeq, I used the two gtf files to check the result.
>
> I have got different results for SLC25A13 exons
>
>
This sounds strange, as I don't remember expecting any ordering. Thanks for the example files and the report, I'll check that.
> -- output of sessionInfo():
Can you paste the output? It's not in the file you sent off list either.
>
> Firstly,I got my bam file from tophat.
>
> When I used Homo_sapiens.GRCh37.70.gtf as my annotation file in easyRNASeq, I got the result:
>
>
>
> "\"ENSG00000004864\"_1" 2
>
>
>
> "\"ENSG00000004864\"_2" 4
>
>
>
> "\"ENSG00000004864\"_3" 16
>
>
>
> "\"ENSG00000004864\"_4" 3
>
>
>
> "\"ENSG00000004864\"_5" 7
>
>
>
> "\"ENSG00000004864\"_6" 8
>
>
>
> "\"ENSG00000004864\"_7" 5
>
>
>
> "\"ENSG00000004864\"_8" 4
>
>
>
> "\"ENSG00000004864\"_9" 4
>
>
>
> "\"ENSG00000004864\"_10" 1
>
>
>
> "\"ENSG00000004864\"_11" 6
>
>
>
> "\"ENSG00000004864\"_12" 4
>
>
>
> "\"ENSG00000004864\"_13" 4
>
>
>
> "\"ENSG00000004864\"_14" 6
>
>
>
> "\"ENSG00000004864\"_15" 8
>
>
>
> "\"ENSG00000004864\"_16" 5
>
>
>
> "\"ENSG00000004864\"_17" 3
>
>
>
> "\"ENSG00000004864\"_18" 25
>
>
>
> But when I used the gtf file from iIllumina igenomes, I got a wrong result (since we can view the bam form IGV):
>
>
> "\"ENSG00000004864\"_18" 25
>
> "\"ENSG00000004864\"_17" 13
>
> "\"ENSG00000004864\"_2" 11
>
> "\"ENSG00000004864\"_16" 3
>
> "\"ENSG00000004864\"_1" 8
>
> "\"ENSG00000004864\"_15" 5
>
> "\"ENSG00000004864\"_14" 8
>
> "\"ENSG00000004864\"_6" 6
>
> "\"ENSG00000004864\"_13" 6
>
> "\"ENSG00000004864\"_5" 0
>
> "\"ENSG00000004864\"_3" 4
>
> "\"ENSG00000004864\"_4" 4
>
> "\"ENSG00000004864\"_12" 4
>
> "\"ENSG00000004864\"_11" 4
>
> "\"ENSG00000004864\"_10" 6
>
> "\"ENSG00000004864\"_9" 1
>
> "\"ENSG00000004864\"_8" 4
>
> "\"ENSG00000004864\"_7" 4
>
>
>
> I am so confused about the different result.
>
> Here are my main program using easyRNASeq:
>
>
>
>
> count_gene_gtf_ensembl.table <- easyRNASeq(filesDirectory=getwd(),
> filenames="accepted_hits.sorted.bam",
> organism="Hsapiens",
> chr.sizes="auto",
> annotationMethod="gtf",
> annotationFile="/x400ifs-accel/ntteam/hufuyan/humanindex/Ensembl/ussd-ftp.illumina.com/Homo_sapiens/Ensembl/GRCh37/Homo_sapiens/Ensembl/GRCh37/Annotation/Archives/archive-2012-03-09-04-49-46/Genes/genes.gtf",
> format="bam",
> gapped=TRUE,
> count="exon")
>
>
>
> When I changed the order of exons of gene SLC25A13 in genes.gtf (illumina) according to Homo_sapiens.GRCh37.70.gtf., I run easyRNASeq again. Then I got the right exon counts.
>
>
>
> Another problem is that I got the warning:" You enforce UCSC chromosome conventions, however the provided annotation is not compliant. Correcting it." When I used the gtf files from UCSC, I also got this warning.
> How can I fix it?
>
You would need to change the chromosome names, i.e. prepend the "chr" prefix as to follow the UCSC convention (e.g. 7 to chr7) and convert the mitochondrion name to chrM in both your alignment and your annotation file (BAM and GTF). Anyway, this is just a warning to draw your attention on the essential point that both these files need to have a common chromosome naming. I'm handling this differently in the next release of easyRNASeq and do not enforce the UCSC conventions anymore. So in your current case, you can ignore that warning.
Cheers,
Nico
>
> --
> Sent via the guest posting facility at bioconductor.org.
More information about the Bioconductor
mailing list