[BioC] ChIPpeakAnno, getAnnotation question
Zhu, Lihua (Julie)
Julie.Zhu at umassmed.edu
Mon Aug 29 16:56:47 CEST 2011
Dear Daria,
By default, getAnnotation assumes featureType TSS. Currently, the parameter
featureType accepts one of the feature types (case sensitive):
"TSS","miRNA", "Exon", "5utr", "3utr" or "ExonPlusUtr". For example, 5utr
for 5 UTR.
You were right that with parameter featureType set to TSS, getAnnotation
returns the gene coordinates. If you think it is useful to have transcript
coordinates, I will be happy to add featureType transcript. Thanks!
Best regards,
Julie
On 8/25/11 4:00 PM, "Daria Goranskaya" <daria.goranskaya at gmail.com> wrote:
> Dear Julie:
>
> I'm PhD student in bioinformatics in Karolinska Institutet, Stockholm.
> I've been using ChIPpeakAnno for my data and I found something strange
> with getting annotation using getAnnotation function. Could you take
> a look on the following?
>
>> library("biomaRt")
>> library("ChIPpeakAnno")
>> mart<-useMart(biomart="ensembl",dataset="hsapiens_gene_ensembl")
>> EnsemblAnnotation<-as.data.frame(getAnnotation(mart,
>> featureType=c("TSS","miRNA", "Exon", "5utr", "3utr", "ExonPlusUtr")))
>> EnsemblTSS<-as.data.frame(getAnnotation(mart, featureType=c("TSS")))
>
> When I tried to get TSS , I got not transcripts, but genes. And the
> last two commands gave the same results! That's strange, because in
> the first command there should be plenty of other features except TSS.
> Also I got an error, when asking for 5UTR.
>
> How should I use this function to get necessary annotation features?
> hank you in advance!
>
> Best regards,
> Daria
>
>
> P.S. Here is the whole R history:
>
>> library("biomaRt")
>> library("ChIPpeakAnno")
>> mart<-useMart(biomart="ensembl",dataset="hsapiens_gene_ensembl")
>> EnsemblAnnotation<-as.data.frame(getAnnotation(mart,
>> featureType=c("TSS","miRNA", "Exon", "5utr", "3utr", "ExonPlusUtr")))
>> EnsemblTSS<-as.data.frame(getAnnotation(mart, featureType=c("TSS")))
>> Ensembl5utr<-as.data.frame(getAnnotation(mart, featureType=c("5utr")))
> Warnings:
> 1: In getAnnotation(mart, featureType = c("5utr")) :
> Following duplicated IDs found, only one of entries of the
> duplicated id will be returned!
> 2: In getAnnotation(mart, featureType = c("5utr")) :
>
> ENST00000400678ENST00000400776ENST00000400776ENST00000546775ENST00000550740ENS
> T00000546775ENST00000550740ENST00000546775ENST00000451927ENST00000400890ENST00
> 000550740ENST00000552764ENST00000546832ENST00000549120ENST00000550740ENST00000
> 546775ENST00000552764ENST00000546736ENST00000346061ENST00000447903ENST00000272
> 035ENST00000413237ENST00000447903ENST00000346061ENST00000418749ENST00000400681
> ENST00000418749ENST00000346061ENST00000400840ENST00000262316ENST00000420545ENS
> T00000450643ENST00000454039ENST00000338527ENST00000219431ENST00000436333ENST00
> 000397817ENST00000551377ENST00000368372ENST00000314367ENST00000431099ENST00000
> 382389ENST00000399951ENST00000323434ENST00000331302ENST00000399951ENST00000551
> 377ENST00000456528ENST00000521270ENST00000521145ENST00000523418ENST00000308811
> ENST00000523162ENST00000522866ENST00000518414ENST00000521270ENST00000320552ENS
> T00000398612ENST00000325113ENST00000525282ENST00000540150ENST00000342593ENST00
> 000399012ENST00000445062ENST00000429181ENST00000399012ENST000004
> [... truncated]
>> head(EnsemblAnnotation)
> space start end width names strand
> 1 1 11869 14412 2544 ENSG00000223972 1
> 2 1 14363 29806 15444 ENSG00000227232 -1
> 3 1 29554 31109 1556 ENSG00000243485 1
> 4 1 30366 30503 138 ENSG00000221311 1
> 5 1 34554 36081 1528 ENSG00000237613 -1
> 6 1 62948 63887 940 ENSG00000240361 1
>
> description
> 1 DEAD/H (Asp-Glu-Ala-Asp/His) box polypeptide 11 like 1
> [Source:HGNC Symbol;Acc:37102]
> 2 WAS protein family homolog 7 pseudogene
> [Source:HGNC Symbol;Acc:38034]
> 3 microRNA 1302-10
> [Source:HGNC Symbol;Acc:38233]
> 4 microRNA 1302-10
> [Source:HGNC Symbol;Acc:38233]
> 5 family with sequence similarity 138, member A
> [Source:HGNC Symbol;Acc:32334]
> 6 olfactory receptor, family 4, subfamily G, member 11 pseudogene
> [Source:HGNC Symbol;Acc:31276]
>> head(EnsemblTSS)
> space start end width names strand
> 1 1 11869 14412 2544 ENSG00000223972 1
> 2 1 14363 29806 15444 ENSG00000227232 -1
> 3 1 29554 31109 1556 ENSG00000243485 1
> 4 1 30366 30503 138 ENSG00000221311 1
> 5 1 34554 36081 1528 ENSG00000237613 -1
> 6 1 62948 63887 940 ENSG00000240361 1
>
> description
> 1 DEAD/H (Asp-Glu-Ala-Asp/His) box polypeptide 11 like 1
> [Source:HGNC Symbol;Acc:37102]
> 2 WAS protein family homolog 7 pseudogene
> [Source:HGNC Symbol;Acc:38034]
> 3 microRNA 1302-10
> [Source:HGNC Symbol;Acc:38233]
> 4 microRNA 1302-10
> [Source:HGNC Symbol;Acc:38233]
> 5 family with sequence similarity 138, member A
> [Source:HGNC Symbol;Acc:32334]
> 6 olfactory receptor, family 4, subfamily G, member 11 pseudogene
> [Source:HGNC Symbol;Acc:31276]
>> head(Ensembl5utr)
> space start end width names strand
> 1 1 35737 36081 345 ENST00000417324 -1
> 2 1 367640 367658 19 ENST00000426406 1
> 3 1 622035 622053 19 ENST00000332831 -1
> 4 1 721320 721405 86 ENST00000358533 1
> 5 1 860260 860328 69 ENST00000420190 1
> 6 1 860530 860569 40 ENST00000437963 1
>
> description
> 1 family with sequence similarity 138, member A [Source:HGNC
> Symbol;Acc:32334]
> 2 olfactory receptor, family 4, subfamily F, member 29 [Source:HGNC
> Symbol;Acc:31275]
> 3 olfactory receptor, family 4, subfamily F, member 16 [Source:HGNC
> Symbol;Acc:15079]
> 4 Transmembrane protein FLJ78588
> [Source:UniProtKB/Swiss-Prot;Acc:A6NHI5]
> 5 sterile alpha motif domain containing 11 [Source:HGNC
> Symbol;Acc:28706]
> 6 sterile alpha motif domain containing 11 [Source:HGNC
> Symbol;Acc:28706]
>
>
>
>
More information about the Bioconductor
mailing list