[BioC] ChIPpeakAnno, getAnnotation question
Zhu, Lihua (Julie)
Julie.Zhu at umassmed.edu
Mon Aug 29 18:46:21 CEST 2011
Daria,
The warnings you experienced with 5utr has been fixed and transcript has
been added as an option for featureType. Please download the 2.0.2 version.
Thanks for your input!
Best regards,
Julie
On 8/29/11 10:56 AM, "Julie Zhu" <julie.zhu at umassmed.edu> wrote:
> Dear Daria,
>
> By default, getAnnotation assumes featureType TSS. Currently, the parameter
> featureType accepts one of the feature types (case sensitive):
> "TSS","miRNA", "Exon", "5utr", "3utr" or "ExonPlusUtr". For example, 5utr
> for 5 UTR.
>
> You were right that with parameter featureType set to TSS, getAnnotation
> returns the gene coordinates. If you think it is useful to have transcript
> coordinates, I will be happy to add featureType transcript. Thanks!
>
> Best regards,
>
> Julie
>
>
>
>
>
> On 8/25/11 4:00 PM, "Daria Goranskaya" <daria.goranskaya at gmail.com> wrote:
>
>> Dear Julie:
>>
>> I'm PhD student in bioinformatics in Karolinska Institutet, Stockholm.
>> I've been using ChIPpeakAnno for my data and I found something strange
>> with getting annotation using getAnnotation function. Could you take
>> a look on the following?
>>
>>> library("biomaRt")
>>> library("ChIPpeakAnno")
>>> mart<-useMart(biomart="ensembl",dataset="hsapiens_gene_ensembl")
>>> EnsemblAnnotation<-as.data.frame(getAnnotation(mart,
>>> featureType=c("TSS","miRNA", "Exon", "5utr", "3utr", "ExonPlusUtr")))
>>> EnsemblTSS<-as.data.frame(getAnnotation(mart, featureType=c("TSS")))
>>
>> When I tried to get TSS , I got not transcripts, but genes. And the
>> last two commands gave the same results! That's strange, because in
>> the first command there should be plenty of other features except TSS.
>> Also I got an error, when asking for 5UTR.
>>
>> How should I use this function to get necessary annotation features?
>> hank you in advance!
>>
>> Best regards,
>> Daria
>>
>>
>> P.S. Here is the whole R history:
>>
>>> library("biomaRt")
>>> library("ChIPpeakAnno")
>>> mart<-useMart(biomart="ensembl",dataset="hsapiens_gene_ensembl")
>>> EnsemblAnnotation<-as.data.frame(getAnnotation(mart,
>>> featureType=c("TSS","miRNA", "Exon", "5utr", "3utr", "ExonPlusUtr")))
>>> EnsemblTSS<-as.data.frame(getAnnotation(mart, featureType=c("TSS")))
>>> Ensembl5utr<-as.data.frame(getAnnotation(mart, featureType=c("5utr")))
>> Warnings:
>> 1: In getAnnotation(mart, featureType = c("5utr")) :
>> Following duplicated IDs found, only one of entries of the
>> duplicated id will be returned!
>> 2: In getAnnotation(mart, featureType = c("5utr")) :
>>
>>
ENST00000400678ENST00000400776ENST00000400776ENST00000546775ENST00000550740EN>>
S
>>
T00000546775ENST00000550740ENST00000546775ENST00000451927ENST00000400890ENST0>>
0
>>
000550740ENST00000552764ENST00000546832ENST00000549120ENST00000550740ENST0000>>
0
>>
546775ENST00000552764ENST00000546736ENST00000346061ENST00000447903ENST0000027>>
2
>>
035ENST00000413237ENST00000447903ENST00000346061ENST00000418749ENST0000040068>>
1
>>
ENST00000418749ENST00000346061ENST00000400840ENST00000262316ENST00000420545EN>>
S
>>
T00000450643ENST00000454039ENST00000338527ENST00000219431ENST00000436333ENST0>>
0
>>
000397817ENST00000551377ENST00000368372ENST00000314367ENST00000431099ENST0000>>
0
>>
382389ENST00000399951ENST00000323434ENST00000331302ENST00000399951ENST0000055>>
1
>>
377ENST00000456528ENST00000521270ENST00000521145ENST00000523418ENST0000030881>>
1
>>
ENST00000523162ENST00000522866ENST00000518414ENST00000521270ENST00000320552EN>>
S
>>
T00000398612ENST00000325113ENST00000525282ENST00000540150ENST00000342593ENST0>>
0
>> 000399012ENST00000445062ENST00000429181ENST00000399012ENST000004
>> [... truncated]
>>> head(EnsemblAnnotation)
>> space start end width names strand
>> 1 1 11869 14412 2544 ENSG00000223972 1
>> 2 1 14363 29806 15444 ENSG00000227232 -1
>> 3 1 29554 31109 1556 ENSG00000243485 1
>> 4 1 30366 30503 138 ENSG00000221311 1
>> 5 1 34554 36081 1528 ENSG00000237613 -1
>> 6 1 62948 63887 940 ENSG00000240361 1
>>
>> description
>> 1 DEAD/H (Asp-Glu-Ala-Asp/His) box polypeptide 11 like 1
>> [Source:HGNC Symbol;Acc:37102]
>> 2 WAS protein family homolog 7 pseudogene
>> [Source:HGNC Symbol;Acc:38034]
>> 3 microRNA 1302-10
>> [Source:HGNC Symbol;Acc:38233]
>> 4 microRNA 1302-10
>> [Source:HGNC Symbol;Acc:38233]
>> 5 family with sequence similarity 138, member A
>> [Source:HGNC Symbol;Acc:32334]
>> 6 olfactory receptor, family 4, subfamily G, member 11 pseudogene
>> [Source:HGNC Symbol;Acc:31276]
>>> head(EnsemblTSS)
>> space start end width names strand
>> 1 1 11869 14412 2544 ENSG00000223972 1
>> 2 1 14363 29806 15444 ENSG00000227232 -1
>> 3 1 29554 31109 1556 ENSG00000243485 1
>> 4 1 30366 30503 138 ENSG00000221311 1
>> 5 1 34554 36081 1528 ENSG00000237613 -1
>> 6 1 62948 63887 940 ENSG00000240361 1
>>
>> description
>> 1 DEAD/H (Asp-Glu-Ala-Asp/His) box polypeptide 11 like 1
>> [Source:HGNC Symbol;Acc:37102]
>> 2 WAS protein family homolog 7 pseudogene
>> [Source:HGNC Symbol;Acc:38034]
>> 3 microRNA 1302-10
>> [Source:HGNC Symbol;Acc:38233]
>> 4 microRNA 1302-10
>> [Source:HGNC Symbol;Acc:38233]
>> 5 family with sequence similarity 138, member A
>> [Source:HGNC Symbol;Acc:32334]
>> 6 olfactory receptor, family 4, subfamily G, member 11 pseudogene
>> [Source:HGNC Symbol;Acc:31276]
>>> head(Ensembl5utr)
>> space start end width names strand
>> 1 1 35737 36081 345 ENST00000417324 -1
>> 2 1 367640 367658 19 ENST00000426406 1
>> 3 1 622035 622053 19 ENST00000332831 -1
>> 4 1 721320 721405 86 ENST00000358533 1
>> 5 1 860260 860328 69 ENST00000420190 1
>> 6 1 860530 860569 40 ENST00000437963 1
>>
>> description
>> 1 family with sequence similarity 138, member A [Source:HGNC
>> Symbol;Acc:32334]
>> 2 olfactory receptor, family 4, subfamily F, member 29 [Source:HGNC
>> Symbol;Acc:31275]
>> 3 olfactory receptor, family 4, subfamily F, member 16 [Source:HGNC
>> Symbol;Acc:15079]
>> 4 Transmembrane protein FLJ78588
>> [Source:UniProtKB/Swiss-Prot;Acc:A6NHI5]
>> 5 sterile alpha motif domain containing 11 [Source:HGNC
>> Symbol;Acc:28706]
>> 6 sterile alpha motif domain containing 11 [Source:HGNC
>> Symbol;Acc:28706]
>>
>>
>>
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list