[Bioc-sig-seq] ChIPpeakAnno, BioMart, getAnnotation 'Exon' error message
Wolfgang Huber
whuber at embl.de
Wed Mar 17 15:48:25 CET 2010
Julie
why do you say that "the database contains errors" ? I had a look at
http://gbrowse.arabidopsis.org/cgi-bin/gbrowse/arabidopsis/?name=AT1G68552.1
and while this is perhaps a complex locus whose expression we have not
yet fully understood, or not yet properly formalised into the database's
ontology of genomic features and gene products, I am not sure "error" is
the right term for that.
Arabidopsis people might have more insight on that.
Wolfgang
Zhu, Julie scripsit 16/03/10 22:56:
> Hi,
>
> I obtained the exon sequences and here are the duplicate exon IDs with different descriptions.
>
> TSS[duplicated(TSS[,1]), 1]
> [1] "AT1G68552.1-E12203" "AT1G64140.1-E14755" "AT1G64140.1-E14756" "AT1G70780.1-E4116"
> [5] "AT1G75390.1-E22428" "AT1G06149.1-E1988" "AT1G36730.1-E35050" "AT1G36730.1-E35051"
> [9] "AT1G29952.1-E5728" "AT1G29952.1-E5730" "AT1G29952.1-E5732" "AT1G29970.2-E8863"
> [13] "AT1G29970.2-E8864" "AT1G64628.1-E10574" "AT1G25470.1-E20679" "AT1G58120.1-E18468"
> [17] "AT1G29041.1-E15117" "AT1G23149.1-E13728" "AT1G29952.1-E5728" "AT1G29952.1-E5732"
> [21] "AT2G18162.1-E49029" "AT3G51632.1-E98183" "AT3G22970.1-E89708" "AT3G45240.2-E86808"
> [25] "AT3G18000.1-E98438" "AT3G59052.1-E77046" "AT3G62422.1-E76351" "AT3G25570.1-E88575"
> [29] "AT3G25570.1-E88576" "AT3G10910.1-E77164" "AT3G02468.1-E88931" "AT3G12010.1-E78704"
> [33] "AT3G01470.1-E92685" "AT3G53402.1-E93478" "AT3G26430.1-E85151" "AT3G26430.1-E85154"
> [37] "AT4G19110.1-E121565" "AT4G22592.1-E113550" "AT4G22592.1-E113551" "AT4G22592.1-E113552"
> [41] "AT4G12430.1-E113931" "AT4G12430.1-E113932" "AT4G12430.1-E113933" "AT4G25670.1-E111076"
> [45] "AT4G25670.1-E111077" "AT4G36990.1-E122859" "AT4G14620.1-E120308" "AT4G34590.1-E116802"
> [49] "AT5G09460.1-E136355" "AT5G09460.1-E136357" "AT5G50010.1-E151574" "AT5G50010.1-E151576"
> [53] "AT5G50010.1-E151574" "AT5G50011.1-E153108" "AT5G50011.1-E153110" "AT5G09460.1-E136355"
> [57] "AT5G09463.1-E151757" "AT5G09463.1-E151758" "AT5G52552.1-E136887" "AT5G52552.1-E136888"
> [61] "AT5G41992.1-E154552" "AT5G64341.1-E144370" "AT5G64341.1-E144371" "AT5G64341.1-E144373"
> [65] "AT5G64341.1-E144370" "AT5G64341.1-E144371" "AT5G64343.1-E148873" "AT5G64341.1-E144373"
> [69] "AT5G09460.1-E136355" "AT5G09463.1-E151757" "AT5G09460.1-E136357" "AT5G09463.1-E151758"
> [73] "AT5G49448.1-E171824" "AT5G05282.1-E152619" "AT5G53588.1-E159453" "AT5G09670.2-E157563"
> [77] "AT5G01710.1-E140929" "AT5G64341.1-E144370" "AT5G64343.1-E148873" "AT5G61230.1-E153842"
> [81] "AT5G61230.1-E153843" "AT5G60550.1-E140873" "AT5G64552.1-E148753" "AT5G64552.1-E148754"
> [85] "AT5G45430.1-E151338"
>
> For example,
>
> TSS[TSS[,1]=="AT1G68552.1-E12203",]
> ensembl_exon_id chromosome_name exon_chrom_start exon_chrom_end strand
> 3125 AT1G68552.1-E12203 1 25727627 25727701 -1
> 15537 AT1G68552.1-E12203 1 25727627 25727701 -1
> description
> 3125 CPuORF53 (Conserved peptide upstream open reading frame 53); Upstream open reading frames (uORFs) are small open reading frames found in the 5' UTR of a mature mRNA, and can potentially mediate translational regulation of the largest, or major, ORF (mORF). CPuORF53 represents a conserved upstream opening reading frame relative to major ORF AT1G68550.1
> 15537 AP2 domain-containing transcription factor, putative; encodes a member of the ERF (ethylene response factor) subfamily B-6 of ERF/AP2 transcription factor family. The protein contains one AP2 domain. There are 12 members in this subfamily including RAP2.11.
>
> So I think the database contains errors. In this case, it will require manual curation to determine which row to choose. Did you contact ensembl about this? Thanks!
>
> Best regards,
>
> Julie
>
>
> *******************************************
> Lihua Julie Zhu, Ph.D
> Research Associate Professor
> Program Gene Function and Expression
> University of Massachusetts Medical School
> 364 Plantation Street, Room 613
> Worcester, MA 01605
> 508-856-5256
> http://www.umassmed.edu/pgfe/faculty/zhu.cfm
> *******************************************
>
> On 3/5/10 6:46 PM, "pterry at huskers.unl.edu" <pterry at huskers.unl.edu> wrote:
>
>
>
> Dear bioc-sig-sequencing,
>
> I would like to annotate chip-seq peaks for the arabidopsis genome. "TSS" and "Exon" are two of the arguments for the 'getAnnotation' function. The "TSS" argument succeeded, but the "Exon" argument failed.
>
> ...
>> arabdset<-useMart(biomart="plant_mart_4", dataset = "athaliana_eg_gene")
> Checking attributes ... ok
> Checking filters ... ok
>> ExonArabAnno<-getAnnotation(arabdset, featureType="Exon")
> Error in `rownames<-`(`*tmp*`, value = c("ATCG00010.1-E176369", "ATMG00010.1-E176520", :
> duplicate rownames not allowed
>
>> sessionInfo()
> R version 2.11.0 Under development (unstable) (2010-02-28 r51186)
> x86_64-unknown-linux-gnu
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] ChIPpeakAnno_1.3.4 org.Hs.eg.db_2.3.6
> [3] GO.db_2.3.5 RSQLite_0.8-3
> [5] DBI_0.2-5 AnnotationDbi_1.9.4
> [7] BSgenome.Ecoli.NCBI.20080805_1.3.16 BSgenome_1.15.11
> [9] Biostrings_2.15.22 IRanges_1.5.51
> [11] multtest_2.3.0 Biobase_2.7.4
> [13] biomaRt_2.3.4
>
> loaded via a namespace (and not attached):
> [1] MASS_7.3-5 RCurl_1.3-1 splines_2.11.0 survival_2.35-8
> [5] tools_2.11.0 XML_2.6-0
>
> Can someone comment?
>
>
> Thanks,
> P. Terry
> pterry at huskers.unl.edu
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
>
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
--
Best wishes
Wolfgang
--
Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber/contact
More information about the Bioc-sig-sequencing
mailing list