[BioC] Where to get BAM files for easyRNASeq human use case ALSO ANNOTATION

Thu Aug 16 19:34:52 CEST 2012

On 08/16/2012 10:29 AM, Richard Friedman wrote:
> Steve,
>
> 	Thanks. I use annaffy for microarrays and was hoping for an
> already-worked-out protocol. I will however look into the package
> you recommend if no more explicit protocol is available.

Not so much an already worked out protocol but an elaboration of Steve's bet

An AnnotateSeq package would be a useful addition; the info in annaffy 
is in the org packages, discoverable with 'cols', 'keytypes' (often 
synonymous with 'cols'), and accessible via 'select'. The plans for the 
next release are OrganismDb objects that make the merge that one would 
do across, say, org*, TxDb*, and GO.db packages transparent.

 > library(org.Dm.eg.db)
 > cols(org.Dm.eg.db)
  [1] "ENTREZID"     "ACCNUM"       "ALIAS"        "CHR" 
"CHRLOC"
  [6] "CHRLOCEND"    "ENZYME"       "MAP"          "PATH"         "PMID" 

[11] "REFSEQ"       "SYMBOL"       "UNIGENE"      "ENSEMBL" 
"ENSEMBLPROT"
[16] "ENSEMBLTRANS" "GENENAME"     "UNIPROT"      "GO" 
"EVIDENCE"
[21] "ONTOLOGY"     "FLYBASE"      "FLYBASECG"    "FLYBASEPROT"
 > select(org.Dm.eg.db, "FBtr0005009", c("GENENAME", "SYMBOL"), 
"ENSEMBLTRANS")
   ENSEMBLTRANS          GENENAME SYMBOL
1  FBtr0005009 Muscle protein 20   Mp20

Martin

>
> Best wishes,
> Rich
>
> On Aug 16, 2012, at 1:25 PM, Steve Lianoglou wrote:
>
>> Hi,
>>
>> On Thu, Aug 16, 2012 at 1:17 PM, Richard Friedman
>> <friedman at cancercenter.columbia.edu> wrote:
>> [snip]
>>>         I would like then to ask a broader question - one that I was
>>> going to ask after I completed the vignette:
>>> Is it possible to obtain annotation for RNASeq data analogous
>>> to the kind obtained for microarrays?
>>> To be specific, when I analyze affymetrix microarrays I get, for
>>> each probeset the Entrez gene symbol and a description of the gene
>>> which could be several words long, as well as gene ontology categories
>>> and pathways. I can output this information as an Excel spreadsheet.
>>> When I work through  the drosophila vignette with transcriptCounts or
>>> geneCounts I got accession numbers (e.g.,"FBtr0005009") but no gene
>>> symbols etc.
>>>
>>> Do you have any suggestions as to how to get Entrez Gene Symbols,
>>> descriptions, etc, for RNASeq output with easy RNASeq?
>> [/snip]
>>
>> Perhaps I'm missing something, but given accession numbers (or other
>> gene identifiers), it should be pretty straightforward to jimmy up
>> something using the org.*.eg.db packages, no?
>>
>> I suspect you won't get gene descriptions there -- but if I were a
>> gambling man, I would bet you can probably get that last piece of the
>> puzzle from biomaRt.
>>
>> HTH,
>> -steve
>>
>> --
>> Steve Lianoglou
>> Graduate Student: Computational Systems Biology
>> | Memorial Sloan-Kettering Cancer Center
>> | Weill Medical College of Cornell University
>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>

-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793