[BioC] Where to get BAM files for easyRNASeq human use case ALSO ANNOTATION

Richard Friedman friedman at cancercenter.columbia.edu
Thu Aug 16 19:17:10 CEST 2012


Dear Nico,

	Thanks for offering to revise the vignette. I always 
find it best to do a worked example  on its original  dataset.
I am sure that it will be useful to many other workers in this
field.
	I would like then to ask a broader question - one that I was 
going to ask after I completed the vignette:
Is it possible to obtain annotation for RNASeq data analogous
to the kind obtained for microarrays?
To be specific, when I analyze affymetrix microarrays I get, for
each probeset the Entrez gene symbol and a description of the gene
which could be several words long, as well as gene ontology categories
and pathways. I can output this information as an Excel spreadsheet.
When I work through  the drosophila vignette with transcriptCounts or
geneCounts I got accession numbers (e.g.,"FBtr0005009") but no gene
symbols etc.

Do you have any suggestions as to how to get Entrez Gene Symbols,
descriptions, etc, for RNASeq output with easy RNASeq?

Thanks and best wishes,
Rich 


On Aug 16, 2012, at 12:17 PM, Nicolas Delhomme wrote:

> Dear Richard,
> 
> Sorry that this information is missing. I've added this use case after discussing with Francesco Lescai, see http://permalink.gmane.org/gmane.science.biology.informatics.conductor/38858. The point of that use case is to explain the importance of having consistent annotations and I was not expecting it to be used as a tutorial. 
> 
>> From the email exchange with Francesco, I recall that the data is public and had been retrieved from the ENA (SRA). One accession number I found is: SRR349689.
> 
> I'll try to look up more information about it, but I'm afraid that there are no readily available bam files for it. 
> 
> In any case, thanks for pointing that out. I'll try to find out a dataset that could be used for that use case and I'll update the vignette as well.
> 
> Thanks,
> 
> Nico
> 
> ---------------------------------------------------------------
> Nicolas Delhomme
> 
> Genome Biology Computational Support
> 
> European Molecular Biology Laboratory
> 
> Tel: +49 6221 387 8310
> Email: nicolas.delhomme at embl.de
> Meyerhofstrasse 1 - Postfach 10.2209
> 69102 Heidelberg, Germany
> ---------------------------------------------------------------
> 
> 
> 
> 
> 
> On Aug 16, 2012, at 6:02 PM, Richard Friedman wrote:
> 
>> Dear List,
>> 
>> 	I am working through the use case in the easyRNASeq 
>> vignette with the human data (section 6 of the vignette).
>> I am not sure where the bam files are for the use case. 
>> 
>> Here is the record of my session:
>> 
>>> library(easyRNASeq)
>> Loading required package: parallel
>> Loading required package: genomeIntervals
>> Loading required package: intervals
>> Loading required package: BiocGenerics
>> 
>> Attaching package: ŒBiocGenerics‚
>> 
>> The following object(s) are masked from Œpackage:stats‚:
>> 
>>   xtabs
>> 
>> The following object(s) are masked from Œpackage:base‚:
>> 
>>   anyDuplicated, cbind, colnames, duplicated, eval, Filter, Find, get, intersect, lapply, Map,
>>   mapply, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rbind, Reduce, rep.int,
>>   rownames, sapply, setdiff, table, tapply, union, unique
>> 
>> Loading required package: Biobase
>> Welcome to Bioconductor
>> 
>>   Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor,
>>   see 'citation("Biobase")', and for packages 'citation("pkgname")'.
>> 
>> Loading required package: biomaRt
>> Loading required package: edgeR
>> Loading required package: limma
>> Loading required package: Biostrings
>> Loading required package: IRanges
>> 
>> Attaching package: ŒIRanges‚
>> 
>> The following object(s) are masked from Œpackage:intervals‚:
>> 
>>   reduce
>> 
>> 
>> Attaching package: ŒBiostrings‚
>> 
>> The following object(s) are masked from Œpackage:intervals‚:
>> 
>>   type
>> 
>> Loading required package: BSgenome
>> Loading required package: GenomicRanges
>> Loading required package: DESeq
>> Loading required package: locfit
>> locfit 1.5-8 	 2012-04-25
>> 
>> Attaching package: Œlocfit‚
>> 
>> The following object(s) are masked from Œpackage:GenomicRanges‚:
>> 
>>   left, right
>> 
>> Loading required package: Rsamtools
>> Loading required package: ShortRead
>> Loading required package: lattice
>> Loading required package: latticeExtra
>> Loading required package: RColorBrewer
>> Warning messages:
>> 1: replacing previous import Œcoerce‚ when loading Œintervals‚ 
>> 2: replacing previous import Œinitialize‚ when loading Œintervals‚ 
>>> library(BSgenome.Hsapiens.UCSC.hg19)
>>> chr.sizes=as.list(seqlengths(Hsapiens))
>>> class(chr.sizes)
>> [1] "list"
>>> bamfiles=dir(getwd(),pattern="*\\.bam$")
>>> bamfiles
>> character(0)
>>> sessionInfo()
>> R version 2.15.1 (2012-06-22)
>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>> 
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>> 
>> attached base packages:
>> [1] parallel  stats     graphics  grDevices utils     datasets  methods   base     
>> 
>> other attached packages:
>> [1] BSgenome.Hsapiens.UCSC.hg19_1.3.17 easyRNASeq_1.2.3                   ShortRead_1.14.4                  
>> [4] latticeExtra_0.6-19                RColorBrewer_1.0-5                 lattice_0.20-6                    
>> [7] Rsamtools_1.8.5                    DESeq_1.8.3                        locfit_1.5-8                      
>> [10] BSgenome_1.24.0                    GenomicRanges_1.8.7                Biostrings_2.24.1                 
>> [13] IRanges_1.14.4                     edgeR_2.6.10                       limma_3.12.1                      
>> [16] biomaRt_2.12.0                     Biobase_2.16.0                     genomeIntervals_1.12.0            
>> [19] BiocGenerics_0.2.0                 intervals_0.13.3                  
>> 
>> loaded via a namespace (and not attached):
>> [1] annotate_1.34.1      AnnotationDbi_1.18.1 bitops_1.0-4.1       DBI_0.2-5            genefilter_1.38.0   
>> [6] geneplotter_1.34.0   grid_2.15.1          hwriter_1.3          RCurl_1.91-1         RSQLite_0.11.1      
>> [11] splines_2.15.1       stats4_2.15.1        survival_2.36-14     XML_3.9-4            xtable_1.7-0        
>> [16] zlibbioc_1.2.0      
>>> 
>> 
>> THANKS!
>> Rich
>> 
>> 
>> Richard A. Friedman, PhD
>> Associate Research Scientist,
>> Biomedical Informatics Shared Resource
>> Herbert Irving Comprehensive Cancer Center (HICCC)
>> Lecturer,
>> Department of Biomedical Informatics (DBMI)
>> Educational Coordinator,
>> Center for Computational Biology and Bioinformatics (C2B2)/
>> National Center for Multiscale Analysis of Genomic Networks (MAGNet)
>> Room 824
>> Irving Cancer Research Center
>> Columbia University
>> 1130 St. Nicholas Ave
>> New York, NY 10032
>> (212)851-4765 (voice)
>> friedman at cancercenter.columbia.edu
>> http://cancercenter.columbia.edu/~friedman/
>> 
>> "School is an evil plot to suppress my individuality"
>> 
>> Rose Friedman, age15
>> 
>> 
>> 	[[alternative HTML version deleted]]
>> 
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 



More information about the Bioconductor mailing list