[BioC] ReportingTools gene IDs

Fri Apr 25 15:43:32 CEST 2014

Hi Assa,

Gabriel actually already gave you the answer, and it is yes. You just 
have to add things to the .modifyDF argument. There are several examples in

http://www.bioconductor.org/packages/release/bioc/vignettes/ReportingTools/inst/doc/basicReportingTools.pdf

and here is one (untested) that should apply to your situation:

fun <- function(df, object, ...){
     if(!ENSEMBL %in% names(df))
         stop("The column name for ensembl ids has to be 'ENSEMBL'!")
     ensids <- df$ENSEMBL
     whichcol <- which(names(df) == "ENSEMBL")
     annot <- select(org.Mm.eg.db, ensids, c("SYMBOL","GENENAME"), 
"ENSEMBL")
     if(nrow(annot) > nrow(df)) annot <- annot[!duplicated(annot[,1]),]
     df <- data.frame(annot, df[,-whichcol])
     df$ENSEMBL <- hwrite(as.character(df$ENSEMBL),
                          link = paste0(" 
http://www.ensembl.org/Homo_sapiens/Gene/Summary?g=",
                          as.character(df$ENSEMBL)), table = FALSE)
     df
}

This function implicitly assumes (and checks) that there is an ENSEMBL 
column in your data.frame that it can use to extract the Ensembl IDs. It 
also assumes that your species is human, and that you have the 
org.Mm.eg.db package already loaded. It then gets the symbol and 
genename for those IDs, and does a really naive subsetting of the data 
if there are duplicates. Other more sophisticated things are possible, 
but I leave it to you to make any such modifications.

You would use this (as Gabriel already said), as part of an argument 
passed in via .modifyDF. You also need modifyReportDF as well. So your 
publish argument would now look like

publish(fit,des2Report, pvalueCutoff=0.05,annotation.db="org.Mm.eg.db", 
factor = colData(fit)$condition,reportDir="./reports", .modifyDF = 
list(modifyReportDF, fun))

That at least is the basic idea, and you might need to play around to 
make things work correctly.

Best,

Jim

On 4/25/2014 4:21 AM, Assa Yeroslaviz wrote:
> Hi Gabriel,
>
> Thanks for the quick answer I will look into that as soon as I have 
> the time.
> Another question was if it is possible to work directy with the 
> Ensembl IDs.
>
> I have a table of ~37K ensembl Ids, for which almost 50% have no 
> Entrez Ids, so I can't convert them. Is there a way to work directly 
> with the Ensembl IDs and still benefit from the annotation.de 
> <http://annotation.de> possibilities?
>
> Thanks
>
> Assa
>
>
> On Thu, Apr 24, 2014 at 4:48 PM, Gabriel Becker <gmbecker at ucdavis.edu 
> <mailto:gmbecker at ucdavis.edu>> wrote:
>
>     I wrote my previous message too quickly. Apologies.
>
>     Your functions must have the signature
>
>     function(df, object, ...)
>
>     df is current data.frame represenation of the object,
>     object is the *original* object (so that the class can be identified),
>     ... are passed in from the call to publish
>
>     And you can just place the generic modifyReportDF function at the
>     beginning of the list, rather than using getMethod. The getMethod
>     thing I said is for when you want to apply the default handling
>     for a *different* class to your object. It is a rare use-case, but
>     came up recently so it was on my mind.
>
>     That will teach me to respond quickly to emails early in the morning.
>
>     Sorry about that.
>
>     ~G
>
>
>     On Thu, Apr 24, 2014 at 7:18 AM, Gabriel Becker
>     <gmbecker at ucdavis.edu <mailto:gmbecker at ucdavis.edu>> wrote:
>
>         Assa,
>
>         In general yes, if you want to add to the table you will be
>         working with the data.frame.
>
>         You can do so after the initial conversion, though, so you
>         don't have to recreate the wheel to get from your object to an
>         initial data.frame.
>
>         To modify the default table (data.frame) generated for an
>         object, you can pass publish()'s  .modifyDF parameter a
>         function of list of functions, each of which should accept
>         object (the data.frame) and "..." and return a data.frame.
>
>         These will be called in order, each accepting the output from
>         the last. The output of the final function is what will be
>         transformed into HTML and inserted into the report.
>
>         You'll probably want to add the default handling of your
>         object type, which you can do by putting
>         getMethod("modifyReportDF", "<your object's class>") at the
>         beginning of the list.
>
>         See section 4 of the ReportingTools basics vignette for
>         example code.
>
>         HTH,
>         ~G
>
>
>         On Thu, Apr 24, 2014 at 6:54 AM, Assa Yeroslaviz
>         <frymor at gmail.com <mailto:frymor at gmail.com>> wrote:
>
>             Thanks Jim,
>
>             I have found in one of the forums a response from Jason
>             (thanks again) for
>             the option to set annotation.db=NULL and though force the
>             publish command
>             to work with the Ids I provide in the DESeqDataSet object.
>
>             So this is now working, But I would like to have also the
>             option to add
>             some annotations to the table.
>
>             Is this only possible when working directly with a data
>             .frame?
>
>             Thanks again
>             Assa
>
>             On Thu, Apr 24, 2014 at 3:45 PM, James W. MacDonald
>             <jmacdon at uw.edu <mailto:jmacdon at uw.edu>> wrote:
>
>             > Hi Assa,
>             >
>             > There may well be a way to work with Ensembl IDs, and
>             you will likely get
>             > an answer to your direct question from one of the
>             maintainers.
>             >
>             > However you should note that ReportingTools simply takes
>             the input object
>             > and then coerces the data to a data.frame, which is then
>             used to create the
>             > HTML table. You can always create the data.frame to your
>             own liking up
>             > front, and then pass that to publish(). While this is
>             more work than just
>             > passing in the DESeqDataSet, you do have complete
>             control over the output.
>             >
>             > Best,
>             >
>             > Jim
>             >
>             >
>             >
>             > On 4/24/2014 8:50 AM, Assa Yeroslaviz wrote:
>             >
>             >> Hi,
>             >>
>             >> Is it neccessary to have entrez gene IDs to work with
>             this package?
>             >>
>             >> I am working on a dataset with Ensembl IDs. Do I need
>             to convert them to
>             >> Entrez?
>             >>
>             >> When trying to create a report for a DESeqDataSet or
>             DESeqResults objects
>             >> i
>             >> am getting the error messege:
>             >>
>             >> Error: Ids do not appear to be Entrez Ids for the
>             specified species.
>             >>
>             >> Is there a way to work straight with the ensembl IDs?
>             >>
>             >> Thanks
>             >>
>             >> Assa
>             >>
>             >> my script:
>             >>
>             >> head(Counts_set)
>             >> A_pKO_aV_FCS G_pKO_aV_FCS M_pKO_aV_FCS D_pKO_aV
>             >> J_pKO_aV
>             >> ENSMUSG00000000001 4744         4632         4535 4748
>             >> 3736
>             >> ENSMUSG00000000003    0            0            0  0
>             >>  0
>             >> ENSMUSG00000000028 1246         1420         1429 2304
>             >> 1261
>             >> ENSMUSG00000000031    3           25           65  0
>             >> 50
>             >> ENSMUSG00000000037    0            0            0  0
>             >>  0
>             >> ENSMUSG00000000049    0            0            3  1
>             >>  3
>             >>
>             >> cds <- DESeqDataSetFromMatrix (
>             >>      countData = Counts_set,
>             >>      colData   = colData,
>             >>      design    = ~  condition
>             >>      )
>             >>
>             >> fit = DESeq(cds)
>             >> des2Report <- HTMLReport(shortName
>             =paste('RNAseq_analysis_', group1, "_",
>             >> group2, sep=""),title ='RNA-seq analysis of
>             differential expression using
>             >> DESeq2',reportDirectory = "./reports")
>             >> publish(fit,des2Report,
>             pvalueCutoff=0.05,annotation.db="org.Mm.eg.db",
>             >> factor = colData(fit)$condition,reportDir="./reports")
>             >> Error: Ids do not appear to be Entrez Ids for the
>             specified species.
>             >> finish(des2Report)
>             >>
>             >>
>             >>  sessionInfo()
>             >>>
>             >> R version 3.1.0 (2014-04-10)
>             >> Platform: x86_64-pc-linux-gnu (64-bit)
>             >>
>             >> locale:
>             >>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>             >>   [3] LC_TIME=en_US.UTF-8      LC_COLLATE=en_US.UTF-8
>             >>   [5] LC_MONETARY=en_US.UTF-8  LC_MESSAGES=en_US.UTF-8
>             >>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>             >>   [9] LC_ADDRESS=C       LC_TELEPHONE=C
>             >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>             >>
>             >> attached base packages:
>             >> [1] parallel  stats graphics  grDevices utils datasets
>              methods
>             >> [8] base
>             >>
>             >> other attached packages:
>             >>   [1] org.Mm.eg.db_2.14.0   ReportingTools_2.4.0
>              AnnotationDbi_1.26.0
>             >>   [4] Biobase_2.24.0    RSQLite_0.11.4          DBI_0.2-7
>             >>   [7] knitr_1.5   DESeq2_1.4.0
>             >>  RcppArmadillo_0.4.200.0
>             >> [10] Rcpp_0.11.1   GenomicRanges_1.16.2  GenomeInfoDb_1.0.2
>             >> [13] IRanges_1.22.3  BiocGenerics_0.10.0
>             >>
>             >> loaded via a namespace (and not attached):
>             >>   [1] annotate_1.42.0    AnnotationForge_1.6.0
>             >> BatchJobs_1.2
>             >>   [4] BBmisc_1.5     BiocParallel_0.6.0
>             >> biomaRt_2.20.0
>             >>   [7] Biostrings_2.32.0    biovizBase_1.12.0
>             >> bitops_1.0-6
>             >> [10] brew_1.0-6   BSgenome_1.32.0
>             >> Category_2.30.0
>             >> [13] cluster_1.14.4   codetools_0.2-8
>             >> colorspace_1.2-4
>             >> [16] dichromat_2.0-0    digest_0.6.4
>             >> edgeR_3.6.0
>             >> [19] evaluate_0.5.3   fail_1.2
>             >> foreach_1.4.2
>             >> [22] formatR_0.10   Formula_1.1-1
>             >> genefilter_1.46.0
>             >> [25] geneplotter_1.42.0   GenomicAlignments_1.0.0
>             >> GenomicFeatures_1.16.0
>             >> [28] ggbio_1.12.0   ggplot2_0.9.3.1
>             >> GO.db_2.14.0
>             >> [31] GOstats_2.30.0   graph_1.42.0
>             >> grid_3.1.0
>             >> [34] gridExtra_0.9.1    GSEABase_1.26.0
>             >> gtable_0.1.2
>             >> [37] Hmisc_3.14-4   hwriter_1.3
>             >> iterators_1.0.7
>             >> [40] lattice_0.20-24    latticeExtra_0.6-26
>             >> limma_3.20.1
>             >> [43] locfit_1.5-9.1   MASS_7.3-29
>             >> Matrix_1.1-2
>             >> [46] munsell_0.4.2    PFAM.db_2.14.0
>             >> plyr_1.8.1
>             >> [49] proto_0.3-10   RBGL_1.40.0
>             >> RColorBrewer_1.0-5
>             >> [52] RCurl_1.95-4.1   reshape2_1.2.2
>             >> R.methodsS3_1.6.1
>             >> [55] R.oo_1.18.0    Rsamtools_1.16.0
>             >> rtracklayer_1.24.0
>             >> [58] R.utils_1.29.8   scales_0.2.4
>             >> sendmailR_1.1-2
>             >> [61] splines_3.1.0    stats4_3.1.0
>             >> stringr_0.6.2
>             >> [64] survival_2.37-7    tools_3.1.0
>             >> VariantAnnotation_1.10.0
>             >> [67] XML_3.98-1.1   xtable_1.7-3
>             >> XVector_0.4.0
>             >> [70] zlibbioc_1.10.0
>             >>
>             >>         [[alternative HTML version deleted]]
>             >>
>             >> _______________________________________________
>             >> Bioconductor mailing list
>             >> Bioconductor at r-project.org
>             <mailto:Bioconductor at r-project.org>
>             >> https://stat.ethz.ch/mailman/listinfo/bioconductor
>             >> Search the archives: http://news.gmane.org/gmane.
>             >> science.biology.informatics.conductor
>             >>
>             >
>             > --
>             > James W. MacDonald, M.S.
>             > Biostatistician
>             > University of Washington
>             > Environmental and Occupational Health Sciences
>             > 4225 Roosevelt Way NE, # 100
>             > Seattle WA 98105-6099
>             >
>             >
>
>                     [[alternative HTML version deleted]]
>
>             _______________________________________________
>             Bioconductor mailing list
>             Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>             https://stat.ethz.ch/mailman/listinfo/bioconductor
>             Search the archives:
>             http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>
>
>         -- 
>         Gabriel Becker
>         Graduate Student
>         Statistics Department
>         University of California, Davis
>
>
>
>
>     -- 
>     Gabriel Becker
>     Graduate Student
>     Statistics Department
>     University of California, Davis
>
>

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099