[BioC] ReportingTools - trouble incorporating annotations
James W. MacDonald
jmacdon at uw.edu
Wed Jun 19 19:55:40 CEST 2013
Hi Sam,
Sorry, my bad. One of the arguments for DGEList is 'genes', which is
genes: data frame containing annotation information for the
tags/transcripts/genes.
So yeah, when you are creating your DGELists, add in the annotation data
for each gene/transcript/whatever, and then go from there.
Best,
Jim
On 6/19/2013 1:37 PM, Sam McInturf wrote:
> Jim,
> When I look at my DGELRT objects I don't have a $genes field: (to
> be verbose, DGELists is a list of DGELRT objects (list(glmLRT(),
> glmLRT(),...))
>
> > DGELists$Roots$genes
> NULL
>
> Looking at what fields I do have: coefficients, fitted.values,
> deviance, df.residual, design, offset, dispersion, method, samples,
> AveLogCPM, table, comparison, df.test
>
> Looking at the DGEGLM which my DGELists were made, I have no gene slot
> either. I added the rownames of the counts table in the DGEGLM to a
> new $genes:
>
> >fit$genes <- rownames(fit$counts)
>
> but the the $genes slot did not get passed onto the DGELRTs. Is it
> best just to add the $genes field to each DGELRT and then follow what
> you stated before?
>
> Thanks,
>
> > sessionInfo()
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=C LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets methods
> [8] base
>
> other attached packages:
> [1] ReportingTools_2.0.1 org.At.tair.db_2.9.0 RSQLite_0.11.4
> [4] DBI_0.2-7 AnnotationDbi_1.22.6 Biobase_2.20.0
> [7] GenomicRanges_1.12.4 IRanges_1.18.1 BiocGenerics_0.6.0
> [10] edgeR_3.2.3 limma_3.16.5 BiocInstaller_1.10.2
>
> loaded via a namespace (and not attached):
> [1] annotate_1.38.0 AnnotationForge_1.2.1 biomaRt_2.16.0
> [4] Biostrings_2.28.0 biovizBase_1.8.1 bitops_1.0-5
> [7] BSgenome_1.28.0 Category_2.26.0 cluster_1.14.4
> [10] colorspace_1.2-2 dichromat_2.0-0 digest_0.6.3
> [13] genefilter_1.42.0 GenomicFeatures_1.12.2 ggbio_1.8.3
> [16] ggplot2_0.9.3.1 GO.db_2.9.0 GOstats_2.26.0
> [19] graph_1.38.2 grid_3.0.1 gridExtra_0.9.1
> [22] GSEABase_1.22.0 gtable_0.1.2 Hmisc_3.10-1.1
> [25] hwriter_1.3 labeling_0.1 lattice_0.20-15
> [28] MASS_7.3-26 munsell_0.4 PFAM.db_2.9.0
> [31] plyr_1.8 proto_0.3-10 RBGL_1.36.2
> [34] RColorBrewer_1.0-5 RCurl_1.95-4.1 reshape2_1.2.2
> [37] R.methodsS3_1.4.2 R.oo_1.13.0 Rsamtools_1.12.3
> [40] rtracklayer_1.20.2 R.utils_1.23.2 scales_0.2.3
> [43] splines_3.0.1 stats4_3.0.1 stringr_0.6.2
> [46] survival_2.37-4 tools_3.0.1
> VariantAnnotation_1.6.6
> [49] XML_3.96-1.1 xtable_1.7-1 zlibbioc_1.6.0
>
>
>
>
> On Wed, Jun 19, 2013 at 10:55 AM, James W. MacDonald <jmacdon at uw.edu
> <mailto:jmacdon at uw.edu>> wrote:
>
> Hi Sam,
>
> First, please always give us the results of sessionInfo(). This is
> especially critical in the case of ReportingTools, which has been
> fundamentally altered between the previous and current versions of
> BioC.
>
>
> On 6/19/2013 11:12 AM, Sam McInturf wrote:
>
> Bioconductors,
> I am working on a RNA seq analysis project and am having
> trouble
> publishing an HTML report for it. I am unsure of how to make
> my DE genes
> have the same ID as what publish() will accept when passing an
> argument to
> 'annotation'.
> I mapped the reads using tophat and passed the TAIR 10
> gtf file to the
> -G option. When i counted my reads I used the
> summarizeOverlaps function
> from GenomicRanges and again used this same file. I called
> differential
> expression in edgeR using the GLM methods. So the rownames
> of my DE table
> are the AGI identifiers (AT#G#####). I loaded the org.At.tair.db
> annotations and passed it to HTMLReport in:
>
> publish(DGELists[["Roots"]], myHTML, countTable = cpmMat,
> conditions =
> group, annotation = "org.At.tair.db", pvaueCutoff = 0.01, lfc
> =2, n = 1000,
> name = "RootsLRT")
> Error: More than half of your IDs could not be mapped.
> In addition: Warning message:
> In .DGELRT.to.data.frame(object, ...) : NAs introduced by coercion
>
> which makes sense, because publish() is looking for Entrez IDs
> (right?)
>
> How do I proceed?
>
>
> Here I assume you are running R-3.0.x and the current release of BioC.
>
> When you run publish() on anything but a data.frame, the first
> step is to coerce to a data.frame using a set of assumptions that
> might not hold in your case (or there may be defaults that you
> don't like). Because of this, I tend to just coerce to a
> data.frame myself and then publish() that directly. This also
> allows you to pass in arguments to .modifyDF which is kind of sweet.
>
> In the case of a DGELRT or DEGExact object, there is a 'genes'
> slot that will be used to annotate the output of topTags().
> Ideally you would just add the annotation you want to that slot
> first. So you could do something like
>
> annot <- select(org.At.tair.db, DGELists[["Roots"]]$genes[,<Tair
> column goes here>], c("SYMBOL","GENENAME","OTHERSTUFF"))
>
> and then put that in your DGEobjects. Now you can do something like
>
> outlst <- lapply(DGELists, topTags, otherargsgohere)
>
> htmlst <- lapply(seq_len(length(DGELists)) function(x)
> HTMLReport(namevector[x], titlevector[x], otherargs))
>
> and you can define a function similar to this function I use for
> Entrez Gene IDs:
>
> entrezLinks <- function (df, ...){
> df$ENTREZID <- hwriter::hwrite(as.character(df$ENTREZID),
> link = paste0("http://www.ncbi.nlm.nih.gov/gene/",
> as.character(df$ENTREZID)),
> table = FALSE)
> return(df)
> }
>
> but modified for the Tair equivalent and then
>
> lapply(seq_len(length(htmlst)), function(x) publish(outlst[[x]],
> htmlst[[x]], .modifyDF = samsTairLinkFun)))
> lapply(htmlst, finish)
>
> et voila!
>
> You can also then use htmlst to make a bunch of links in an
> index.html page.
>
> indx <- HTMLReport("index", "A bunch of links for this expt",
> reportDirectory=".", baseUrl = "")
> publish(hwriter::hwrite("Here are links", page(indx), header=2,
> br=TRUE), indx)
> publish(Link(htmlst, report=indx), indx)
> finish(indx)
>
> Best,
>
> Jim
>
>
>
> Thanks in advance!
>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
>
>
>
> --
> Sam McInturf
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list