[BioC] ReportingTools - trouble incorporating annotations
James W. MacDonald
jmacdon at uw.edu
Wed Jun 19 17:55:23 CEST 2013
Hi Sam,
First, please always give us the results of sessionInfo(). This is
especially critical in the case of ReportingTools, which has been
fundamentally altered between the previous and current versions of BioC.
On 6/19/2013 11:12 AM, Sam McInturf wrote:
> Bioconductors,
> I am working on a RNA seq analysis project and am having trouble
> publishing an HTML report for it. I am unsure of how to make my DE genes
> have the same ID as what publish() will accept when passing an argument to
> 'annotation'.
> I mapped the reads using tophat and passed the TAIR 10 gtf file to the
> -G option. When i counted my reads I used the summarizeOverlaps function
> from GenomicRanges and again used this same file. I called differential
> expression in edgeR using the GLM methods. So the rownames of my DE table
> are the AGI identifiers (AT#G#####). I loaded the org.At.tair.db
> annotations and passed it to HTMLReport in:
>
> publish(DGELists[["Roots"]], myHTML, countTable = cpmMat, conditions =
> group, annotation = "org.At.tair.db", pvaueCutoff = 0.01, lfc =2, n = 1000,
> name = "RootsLRT")
> Error: More than half of your IDs could not be mapped.
> In addition: Warning message:
> In .DGELRT.to.data.frame(object, ...) : NAs introduced by coercion
>
> which makes sense, because publish() is looking for Entrez IDs (right?)
>
> How do I proceed?
Here I assume you are running R-3.0.x and the current release of BioC.
When you run publish() on anything but a data.frame, the first step is
to coerce to a data.frame using a set of assumptions that might not hold
in your case (or there may be defaults that you don't like). Because of
this, I tend to just coerce to a data.frame myself and then publish()
that directly. This also allows you to pass in arguments to .modifyDF
which is kind of sweet.
In the case of a DGELRT or DEGExact object, there is a 'genes' slot that
will be used to annotate the output of topTags(). Ideally you would just
add the annotation you want to that slot first. So you could do
something like
annot <- select(org.At.tair.db, DGELists[["Roots"]]$genes[,<Tair column
goes here>], c("SYMBOL","GENENAME","OTHERSTUFF"))
and then put that in your DGEobjects. Now you can do something like
outlst <- lapply(DGELists, topTags, otherargsgohere)
htmlst <- lapply(seq_len(length(DGELists)) function(x)
HTMLReport(namevector[x], titlevector[x], otherargs))
and you can define a function similar to this function I use for Entrez
Gene IDs:
entrezLinks <- function (df, ...){
df$ENTREZID <- hwriter::hwrite(as.character(df$ENTREZID),
link = paste0("http://www.ncbi.nlm.nih.gov/gene/",
as.character(df$ENTREZID)),
table = FALSE)
return(df)
}
but modified for the Tair equivalent and then
lapply(seq_len(length(htmlst)), function(x) publish(outlst[[x]],
htmlst[[x]], .modifyDF = samsTairLinkFun)))
lapply(htmlst, finish)
et voila!
You can also then use htmlst to make a bunch of links in an index.html page.
indx <- HTMLReport("index", "A bunch of links for this expt",
reportDirectory=".", baseUrl = "")
publish(hwriter::hwrite("Here are links", page(indx), header=2,
br=TRUE), indx)
publish(Link(htmlst, report=indx), indx)
finish(indx)
Best,
Jim
>
> Thanks in advance!
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list