James,




On Wed, Jun 19, 2013 at 1:50 PM, James W. MacDonald <jmacdon@uw.edu> wrote:

> Hi Gabriel,
>
>
> On 6/19/2013 3:22 PM, Gabriel Becker wrote:
>
>> James,
>>
>> Thanks for responding to Sam's question.
>>
>> One small point, you can control the coercion of the object to a
>> data.frame within the publish call via the .toDF argument, which is applied
>> directly before .modifyDF.
>>
>
> Nice! I think I missed that one. I'll take a closer look.
>
>
>
>> The defaults of these steps (coercion and modification of data.frames)
>> can also be extended on an S4 class basis, by creating new methods for
>> toReportDF and modifyReportDF if you find yourself always using the same
>> customization across multiple scripts (this approaches creating a package
>> which depends on ReportingTools, however, and is likely overkill in most
>> situations).
>>
>
> Well, I have a package that I am converting from using annaffy to using
> ReportingTools, so it already depends on ReportingTools. I certainly have
> need to modify things.
>
> As an example, it is not uncommon for me to do a GO hypergeometric test
> for multiple different contrasts, and want to output the results. However
> the default in ReportingTools for these objects is to create a table with a
> bunch of glyphs, and then automatically make tables for all the significant
> GO terms that contain the underlying genes.


> This is OK as long as you don't try to A.) Do a bunch of contrasts and
> then go GO analyses on all of them (thereby creating literally thousands of
> little files) and then B.) zip up and send to a collaborator who is on
> Windows.
>
> Turns out the Windows zip utility will silently fail when unzipping a file
> with thousands of files within it. No error, nothing. Just truncated
> output. And a PI who wants to know where the rest of it is...
>
>
It is actually possible to include images directly in the HTML without
creating external files (or rather, with creating them, but then not
requiring they continue to exist) using data URIs. See
http://css-tricks.com/data-uris/

Since we give you full control over what comes out, you can add logic
similar to the following:

df$Image = sapply(rows, function(r)
{
   fname = glyphFileName(r)
   png(file = fname, ...)
   makeGlyph(r)
   dev.off()
   img(fname) #from base64 package, seems to work but I haven't tested it
extensively
}

in your .modifyDF function, with appropriate definitions for glyphFileName
and makeGlyph.

We don't do this by default because it can make the html files extremely
large without much benefit when hosting the files on a server,  but when
you are transporting them manually to individual collaborators instead of
serving them over the web, this may not be an issue.

A couple of notes about this approach:

You'll want to remove the files if they aren't needed

The base64 package is listed as having no maintainer. If that proves to be
the case and people want to use it I'm sure a solution can be found.

~G


> So having my own personal way of outputting GO results will be quite
> useful. Thanks for the heads-up.
>
> Best,
>
> Jim
>
>
>
>> There is a flowchart for the full publish mechanism in inst/examples/**publishFlowchart.svg,
>> which I am attaching here as well for convenience.
>>
>> ~G
>>
>>
>> On Wed, Jun 19, 2013 at 10:55 AM, James W. MacDonald <jmacdon@uw.edu<mailto:
>> jmacdon@uw.edu>> wrote:
>>
>>     Hi Sam,
>>
>>     Sorry, my bad. One of the arguments for DGEList is 'genes', which is
>>
>>        genes: data frame containing annotation information for the
>>               tags/transcripts/genes.
>>
>>     So yeah, when you are creating your DGELists, add in the
>>     annotation data for each gene/transcript/whatever, and then go
>>     from there.
>>
>>     Best,
>>
>>     Jim
>>
>>
>>
>>     On 6/19/2013 1:37 PM, Sam McInturf wrote:
>>
>>         Jim,
>>            When I look at my DGELRT objects I don't have a $genes
>>         field: (to be verbose, DGELists is a list of DGELRT objects
>>         (list(glmLRT(), glmLRT(),...))
>>
>>         > DGELists$Roots$genes
>>         NULL
>>
>>         Looking at what fields I do have: coefficients, fitted.values,
>>         deviance, df.residual, design, offset, dispersion, method,
>>         samples, AveLogCPM, table, comparison, df.test
>>
>>         Looking at the DGEGLM which my DGELists were made, I have no
>>         gene slot either.  I added the rownames of the counts table in
>>         the DGEGLM to a new $genes:
>>
>>         >fit$genes <- rownames(fit$counts)
>>
>>         but the the $genes slot did not get passed onto the DGELRTs.
>>          Is it best just to add the $genes field to each DGELRT and
>>         then follow what you stated before?
>>
>>         Thanks,
>>
>>         > sessionInfo()
>>         R version 3.0.1 (2013-05-16)
>>         Platform: x86_64-pc-linux-gnu (64-bit)
>>
>>         locale:
>>          [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>          [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>          [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>          [7] LC_PAPER=C                 LC_NAME=C
>>          [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>         [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>>         attached base packages:
>>         [1] parallel  stats     graphics  grDevices utils     datasets
>>          methods
>>         [8] base
>>
>>         other attached packages:
>>          [1] ReportingTools_2.0.1 org.At.tair.db_2.9.0 RSQLite_0.11.4
>>          [4] DBI_0.2-7            AnnotationDbi_1.22.6 Biobase_2.20.0
>>          [7] GenomicRanges_1.12.4 IRanges_1.18.1       BiocGenerics_0.6.0
>>         [10] edgeR_3.2.3          limma_3.16.5
>>  BiocInstaller_1.10.2
>>
>>         loaded via a namespace (and not attached):
>>          [1] annotate_1.38.0         AnnotationForge_1.2.1
>>  biomaRt_2.16.0
>>          [4] Biostrings_2.28.0       biovizBase_1.8.1        bitops_1.0-5
>>          [7] BSgenome_1.28.0         Category_2.26.0
>>  cluster_1.14.4
>>         [10] colorspace_1.2-2        dichromat_2.0-0         digest_0.6.3
>>         [13] genefilter_1.42.0       GenomicFeatures_1.12.2  ggbio_1.8.3
>>         [16] ggplot2_0.9.3.1         GO.db_2.9.0
>>  GOstats_2.26.0
>>         [19] graph_1.38.2            grid_3.0.1
>> gridExtra_0.9.1
>>         [22] GSEABase_1.22.0         gtable_0.1.2
>> Hmisc_3.10-1.1
>>         [25] hwriter_1.3             labeling_0.1
>> lattice_0.20-15
>>         [28] MASS_7.3-26             munsell_0.4             PFAM.db_2.9.0
>>         [31] plyr_1.8                proto_0.3-10            RBGL_1.36.2
>>         [34] RColorBrewer_1.0-5      RCurl_1.95-4.1
>> reshape2_1.2.2
>>         [37] R.methodsS3_1.4.2       R.oo_1.13.0
>>  Rsamtools_1.12.3
>>         [40] rtracklayer_1.20.2      R.utils_1.23.2          scales_0.2.3
>>         [43] splines_3.0.1           stats4_3.0.1            stringr_0.6.2
>>         [46] survival_2.37-4         tools_3.0.1
>>  VariantAnnotation_1.6.6
>>         [49] XML_3.96-1.1            xtable_1.7-1
>> zlibbioc_1.6.0
>>
>>
>>
>>
>>         On Wed, Jun 19, 2013 at 10:55 AM, James W. MacDonald
>>         <jmacdon@uw.edu <mailto:jmacdon@uw.edu> <mailto:jmacdon@uw.edu
>>
>>         <mailto:jmacdon@uw.edu>>> wrote:
>>
>>             Hi Sam,
>>
>>             First, please always give us the results of sessionInfo().
>>         This is
>>             especially critical in the case of ReportingTools, which
>>         has been
>>             fundamentally altered between the previous and current
>>         versions of
>>             BioC.
>>
>>
>>             On 6/19/2013 11:12 AM, Sam McInturf wrote:
>>
>>                 Bioconductors,
>>                      I am working on a RNA seq analysis project and am
>>         having
>>                 trouble
>>                 publishing an HTML report for it.  I am unsure of how
>>         to make
>>                 my DE genes
>>                 have the same ID as what publish() will accept when
>>         passing an
>>                 argument to
>>                 'annotation'.
>>                      I mapped the reads using tophat and passed the
>>         TAIR 10
>>                 gtf file to the
>>                 -G option.  When i counted my reads I used the
>>                 summarizeOverlaps function
>>                 from GenomicRanges and again used this same file.  I
>>         called
>>                 differential
>>                 expression in edgeR using the GLM methods.  So the
>>         rownames
>>                  of my DE table
>>                 are the AGI identifiers (AT#G#####).  I loaded the
>>         org.At.tair.db
>>                 annotations and passed it to HTMLReport in:
>>
>>                 publish(DGELists[["Roots"]], myHTML, countTable = cpmMat,
>>                 conditions =
>>                 group, annotation = "org.At.tair.db", pvaueCutoff =
>>         0.01, lfc
>>                 =2, n = 1000,
>>                 name = "RootsLRT")
>>                 Error: More than half of your IDs could not be mapped.
>>                 In addition: Warning message:
>>                 In .DGELRT.to.data.frame(object, ...) : NAs introduced
>>         by coercion
>>
>>                 which makes sense, because publish() is looking for
>>         Entrez IDs
>>                 (right?)
>>
>>                 How do I proceed?
>>
>>
>>             Here I assume you are running R-3.0.x and the current
>>         release of BioC.
>>
>>             When you run publish() on anything but a data.frame, the first
>>             step is to coerce to a data.frame using a set of
>>         assumptions that
>>             might not hold in your case (or there may be defaults that you
>>             don't like). Because of this, I tend to just coerce to a
>>             data.frame myself and then publish() that directly. This also
>>             allows you to pass in arguments to .modifyDF which is kind
>>         of sweet.
>>
>>             In the case of a DGELRT or DEGExact object, there is a 'genes'
>>             slot that will be used to annotate the output of topTags().
>>             Ideally you would just add the annotation you want to that
>>         slot
>>             first. So you could do something like
>>
>>             annot <- select(org.At.tair.db,
>>         DGELists[["Roots"]]$genes[,<**Tair
>>             column goes here>], c("SYMBOL","GENENAME","**OTHERSTUFF"))
>>
>>
>>             and then put that in your DGEobjects. Now you can do
>>         something like
>>
>>             outlst <- lapply(DGELists, topTags, otherargsgohere)
>>
>>             htmlst <- lapply(seq_len(length(**DGELists)) function(x)
>>
>>             HTMLReport(namevector[x], titlevector[x], otherargs))
>>
>>             and you can define a function similar to this function I
>>         use for
>>             Entrez Gene IDs:
>>
>>             entrezLinks <- function (df, ...){
>>                 df$ENTREZID <- hwriter::hwrite(as.character(**
>> df$ENTREZID),
>>                     link = paste0("http://www.ncbi.nlm.**nih.gov/gene/<http://www.ncbi.nlm.nih.gov/gene/>
>> ",
>>
>>             as.character(df$ENTREZID)),
>>                     table = FALSE)
>>                 return(df)
>>             }
>>
>>             but modified for the Tair equivalent and then
>>
>>             lapply(seq_len(length(htmlst))**, function(x)
>>         publish(outlst[[x]],
>>
>>             htmlst[[x]], .modifyDF = samsTairLinkFun)))
>>             lapply(htmlst, finish)
>>
>>             et voila!
>>
>>             You can also then use htmlst to make a bunch of links in an
>>             index.html page.
>>
>>             indx <- HTMLReport("index", "A bunch of links for this expt",
>>             reportDirectory=".", baseUrl = "")
>>             publish(hwriter::hwrite("Here are links", page(indx),
>>         header=2,
>>             br=TRUE), indx)
>>             publish(Link(htmlst, report=indx), indx)
>>             finish(indx)
>>
>>             Best,
>>
>>             Jim
>>
>>
>>
>>                 Thanks in advance!
>>
>>
>>             --     James W. MacDonald, M.S.
>>             Biostatistician
>>             University of Washington
>>             Environmental and Occupational Health Sciences
>>             4225 Roosevelt Way NE, # 100
>>             Seattle WA 98105-6099
>>
>>
>>
>>
>>         --         Sam McInturf
>>
>>
>>     --     James W. MacDonald, M.S.
>>     Biostatistician
>>     University of Washington
>>     Environmental and Occupational Health Sciences
>>     4225 Roosevelt Way NE, # 100
>>     Seattle WA 98105-6099
>>
>>     ______________________________**_________________
>>     Bioconductor mailing list
>>     Bioconductor@r-project.org <mailto:Bioconductor@r-**project.org<Bioconductor@r-project.org>
>> >
>>
>>     https://stat.ethz.ch/mailman/**listinfo/bioconductor<https://stat.ethz.ch/mailman/listinfo/bioconductor>
>>     Search the archives:
>>     http://news.gmane.org/gmane.**science.biology.informatics.**conductor<http://news.gmane.org/gmane.science.biology.informatics.conductor>
>>
>>
>>
>>
>> --
>> Gabriel Becker
>> Graduate Student
>> Statistics Department
>> University of California, Davis
>>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
>


-- 
Gabriel Becker
Graduate Student
Statistics Department
University of California, Davis

	[[alternative HTML version deleted]]

