[BioC] ReportingTools and gene annotation
James W. MacDonald
jmacdon at uw.edu
Mon Feb 10 16:28:08 CET 2014
Hi Ugo,
On 2/10/2014 6:02 AM, Ugo Borello wrote:
> Good morning,
> I am using ReportingTools with DESeq2 and I am not able to add the gene
> annotation to my final report.
> I have ensembl gene id as identifiers and not entrez id!
>
> I followed Jason's suggestions as described here:
>
> http://article.gmane.org/gmane.science.biology.informatics.conductor/51995/m
> atch=
>
> But the add.anns() functions doesn't work in my hands.
>
>> mart <- useMart("ensembl",dataset="mmusculus_gene_ensembl")
>> add.anns <- function(df, mart, ...)
> + {
> + nm <- rownames(df)
> + anns <- getBM(
> + attributes = c("ensembl_gene_id", "external_gene_id", "description"),
> + filters = "ensembl_gene_id", values = nm, mart = mart)
> + anns <- anns[match(nm, anns[, 1]), ]
> + colnames(anns) <- c("ID", "Gene Symbol", "Gene Description")
> + df <- cbind(anns, df[, 2:nrow(df)])
Note that in the line above you are subsetting 'df' by column, using the
number of rows. I am not sure if you want to eliminate the first column
here (as you are using the rownames to annotate, so I don't know what
the first column contains). But it is simpler to eliminate the first
column than to keep the 2:ncol(columns):
df <- cbind(anns, df[,-1])
Best,
Jim
> + rownames(df) <- nm
> + df
> + }
>> publish(dds, des2Report, factor= colData(dds)$condition, .modifyDF =
> list(add.anns, modifyReportDF), mart = mart) ## dds is the DESeqDataSet
> object
>
> Show Traceback
> Rerun with Debug
> Error in `[.data.frame`(df, , 2:nrow(df)) : undefined columns selected
>
> I also tried this:
>> publish(dds, des2Report, factor= colData(dds)$condition, .modifyDF =
> list(add.anns, modifyReportDF), mart = mart, df= counts(dds)) # dds is the
> DESeqDataSet object
>
> Show Traceback
> Rerun with Debug
> Error in df[, 2:nrow(df)] : subscript out of bounds
>
>
> What am I doing wrong?
>
> Is there a simple way of adding my annotation to the HTML report?
>
> ENSEMBL ENTREZID SYMBOL GENENAME
> 1 ENSMUSG00000000001 14679 Gnai guanine nucleotide binding protein
> (G protein), alpha inhibiting 3
> 2 ENSMUSG00000000028 12544 Cdc45 cell division cycle 45
> 3 ENSMUSG00000000031 NA NA NA
> 4 ENSMUSG00000000037 107815 Scml2 sex comb on midleg-like 2 (Drosophila)
> 5 ENSMUSG00000000049 11818 Apoh apolipoprotein H
> 6 ENSMUSG00000000056 67608 Narf nuclear prelamin A recognition
> factor
>
>
> Thank you
>
> Ugo
>
>
>> sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets methods
> base
>
> other attached packages:
> [1] ReportingTools_2.2.0 knitr_1.5 org.Mm.eg.db_2.10.1
> RSQLite_0.11.4 DBI_0.2-7
> [6] AnnotationDbi_1.24.0 Biobase_2.22.0 DESeq2_1.2.9
> RcppArmadillo_0.4.000.2 Rcpp_0.10.6
> [11] GenomicRanges_1.14.4 XVector_0.2.0 IRanges_1.20.6
> BiocGenerics_0.8.0 biomaRt_2.18.0
>
> loaded via a namespace (and not attached):
> [1] annotate_1.40.0 AnnotationForge_1.4.4 Biostrings_2.30.1
> biovizBase_1.10.7 bitops_1.0-6
> [6] BSgenome_1.30.0 Category_2.28.0 cluster_1.14.4
> colorspace_1.2-4 dichromat_2.0-0
> [11] digest_0.6.4 edgeR_3.4.2 evaluate_0.5.1
> formatR_0.10 Formula_1.1-1
> [16] genefilter_1.44.0 GenomicFeatures_1.14.2 ggbio_1.10.10
> ggplot2_0.9.3.1 GO.db_2.10.1
> [21] GOstats_2.28.0 graph_1.40.1 grid_3.0.2
> gridExtra_0.9.1 GSEABase_1.24.0
> [26] gtable_0.1.2 Hmisc_3.14-0 hwriter_1.3
> labeling_0.2 lattice_0.20-24
> [31] latticeExtra_0.6-26 limma_3.18.10 locfit_1.5-9.1
> MASS_7.3-29 Matrix_1.1-2
> [36] munsell_0.4.2 PFAM.db_2.10.1 plyr_1.8
> proto_0.3-10 R.methodsS3_1.6.1
> [41] R.oo_1.17.0 R.utils_1.28.4 RBGL_1.38.0
> RColorBrewer_1.0-5 RCurl_1.95-4.1
> [46] reshape2_1.2.2 Rsamtools_1.14.2 rtracklayer_1.22.3
> scales_0.2.3 splines_3.0.2
> [51] stats4_3.0.2 stringr_0.6.2 survival_2.37-7
> tools_3.0.2 VariantAnnotation_1.8.10
> [56] XML_3.95-0.2 xtable_1.7-1 zlibbioc_1.8.0
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list