[BioC] ChIPpeakAnno as.data.frame error

Mark Cowley m.cowley at garvan.org.au
Wed Mar 7 23:53:55 CET 2012


Dear list,
as usual this was a PEBKAC error.
My package was missing the importMethodsFrom(IRanges, as.data.frame) in NAMESPACE, and Imports: IRanges in DESCRIPTION. What threw me off was by taking working code from an example within ChIPpeakAnno which worked because that package had imports(IRanges) in its NAMESPACE...

Mark

On 07/03/2012, at 9:57 AM, Mark Cowley wrote:

> Dear Julie,
> I already had ChIPpeakAnno in the Depends section of DESCRIPTION. here's my search() output after loading my 'macsR' package & thus the state of the search() when my function gets run:
>> search()
> [1] ".GlobalEnv"                          
> [2] "package:macsR"                       
> [3] "package:ChIPpeakAnno"                
> [4] "package:limma"                       
> [5] "package:org.Hs.eg.db"                
> [6] "package:GO.db"                       
> [7] "package:RSQLite"                     
> [8] "package:DBI"                         
> [9] "package:AnnotationDbi"               
> [10] "package:BSgenome.Ecoli.NCBI.20080805"
> [11] "package:BSgenome"                    
> [12] "package:GenomicRanges"               
> [13] "package:Biostrings"                  
> [14] "package:IRanges"                     
> [15] "package:multtest"                    
> [16] "package:Biobase"                     
> [17] "package:biomaRt"                     
> [18] "package:plyr"                        
> [19] "package:stats"                       
> [20] "package:graphics"                    
> [21] "package:grDevices"                   
> [22] "package:utils"                       
> [23] "package:datasets"                    
> [24] "package:methods"                     
> [25] "Autoloads"                           
> [26] "package:base"                     
> 
> I did add another library(ChIPpeakAnno) command in the relevant function, but since it was already loaded, there's no effect.
> The error is due to not finding the S4 method defined in IRanges. I tried moving IRanges up the search() list, but it's a dependency of too many packaged to be moved.
> 
> I'll upgrade R (I've been putting this off for too long) and will report back
> 
> cheers,
> Mark
> 
> On 07/03/2012, at 2:37 AM, Zhu, Lihua (Julie) wrote:
> 
>> Dear Mark,
>> 
>> Sorry that I meant adding library(ChIPpeakAnno) after importing all other
>> packages. Also, I noticed that you are using an old version of ChIPpeakAnno.
>> I am wondering what happens if you update your R and ChIPpeakAnno.
>> 
>> Best regards,
>> 
>> Julie
>> 
>> On 3/6/12 9:49 AM, "Julie Zhu" <julie.zhu at umassmed.edu> wrote:
>> 
>>> Dear Mark,
>>> 
>>> I am wondering what happens if you add library(RangedData) after all the
>>> packages have been imported.
>>> 
>>> Best regards,
>>> 
>>> Julie
>>> 
>>> 
>>> On 3/5/12 10:31 PM, "Mark Cowley" <m.cowley at garvan.org.au> wrote:
>>> 
>>>> Dear list,
>>>> i've had an odd error for a few days now, which i'm really struggling to
>>>> pinpoint. 
>>>> Essentially, i'm trying to annotate ChIP-Seq peaks using ChIPpeakAnno. after
>>>> doing annotatePeakInBatch, I convert to data.frame & I get this error:
>>>> 
>>>>> peaks.bed <- read.delim(bed.file, header=F)
>>>>> peaks.RangedData <- BED2RangedData(peaks.bed, FALSE)
>>>>> peaks.annot <- annotatePeakInBatch(peaks.RangedData, AnnotationData =
>>>>> annoData, output="both", multiple=TRUE, maxgap=0)
>>>>> peaks.annot <- as.data.frame(peaks.annot)
>>>> Error in as.data.frame.default(peaks.annot) :
>>>> cannot coerce class 'structure("RangedData", package = "IRanges")' into a
>>>> data.frame
>>>> 
>>>> A combination of debug, getMethod & traceback (see below) prove that my
>>>> peaks.annot object is of class RangedData, the S4 method "as.data.frame" for
>>>> "RangedData" exists, but is not actually called; instead
>>>> as.data.frame.default
>>>> is called, causing the error.
>>>> Oddly enough, the error only occurs when run inside my function, which i
>>>> think
>>>> implies that my package NAMESPACE is incorrectly setup. To this end, I put
>>>> this code in its own package, deliberately do not import any code from my
>>>> other [potentially dodgy] packages & I still get this error.
>>>> 
>>>> My code & the debug trace/getMethod/traceback is below & any advice would be
>>>> greatly recommended.
>>>> cheers,
>>>> Mark
>>>> 
>>>> 
>>>> 
>> #############################################################################>>
>> #
>>>> ##
>>>> The function:
>>>> #' annotate MACS peaks using ChIPpeakAnno
>>>> #' 
>>>> #' This compares the BED regions from running MACS to the nearest ENSEMBL
>>>> #' Gene, and to CpG islands. This uses
>>>> \code{\link[ChIPpeakAnno]{annotatePeakInBatch}}
>>>> #' from ChIPpeakAnno, which has a very loose definition of close, ie
>>>> #' within 0.5Mb.
>>>> #' 
>>>> #' This uses pre-built CpG island definitions downloaded from UCSC.
>>>> #'
>>>> #' @param dir the path to a MACS result directory
>>>> #' @param genome one of \sQuote{hg18}, or \sQuote{hg19}
>>>> #' @return nothing. it writes a <prefix>_peaks_annot.xls file. To interpret
>>>> #' the new columns, see \code{\link[ChIPpeakAnno]{annotatePeakInBatch}}
>>>> #' @author Mark Cowley, 2012-03-01
>>>> #' @importFrom ChIPpeakAnno BED2RangedData annotatePeakInBatch
>>>> #' @importFrom plyr join
>>>> #' @export
>>>> #' @examples
>>>> #' \dontrun{
>>>> #' dir <- "./MACS/TAMR.H3k4Me3.vs.MCF7.H3k4Me3/"
>>>> #' macs.ChIPpeakAnno(dir, "hg19")
>>>> #' }
>>>> annotate.macs.output <- function(dir, genome=c("hg19", "hg18", "mm9", "rn4"),
>>>> tss=TRUE, cpg=TRUE) {
>>>> length(dir) == 1 && is.dir(dir) || stop("dir must be the path to a macs
>>>> result
>>>> directory.")
>>>> SUPPORTED.GENOMES <- c("hg19", "hg18")
>>>> genome <- genome[1]
>>>> genome %in% SUPPORTED.GENOMES || stop("genome is unsupported.")
>>>> 
>> #############################################################################>>
>> #
>>>> ##
>>>> saf <- getOption("stringsAsFactors")
>>>> on.exit(options(stringsAsFactors=saf))
>>>> options(stringsAsFactors=FALSE)
>>>> 
>> #############################################################################>>
>> #
>>>> ##
>>>> 
>>>> 
>> #############################################################################>>
>> #
>>>> ##
>>>> # which genome?
>>>> annoData <- switch(genome,
>>>> hg19={data(TSS.human.GRCh37);  TSS.human.GRCh37},
>>>> hg18={data(TSS.human.NCBI36);  TSS.human.NCBI36},
>>>> mm9={data(TSS.mouse.NCBIM37); TSS.mouse.NCBIM37},
>>>> rn4={data(TSS.rat.RGSC3.4);   TSS.rat.RGSC3.4},
>>>> stop("unsupported genome")
>>>> )
>>>> cpgData <- switch(genome,
>>>> hg19={data(CpG.human.GRCh37);  CpG.human.GRCh37},
>>>> hg18={data(CpG.human.NCBI36);  CpG.human.NCBI36},
>>>> mm9={data(CpG.mouse.NCBIM37); CpG.mouse.NCBIM37},
>>>> rn4={data(CpG.rat.RGSC3.4);   CpG.rat.RGSC3.4},
>>>> stop("unsupported genome")
>>>> )
>>>> 
>> #############################################################################>>
>> #
>>>> ##
>>>> 
>>>> 
>> #############################################################################>>
>> #
>>>> ##
>>>> # setup the input/output file paths
>>>> # bed.file <- "TAMR.vs.MCF7.MBD2_peaks.bed"
>>>> bed.file <- dir(dir, pattern="_peaks.bed", full=TRUE)
>>>> file.exists(bed.file) || stop("bed.file must exist")
>>>> 
>>>> # macs.peaks.file <- "TAMR.vs.MCF7.MBD2_peaks.xls"
>>>> macs.peaks.file <- rev(dir(dir, pattern="_peaks.xls", full=TRUE))[1]
>>>> file.exists(macs.peaks.file) || stop("macs.peaks.file must exist")
>>>> 
>>>> result.file <- sub("_peaks.xls", "_peaks_annot.xls", macs.peaks.file)
>>>> 
>> #############################################################################>>
>> #
>>>> ##
>>>> 
>>>> 
>> #############################################################################>>
>> #
>>>> ##
>>>> # import peaks
>>>> peaks.bed <- read.delim(bed.file, header=F)
>>>> # head(peaks.bed)
>>>> 
>>>> peaks.RangedData <- BED2RangedData(peaks.bed, FALSE)
>>>> # peaks.RangedData
>>>> 
>> #############################################################################>>
>> #
>>>> ##
>>>> 
>>>> #
>>>> # import peaks & join to closest gene and CpG island
>>>> # 
>>>> peaks.stats <- import.macs.peaks.file(macs.peaks.file)
>>>> # head(peaks.stats)
>>>> 
>>>> res <- peaks.stats
>>>> 
>>>> 
>> #############################################################################>>
>> #
>>>> ##
>>>> message("Annotating peaks to nearest ENSG TSS...")
>>>> if( tss ) {
>>>> peaks.annot <- annotatePeakInBatch(peaks.RangedData, AnnotationData =
>>>> annoData, output="both", multiple=TRUE, maxgap=0)
>>>> message("done")
>>>> 
>>>> peaks.annot <- as.data.frame(peaks.annot)
>>>> peaks.annot <- peaks.annot[,-match(c("space", "start", "end", "width",
>>>> "names"), colnames(peaks.annot))]
>>>> library(org.Hs.eg.db)
>>>> peaks.annot$GeneSymbol <- mget.chain(as.character(peaks.annot$feature),
>>>> org.Hs.egENSEMBL2EG, org.Hs.egSYMBOL)
>>>> peaks.annot <- rename.column(peaks.annot, "feature", "Ensembl ID")
>>>> peaks.annot <- rename.column(peaks.annot, "peak", "name")
>>>> peaks.annot <- move.column(peaks.annot, "strand", "insideFeature")
>>>> peaks.annot <- move.column(peaks.annot, "GeneSymbol", "start_position")
>>>> 
>>>> res <- plyr::join(peaks.stats, peaks.annot, by="name", type="full")
>>>> }
>>>> else {
>>>> message("skipping")
>>>> }
>>>> 
>> #############################################################################>>
>> #
>>>> ##
>>>> 
>>>> 
>> #############################################################################>>
>> #
>>>> ##
>>>> message("Annotating peaks to nearest CpG...")
>>>> if( cpg ) {
>>>> peaks.annot.cpg <- annotatePeakInBatch(peaks.RangedData, AnnotationData =
>>>> cpgData, output="both", multiple=T, maxgap=0)
>>>> message("done")
>>>> 
>>>> peaks.annot.cpg <- as.data.frame(peaks.annot.cpg)
>>>> peaks.annot.cpg <- peaks.annot.cpg[,-match(c("space", "start", "end",
>>>> "width",
>>>> "names"), colnames(peaks.annot.cpg))]
>>>> library(org.Hs.eg.db)
>>>> peaks.annot.cpg <- rename.column(peaks.annot.cpg, "feature", "CpG island")
>>>> peaks.annot.cpg <- rename.column(peaks.annot.cpg, "peak", "name")
>>>> peaks.annot.cpg <- move.column(peaks.annot.cpg, "strand", "insideFeature")
>>>> 
>>>> res <- plyr::join(res, peaks.annot.cpg, by="name", type="full")
>>>> }
>>>> else {
>>>> message("skipping")
>>>> }
>>>> 
>> #############################################################################>>
>> #
>>>> ##
>>>> 
>>>> write.xls(res, result.file)
>>>> }
>>>> 
>> #############################################################################>>
>> #
>>>> ##
>>>> 
>>>> 
>>>> 
>> #############################################################################>>
>> #
>>>> ##
>>>> Selection from Debug showing that the method is at least visible:
>>>> ...
>>>> Browse[2]> class(peaks.annot)
>>>> [1] "RangedData"
>>>> attr(,"package")
>>>> [1] "IRanges"
>>>> Browse[2]> getMethod("as.data.frame", "RangedData")
>>>> Method Definition:
>>>> 
>>>> function (x, row.names = NULL, optional = FALSE, ...)
>>>> {
>>>>   if (!(is.null(row.names) || is.character(row.names)))
>>>>       stop("'row.names'  must be NULL or a character vector")
>>>>   if (!missing(optional) || length(list(...)))
>>>>       warning("'optional' and arguments in '...' ignored")
>>>>   data.frame(as.data.frame(ranges(x)), as.data.frame(values(x))[-1L],
>>>>       row.names = row.names, stringsAsFactors = FALSE)
>>>> }
>>>> <environment: namespace:IRanges>
>>>> 
>>>> Signatures:
>>>>       x       
>>>> target  "RangedData"
>>>> defined "RangedData"
>>>> Browse[2]> 
>>>> Error in as.data.frame.default(peaks.annot) :
>>>> cannot coerce class 'structure("RangedData", package = "IRanges")' into a
>>>> data.frame
>>>> traceback()
>>>> 4: stop(gettextf("cannot coerce class '%s' into a data.frame",
>>>> deparse(class(x))),
>>>>      domain = NA)
>>>> 3: as.data.frame.default(peaks.annot)
>>>> 2: as.data.frame(peaks.annot)
>>>> 1: annotate.macs.output("MACS/TAMR.MBD2.vs.MCF7.MBD2")
>>>> 
>> #############################################################################>>
>> #
>>>> ##
>>>> 
>>>> sessionInfo()
>>>> R version 2.13.1 (2011-07-08)
>>>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>>> 
>>>> locale:
>>>> [1] en_AU.UTF-8/en_AU.UTF-8/C/C/en_AU.UTF-8/en_AU.UTF-8
>>>> 
>>>> attached base packages:
>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>> 
>>>> other attached packages:
>>>> [1] gdata_2.8.1                         macsR_1.0
>>>> [3] ChIPpeakAnno_1.9.0                  limma_3.8.3
>>>> [5] org.Hs.eg.db_2.5.0                  GO.db_2.5.0
>>>> [7] RSQLite_0.9-4                       DBI_0.2-5
>>>> [9] AnnotationDbi_1.14.1                BSgenome.Ecoli.NCBI.20080805_1.3.17
>>>> [11] BSgenome_1.20.0                     GenomicRanges_1.4.6
>>>> [13] Biostrings_2.20.2                   IRanges_1.10.5
>>>> [15] multtest_2.8.0                      Biobase_2.12.2
>>>> [17] biomaRt_2.8.1                       plyr_1.6
>>>> 
>>>> loaded via a namespace (and not attached):
>>>> [1] gtools_2.6.2    MASS_7.3-14     RCurl_1.6-7     splines_2.13.1
>>>> [5] survival_2.36-9 tools_2.13.1    XML_3.4-2
>>>> 
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>> 
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> 
>> 
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list