[Bioc-sig-seq] ChIPpeakAnno annotatePeakInBatch problems in R2.13/ChIPpeakAnno_1.8.0 and R2.13/ChIPpeakAnno_2.0.2

Thu Sep 1 16:09:54 CEST 2011

Hi Sonia,

Could you tell me the details about how to generate the annotation file Annots/UCSC_knownGene.hg19.bed?

Yours sincerely,

Jianhong Ou

jianhong.ou at umassmed.edu

On Aug 31, 2011, at 4:17 PM, Zhu, Lihua (Julie) wrote:

> 
> ------ Forwarded Message
> From: Sonia Leach <sonia.leach at gmail.com>
> Date: Wed, 31 Aug 2011 15:52:43 -0400
> To: "bioc-sig-sequencing at r-project.org" <bioc-sig-sequencing at r-project.org>
> Subject: [Bioc-sig-seq] ChIPpeakAnno annotatePeakInBatch problems in
> R2.13/ChIPpeakAnno_1.8.0 and R2.13/ChIPpeakAnno_2.0.2
> 
> I had a problem with the original ChIPpeakAnno distribution
> ChIPpeakAnno_1.8.0 for R2.13 where depending on the number of spaces
> in the RangedData Annotation object sent to annotatePeakInBatch, I
> would get the error:
>       Error in FUN(1L[[1L]], ...) : object 'r' not found
> (see Problem 1 below) which went away when I downloaded the
> development version R2.13/ChIPpeakAnno_2.0.2
> 
> However, then I had the problem that calling annotatePeakInBatch(...,
> output="overlapping", multiple=FALSE) returned the same number of
> answers as annotatePeakInBatch(..., output="overlapping",
> multiple=TRUE) (see Problem 2 below). Obviously, the work around is to
> take one hit from among the multiples returned but this should be
> fixed.
> 
> The annotation file I used is just a bed6 dump from UCSC goldenpath.
> 
> ============ problem 1:
> library(ChIPpeakAnno)
> 
> myPeak = RangedData(IRanges(start = c(17208381), end = c(17208381), names =
> c("S
> ite1")),space = c("chr1"),strand = c('+'))
> 
> ## This object has 25 spaces for chr1..22,X,Y,M
> UCSC = read.delim('Annots/UCSC_knownGene.hg19.bed',header=FALSE)
> UCSC_rangeD = RangedData(IRanges(start= UCSC[,2], end= UCSC[,3],
> names=UCSC[,4])
> , space=as.character(UCSC[,1]),strand=UCSC[,6])
> 
> ## This object has just 1 space but the same data as UCSC_rangedD[868,]
> feature = RangedData(IRanges(start = c(17066767), end = c(17267729), names =
> c("
> Site1")),space = c("chr1"),strand = c('+'))
> 
> ## with UCSC_rangeD[868,], gives error in R2.13/ChIPpeakAnno_1.8.0
> ##         Error in FUN(1L[[1L]], ...) : object 'r' not found
> annotation = annotatePeakInBatch(myPeak, AnnotationData=UCSC_rangeD[868,],
> outpu
> t="overlapping", maxgap=0, multiple=FALSE)
> 
> ## with 1-space feature, no error
> annotation = annotatePeakInBatch(myPeak, AnnotationData=feature,
> output="overlap
> ping", maxgap=0, multiple=FALSE)
> 
> <sorry, I no longer have the session info for this run - but it is the
> basic R2.13 install plus biocLite(ChIPpeakAnno), and should have the
> same versions as the session info shown for problem 2 below, minus the
> new dev version for ChIPpeakAnno (i.e. everything the same as below,
> except ChIPpeakAnno_2.0.2.tar.gz, gplots_2.8.0.tar.gz,
> caTools_1.12.tar.gz, gdata_2.8.2.tar.gz, gtools_2.6.2.tar.gz)
>> 
> 
> ======== Problem 2
> R version 2.13.0 (2011-04-13)
> Copyright (C) 2011 The R Foundation for Statistical Computing
> ISBN 3-900051-07-0
> Platform: x86_64-unknown-linux-gnu (64-bit)
> 
>> library(ChIPpeakAnno)
> Warning message:
> replacing previous import 'space' when loading 'IRanges'
>> UCSC = read.delim('Annots/UCSC_knownGene.hg19.bed',header=FALSE)
>> UCSC_rangeD = RangedData(IRanges(start= UCSC[,2], end= UCSC[,3],
> names=UCSC[,4]), space=as.character(UCSC[,1]),strand=UCSC[,6])
>> data = unique(read.table(file[i], sep="\t", header=FALSE))
>> ids = sub("ID=(\\d+);.+", "ID\\1", data[,9], perl=TRUE)
>> data_rangeD = RangedData(IRanges(start=data$V4, end=data$V5,
> names=paste(ids,data$V3, sep="_")), space=data$V1, strand="+")
>> dim(data_rangeD)
> [1] 19501     1
>> annotationU = annotatePeakInBatch(data_rangeD, AnnotationData=UCSC_rangeD, out
> put="overlapping", maxgap=0, multiple=FALSE)
>> dim(annotationU)
> [1] 16777     9
>> annotationU = annotatePeakInBatch(data_rangeD, AnnotationData=UCSC_rangeD, out
> put="overlapping", maxgap=0, multiple=TRUE)
>> dim(annotationU)
> [1] 16777     9
>> sessionInfo()
> R version 2.13.0 (2011-04-13)
> Platform: x86_64-unknown-linux-gnu (64-bit)
> 
> locale:
> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
> [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] grid      stats     graphics  grDevices utils     datasets  methods
> [8] base
> 
> other attached packages:
> [1] ChIPpeakAnno_2.0.2                  gplots_2.8.0
> [3] caTools_1.12                        bitops_1.0-4.1
> [5] gdata_2.8.2                         gtools_2.6.2
> [7] limma_3.8.3                         org.Hs.eg.db_2.5.0
> [9] GO.db_2.5.0                         RSQLite_0.9-4
> [11] DBI_0.2-5                           AnnotationDbi_1.14.1
> [13] BSgenome.Ecoli.NCBI.20080805_1.3.17 BSgenome_1.20.0
> [15] GenomicRanges_1.4.8                 Biostrings_2.20.2
> [17] IRanges_1.10.6                      multtest_2.8.0
> [19] Biobase_2.12.2                      biomaRt_2.8.1
> 
> loaded via a namespace (and not attached):
> [1] MASS_7.3-12     RCurl_1.6-9     splines_2.13.0  survival_2.36-5
> [5] XML_3.4-2
>> 
> 
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
> 
> ------ End of Forwarded Message
>