[Bioc-sig-seq] ChIPpeakAnno annotatePeakInBatch problems in R2.13/ChIPpeakAnno_1.8.0 and R2.13/ChIPpeakAnno_2.0.2
Ou, Jianhong
Jianhong.Ou at umassmed.edu
Thu Sep 1 16:09:54 CEST 2011
Hi Sonia,
Could you tell me the details about how to generate the annotation file Annots/UCSC_knownGene.hg19.bed?
Yours sincerely,
Jianhong Ou
jianhong.ou at umassmed.edu
On Aug 31, 2011, at 4:17 PM, Zhu, Lihua (Julie) wrote:
>
> ------ Forwarded Message
> From: Sonia Leach <sonia.leach at gmail.com>
> Date: Wed, 31 Aug 2011 15:52:43 -0400
> To: "bioc-sig-sequencing at r-project.org" <bioc-sig-sequencing at r-project.org>
> Subject: [Bioc-sig-seq] ChIPpeakAnno annotatePeakInBatch problems in
> R2.13/ChIPpeakAnno_1.8.0 and R2.13/ChIPpeakAnno_2.0.2
>
> I had a problem with the original ChIPpeakAnno distribution
> ChIPpeakAnno_1.8.0 for R2.13 where depending on the number of spaces
> in the RangedData Annotation object sent to annotatePeakInBatch, I
> would get the error:
> Error in FUN(1L[[1L]], ...) : object 'r' not found
> (see Problem 1 below) which went away when I downloaded the
> development version R2.13/ChIPpeakAnno_2.0.2
>
> However, then I had the problem that calling annotatePeakInBatch(...,
> output="overlapping", multiple=FALSE) returned the same number of
> answers as annotatePeakInBatch(..., output="overlapping",
> multiple=TRUE) (see Problem 2 below). Obviously, the work around is to
> take one hit from among the multiples returned but this should be
> fixed.
>
> The annotation file I used is just a bed6 dump from UCSC goldenpath.
>
> ============ problem 1:
> library(ChIPpeakAnno)
>
> myPeak = RangedData(IRanges(start = c(17208381), end = c(17208381), names =
> c("S
> ite1")),space = c("chr1"),strand = c('+'))
>
> ## This object has 25 spaces for chr1..22,X,Y,M
> UCSC = read.delim('Annots/UCSC_knownGene.hg19.bed',header=FALSE)
> UCSC_rangeD = RangedData(IRanges(start= UCSC[,2], end= UCSC[,3],
> names=UCSC[,4])
> , space=as.character(UCSC[,1]),strand=UCSC[,6])
>
> ## This object has just 1 space but the same data as UCSC_rangedD[868,]
> feature = RangedData(IRanges(start = c(17066767), end = c(17267729), names =
> c("
> Site1")),space = c("chr1"),strand = c('+'))
>
> ## with UCSC_rangeD[868,], gives error in R2.13/ChIPpeakAnno_1.8.0
> ## Error in FUN(1L[[1L]], ...) : object 'r' not found
> annotation = annotatePeakInBatch(myPeak, AnnotationData=UCSC_rangeD[868,],
> outpu
> t="overlapping", maxgap=0, multiple=FALSE)
>
> ## with 1-space feature, no error
> annotation = annotatePeakInBatch(myPeak, AnnotationData=feature,
> output="overlap
> ping", maxgap=0, multiple=FALSE)
>
> <sorry, I no longer have the session info for this run - but it is the
> basic R2.13 install plus biocLite(ChIPpeakAnno), and should have the
> same versions as the session info shown for problem 2 below, minus the
> new dev version for ChIPpeakAnno (i.e. everything the same as below,
> except ChIPpeakAnno_2.0.2.tar.gz, gplots_2.8.0.tar.gz,
> caTools_1.12.tar.gz, gdata_2.8.2.tar.gz, gtools_2.6.2.tar.gz)
>>
>
> ======== Problem 2
> R version 2.13.0 (2011-04-13)
> Copyright (C) 2011 The R Foundation for Statistical Computing
> ISBN 3-900051-07-0
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
>> library(ChIPpeakAnno)
> Warning message:
> replacing previous import 'space' when loading 'IRanges'
>> UCSC = read.delim('Annots/UCSC_knownGene.hg19.bed',header=FALSE)
>> UCSC_rangeD = RangedData(IRanges(start= UCSC[,2], end= UCSC[,3],
> names=UCSC[,4]), space=as.character(UCSC[,1]),strand=UCSC[,6])
>> data = unique(read.table(file[i], sep="\t", header=FALSE))
>> ids = sub("ID=(\\d+);.+", "ID\\1", data[,9], perl=TRUE)
>> data_rangeD = RangedData(IRanges(start=data$V4, end=data$V5,
> names=paste(ids,data$V3, sep="_")), space=data$V1, strand="+")
>> dim(data_rangeD)
> [1] 19501 1
>> annotationU = annotatePeakInBatch(data_rangeD, AnnotationData=UCSC_rangeD, out
> put="overlapping", maxgap=0, multiple=FALSE)
>> dim(annotationU)
> [1] 16777 9
>> annotationU = annotatePeakInBatch(data_rangeD, AnnotationData=UCSC_rangeD, out
> put="overlapping", maxgap=0, multiple=TRUE)
>> dim(annotationU)
> [1] 16777 9
>> sessionInfo()
> R version 2.13.0 (2011-04-13)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] grid stats graphics grDevices utils datasets methods
> [8] base
>
> other attached packages:
> [1] ChIPpeakAnno_2.0.2 gplots_2.8.0
> [3] caTools_1.12 bitops_1.0-4.1
> [5] gdata_2.8.2 gtools_2.6.2
> [7] limma_3.8.3 org.Hs.eg.db_2.5.0
> [9] GO.db_2.5.0 RSQLite_0.9-4
> [11] DBI_0.2-5 AnnotationDbi_1.14.1
> [13] BSgenome.Ecoli.NCBI.20080805_1.3.17 BSgenome_1.20.0
> [15] GenomicRanges_1.4.8 Biostrings_2.20.2
> [17] IRanges_1.10.6 multtest_2.8.0
> [19] Biobase_2.12.2 biomaRt_2.8.1
>
> loaded via a namespace (and not attached):
> [1] MASS_7.3-12 RCurl_1.6-9 splines_2.13.0 survival_2.36-5
> [5] XML_3.4-2
>>
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
> ------ End of Forwarded Message
>
More information about the Bioc-sig-sequencing
mailing list