[Bioc-sig-seq] ChIPpeakAnno annotatePeakInBatch problems in R2.13/ChIPpeakAnno_1.8.0 and R2.13/ChIPpeakAnno_2.0.2

Sonia Leach sonia.leach at gmail.com
Thu Sep 1 17:12:42 CEST 2011


Oh, I thought I mentioned it was just a straight table dump from UCSC
goldenpath, but to be more specific:

1. go to http://genome.ucsc.edu/cgi-bin/hgGateway
2. Click the 'Tables' tab
3. Make sure you use genome:Human, assembly:Feb 2009, group:Genes and
Gene Prediction Tracks, track:UCSC Genes, table: knownGene
4. For output format: BED - browser extensible data, fill in output
file with "UCSC_knownGene.hg19.bed"
5. Click 'get output' and choose 'Create one BED record per 'Whole
Gene'' and then hit getBed.

Thanks for looking into this.
Sonia

On Thu, Sep 1, 2011 at 7:08 AM, Ou, Jianhong <Jianhong.Ou at umassmed.edu> wrote:
> Hi Sonia,
>
> Could you tell me the details about how to generate the annotation file Annots/UCSC_knownGene.hg19.bed?
>
> Yours sincerely,
>
> Jianhong Ou
>
> jianhong.ou at umassmed.edu
>
>
> On Aug 31, 2011, at 4:17 PM, Zhu, Lihua (Julie) wrote:
>
>>
>> ------ Forwarded Message
>> From: Sonia Leach <sonia.leach at gmail.com>
>> Date: Wed, 31 Aug 2011 15:52:43 -0400
>> To: "bioc-sig-sequencing at r-project.org" <bioc-sig-sequencing at r-project.org>
>> Subject: [Bioc-sig-seq] ChIPpeakAnno annotatePeakInBatch problems in
>> R2.13/ChIPpeakAnno_1.8.0 and R2.13/ChIPpeakAnno_2.0.2
>>
>> I had a problem with the original ChIPpeakAnno distribution
>> ChIPpeakAnno_1.8.0 for R2.13 where depending on the number of spaces
>> in the RangedData Annotation object sent to annotatePeakInBatch, I
>> would get the error:
>>         Error in FUN(1L[[1L]], ...) : object 'r' not found
>> (see Problem 1 below) which went away when I downloaded the
>> development version R2.13/ChIPpeakAnno_2.0.2
>>
>> However, then I had the problem that calling annotatePeakInBatch(...,
>> output="overlapping", multiple=FALSE) returned the same number of
>> answers as annotatePeakInBatch(..., output="overlapping",
>> multiple=TRUE) (see Problem 2 below). Obviously, the work around is to
>> take one hit from among the multiples returned but this should be
>> fixed.
>>
>> The annotation file I used is just a bed6 dump from UCSC goldenpath.
>>
>> ============ problem 1:
>> library(ChIPpeakAnno)
>>
>> myPeak = RangedData(IRanges(start = c(17208381), end = c(17208381), names =
>> c("S
>> ite1")),space = c("chr1"),strand = c('+'))
>>
>> ## This object has 25 spaces for chr1..22,X,Y,M
>> UCSC = read.delim('Annots/UCSC_knownGene.hg19.bed',header=FALSE)
>> UCSC_rangeD = RangedData(IRanges(start= UCSC[,2], end= UCSC[,3],
>> names=UCSC[,4])
>> , space=as.character(UCSC[,1]),strand=UCSC[,6])
>>
>> ## This object has just 1 space but the same data as UCSC_rangedD[868,]
>> feature = RangedData(IRanges(start = c(17066767), end = c(17267729), names =
>> c("
>> Site1")),space = c("chr1"),strand = c('+'))
>>
>> ## with UCSC_rangeD[868,], gives error in R2.13/ChIPpeakAnno_1.8.0
>> ##         Error in FUN(1L[[1L]], ...) : object 'r' not found
>> annotation = annotatePeakInBatch(myPeak, AnnotationData=UCSC_rangeD[868,],
>> outpu
>> t="overlapping", maxgap=0, multiple=FALSE)
>>
>> ## with 1-space feature, no error
>> annotation = annotatePeakInBatch(myPeak, AnnotationData=feature,
>> output="overlap
>> ping", maxgap=0, multiple=FALSE)
>>
>> <sorry, I no longer have the session info for this run - but it is the
>> basic R2.13 install plus biocLite(ChIPpeakAnno), and should have the
>> same versions as the session info shown for problem 2 below, minus the
>> new dev version for ChIPpeakAnno (i.e. everything the same as below,
>> except ChIPpeakAnno_2.0.2.tar.gz, gplots_2.8.0.tar.gz,
>> caTools_1.12.tar.gz, gdata_2.8.2.tar.gz, gtools_2.6.2.tar.gz)
>>>
>>
>> ======== Problem 2
>> R version 2.13.0 (2011-04-13)
>> Copyright (C) 2011 The R Foundation for Statistical Computing
>> ISBN 3-900051-07-0
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>>> library(ChIPpeakAnno)
>> Warning message:
>> replacing previous import 'space' when loading 'IRanges'
>>> UCSC = read.delim('Annots/UCSC_knownGene.hg19.bed',header=FALSE)
>>> UCSC_rangeD = RangedData(IRanges(start= UCSC[,2], end= UCSC[,3],
>> names=UCSC[,4]), space=as.character(UCSC[,1]),strand=UCSC[,6])
>>> data = unique(read.table(file[i], sep="\t", header=FALSE))
>>> ids = sub("ID=(\\d+);.+", "ID\\1", data[,9], perl=TRUE)
>>> data_rangeD = RangedData(IRanges(start=data$V4, end=data$V5,
>> names=paste(ids,data$V3, sep="_")), space=data$V1, strand="+")
>>> dim(data_rangeD)
>> [1] 19501     1
>>> annotationU = annotatePeakInBatch(data_rangeD, AnnotationData=UCSC_rangeD, out
>> put="overlapping", maxgap=0, multiple=FALSE)
>>> dim(annotationU)
>> [1] 16777     9
>>> annotationU = annotatePeakInBatch(data_rangeD, AnnotationData=UCSC_rangeD, out
>> put="overlapping", maxgap=0, multiple=TRUE)
>>> dim(annotationU)
>> [1] 16777     9
>>> sessionInfo()
>> R version 2.13.0 (2011-04-13)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>> [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] grid      stats     graphics  grDevices utils     datasets  methods
>> [8] base
>>
>> other attached packages:
>> [1] ChIPpeakAnno_2.0.2                  gplots_2.8.0
>> [3] caTools_1.12                        bitops_1.0-4.1
>> [5] gdata_2.8.2                         gtools_2.6.2
>> [7] limma_3.8.3                         org.Hs.eg.db_2.5.0
>> [9] GO.db_2.5.0                         RSQLite_0.9-4
>> [11] DBI_0.2-5                           AnnotationDbi_1.14.1
>> [13] BSgenome.Ecoli.NCBI.20080805_1.3.17 BSgenome_1.20.0
>> [15] GenomicRanges_1.4.8                 Biostrings_2.20.2
>> [17] IRanges_1.10.6                      multtest_2.8.0
>> [19] Biobase_2.12.2                      biomaRt_2.8.1
>>
>> loaded via a namespace (and not attached):
>> [1] MASS_7.3-12     RCurl_1.6-9     splines_2.13.0  survival_2.36-5
>> [5] XML_3.4-2
>>>
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>> ------ End of Forwarded Message
>>
>
>
>



More information about the Bioc-sig-sequencing mailing list