[BioC] ChIPpeakAnno 2.12.2 bug
Zinkgraf, Matthew S -FS
mszinkgraf at fs.fed.us
Mon Aug 18 20:49:07 CEST 2014
Hi
Thanks for working on this but I am still getting the same result even with 2.13.1. The annotatePeakInBatch is not assigning peaks to features correctly when genes are located on the negative strand. Below is an example that produces an incorrect result when using annotatePeakInBatch. I think the problem is located in the output= "nearestStart" because "overlapping" produces the correct result. Specifically the second line of the annPeak output should be assigning peak1 to overlapEnd instead of overlapStart because the gene feature is located on the negative strand.
Matt
> library(ChIPpeakAnno)
>
>
> peak<-RangedData(IRanges(start=as.numeric(as.character(c(1650,2806,8361))), end=as.numeric(as.character(c(1860,3006,8591)))), space=c("Chr01","Chr01","Chr01"),strand=as.integer(1))
> row.names(peak)<-c("peak1","peak2","peak3")
>
>
> #load subset of data from Populus v210 gff3 file
> genome<-as.data.frame(rbind(c("Chr01","phytozome8_0", "gene",1660,2502, ".", "-", ".","Potri.001G000100"),c("Chr01", "phytozome8_0", "gene",2906,6646, ".", "-", ".","Potri.001G000200"),c("Chr01", "phytozome8_0", "gene",8391,8860, ".", "+", ".","Potri.001G000300")))
>
>
> GENOME<-GFF2RangedData(genome)
Warning message:
In DataFrame(...) : NAs introduced by coercion
> row.names(GENOME)<-genome[,9]
>
> annPeak = annotatePeakInBatch(peak, AnnotationData=GENOME, PeakLocForDistance ="middle",output="both")
Warning messages:
1: In mapply(FUN = f, ..., SIMPLIFY = FALSE) : NAs introduced by coercion
2: In mapply(FUN = f, ..., SIMPLIFY = FALSE) : NAs introduced by coercion
>
> View(annPeak)
space
start
end
width
names
peak
strand
feature
start_position
end_position
insideFeature
distancetoFeature
shortestDistance
Chr01
8361
8591
231
peak3 Potri.001G000300
peak3
+
Potri.001G000300
8391
8860
overlapStart
85
30
Chr01
1650
1860
211
peak1 Potri.001G000100
peak1
-
Potri.001G000100
1660
2502
overlapStart
747
10
Chr01
2806
3006
201
peak2 Potri.001G000100
peak2
-
Potri.001G000100
1660
2502
downstream
-404
304
Chr01
2806
3006
201
peak2 Potri.001G000200
peak2
-
Potri.001G000200
2906
6646
overlapEnd
3740
100
>
> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=C LC_COLLATE=C
[5] LC_MONETARY=C LC_MESSAGES=C
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=C LC_IDENTIFICATION=C
attached base packages:
[1] grid parallel stats graphics grDevices utils
[7] datasets methods base
other attached packages:
[1] ChIPpeakAnno_2.13.1 Biostrings_2.32.1
[3] XVector_0.4.0 IRanges_1.22.10
[5] biomaRt_2.20.0 VennDiagram_1.6.7
[7] GOSemSim_1.22.0 Rcpp_0.11.2
[9] BiocInstaller_1.14.2 edgeR_3.6.7
[11] limma_3.20.8 hash_2.2.6
[13] GSEABase_1.26.0 annotate_1.42.1
[15] GOstats_2.30.0 graph_1.42.0
[17] Category_2.30.0 GO.db_2.14.0
[19] RSQLite_0.11.4 DBI_0.2-7
[21] Matrix_1.1-4 AnnotationDbi_1.26.0
[23] GenomeInfoDb_1.0.2 Biobase_2.24.0
[25] BiocGenerics_0.10.0
loaded via a namespace (and not attached):
[1] AnnotationForge_1.6.1 BBmisc_1.7
[3] BSgenome_1.32.0 BatchJobs_1.3
[5] BiocParallel_0.6.1 GenomicAlignments_1.0.5
[7] GenomicFeatures_1.16.2 GenomicRanges_1.16.4
[9] MASS_7.3-33 RBGL_1.40.0
[11] RCurl_1.95-4.3 Rsamtools_1.16.1
[13] XML_3.98-1.1 bitops_1.0-6
[15] brew_1.0-6 checkmate_1.2
[17] codetools_0.2-8 digest_0.6.4
[19] fail_1.2 foreach_1.4.2
[21] genefilter_1.46.1 iterators_1.0.7
[23] lattice_0.20-29 multtest_2.20.0
[25] rtracklayer_1.24.2 sendmailR_1.1-2
[27] splines_3.1.0 stats4_3.1.0
[29] stringr_0.6.2 survival_2.37-7
[31] tools_3.1.0 xtable_1.7-3
[33] zlibbioc_1.10.0
>
From: Zhu, Lihua (Julie) [mailto:Julie.Zhu at umassmed.edu]
Sent: Wednesday, August 13, 2014 2:47 PM
To: Zinkgraf, Matthew S -FS
Cc: bioconductor at r-project.org
Subject: Re: ChIPpeakAnno 2.12.2 bug
Matt,
Many thanks for helping us identifying the bug introduced in 2.12.2! The bug has been fixed in version 2.13.1. Please let me know if you have any issues.
For future correspondence, could you please keep the discussion in the bioconductor list so others can contribute or benefit? Thanks!
Best regards,
Julie
On 8/13/14 1:18 PM, "Zinkgraf, Matthew S -FS" <mszinkgraf at fs.fed.us> wrote:
Hello Julie
I recently updated ChIPpeakAnno from version 2.10 to 2.12.2 and I am getting incorrect results from annotatePeakInBatch.
More specifically the output column 'insideFeature' is not taking feature strand into account when it is assigning peaks to gene features, the function thinks all gene features are on the positive strand even though my genome ranged data contains strand information. Also all other calculations from annotatePeakInBatch gives the same results as in 2.10.
Please let me know if you would like more information about the bug I am getting.
Thanks
Matt
---
Matthew S. Zinkgraf, PhD
Postdoctoral Researcher
USDA, Forest Service
Davis, CA 95618
530-759-1739
mszinkgraf at fs.fed.us <mailto:mszinkgraf at fs.fed.us>
This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately.
[[alternative HTML version deleted]]
More information about the Bioconductor
mailing list