[BioC] Query on ChipPeakAnno: AnnotatePeakinBatch input

Jailwala, Parthav (NIH/NCI) [C] parthav.jailwala at nih.gov
Tue Dec 3 16:10:19 CET 2013


Julie,

Thanks for your response. Attached is my input file of 'peaks' (2070 lincRNA_mergedGTF.txt), the features annotation file that I am using (23188PCGgroupEnsemblGTFwithstrand.txt: it has strand information coded as +,-).

Also attached is the output file that shows the strand information as all positive (2070lincRNAmergedGTF.annout)

Here is the sessionInfo()


> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel grid stats graphics grDevices utils datasets
[8] methods base

other attached packages:
[1] ChIPpeakAnno_2.10.0 GenomicFeatures_1.14.2
[3] limma_3.18.3 org.Hs.eg.db_2.10.1
[5] GO.db_2.10.1 RSQLite_0.11.4
[7] DBI_0.2-7 AnnotationDbi_1.24.0
[9] BSgenome.Ecoli.NCBI.20080805_1.3.17 BSgenome_1.30.0
[11] GenomicRanges_1.14.3 Biostrings_2.30.1
[13] XVector_0.2.0 IRanges_1.20.6
[15] multtest_2.18.0 Biobase_2.22.0
[17] biomaRt_2.18.0 BiocGenerics_0.8.0
[19] VennDiagram_1.6.5

loaded via a namespace (and not attached):
[1] MASS_7.3-29 RCurl_1.95-4.1 Rsamtools_1.14.2 XML_3.98-1.1
[5] bitops_1.0-6 rtracklayer_1.22.0 splines_3.0.2 stats4_3.0.2
[9] survival_2.37-4 tools_3.0.2 zlibbioc_1.8.0
>


On 12/3/13 9:52 AM, "Zhu, Lihua (Julie)" <Julie.Zhu at umassmed.edu<mailto:Julie.Zhu at umassmed.edu>> wrote:

Parthav,

Could you please send us the code snippets,  a test bed file and the
sessionInfo? Thanks!

Best regards,

Julie


On 12/3/13 9:43 AM, "Jailwala, Parthav (NIH/NCI) [C]"
<parthav.jailwala at nih.gov<mailto:parthav.jailwala at nih.gov>> wrote:

Hi Julie,
I have a strand issue with using the AnnotatePeakinBatch function within the
ChipPeakAnno package and am reaching out to you to see if you can help to
figure out what is the issue.
I am trying to find the distance to the TSS , for a set of lincRNA. To do
this, I am using my own BED file of the 'background' or Annotation. The BED
file looks like this:
Y       597158  623056  Ddx3y   -
Y       346986  365290  Eif2s3y +
Y       2118049 2129045 Gm10256 +
Y       2156899 2168120 Gm10352 +
Y       1976249 1976584 Gm16501 -
Y       2390390 2398856 Gm3376  +
As you can see, there is now header row for the column names as well as, the
fifth column is the strand of the feature.
Now, when I run the command, in the output file, the 'Strand' column is always
+ve (Always + eventhough the feature is on ­ve strand).
Here is a sample from the output file:
"","space","start","end","width","names","peak","strand","feature","start_posi
tion","end_position","insid
eFeature","distancetoFeature","shortestDistance","fromOverlappingOrNearest"
"1","1",9708702,9782003,73302,"0001
23152","0001","+","23152",9708703,9738463,"includeFeature",-1,1,"Near
estStart"
"2","1",134088012,134153958,65947,"0002
22624","0002","+","22624",134088013,134153958,"overlapStart",-1,0
,"NearestStart"
"3","1",171899539,172040632,141094,"0003
22283","0003","+","22283",171902439,172040632,"overlapStart",-29
00,0,"NearestStart"
"4","1",195333431,195335997,2567,"0004
22164","0004","+","22164",195172540,195196491,"downstream",160891,
136940,"NearestStart"
I will really appreciate if you can tell me what is wrong with my inputs.
Thanks
Parthav Jailwala
Parthav Jailwala [Contractor]
Bioinformatics Analyst, CCRIFX Bioinformatics Core
Advanced Biomedical Computing Center (ABCC)
Information Systems Program
Leidos Biomedical Research, Inc.
(formerly SAIC-Frederick, Inc.)
Frederick National Laboratory for Cancer Research (FNLCR)
P. O. Box B, Frederick, MD 21702
Building 41-B620, NIH, Bethesda, MD
E-mail: parthav.jailwala at nih.gov<mailto:parthav.jailwala at nih.gov><mailto:parthav.jailwala at nih.gov>
Bethesda: 301.451.3455
Frederick: 301.846.5664
Fax (Bethesda): 301.480.0391
http://ccrifx.cancer.gov<http://ccrifx.cancer.gov/>
[cid:3573556C-D796-400A-A322-DCBDDD35455A]


-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 23188_PCGgroup_EnsemblGTFwithstrand.txt
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20131203/1d2601d4/attachment-0002.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 2070_lincRNA_mergedGTF.txt
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20131203/1d2601d4/attachment-0003.txt>


More information about the Bioconductor mailing list