[BioC] Query on ChipPeakAnno: AnnotatePeakinBatch input

Zhu, Lihua (Julie) Julie.Zhu at umassmed.edu
Tue Dec 3 17:09:25 CET 2013


Parthav,

Your annotation file is not in bed format, i.e., strand information needs to
be on the 6th column ( http://genome.ucsc.edu/FAQ/FAQformat#format1). You
can fix it by adding score as 5th column.

Please let me know if you still have problem after fixing the annotation
file. Thanks!

Best regards,

Julie


On 12/3/13 10:10 AM, "Jailwala, Parthav (NIH/NCI) [C]"
<parthav.jailwala at nih.gov> wrote:

> Julie,
> 
> Thanks for your response. Attached is my input file of 'peaks' (2070
> lincRNA_mergedGTF.txt), the features annotation file that I am using
> (23188PCGgroupEnsemblGTFwithstrand.txt: it has strand information coded as
> +,-).
> 
> Also attached is the output file that shows the strand information as all
> positive (2070lincRNAmergedGTF.annout)
> 
> Here is the sessionInfo()
> 
> 
>> sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-unknown-linux-gnu (64-bit)
> 
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] parallel grid stats graphics grDevices utils datasets
> [8] methods base
> 
> other attached packages:
> [1] ChIPpeakAnno_2.10.0 GenomicFeatures_1.14.2
> [3] limma_3.18.3 org.Hs.eg.db_2.10.1
> [5] GO.db_2.10.1 RSQLite_0.11.4
> [7] DBI_0.2-7 AnnotationDbi_1.24.0
> [9] BSgenome.Ecoli.NCBI.20080805_1.3.17 BSgenome_1.30.0
> [11] GenomicRanges_1.14.3 Biostrings_2.30.1
> [13] XVector_0.2.0 IRanges_1.20.6
> [15] multtest_2.18.0 Biobase_2.22.0
> [17] biomaRt_2.18.0 BiocGenerics_0.8.0
> [19] VennDiagram_1.6.5
> 
> loaded via a namespace (and not attached):
> [1] MASS_7.3-29 RCurl_1.95-4.1 Rsamtools_1.14.2 XML_3.98-1.1
> [5] bitops_1.0-6 rtracklayer_1.22.0 splines_3.0.2 stats4_3.0.2
> [9] survival_2.37-4 tools_3.0.2 zlibbioc_1.8.0
>> 
> 
> 
> On 12/3/13 9:52 AM, "Zhu, Lihua (Julie)"
> <Julie.Zhu at umassmed.edu<mailto:Julie.Zhu at umassmed.edu>> wrote:
> 
> Parthav,
> 
> Could you please send us the code snippets,  a test bed file and the
> sessionInfo? Thanks!
> 
> Best regards,
> 
> Julie
> 
> 
> On 12/3/13 9:43 AM, "Jailwala, Parthav (NIH/NCI) [C]"
> <parthav.jailwala at nih.gov<mailto:parthav.jailwala at nih.gov>> wrote:
> 
> Hi Julie,
> I have a strand issue with using the AnnotatePeakinBatch function within the
> ChipPeakAnno package and am reaching out to you to see if you can help to
> figure out what is the issue.
> I am trying to find the distance to the TSS , for a set of lincRNA. To do
> this, I am using my own BED file of the 'background' or Annotation. The BED
> file looks like this:
> Y       597158  623056  Ddx3y   -
> Y       346986  365290  Eif2s3y +
> Y       2118049 2129045 Gm10256 +
> Y       2156899 2168120 Gm10352 +
> Y       1976249 1976584 Gm16501 -
> Y       2390390 2398856 Gm3376  +
> As you can see, there is now header row for the column names as well as, the
> fifth column is the strand of the feature.
> Now, when I run the command, in the output file, the 'Strand' column is always
> +ve (Always + eventhough the feature is on ­ve strand).
> Here is a sample from the output file:
> "","space","start","end","width","names","peak","strand","feature","start_posi
> tion","end_position","insid
> eFeature","distancetoFeature","shortestDistance","fromOverlappingOrNearest"
> "1","1",9708702,9782003,73302,"0001
> 23152","0001","+","23152",9708703,9738463,"includeFeature",-1,1,"Near
> estStart"
> "2","1",134088012,134153958,65947,"0002
> 22624","0002","+","22624",134088013,134153958,"overlapStart",-1,0
> ,"NearestStart"
> "3","1",171899539,172040632,141094,"0003
> 22283","0003","+","22283",171902439,172040632,"overlapStart",-29
> 00,0,"NearestStart"
> "4","1",195333431,195335997,2567,"0004
> 22164","0004","+","22164",195172540,195196491,"downstream",160891,
> 136940,"NearestStart"
> I will really appreciate if you can tell me what is wrong with my inputs.
> Thanks
> Parthav Jailwala
> Parthav Jailwala [Contractor]
> Bioinformatics Analyst, CCRIFX Bioinformatics Core
> Advanced Biomedical Computing Center (ABCC)
> Information Systems Program
> Leidos Biomedical Research, Inc.
> (formerly SAIC-Frederick, Inc.)
> Frederick National Laboratory for Cancer Research (FNLCR)
> P. O. Box B, Frederick, MD 21702
> Building 41-B620, NIH, Bethesda, MD
> E-mail: 
> parthav.jailwala at nih.gov<mailto:parthav.jailwala at nih.gov><mailto:parthav.jailw
> ala at nih.gov>
> Bethesda: 301.451.3455
> Frederick: 301.846.5664
> Fax (Bethesda): 301.480.0391
> http://ccrifx.cancer.gov<http://ccrifx.cancer.gov/>
> [cid:3573556C-D796-400A-A322-DCBDDD35455A]
> 
> 



More information about the Bioconductor mailing list