[BioC] Query on ChipPeakAnno: AnnotatePeakinBatch input
    Zhu, Lihua (Julie) 
    Julie.Zhu at umassmed.edu
       
    Tue Dec  3 18:04:24 CET 2013
    
    
  
Parthav,
Great to know that you got the correct strand information now.
To understand the meaning of each output variable, please type
help(annotatePeakInBatch) in R. Under the value section, you will see the
description for each output variable. For example, distancetoFeature is
described as "distance to the nearest feature such as transcription start
site. By default, the distance is calculated as the distance between the
start of the binding site and the TSS that is the gene start for genes
located on the forward strand and the gene end for genes located on the
reverse strand."
Please see additional inline comments below.
Best regards,
Julie
On 12/3/13 11:37 AM, "Jailwala, Parthav (NIH/NCI) [C]"
<parthav.jailwala at nih.gov> wrote:
> Hi Julie,
> 
> Thanks !
> I fixed the strand information in the annotation file and now I do get
> correct strand information in the output.
> 
> However, when looking at the output, I am still confused about the
> 'upstream/downstream' determination for features that are on -ve strand.
> My understanding is that for genes on the reverse strand, the Start = 3'
> end of the gene and the End= 5' end of the gene. Hence, when I chose 'TSS'
> as the option, all distances should have been calculated from the TSS,
> that is the 'End' coordinate for that gene.
Correct.
> Also, for features on the
> negative strand, if the start of the peak is higher than the TSS of the
> feature, then actually, the peak is 'Upstream' of the feature. However, in
> the output, for features on -ve strand,when the start of the peak is
> higher than the TSS of the feature, the peak is determined to be
> 'Downstream' of the feature.
Could you please send me an example output row? Also which version of
ChIPpeakAnno did you use ? Please type sessionInfo() in R and copy the
output.
> 
> I will really appreciate if you can advise if my understanding is
> incorrect. 
> 
> Thanks
> Parthav
>  
> 
> 
> 
> On 12/3/13 11:09 AM, "Zhu, Lihua (Julie)" <Julie.Zhu at umassmed.edu> wrote:
> 
>> Parthav,
>> 
>> Your annotation file is not in bed format, i.e., strand information needs
>> to
>> be on the 6th column ( http://genome.ucsc.edu/FAQ/FAQformat#format1). You
>> can fix it by adding score as 5th column.
>> 
>> Please let me know if you still have problem after fixing the annotation
>> file. Thanks!
>> 
>> Best regards,
>> 
>> Julie
>> 
>> 
>> On 12/3/13 10:10 AM, "Jailwala, Parthav (NIH/NCI) [C]"
>> <parthav.jailwala at nih.gov> wrote:
>> 
>>> Julie,
>>> 
>>> Thanks for your response. Attached is my input file of 'peaks' (2070
>>> lincRNA_mergedGTF.txt), the features annotation file that I am using
>>> (23188PCGgroupEnsemblGTFwithstrand.txt: it has strand information coded
>>> as
>>> +,-).
>>> 
>>> Also attached is the output file that shows the strand information as
>>> all
>>> positive (2070lincRNAmergedGTF.annout)
>>> 
>>> Here is the sessionInfo()
>>> 
>>> 
>>>> sessionInfo()
>>> R version 3.0.2 (2013-09-25)
>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>> 
>>> locale:
>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
>>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>> 
>>> attached base packages:
>>> [1] parallel grid stats graphics grDevices utils datasets
>>> [8] methods base
>>> 
>>> other attached packages:
>>> [1] ChIPpeakAnno_2.10.0 GenomicFeatures_1.14.2
>>> [3] limma_3.18.3 org.Hs.eg.db_2.10.1
>>> [5] GO.db_2.10.1 RSQLite_0.11.4
>>> [7] DBI_0.2-7 AnnotationDbi_1.24.0
>>> [9] BSgenome.Ecoli.NCBI.20080805_1.3.17 BSgenome_1.30.0
>>> [11] GenomicRanges_1.14.3 Biostrings_2.30.1
>>> [13] XVector_0.2.0 IRanges_1.20.6
>>> [15] multtest_2.18.0 Biobase_2.22.0
>>> [17] biomaRt_2.18.0 BiocGenerics_0.8.0
>>> [19] VennDiagram_1.6.5
>>> 
>>> loaded via a namespace (and not attached):
>>> [1] MASS_7.3-29 RCurl_1.95-4.1 Rsamtools_1.14.2 XML_3.98-1.1
>>> [5] bitops_1.0-6 rtracklayer_1.22.0 splines_3.0.2 stats4_3.0.2
>>> [9] survival_2.37-4 tools_3.0.2 zlibbioc_1.8.0
>>>> 
>>> 
>>> 
>>> On 12/3/13 9:52 AM, "Zhu, Lihua (Julie)"
>>> <Julie.Zhu at umassmed.edu<mailto:Julie.Zhu at umassmed.edu>> wrote:
>>> 
>>> Parthav,
>>> 
>>> Could you please send us the code snippets,  a test bed file and the
>>> sessionInfo? Thanks!
>>> 
>>> Best regards,
>>> 
>>> Julie
>>> 
>>> 
>>> On 12/3/13 9:43 AM, "Jailwala, Parthav (NIH/NCI) [C]"
>>> <parthav.jailwala at nih.gov<mailto:parthav.jailwala at nih.gov>> wrote:
>>> 
>>> Hi Julie,
>>> I have a strand issue with using the AnnotatePeakinBatch function
>>> within the
>>> ChipPeakAnno package and am reaching out to you to see if you can help
>>> to
>>> figure out what is the issue.
>>> I am trying to find the distance to the TSS , for a set of lincRNA. To
>>> do
>>> this, I am using my own BED file of the 'background' or Annotation. The
>>> BED
>>> file looks like this:
>>> Y       597158  623056  Ddx3y   -
>>> Y       346986  365290  Eif2s3y +
>>> Y       2118049 2129045 Gm10256 +
>>> Y       2156899 2168120 Gm10352 +
>>> Y       1976249 1976584 Gm16501 -
>>> Y       2390390 2398856 Gm3376  +
>>> As you can see, there is now header row for the column names as well
>>> as, the
>>> fifth column is the strand of the feature.
>>> Now, when I run the command, in the output file, the 'Strand' column is
>>> always
>>> +ve (Always + eventhough the feature is on ve strand).
>>> Here is a sample from the output file:
>>> 
>>> "","space","start","end","width","names","peak","strand","feature","start
>>> _posi
>>> tion","end_position","insid
>>> 
>>> eFeature","distancetoFeature","shortestDistance","fromOverlappingOrNeares
>>> t"
>>> "1","1",9708702,9782003,73302,"0001
>>> 23152","0001","+","23152",9708703,9738463,"includeFeature",-1,1,"Near
>>> estStart"
>>> "2","1",134088012,134153958,65947,"0002
>>> 22624","0002","+","22624",134088013,134153958,"overlapStart",-1,0
>>> ,"NearestStart"
>>> "3","1",171899539,172040632,141094,"0003
>>> 22283","0003","+","22283",171902439,172040632,"overlapStart",-29
>>> 00,0,"NearestStart"
>>> "4","1",195333431,195335997,2567,"0004
>>> 22164","0004","+","22164",195172540,195196491,"downstream",160891,
>>> 136940,"NearestStart"
>>> I will really appreciate if you can tell me what is wrong with my
>>> inputs.
>>> Thanks
>>> Parthav Jailwala
>>> Parthav Jailwala [Contractor]
>>> Bioinformatics Analyst, CCRIFX Bioinformatics Core
>>> Advanced Biomedical Computing Center (ABCC)
>>> Information Systems Program
>>> Leidos Biomedical Research, Inc.
>>> (formerly SAIC-Frederick, Inc.)
>>> Frederick National Laboratory for Cancer Research (FNLCR)
>>> P. O. Box B, Frederick, MD 21702
>>> Building 41-B620, NIH, Bethesda, MD
>>> E-mail: 
>>> 
>>> parthav.jailwala at nih.gov<mailto:parthav.jailwala at nih.gov><mailto:parthav.
>>> jailw
>>> ala at nih.gov>
>>> Bethesda: 301.451.3455
>>> Frederick: 301.846.5664
>>> Fax (Bethesda): 301.480.0391
>>> http://ccrifx.cancer.gov<http://ccrifx.cancer.gov/>
>>> [cid:3573556C-D796-400A-A322-DCBDDD35455A]
>>> 
>>> 
>> 
> 
    
    
More information about the Bioconductor
mailing list