[BioC] Chip-Seq tag density plot

Wed Mar 30 22:21:07 CEST 2011

Hi Eva,

One solution that involves a little bit of programming from your part
is to make those bins "by hand" by using things like
'start(gr) <- value' and 'end(gr) <- value' to adjust the starts
and ends of your GRanges object. For example:

   refseqGR1 <- refseqGR
   ## Adjust 'end(refseqGR1)' so that each range in it corresponds to
   ## the region of length 1000 starting at 'start(refseqGR))' (let's
   ## call this region the "1st bin"):
   end(refseqGR1) <- start(refseqGR1) + 999
   olaps1 <- findOverlaps(refseqGR1, simTags)

   ## Then for the "2nd bin":
   refseqGR2 <- shift(refseqGR1, 1000)
   olaps2 <- findOverlaps(refseqGR2, simTags)

   ## etc...

You might want to put this in a loop, decide how many loops you want
to do, or maybe define some criteria to exit the loop prematurely...

An important thing to be aware of though is that 'start(refseqGR)' is
always the position of the left-most base of the transcript, that is,
it is its 5' end if it's on the + strand and its 3' end if it's on the
- strand. So you will probably need to adapt the code above to do
something like:

   ## 1st bin:
   refseqGR1 <- refseqGR
   idx <- strand(refseqGR) == "+"
   end(refseqGR1[idx]) <- start(refseqGR1[idx]) + 999
   start(refseqGR1[!idx]) <- end(refseqGR1[!idx]) - 999

   ## 2nd bin:
   refseqGR2 <- shift(refseqGR1, as.vector(ifelse(strand(refseqGR1) == 
"+", 1000, -1000)))

Cheers,
H.

On 03/30/2011 08:37 AM, Eva Benito Garagorri wrote:
> Dear list,
>
> I am trying to generate a tag density plot from a ChIP-Seq experiment for regions around the TSS of known transcripts. I suppose this is not too complicated, but I am having difficulties approaching this issue. I think maybe the most straightforward way is to use the GRanges capabilities. I generated a GRanges object by downloading the mouse refseq database from UCSC (see code below). This gives me a GRanges object containing the start and end coordinate of each refseq. I could cross this with my tag file with "findOverlaps", but that would give me the overlap of my tags with the entire span of the transcript and I would like to just keep (and make bins around) the region at either side of the TSS. I couldn't find a way to generate a sliding window in the refseq database to calculate the overlap between each bin and my tag file.
>
> If anyone could point me to some package/functionality/reading material or maybe to a previous thread (I did search the mailing list but maybe I missed something) or else help with the strategy, I would be most grateful.
>
> Thank you very much in advance.
>
> Best regards,
>
>     Eva
>
>
>
> library(GenomicRanges)
> library(GenomicFeatures)
>
> refseq = makeTranscriptDbFromUCSC('mm9', tablename='refGene')
>
> refseqGR = transcripts(refseq)
>
> head(refseqGR)
>
> GRanges with 6 ranges and 2 elementMetadata values
>      seqnames             ranges strand |     tx_id      tx_name
>         <Rle>           <IRanges>   <Rle>  |<integer>   <character>
> [1]     chr1 [4797974, 4836816]      + |       490    NM_008866
> [2]     chr1 [4847775, 4887990]      + |        73    NM_011541
> [3]     chr1 [4847775, 4887990]      + |        78 NM_001159750
> [4]     chr1 [4848409, 4887990]      + |        75 NM_001159751
> [5]     chr1 [5073254, 5152630]      + |        77    NM_133826
> [6]     chr1 [5578574, 5596214]      + |       494 NM_001204371
>
> seqlengths
>           chr1         chr2         chr3         chr4         chr5         chr6 ...  chr7_random  chr8_random  chr9_random chrUn_random  chrX_random  chrY_random
>      197195432    181748087    159599783    155630120    152537259    149517037 ...       362490       849593       449403      5900358      1785075     58682461
>
> ### Simulate a tag file as a sample of the above
>
> simTags = sample(refseqGR, 1000)
>
>
> #### It would now be possible to get the overlap between the two by doing:
>
> olaps = findOverlaps(refseqGR, simTags)
>
> ### But how do I divide the refseqGR into bins of equal size around the start coordinate?
>
>
> sessionInfo()
>
>
> R version 2.12.0 (2010-10-15)
> Platform: i386-apple-darwin9.8.0/i386 (32-bit)
>
> locale:
> [1] es_ES.UTF-8/es_ES.UTF-8/C/C/es_ES.UTF-8/es_ES.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] GenomicFeatures_1.2.3 GenomicRanges_1.2.3   IRanges_1.8.9
>
> loaded via a namespace (and not attached):
>   [1] Biobase_2.10.0     biomaRt_2.6.0      Biostrings_2.18.0  BSgenome_1.18.3    DBI_0.2-5          RCurl_1.4-3        RSQLite_0.9-3      rtracklayer_1.10.6
>   [9] tools_2.12.0       XML_3.2-0
>
> ----------
> Eva Benito Garagorri
> PhD program in Neurosciences
> Institute for Neurosciences in Alicante
> UMH-CSIC
> San Juan de Alicante
> 03550
> Spain
> ebenito at umh.es
> (34) 965 91 92 33
>
>
>
>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319