[BioC] Chip-Seq tag density plot
Hervé Pagès
hpages at fhcrc.org
Wed Mar 30 22:21:07 CEST 2011
Hi Eva,
One solution that involves a little bit of programming from your part
is to make those bins "by hand" by using things like
'start(gr) <- value' and 'end(gr) <- value' to adjust the starts
and ends of your GRanges object. For example:
refseqGR1 <- refseqGR
## Adjust 'end(refseqGR1)' so that each range in it corresponds to
## the region of length 1000 starting at 'start(refseqGR))' (let's
## call this region the "1st bin"):
end(refseqGR1) <- start(refseqGR1) + 999
olaps1 <- findOverlaps(refseqGR1, simTags)
## Then for the "2nd bin":
refseqGR2 <- shift(refseqGR1, 1000)
olaps2 <- findOverlaps(refseqGR2, simTags)
## etc...
You might want to put this in a loop, decide how many loops you want
to do, or maybe define some criteria to exit the loop prematurely...
An important thing to be aware of though is that 'start(refseqGR)' is
always the position of the left-most base of the transcript, that is,
it is its 5' end if it's on the + strand and its 3' end if it's on the
- strand. So you will probably need to adapt the code above to do
something like:
## 1st bin:
refseqGR1 <- refseqGR
idx <- strand(refseqGR) == "+"
end(refseqGR1[idx]) <- start(refseqGR1[idx]) + 999
start(refseqGR1[!idx]) <- end(refseqGR1[!idx]) - 999
## 2nd bin:
refseqGR2 <- shift(refseqGR1, as.vector(ifelse(strand(refseqGR1) ==
"+", 1000, -1000)))
Cheers,
H.
On 03/30/2011 08:37 AM, Eva Benito Garagorri wrote:
> Dear list,
>
> I am trying to generate a tag density plot from a ChIP-Seq experiment for regions around the TSS of known transcripts. I suppose this is not too complicated, but I am having difficulties approaching this issue. I think maybe the most straightforward way is to use the GRanges capabilities. I generated a GRanges object by downloading the mouse refseq database from UCSC (see code below). This gives me a GRanges object containing the start and end coordinate of each refseq. I could cross this with my tag file with "findOverlaps", but that would give me the overlap of my tags with the entire span of the transcript and I would like to just keep (and make bins around) the region at either side of the TSS. I couldn't find a way to generate a sliding window in the refseq database to calculate the overlap between each bin and my tag file.
>
> If anyone could point me to some package/functionality/reading material or maybe to a previous thread (I did search the mailing list but maybe I missed something) or else help with the strategy, I would be most grateful.
>
> Thank you very much in advance.
>
> Best regards,
>
> Eva
>
>
>
> library(GenomicRanges)
> library(GenomicFeatures)
>
> refseq = makeTranscriptDbFromUCSC('mm9', tablename='refGene')
>
> refseqGR = transcripts(refseq)
>
> head(refseqGR)
>
> GRanges with 6 ranges and 2 elementMetadata values
> seqnames ranges strand | tx_id tx_name
> <Rle> <IRanges> <Rle> |<integer> <character>
> [1] chr1 [4797974, 4836816] + | 490 NM_008866
> [2] chr1 [4847775, 4887990] + | 73 NM_011541
> [3] chr1 [4847775, 4887990] + | 78 NM_001159750
> [4] chr1 [4848409, 4887990] + | 75 NM_001159751
> [5] chr1 [5073254, 5152630] + | 77 NM_133826
> [6] chr1 [5578574, 5596214] + | 494 NM_001204371
>
> seqlengths
> chr1 chr2 chr3 chr4 chr5 chr6 ... chr7_random chr8_random chr9_random chrUn_random chrX_random chrY_random
> 197195432 181748087 159599783 155630120 152537259 149517037 ... 362490 849593 449403 5900358 1785075 58682461
>
> ### Simulate a tag file as a sample of the above
>
> simTags = sample(refseqGR, 1000)
>
>
> #### It would now be possible to get the overlap between the two by doing:
>
> olaps = findOverlaps(refseqGR, simTags)
>
> ### But how do I divide the refseqGR into bins of equal size around the start coordinate?
>
>
> sessionInfo()
>
>
> R version 2.12.0 (2010-10-15)
> Platform: i386-apple-darwin9.8.0/i386 (32-bit)
>
> locale:
> [1] es_ES.UTF-8/es_ES.UTF-8/C/C/es_ES.UTF-8/es_ES.UTF-8
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] GenomicFeatures_1.2.3 GenomicRanges_1.2.3 IRanges_1.8.9
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.10.0 biomaRt_2.6.0 Biostrings_2.18.0 BSgenome_1.18.3 DBI_0.2-5 RCurl_1.4-3 RSQLite_0.9-3 rtracklayer_1.10.6
> [9] tools_2.12.0 XML_3.2-0
>
> ----------
> Eva Benito Garagorri
> PhD program in Neurosciences
> Institute for Neurosciences in Alicante
> UMH-CSIC
> San Juan de Alicante
> 03550
> Spain
> ebenito at umh.es
> (34) 965 91 92 33
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioconductor
mailing list