[Bioc-sig-seq] AlignedRead and IRanges
Martin Morgan
mtmorgan at fhcrc.org
Fri Aug 7 01:03:21 CEST 2009
Hi Jim --
James Bullard wrote:
> I believe that this should be a relatively straightforward thing to do,
> but I am having trouble finding out the best bioC way to perform this
> computation. I have an AlignedRead object from a call to readAligned.
> Additionally I have a set of IRanges (rngs). I want to determine the
> number of reads which are within each range. Specifically, I would like
> a vector of length length(rngs) where each element represents the total
> number of reads within that range.
I wrote a small function that identifies the subset of 'aln' that I'm
interested in, creates an IRanges object of the position and width of
aligned reads, uses IRanges::overlap, and tallies the results
nInRanges <- function(rngs, aln, chr) {
idx <- chromosome(aln) == chr
orng <- IRanges(start=position(aln)[idx], width=width(aln)[idx])
as.table(overlap(orng, rngs))
}
and then with this data
> example(readAligned)
> aln <- readAligned(sp, "s_2_export.txt", filter=filt)
> rngs <- IRanges(start=1000000 * 1:100, width=1000000)
I do
> nInRanges(rngs, aln, "chr1.fa")
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 1 0 0 0 0 0 0 0 0 1 0 0 0
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
0 0 0 0 0 0 0 0 1 0 1 0 0 0 0
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
0 0 0 1 0 0 0 0 0 0 0 0 1 2 0
91 92 93 94 95 96 97 98 99 100
1 0 0 1 0 0 0 0 0 0
If the alignments and ranges were on different 'chromosomes', then I
might coerce my AligendRead data to a RangedData or RangedList object,
represent my query ranges as a RangedList, and call overlap on the two
of those. This
.AlignedRead_to_RangesList <- function(aln) {
chr <- chromosome(aln)
rngs <- mapply(IRanges,
start=split(position(aln), chr),
width=split(width(aln), chr))
do.call(RangesList, rngs)
}
coerces AlignedRead to RangesList and then
alnList <= .AlignedRead_to_RangesList(aln)
lapply(overlap(alnList, rngList), as.table)
The coercion function above is not always appropriate, e.g., if strand
information were important.
Martin
>
> thanks, jim
>
> ps. I am working with the current release version of R/Bioc
>
>
> R version 2.9.1 Patched (2009-07-31 r49063)
> i386-apple-darwin9.7.0
>
> locale:
> en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] grid stats graphics
> [4] grDevices utils datasets
> [7] methods base
>
> other attached packages:
> [1] Genominator_1.0.0
> [2] RSQLite_0.7-1
> [3] DBI_0.2-4
> [4] GenomeGraphs_1.4.1
> [5] biomaRt_2.0.0
> [6] ShortRead_1.2.1
> [7] lattice_0.17-25
> [8] BSgenome.Scerevisiae.UCSC.sacCer1_1.3.13
> [9] BSgenome_1.12.2
> [10] Biostrings_2.12.7
> [11] IRanges_1.2.3
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.5.3 hwriter_1.1
> [3] RCurl_0.98-1 XML_2.5-1
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
More information about the Bioc-sig-sequencing
mailing list