[Bioc-sig-seq] AlignedRead and IRanges

Martin Morgan mtmorgan at fhcrc.org
Fri Aug 7 01:03:21 CEST 2009


Hi Jim --

James Bullard wrote:
> I believe that this should be a relatively straightforward thing to do,
> but I am having trouble finding out the best bioC way to perform this
> computation. I have an AlignedRead object from a call to readAligned.
> Additionally I have a set of IRanges (rngs). I want to determine the
> number of reads which are within each range. Specifically, I would like
> a vector of length length(rngs) where each element represents the total
> number of reads within that range.

I wrote a small function that identifies the subset of 'aln' that I'm
interested in, creates an IRanges object of the position and width of
aligned reads, uses IRanges::overlap, and tallies the results

nInRanges <- function(rngs, aln, chr) {
    idx <- chromosome(aln) == chr
    orng <- IRanges(start=position(aln)[idx], width=width(aln)[idx])
    as.table(overlap(orng, rngs))
}

and then with this data

> example(readAligned)
> aln <- readAligned(sp, "s_2_export.txt", filter=filt)
> rngs <- IRanges(start=1000000 * 1:100, width=1000000)

I do

> nInRanges(rngs, aln, "chr1.fa")

  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
  0   0   1   0   0   0   0   0   0   0   0   1   0   0   0
 16  17  18  19  20  21  22  23  24  25  26  27  28  29  30
  0   0   0   0   0   0   0   0   1   0   1   0   0   0   0
 31  32  33  34  35  36  37  38  39  40  41  42  43  44  45
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
 46  47  48  49  50  51  52  53  54  55  56  57  58  59  60
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
 61  62  63  64  65  66  67  68  69  70  71  72  73  74  75
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
 76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
  0   0   0   1   0   0   0   0   0   0   0   0   1   2   0
 91  92  93  94  95  96  97  98  99 100
  1   0   0   1   0   0   0   0   0   0

If the alignments and ranges were on different 'chromosomes', then I
might coerce my AligendRead data to a RangedData or RangedList object,
represent my query ranges as a RangedList, and call overlap on the two
of those. This

.AlignedRead_to_RangesList <- function(aln) {
    chr <- chromosome(aln)
    rngs <- mapply(IRanges,
                   start=split(position(aln), chr),
                   width=split(width(aln), chr))
    do.call(RangesList, rngs)
}

coerces AlignedRead to RangesList and then

  alnList <= .AlignedRead_to_RangesList(aln)
  lapply(overlap(alnList, rngList), as.table)

The coercion function above is not always appropriate, e.g., if strand
information were important.

Martin



> 
> thanks, jim
> 
> ps. I am working with the current release version of R/Bioc
> 
> 
> R version 2.9.1 Patched (2009-07-31 r49063)
> i386-apple-darwin9.7.0
> 
> locale:
> en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
> 
> attached base packages:
> [1] grid      stats     graphics
> [4] grDevices utils     datasets
> [7] methods   base
> 
> other attached packages:
>  [1] Genominator_1.0.0
>  [2] RSQLite_0.7-1
>  [3] DBI_0.2-4
>  [4] GenomeGraphs_1.4.1
>  [5] biomaRt_2.0.0
>  [6] ShortRead_1.2.1
>  [7] lattice_0.17-25
>  [8] BSgenome.Scerevisiae.UCSC.sacCer1_1.3.13
>  [9] BSgenome_1.12.2
> [10] Biostrings_2.12.7
> [11] IRanges_1.2.3
> 
> loaded via a namespace (and not attached):
> [1] Biobase_2.5.3 hwriter_1.1
> [3] RCurl_0.98-1  XML_2.5-1
> 
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing



More information about the Bioc-sig-sequencing mailing list