[Bioc-sig-seq] getting integers from SimpleRleViewsList

Janet Young jayoung at fhcrc.org
Wed Jun 9 02:10:36 CEST 2010


Hi,

This is probably a simple question, but I can't quite figure it out  
for myself.  I've been using RangedData (etc) quite a lot and have  
often found myself looping through the spaces when I probably don't  
need to. I'm trying not to do that now...

I have some scores on a bunch of genomic intervals I'm interested in  
(they're conservation scores in 4kb promoter regions, e.g. phastCons  
scores).  I've stored these as a SimpleRleList object.

I also have some smaller subregions (matches to transcription factor  
motifs) and I want to compare the distribution of scores inside those  
subregions to scores outside those subregions.  I have the regions as  
RangedData objects, over multiple spaces. I want to do this on  
individual base pair scores, rather than by taking subregion means and  
comparing those (so I don't want to use viewSums).

I've managed to get the scores inside and outside the regions using  
the RleViewsList function (following the example in this thread: http://thread.gmane.org/gmane.comp.lang.r.sequencing/403/focus=408 
  ).  The scores are now as SimpleRleViewsList.

I'm currently looping through the scores (the SimpleRleViewsList)  
space by space, converting to simple vector and concatenating the  
scores, but I imagine there's a way to do it without looping. Any ideas?

Below is a simple example (extended from the older thread).  My real  
scores are not integers and there's many more of them.

#########################
library(IRanges)

### generate some arbitrary scores
track <- RangedData(RangesList(chrA = IRanges(start = c(1, 4, 6),  
width=c(3, 2, 4)),chrB = IRanges(start = c(1, 3, 6), width=c(3, 3,  
4))) )
trackCoverage <- coverage(track,  
weight=list(chrA=c(2,7,3),chrB=c(1,1,1)) )

### define subregions
exons <- RangesList(chrA = IRanges(start = c(2, 4), width = c(2,  
2)),chrB = IRanges(start = 3, width = 5))

###get exon and intron scores
exonView <- RleViewsList(rleList=trackCoverage, rangesList=exons)
intronView <- RleViewsList(rleList=trackCoverage,  
rangesList=gaps(ranges(RangedData(exons)),start=1,end=9))

### now convert those to simple integer objects.  Loop through  
chromosomes. Can I do this without looping?
### (I tried using unlist, but it does something weird to the scores,  
as it assumes they're all from the same space)
exonScores <- integer()
intronScores <- integer()
for (chr in names(exonView) ) {
      thischrexonscores <- as.integer(seqselect(trackCoverage[[chr]],  
start=start(exonView[[chr]]),end=end(exonView[[chr]]) ))
      exonScores <- c(exonScores,thischrexonscores)

      thischrintronscores <-  
as.integer(seqselect(trackCoverage[[chr]],  
start=start(intronView[[chr]]),end=end(intronView[[chr]]) ))
      intronScores <- c(intronScores,thischrintronscores)

}

###and do something with the scores to compare distributions
wilcox.test(exonScores,intronScores)

#########################

thanks,

Janet

-------------------------------------------------------------------

Dr. Janet Young (Trask lab)

Fred Hutchinson Cancer Research Center
1100 Fairview Avenue N., C3-168,
P.O. Box 19024, Seattle, WA 98109-1024, USA.

tel: (206) 667 1471 fax: (206) 667 6524
email: jayoung  ...at...  fhcrc.org

http://www.fhcrc.org/labs/trask/



More information about the Bioc-sig-sequencing mailing list