[BioC] loop over IRanges spaces
Yvan
yvan.strahm at uni.no
Tue May 11 11:15:38 CEST 2010
Hello,
I tried your code below, but and iI am not able to create the
scoreRleList as weight could be only a integer list.
> read_gff
function(file="gff3.txt") {
gff <- read.delim(file,header=FALSE)
colnames(gff) <- c("seqname", "source", "feature", "start",
"end", "score", "strand","frame", "comments")
return(gff)
}
> library(IRanges)
Attaching package: 'IRanges'
The following object(s) are masked from 'package:base':
cbind, Map, mapply, order, paste, pmax, pmax.int, pmin, pmin.int,
rbind, rep.int, table
>
gff<-read_gff("/home/yvan/Lundalm/R/projects/coen/gff_original/50Lines.gff")
>
rd<-RangedData(IRanges(gff$start,width=1),score=gff$score,start=gff$start,space=gff$seqname)
> scoreRleList<-coverage(rd, weight ="score",width =
as.list(table(rd$space)))
Error in .local(x, shift, width, weight, ...) :
'weight' must be a non-empty list of integers
>
Can I change/force the type of 'weight' ?
In a different approach, I try to use different views on the data set.
This was motivated by the fact that the difference on the start of each
probes could indicate either a change of scaffold or a gap inside a
scaffold.
> first_view<-slice(diff(rd$start), lower=10, upper=200)
> first_view
Views on a 60-integer XInteger subject
subject: 28 24 25 37 23 24 32 34 25 23 63 ... 24 35 24 29 31 28 19 29 27
29 37
views:
start end width
[1] 1 19 19 [28 24 25 37 23 24 32 34 25 23 63 21 36 27 19 33 33
24 32]
[2] 21 39 19 [ 31 22 34 18 36 25 ... 31 108 34 28 26
23 33]
[3] 41 60 20 [32 28 26 20 33 25 30 30 24 ... 29 31 28 19 29 27 29
37 22]
>
so here the three views are the three scaffold. but if I want to change
the subject of the view to the score
>
second_view<-Views(subject=rd$score,start=first_view$start,end=first_view$end)
Error in function (classes, fdef, mtable) :
unable to find an inherited method for function "Views", for
signature "numeric"
>
Again can I change the expected type of the signature in the Views function.
Thanks again for your help.
Cheers,
yvan
On 04/26/2010 09:08 PM, Patrick Aboyoun wrote:
> Yvan,
> It appears to me that you are trying to perform two conflicting
> activities:
>
> 1) Calculate the running sum of a metric over an annotated sequence
> (as evidenced by your aggregate function call)
> 2) Find the sum for a metric across specified intervals on the
> annotated sequence (as evidenced by your desire to assign the
> aggregated sums into an existing RangedData object)
>
> Taking a step back, I am guessing that you are trying to transform
> something akin to a UCSC bed file into something else that is UCSC bed
> file like. If you are using rtracklayer, this means your initial data
> are stored in a RangedData object. To create a RangedData object
> containing the running sum of a values column from an initial
> RangedData object, I recommend:
>
> 1) Creating an RleList object from the RangedData object using the
> coverage function. Make sure to specify the metric of interest in the
> weight argument to coverage.
> 2) Using the runsum function on the RleList object to calculate your
> running sums.
> 3) Creating a RangedData object from the RleList object in step 2
> using as(<<obj>>, "RangedData")
>
> Here is an example:
>
> > # Step 1: create an RleList representation of the metric
> > rd <- RangedData(IRanges(start = c(5, 10, 15, 2, 4, 8), end = c(7,
> 14, 21, 3, 6, 9)),
> score = 1:6, space = rep(c("A", "B"), each = 3))
> > scoreRleList <- coverage(rd, weight = "score", width = list(A = 30,
> B = 10))
> > scoreRleList
> SimpleRleList of length 2
> $A
> 'integer' Rle of length 30 with 6 runs
> Lengths: 4 3 2 5 7 9
> Values : 0 1 0 2 3 0
>
> $B
> 'integer' Rle of length 10 with 6 runs
> Lengths: 1 2 3 1 2 1
> Values : 0 4 5 0 6 0
>
> > # Step 2: calculate the running sums
> > scoreRunsum <- runsum(scoreRleList, k = 3, endrule = "constant")
> > scoreRunsum
> SimpleRleList of length 2
> $A
> 'integer' Rle of length 30 with 15 runs
> Lengths: 3 1 1 1 1 1 1 1 3 1 1 5 1 1 8
> Values : 0 1 2 3 2 1 2 4 6 7 8 9 6 3 0
>
> $B
> 'integer' Rle of length 10 with 7 runs
> Lengths: 2 1 1 1 1 1 3
> Values : 8 13 14 15 10 11 12
>
>
> > # Step 3: Create a RangedData representation of the running sums
> > rdRunsum <- as(scoreRunsum, "RangedData")
> > rdRunsum
> RangedData with 22 rows and 1 value column across 2 spaces
> space ranges | score
> <character> <IRanges> | <integer>
> 1 A [ 1, 3] | 0
> 2 A [ 4, 4] | 1
> 3 A [ 5, 5] | 2
> 4 A [ 6, 6] | 3
> 5 A [ 7, 7] | 2
> 6 A [ 8, 8] | 1
> 7 A [ 9, 9] | 2
> 8 A [10, 10] | 4
> 9 A [11, 13] | 6
> ... ... ... ... ...
> 14 A [22, 22] | 3
> 15 A [23, 30] | 0
> 16 B [ 1, 2] | 8
> 17 B [ 3, 3] | 13
> 18 B [ 4, 4] | 14
> 19 B [ 5, 5] | 15
> 20 B [ 6, 6] | 10
> 21 B [ 7, 7] | 11
> 22 B [ 8, 10] | 12
>
> > sessionInfo()
> R version 2.11.0 Patched (2010-04-24 r51820)
> i386-apple-darwin9.8.0
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] IRanges_1.6.1
>
> loaded via a namespace (and not attached):
> [1] tools_2.11.0
>
>
>
> On 4/23/10 6:26 AM, Michael Lawrence wrote:
>> Also note that it's not really necessary to loop here, as is often
>> the case
>> with IRanges:
>>
>> rd$windo<- unlist(runmean(RleList(values(rd)[,"score"])))
>>
>> On Fri, Apr 23, 2010 at 6:13 AM, Michael Lawrence<michafla at gene.com>
>> wrote:
>>
>>>
>>> On Fri, Apr 23, 2010 at 1:49 AM, Yvan<yvan.strahm at uni.no> wrote:
>>>
>>>> Hello,
>>>> Thank you both of you.
>>>>
>>>> I could could calculate the sliding window, but not as a Rle
>>>> object, could
>>>> not append values for the last w-1 position in the Rle object in
>>>> order to
>>>> take care of the size problem.
>>>>
>>>>
>>> Why not? Rle supports all the normal vector operations. And runsum or
>>> runmean will output a vector of the same size as the input, using a
>>> choice
>>> of two endrules. If you want 0's at the end, try something like:
>>>
>>> rle[(nrow(rd)-w+1):nrow(rd)]<- 0
>>>
>>>
>>>
>>>> So I did it like that:
>>>>
>>>> params<-RDApplyParams(rd,function(rd)
>>>> append((diff(c(0,cumsum(rd$score)),lag=w)/w),rep(0,each=w-1),after=(length(rd$score)-w+1)))
>>>>
>>>>
>>>> But when I try to add the new values to the rangedData object I got
>>>> these
>>>> error.
>>>>
>>>>> values(rd)[,"windo"]<-rdapply(params)
>>>> Error in `[<-`(`*tmp*`, , j, value =<S4 object of class
>>>> "DataFrame">) :
>>>> ncol(x[j]) != ncol(value)
>>>> In addition: Warning messages:
>>>> 1: In mapply(f, ..., SIMPLIFY = FALSE) :
>>>> longer argument not a multiple of length of shorter
>>>> 2: In mapply(f, ..., SIMPLIFY = FALSE) :
>>>> longer argument not a multiple of length of shorter
>>>>
>>>> But when I check the size, they are the same, here for one space
>>>>
>>>>> x<-rdapply(params)
>>>>> length(x$SCAFFOLD_100) == length(rd["SCAFFOLD_100"]$windo)
>>>> [1] TRUE
>>>>
>>>>
>>> It may be that that type of insertion is unsupported. Why not just do
>>> something like:
>>>
>>> rd$window<- unlist(x)
>>>
>>>
>>>> Maybe params miss a parameter or the way I try to update the rd
>>>> object is
>>>> wrong. Anyway form the rdapply output a vector could be created and
>>>> so a new
>>>> rd object with the new value column.
>>>>
>>>> yvan
>>>>
>>>>
>>>> On 22/04/10 15:51, Michael Lawrence wrote:
>>>>
>>>>
>>>>
>>>> On Thu, Apr 22, 2010 at 5:49 AM, Michael
>>>> Dondrup<Michael.Dondrup at uni.no>wrote:
>>>>
>>>>> Hi,
>>>>> how about funtion rdapply (not lapply) which is for that?
>>>>>
>>>>>
>>>> lapply() should apply per-space as well, basically providing a
>>>> short-cut
>>>> for the more complicated rdapply().
>>>>
>>>>> lapply(rd, function(x) sum(x$score))
>>>> $chr1
>>>> [1] 3
>>>>
>>>> $chr2
>>>> [1] 0
>>>>
>>>> sapply() also works:
>>>>> sapply(rd, function(x) sum(x$score))
>>>> chr1 chr2
>>>> 3 0
>>>>
>>>> Another choice is tapply:
>>>>> tapply(rd$score, space(rd), sum)
>>>> chr1 chr2
>>>> 3 0
>>>>
>>>> Michael
>>>>
>>>> The code below computes the sum score for each space in the
>>>> RangedData:
>>>>> # taken from the examples mostly:
>>>>>> ranges<- IRanges(c(1,2,3),c(4,5,6))
>>>>>> score<- c(2L, 0L, 1L)
>>>>>> rd<- RangedData(ranges, score, space = c("chr1","chr2","chr1"))
>>>>>> rd
>>>>> RangedData with 3 rows and 1 value column across 2 spaces
>>>>> space ranges | score
>>>>> <character> <IRanges> |<integer>
>>>>> 1 chr1 [1, 4] | 2
>>>>> 2 chr1 [3, 6] | 1
>>>>> 3 chr2 [2, 5] | 0
>>>>>> params<- RDApplyParams(rd, function(rd) sum(score(rd)))
>>>>>> rdapply(params)
>>>>> $chr1
>>>>> [1] 3
>>>>>
>>>>> $chr2
>>>>> [1] 0
>>>>>
>>>>>
>>>>> Cheers
>>>>> Michael
>>>>>
>>>>> Am Apr 22, 2010 um 1:57 PM schrieb Yvan:
>>>>>
>>>>>> On 21/04/10 18:43, Michael Lawrence wrote:
>>>>>>>
>>>>>>> On Wed, Apr 21, 2010 at 6:07 AM, Yvan<yvan.strahm at uni.no
>>>>>>> <mailto:yvan.strahm at uni.no>> wrote:
>>>>>>>
>>>>>>> Hello List,
>>>>>>>
>>>>>>> I am confused about how to loop over a rangedData object.
>>>>>>> I have this rangedData
>>>>>>>
>>>>>>> RangedData with 61 rows and 1 value column across 3 spaces
>>>>>>> space ranges | score
>>>>>>> <character> <IRanges> |<numeric>
>>>>>>> 1 SCAFFOLD_1 [ 8, 8] | -0.09405
>>>>>>>
>>>>>>> and the spaces are
>>>>>>>
>>>>>>> "SCAFFOLD_1" "SCAFFOLD_10" "SCAFFOLD_100"
>>>>>>>
>>>>>>> using aggregate it is possible to apply a function to one of
>>>>>>> the
>>>>> space
>>>>>>> aggregate(rd["SCAFFOLD_1"]$score, start =
>>>>>>> 1:(length(rd["SCAFFOLD_1"]$score)-w+1), width = w, FUN = sum)
>>>>>>>
>>>>>>> but how can I apply the aggregate to all space without a for
>>>>>>> loop ?
>>>>>>>
>>>>>>>
>>>>>>> It looks like you're attempting a running window sum of the score
>>>>>>> vector. There are more efficient ways of doing this besides
>>>>>>> aggregate(). If you convert the score into an Rle, you can use
>>>>> runsum().
>>>>>>> Anyway, to do this over each space individually, use lapply().
>>>>>>>
>>>>>>> This would come out to something like:
>>>>>>>
>>>>>>> values(rd)[,"smoothScore"]<- lapply(rd, function(x)
>>>>>>> runsum(Rle(x$score), w))
>>>>>>>
>>>>>>> Probably not exactly right, but it gets you in the right
>>>>>>> direction...
>>>>>>>
>>>>>>> Michael
>>>>>>>
>>>>>>>
>>>>>> Hello Michael,
>>>>>>
>>>>>> Thanks for the answer and the tip about runsum!
>>>>>> I try with lapply but could not get it working right, the main
>>>>>> problem
>>>>>> is that the runsum is calculated on all values and not for a each
>>>>>> specific spaces.
>>>>>> Sorry, I should have been more precise in the problem description.
>>>>>> The runsum should be calculated in a space specific manner, let
>>>>>> say w=2
>>>>>>
>>>>>> space score cumsum
>>>>>> 1 space1 1 3
>>>>>> 2 space1 2 4
>>>>>> 3 space1 2 NA
>>>>>> 4 space2 10 21
>>>>>> 5 space2 11 22
>>>>>> 6 space2 11 NA
>>>>>>
>>>>>> Is it possible to do it with lapply?
>>>>>> Thanks again for your help
>>>>>> cheers,
>>>>>> yvan
>>>>>>
>>>>>> [[alternative HTML version deleted]]
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioconductor mailing list
>>>>>> Bioconductor at stat.math.ethz.ch
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>> Search the archives:
>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at stat.math.ethz.ch
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives:
>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>
>>>>
>>>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
More information about the Bioconductor
mailing list