Thank you all.

On Thu, Sep 6, 2012 at 11:33 PM, Martin Morgan <mtmorgan@fhcrc.org> wrote:

> On 09/06/2012 11:33 AM, d r wrote:
>
>> Thank you Mike and Valarie,
>>
>> Just to be sure I understood you clearly: I understand there is no
>> possibilty to use  subdivideGRanges() to split a GRanges to a given number
>> of elements.
>> If I understood you correctly, it is also not possible to read a vector
>> into the subsize argument. is taht true or is ther some possibilty to do
>> that?
>>
>
>
> Not sure if this is the best way, but for a GRanges instance 'gr' dividing
> it in to 'n' bins, you might start by making sure there won't be any
> zero-width or necessarily overlapping bins
>
>
>     if (any(width(gr) < n))
>         stop("all 'width(gr)' must be >= 'n'")
>
> then figure out the width, in numeric values, of each bin
>
>     d <- width(gr) / n
>
> To make a vector of offsets from the start of each original bin, you want
> to do this in a 'vectorized' fashion, rather than a loop, so I replicated
> each element of d n times, calculated the cumulative sum, and then
> subtracted for each of the d x n elements the value of the cumulative sum
> at the start of each replicate, a little obscure I guess...
>
>     dd <- cumsum(rep(d, each=n))
>     mask <- logical(n); mask[1] <- TRUE  # every n'th is base for range
>     dd <- dd - rep(dd[mask], each=n)
>
> To get the new starts, replicate the original starts and add the offsets,
> using round() to approximate to the nearest integer value
>
>     starts <- round(rep(start(gr), each=n) + dd)
>
> the ends will be the starts of the next range, minus 1 (the 'L' says to
> represent 0 and 1 as integer values, rather than numeric, and avoids
> unnecessary coercion from integer to numeric and back)
>
>     ends <- c(starts[-1] - 1L, 0L)
>
> except for each of the n'th (including last) elements, which will be the
> end of the original range
>
>     ends[rev(mask)] <- end(gr)
>
> we can then replicate our original data and update the ranges
>
>     gr <- gr[rep(seq_along(gr), each=n)]
>     ranges(gr) <- IRanges(starts, ends)
>
> getting the result
>
>     gr
>
> Here's the above as a function
>
> intoNbins <-
>     function(gr, n = 10)
> {
>     if (any(width(gr) < n))
>         stop("all 'width(gr)' must be >= 'n'")
>     d <- width(gr) / n
>     dd <- cumsum(rep(d, each=n))
>     mask <- logical(n); mask[1] <- TRUE
>     dd <- dd - rep(dd[mask], each=n)
>
>     starts <- round(rep(start(gr), each=n) + dd)
>     ends <- c(starts[-1], 0) - 1L
>     ends[rev(mask)] <- end(gr)
>
>     gr <- gr[rep(seq_along(gr), each=n)]
>     ranges(gr) <- IRanges(starts, ends)
>     gr
> }
>
> and in action
>
> > gr <- GRanges("A", IRanges(c(1, 21, 21), width=c(10, 20, 30)))
> > table(width(intoNbins(gr, 10)))
>
>  1  2  3
> 10 10 10
>
> It might make sense to split the GRanges into a GRangesList, one element
> for each CpG island, replacing the last line of intoNbins with
>
>     split(gr, rep(seq_along(d), each=n))
>
> Martin
>
>
>> thanks again
>> Dolev
>>
>> On Thu, Sep 6, 2012 at 5:17 PM, Michael Love <love@molgen.mpg.de> wrote:
>>
>>  Hi Dolev,
>>>
>>> The subdivideGRanges() function cannot accomplish this task.
>>>
>>> from the manual:
>>>
>>> "Takes an input GRanges object and, splits each range into multiple
>>> ranges
>>> of nearly equal width. For an input range of width w and subdividing size
>>> s, it will subdivide the range into max(1,floor(w/s)) nearly equal width
>>> ranges."
>>>
>>> So the function tries to keep the widths of the subdivided ranges around
>>> s
>>> basepairs, rather than keeping the number of subdivided ranges constant.
>>>
>>> I will include a more helpful error message if a vector is provided for
>>> the subsize argument.
>>>
>>> cheers,
>>>
>>> Mike
>>>
>>>
>>>
>>> On 09/06/12 16:02, Valerie Obenchain wrote:
>>>
>>>  Hi Dolev,
>>>>
>>>> I am copying the maintainer of exomeCopy which is the package
>>>> subdivideGRanges() comes from. Please provide a small reproducable
>>>> example
>>>> of the problem you are having (i.e., a few lines of code that someone
>>>> else
>>>> can run to get the same error). This makes it easier for others to help.
>>>>
>>>> Valerie
>>>>
>>>> On 09/06/12 04:39, d r wrote:
>>>>
>>>>  Hello
>>>>>
>>>>> I have a GRanges object representing CpG islands, that I want to
>>>>> subdivide
>>>>> so that each island will be represented by 10 ranges of equal width.
>>>>>
>>>>> To do this, I created a vector containing the widths of the islands
>>>>> divided
>>>>> by ten:
>>>>>
>>>>> swidth<-as.integer(width(****islandshg19)/10)
>>>>>
>>>>>
>>>>>
>>>>> and then called subdivideGRanges():
>>>>>
>>>>>
>>>>> islandshg19<-subdivideGRanges(****islandshg19,
>>>>>
>>>>> subsize=as.integer(width/10))
>>>>>
>>>>>
>>>>> and got a multitude of warnings.
>>>>>
>>>>>
>>>>> This approach did work when I tried it on a GRanges object containing a
>>>>> single range, and a width vector of length 1. When I tried to extend
>>>>> the
>>>>> sample to include two objetcs, I got these warning message:
>>>>>
>>>>> 1: In if (widths[i]<  2 * subsize) { ... :
>>>>>
>>>>>     the condition has length>  1 and only the first element will be
>>>>> used
>>>>>
>>>>> 2: In if (width<  2 * subsize) { ... :
>>>>>
>>>>>     the condition has length>  1 and only the first element will be
>>>>> used
>>>>>
>>>>> 3: In 0:(nchunks - 1) : numerical expression has 2 elements: only the
>>>>> first used
>>>>>
>>>>>
>>>>>
>>>>> Which makes me think that the problem may be related to assigning the
>>>>> right
>>>>> subszie to the the rigth range.
>>>>>
>>>>> Does anyone have an idea how to do that?
>>>>>
>>>>> Thanks in advance
>>>>>
>>>>> Dolev Rahat
>>>>>
>>>>>
>>>>>
>>>>> sessionInfo(if required):
>>>>>
>>>>> R version 2.15.1 (2012-06-22)
>>>>>
>>>>> Platform: x86_64-pc-mingw32/x64 (64-bit)
>>>>>
>>>>>
>>>>>
>>>>> locale:
>>>>>
>>>>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
>>>>> States.1252    LC_MONETARY=English_United States.1252
>>>>>
>>>>> [4] LC_NUMERIC=C LC_TIME=English_United
>>>>> States.1252
>>>>>
>>>>>
>>>>>
>>>>> attached base packages:
>>>>>
>>>>> [1] stats     graphics  grDevices utils     datasets  methods base
>>>>>
>>>>>
>>>>>
>>>>> other attached packages:
>>>>>
>>>>>    [1] exomeCopy_1.2.0      Rsamtools_1.8.6 Biostrings_2.24.1
>>>>> rtracklayer_1.16.3   taRifx_1.0.4         reshape2_1.2.1
>>>>> plyr_1.7.1
>>>>>
>>>>>    [8] XML_3.9-4.1          GenomicRanges_1.9.54 IRanges_1.15.40
>>>>> BiocGenerics_0.3.1   BiocInstaller_1.4.7
>>>>>
>>>>>
>>>>>
>>>>> loaded via a namespace (and not attached):
>>>>>
>>>>> [1] bitops_1.0-4.1  BSgenome_1.25.7 RCurl_1.91-1.1 stats4_2.15.1
>>>>> stringr_0.6.1   tools_2.15.1    zlibbioc_1.2.0
>>>>>
>>>>>      [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________****_________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor@r-project.org
>>>>> https://stat.ethz.ch/mailman/****listinfo/bioconductor<https://stat.ethz.ch/mailman/**listinfo/bioconductor>
>>>>> <https:/**/stat.ethz.ch/mailman/**listinfo/bioconductor<https://stat.ethz.ch/mailman/listinfo/bioconductor>
>>>>> >
>>>>> Search the archives: http://news.gmane.org/gmane.**
>>>>> science.biology.informatics.****conductor<http://news.gmane.**
>>>>> org/gmane.science.biology.**informatics.conductor<http://news.gmane.org/gmane.science.biology.informatics.conductor>
>>>>> >
>>>>>
>>>>>
>>>>
>>>>
>>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________**_________________
>> Bioconductor mailing list
>> Bioconductor@r-project.org
>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https://stat.ethz.ch/mailman/listinfo/bioconductor>
>> Search the archives: http://news.gmane.org/gmane.**
>> science.biology.informatics.**conductor<http://news.gmane.org/gmane.science.biology.informatics.conductor>
>>
>>
>
> --
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793
>

	[[alternative HTML version deleted]]

