[Bioc-devel] Efficient Random Sampling of Positions in GRanges

Hervé Pagès hpages at fredhutch.org
Thu Feb 18 18:11:46 CET 2016


Hi,

On 02/18/2016 03:00 AM, Dario Strbenac wrote:
> Good day,
>
> Thank you for these two suggestions. I have questions about both options.
>
> GPos seems to have a limit on the number of bases it can represent. I get an error: Error in GPos(samplingAreas) : too many genomic positions in 'pos_runs'. What exactly is this limit ? Could it be added to the documentation ?

It is documented. The man page for GPos says:

   Note:

      Like for any Vector derivative, the length of a GPos object cannot
      exceed ‘.Machine$integer.max’ (i.e. 2^31 on most platforms).
      GPos() will return an error if 'pos_runs' contains too many
      genomic positions.

I've started to believe that with GPos we might have a use case that is
strong enough to justify adding support for long Vector objects. This
is a big change to our infrastructure though and it won't happen before
the next release.

> I used all of the human chromosomes. There is also no documentation of the sample function for GPos objects.

That's because there is no sample() function for GPos objects:

   > sample
   function (x, size, replace = FALSE, prob = NULL)
   {
     if (length(x) == 1L && is.numeric(x) && x >= 1) {
         if (missing(size))
             size <- x
         sample.int(x, size, replace, prob)
     }
     else {
         if (missing(size))
             size <- length(x)
         x[sample.int(length(x), size, replace, prob)]
     }
   }

As you can see, sample() works on anything that has a length() and
is subsettable (a.k.a. "vector-like" object). See ?sample for more
information.

H.

>
> Could regioneR be improved to consider strand ? It generates regions with no strand. I would like the regions to have a strand, since I have a strand-specific sequencing dataset, and for regions to be possibly chosen on the opposite strand to a masked region, such as when the genome is masked by transcripts for the purpose of choosing intergenic sequences.
>
> --------------------------------------
> Dario Strbenac
> PhD Student
> University of Sydney
> Camperdown NSW 2050
> Australia
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list