[Bioc-devel] Efficient Random Sampling of Positions in GRanges
    Hervé Pagès 
    hpages at fredhutch.org
       
    Thu Feb 18 18:11:46 CET 2016
    
    
  
Hi,
On 02/18/2016 03:00 AM, Dario Strbenac wrote:
> Good day,
>
> Thank you for these two suggestions. I have questions about both options.
>
> GPos seems to have a limit on the number of bases it can represent. I get an error: Error in GPos(samplingAreas) : too many genomic positions in 'pos_runs'. What exactly is this limit ? Could it be added to the documentation ?
It is documented. The man page for GPos says:
   Note:
      Like for any Vector derivative, the length of a GPos object cannot
      exceed ‘.Machine$integer.max’ (i.e. 2^31 on most platforms).
      GPos() will return an error if 'pos_runs' contains too many
      genomic positions.
I've started to believe that with GPos we might have a use case that is
strong enough to justify adding support for long Vector objects. This
is a big change to our infrastructure though and it won't happen before
the next release.
> I used all of the human chromosomes. There is also no documentation of the sample function for GPos objects.
That's because there is no sample() function for GPos objects:
   > sample
   function (x, size, replace = FALSE, prob = NULL)
   {
     if (length(x) == 1L && is.numeric(x) && x >= 1) {
         if (missing(size))
             size <- x
         sample.int(x, size, replace, prob)
     }
     else {
         if (missing(size))
             size <- length(x)
         x[sample.int(length(x), size, replace, prob)]
     }
   }
As you can see, sample() works on anything that has a length() and
is subsettable (a.k.a. "vector-like" object). See ?sample for more
information.
H.
>
> Could regioneR be improved to consider strand ? It generates regions with no strand. I would like the regions to have a strand, since I have a strand-specific sequencing dataset, and for regions to be possibly chosen on the opposite strand to a masked region, such as when the genome is masked by transcripts for the purpose of choosing intergenic sequences.
>
> --------------------------------------
> Dario Strbenac
> PhD Student
> University of Sydney
> Camperdown NSW 2050
> Australia
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
-- 
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319
    
    
More information about the Bioc-devel
mailing list