[Bioc-devel] Efficient Random Sampling of Positions in GRanges
Hervé Pagès
hpages at fredhutch.org
Thu Feb 18 18:11:46 CET 2016
Hi,
On 02/18/2016 03:00 AM, Dario Strbenac wrote:
> Good day,
>
> Thank you for these two suggestions. I have questions about both options.
>
> GPos seems to have a limit on the number of bases it can represent. I get an error: Error in GPos(samplingAreas) : too many genomic positions in 'pos_runs'. What exactly is this limit ? Could it be added to the documentation ?
It is documented. The man page for GPos says:
Note:
Like for any Vector derivative, the length of a GPos object cannot
exceed ‘.Machine$integer.max’ (i.e. 2^31 on most platforms).
GPos() will return an error if 'pos_runs' contains too many
genomic positions.
I've started to believe that with GPos we might have a use case that is
strong enough to justify adding support for long Vector objects. This
is a big change to our infrastructure though and it won't happen before
the next release.
> I used all of the human chromosomes. There is also no documentation of the sample function for GPos objects.
That's because there is no sample() function for GPos objects:
> sample
function (x, size, replace = FALSE, prob = NULL)
{
if (length(x) == 1L && is.numeric(x) && x >= 1) {
if (missing(size))
size <- x
sample.int(x, size, replace, prob)
}
else {
if (missing(size))
size <- length(x)
x[sample.int(length(x), size, replace, prob)]
}
}
As you can see, sample() works on anything that has a length() and
is subsettable (a.k.a. "vector-like" object). See ?sample for more
information.
H.
>
> Could regioneR be improved to consider strand ? It generates regions with no strand. I would like the regions to have a strand, since I have a strand-specific sequencing dataset, and for regions to be possibly chosen on the opposite strand to a masked region, such as when the genome is masked by transcripts for the purpose of choosing intergenic sequences.
>
> --------------------------------------
> Dario Strbenac
> PhD Student
> University of Sydney
> Camperdown NSW 2050
> Australia
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fredhutch.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioc-devel
mailing list