[Bioc-devel] Issue with change in random sampling

Martin Morgan mtmorg@n@b|oc @end|ng |rom gm@||@com
Fri Sep 24 20:15:36 CEST 2021


This does sound like a BiocParallel side effect, and I would suggest
holding off for another week so for the BiocParallel changes to be
finalized.
On 9/24/21, 2:05 PM, "Bioc-devel" <bioc-devel-bounces using r-project.org> wrote:

Hello,

My package `clusterExperiment` has not changed but is hitting errors on the
devel branch. I’ve pinpointed it to the fact that a small dataset I am
running the tests on is randomly subsetted from a larger subset and is no
longer choosing the same observations. I have already in previous version
corrected the tests for the change in random number generation in R.4.0.x.
I am wondering if it is related to the changes in BiocParallel (
https://community-bioc.slack.com/archives/CEQ04GKEC/p1631903391030800?thread_ts=1631881095.027600&cid=CEQ04GKEC
<
https://community-bioc.slack.com/archives/CEQ04GKEC/p1631903391030800?thread_ts=1631881095.027600&cid=CEQ04GKEC>
).

It was unexpected for me that this would affect these results. My package
doesn’t use BiocParallel or depend on it. But it turns out the code in
question does make a call to BiocSingular to run a PCA, and BiocSingular
does make calls to BiocParallel. What is strange to me is that even if I
don’t directly use the results of runPCA, but simply make the call to
runPCA before running the code in question, the output of that code is
changed. So this seems to me to indicate that the sequence of random
numbers is being globally affected by the change, and not just internally
to the results of calls to BiocParallel. I didn’t realize this was the case
from the above discussion — I thought it would only affect output that
directly relied on calls to BiocParallel — and I was hoping someone could
confirm that this is what is happening and/or give me explicit way to check
this is the source of my errors.

Here’s the basic setup. I have a setup file that sets up a lot of objects
for my tests (setup_create_objects.R). The relevant parts look something
like this (I’ve simplified it from what’s actually in the file so it more
clearly shows the progression):

data(simData)
suppressWarnings(RNGversion("3.5.0"))
set.seed(23)

… # bunch of code

clusterIds<- … # code that internally calls BiocSingular::runPCA

… # bunch of code

### sample 3 observations from each cluster:
whSamp<-unlist(tapply(1:ncol(simData),clusterIds,function(x){sample(x=x,size=3)}))
smSimData<-simData[1:20,whSamp]

This results in different values of clusterIds and thus different whSamp on
the release and the devel version.

The unexpected part was even if I add a line that manually overwrites
clusterIds to be the values of the vector `clusterIds` from the release
version (copied manually from running on a different computer that is not
the devel version) I don’t get the same result of whSamp (I still run the
code for `clusterIds`, so BiocSingular::runPCA is still being called). If,
however, when I manually feed the correct clusterIds on the devel version,
I ALSO put in a new call to `set.seed` in the line before calling whSamp
then both the devel and the release version give the same result, as I
would expect. This makes me think that that the random seed has been
affected globally. Further, the second entry of .Random.seed is not the
same after running setup_create_objects.R on the devel version as the new
version.

Thanks,
Elizabeth Purdom



        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel using r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list