[Bioc-devel] Confusing namespace issue with IRanges 1.99.17

Hervé Pagès hpages at fhcrc.org
Tue Jul 8 17:15:25 CEST 2014


Hi guys,

On 07/08/2014 05:29 AM, Michael Lawrence wrote:
> This is why I tell people not to use require(). But what's with needing to
> load IRanges to subset an Rle? Is that temporary?

Very temporary. The source code of the "extractROWS" and "replaceROWS"
methods for Rle objects actually contains the following comment:

   ## FIXME: Right now, the subscript 'i' is turned into an IRanges
   ## object so we need stuff that lives in the IRanges package for this
   ## to work. This is ugly/hacky and needs to be fixed (thru a redesign
   ## of this method).
   if (!suppressWarnings(require(IRanges, quietly=TRUE)))
     stop(...)
   ...

I introduced this hack last week when I moved the Rle code from IRanges
to S4Vectors. It's temporary. The 2 methods need to be refactored which
I'm planning to do this week.

Cheers,
H.


>
> Limiting imports is unlikely to reduce loading time. It may actually
> increase it. There are good reasons for it though.
>
>
>
> On Tue, Jul 8, 2014 at 5:21 AM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
>
>> Hi Leonardo --
>>
>>
>> On 07/07/2014 03:27 PM, Leonardo Collado Torres wrote:
>>
>>> Hello BioC-devel list,
>>>
>>> I am currently confused on a namespace issue which I haven't been able
>>> to solve. To reproduce this, I made the simplest example I thought of.
>>>
>>>
>>> Step 1: make some toy data and save it on your desktop
>>>
>>> library(IRanges)
>>> DF <- DataFrame(x = Rle(0, 10), y = Rle(1, 10))
>>> save(DF, file="~/Desktop/DF.Rdata")
>>>
>>> Step 2: install the toy package on R 3.1.x
>>>
>>> library(devtools)
>>> install_github("lcolladotor/fooPkg")
>>> # Note that it passes R CMD check
>>>
>>> Step 3: on a new R session run
>>>
>>> example("foo", "fooPkg")
>>> # Change the location of DF.Rdata if necessary
>>>
>>>
>>> You will see that when running the example, the session information is
>>> printed listing:
>>>
>>> other attached packages:
>>> [1] fooPkg_0.0.1
>>>
>>> loaded via a namespace (and not attached):
>>> [1] BiocGenerics_0.11.3 IRanges_1.99.17     parallel_3.1.0
>>> S4Vectors_0.1.0     stats4_3.1.0        tools_3.1.0
>>>
>>>
>>> Then the message for loading IRanges is showed, which is something I
>>> was not expecting and thus the following session info shows:
>>>
>>> other attached packages:
>>> [1] IRanges_1.99.17     S4Vectors_0.1.0     BiocGenerics_0.11.3
>>> fooPkg_0.0.1
>>>
>>> loaded via a namespace (and not attached):
>>> [1] stats4_3.1.0 tools_3.1.0
>>>
>>> Meaning that IRanges, S4Vectors and BiocGenerics all went from "loaded
>>> via a namespace" to "other attached packages".
>>>
>>>
>>>
>>> All the fooPkg::foo() is doing is using a mapply() to go through a
>>> DataFrame and a list of indices to subset the data as shown at
>>> https://github.com/lcolladotor/fooPkg/blob/master/R/foo.R#L26 That is:
>>>
>>> res <- mapply(function(x, y) { x[y] }, DF, index)
>>>
>>> I thus thought that the only thing I would need to specify on the
>>> namespace is to import the '[' IRanges method.
>>>
>>> Checking with BiocCheck and codetoolsBioC suggests importing the
>>> method for mapply() from BiocGenerics. Doing so doesn't affect things
>>> and R still loads IRanges on that mapply() call. Importing the '['
>>> method from S4Vectors doesn't help either. Most intriging, importing
>>> the whole S4Vectors, BiocGenerics and IRanges still doesn't change the
>>> fact that IRanges is loaded when evaluating the same line of code
>>> shown above.
>>>
>>> Any clues on what I am missing or doing wrong?
>>>
>>>
>> This comes from S4Vectors::extractROWS
>>
>>> selectMethod(extractROWS, c("Rle", "integer"))
>> Method Definition:
>>
>> function (x, i)
>> {
>>      if (!suppressWarnings(require(IRanges, quietly = TRUE)))
>>          stop("Couldn't load the IRanges package. You need to install ",
>>              "the IRanges\n  package in order to subset an Rle object.")
>>
>> ...
>>
>> which moves the IRanges package from loaded to attached. Maybe that should
>> be 'suppressPackageStartupMessages' or if (!IRanges %in%
>> loadedNamespaces()) and functions referenced by IRanges:::...
>>
>>
>>
>>
>>
>>>
>>> In my use case, I'm trying to keep the namespace as small as possible
>>> (to minimize loading time) because it's for a tiny package that has a
>>> single function. This tiny package is then loaded on a
>>> BiocParallel::blapply() call using BiocParallel::SnowParam() which
>>> performs much better than BiocParallel::MulticoreParam() in terms of
>>> keeping the memory under control.
>>>
>>
>> probably it is not desirable to move packages from loaded to attached, but
>> I don't think this influences performance in a meaningful way?
>>
>> Martin
>>
>>
>>
>>>
>>>
>>>
>>> Thank you for your help!
>>> Leo
>>>
>>> Leonardo Collado Torres, PhD student
>>> Department of Biostatistics
>>> Johns Hopkins University
>>> Bloomberg School of Public Health
>>> Website: http://www.biostat.jhsph.edu/~lcollado/
>>> Blog: http://lcolladotor.github.io/
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Full output from running the example:
>>>
>>>
>>>
>>>
>>>   example("foo", "fooPkg")
>>>>
>>>
>>> foo> ## Initial info
>>> foo> sessionInfo()
>>> R version 3.1.0 (2014-04-10)
>>> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>>>
>>> locale:
>>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> other attached packages:
>>> [1] fooPkg_0.0.1
>>>
>>> loaded via a namespace (and not attached):
>>> [1] BiocGenerics_0.11.3 IRanges_1.99.17     parallel_3.1.0
>>> S4Vectors_0.1.0     stats4_3.1.0        tools_3.1.0
>>>
>>> foo> ## Load data
>>> foo> load("~/Desktop/DF.Rdata")
>>>
>>> foo> ## Run function
>>> foo> result <- foo(DF)
>>> R version 3.1.0 (2014-04-10)
>>> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>>>
>>> locale:
>>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> other attached packages:
>>> [1] fooPkg_0.0.1
>>>
>>> loaded via a namespace (and not attached):
>>> [1] BiocGenerics_0.11.3 IRanges_1.99.17     parallel_3.1.0
>>> S4Vectors_0.1.0     stats4_3.1.0        tools_3.1.0
>>> Loading required package: parallel
>>>
>>> Attaching package: ‘BiocGenerics’
>>>
>>> The following objects are masked from ‘package:parallel’:
>>>
>>>       clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
>>> clusterExport, clusterMap, parApply, parCapply, parLapply,
>>>       parLapplyLB, parRapply, parSapply, parSapplyLB
>>>
>>> The following object is masked from ‘package:stats’:
>>>
>>>       xtabs
>>>
>>> The following objects are masked from ‘package:base’:
>>>
>>>       anyDuplicated, append, as.data.frame, as.vector, cbind, colnames,
>>> do.call, duplicated, eval, evalq, Filter, Find, get,
>>>       intersect, is.unsorted, lapply, Map, mapply, match, mget, order,
>>> paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
>>>       rbind, Reduce, rep.int, rownames, sapply, setdiff, sort, table,
>>> tapply, union, unique, unlist
>>>
>>> R version 3.1.0 (2014-04-10)
>>> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>>>
>>> locale:
>>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>>
>>> attached base packages:
>>> [1] parallel  stats     graphics  grDevices utils     datasets
>>> methods   base
>>>
>>> other attached packages:
>>> [1] IRanges_1.99.17     S4Vectors_0.1.0     BiocGenerics_0.11.3
>>> fooPkg_0.0.1
>>>
>>> loaded via a namespace (and not attached):
>>> [1] stats4_3.1.0 tools_3.1.0
>>>
>>>>
>>>>
>>>
>>>
>>> The same thing happens with the following setup:
>>>
>>> R version 3.1.1 RC (2014-07-07 r66083)
>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>
>>> locale:
>>>    [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>    [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>    [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>>    [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>    [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] parallel  stats     graphics  grDevices datasets  utils     methods
>>> [8] base
>>>
>>> other attached packages:
>>> [1] IRanges_1.99.17     S4Vectors_0.1.0     BiocGenerics_0.11.3
>>> [4] fooPkg_0.0.1        colorout_1.0-2
>>>
>>> loaded via a namespace (and not attached):
>>> [1] stats4_3.1.1 tools_3.1.1
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>>
>>
>> --
>> Computational Biology / Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N.
>> PO Box 19024 Seattle, WA 98109
>>
>> Location: Arnold Building M1 B861
>> Phone: (206) 667-2793
>>
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
> 	[[alternative HTML version deleted]]
>
>
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list