[Bioc-devel] NAMESPACE best practices

Hervé Pagès hp@ge@@on@g|thub @end|ng |rom gm@||@com
Tue May 24 19:43:23 CEST 2022


Hi Alex,

On 24/05/2022 03:56, Alexander Blume wrote:
> Dear All,
>
> I recently took over maintenance of the “fastseg” package (http://bioconductor.org/packages/3.16/bioc/html/fastseg.html) and after fixing the issues recommended by `R CMD Check` I wanted to optimize the package's NAMESPACE file and the Depends/Imports given in the DESCRIPTION file.
>
> Replacing the generic complete `import` of dependent packages with more fine-grained `importFrom` calls is rather obvious.
> However, I was wondering if there are any reasons that speak against doing so?

In my experience doing selective imports for core packages like methods, 
BiocGenerics, S4Vectors, IRanges, and GenomicRanges, is almost never 
worth it. It's just one more maintenance burden for virtually zero benefits.

However, the following 'R CMD check' NOTES:

     Namespace in Imports field not imported from: ‘stats’

and

     Consider adding
       importFrom("grDevices", "dev.cur", "dev.interactive", "dev.new")

reveal real problems that should be addressed.

>
> Concerning the DESCRIPTION file, given that the used functions were already specified in the NAMESPACE I was planning to edit the DESCRIPTION file and move the “GenomicRanges” and “Biobase” dependencies from Depends to Imports.
> In the package, the Biobase functions are used to query supported ExpressionSet objects, while GenomicRanges is used to support Granges objects and create the final output as Granges object.
> Is it legit to have GenomicRanges “only" as Imports, even if the main function's output is in GRanges format?

The consequence of moving GenomicRanges from Depends to Imports is that 
the basic GRanges functionalities would no longer be available to your 
users so it would feel like you're returning objects that "don't work". 
Unfortunately I see many Bioconductor packages doing similar things e.g. 
some packages return SummarizedExperiment derivatives but don't depend 
on the SummarizedExperiment package (they only import it). As a 
consequence basic things like assay() or colData() don't work on the object.

Here is a concrete example:

   library(AUCell)
   exprMatrix <- cbind(cell1=100*4:0, cell2=c(500, 0, 90, 0, 750))
   rownames(exprMatrix) <- sprintf("gene%02d", seq_len(nrow(exprMatrix)))
   rankings <- AUCell_buildRankings(exprMatrix, plotStats=FALSE, 
verbose=FALSE)  # a SummarizedExperiment derivative

   assay(rankings)
   # Error in assay(rankings) : could not find function "assay"

   colData(rankings)
   # Error in colData(rankings) : could not find function "colData"

   library(SummarizedExperiment)
   assay(rankings)
   #           cells
   #   genes    cell1 cell2
   #     gene01     1     2
   #     gene02     2     4
   #     gene03     3     3
   #     gene04     4     5
   #     gene05     5     1

>
> I want to keep the “Depends” field as small as possible to not pollute downstream packages to attach everything and mask other functions.

Keeping Depends as small as possible is definitely something to aim for, 
as long as your users can still "operate" on the objects that you expose 
to them. For example your users should not need to guess what package to 
load before they can use the accessor functions defined for the object 
your returned to them.

>   Is this reasonable, or should I just import “GenomicRanges” plus all required packages from the beginning and live with it? I hope there are some general guidelines to follow.

Definitely keep GenomicRanges in Depends.

Cheers,

H.


>
> Best
> Alex
>
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

-- 
Hervé Pagès

Bioconductor Core Team
hpages.on.github using gmail.com



More information about the Bioc-devel mailing list