[Bioc-devel] Moving minfi classes definition to a lighter package

Robert Castelo robert@c@@te|o @end|ng |rom up|@edu
Wed Mar 3 14:42:40 CET 2021


hi,

about a year ago we had a developer's forum session devoted to this 
subject, you might find useful the discussion we had starting on minute 
29th here:

https://www.youtube.com/watch?v=xsM4nN85cok

part of the result of that discussion is in section 7 of this vignette:

http://bioconductor.org/packages/release/bioc/vignettes/BiocPkgTools/inst/doc/BiocPkgTools.html#dependency-burden

which illustrates how to calculate some metrics on the dependency burden 
of a package using functionality we implemented in the package 
BiocPkgTools, in the case of minfi, this is the output:

library(BiocPkgTools)
depdf <- buildPkgDependencyDataFrame(repo=c("BioCsoft", "CRAN"),
dependencies=c("Depends", "Imports"))
minfidepmetrics <- pkgDepMetrics("minfi", depdf)
minfidepmetrics
                      ImportedAndUsed Exported  Usage DepOverlap 
DepGainIfExcluded
DelayedArray                       1      188 0.53       
0.11                 0
grDevices                          1      112 0.89       
0.01                 0
data.table                         1      100 1.00       
0.01                 1
MASS                               1       78 1.28       
0.04                 0
limma                              4      310 1.29       
0.04                 0
reshape                            1       67 1.49       
0.03                 2
nlme                               2      109 1.83       
0.05                 1
utils                              4      216 1.85       
0.01                 0
lattice                            3      144 2.08       
0.05                 0
BiocGenerics                       5      141 3.55       
0.04                 0
stats                             16      449 3.56       
0.01                 0
siggenes                           2       51 3.92       
0.13                 3
genefilter                         2       49 4.08       
0.38                 3
Biobase                            6      128 4.69       
0.05                 0
GenomeInfoDb                       3       60 5.00       
0.09                 0
preprocessCore                     2       39 5.13       
0.02                 1
GEOquery                           1       17 5.88       
0.32                 4
HDF5Array                          5       72 6.94       
0.15                 4
bumphunter                         1       14 7.14       
0.76                25
BiocParallel                       6       68 8.82       
0.07                 0
Biostrings                        23      240 9.58       
0.11                 0
graphics                           9       87 10.34       
0.01                 0
IRanges                           40      254 15.75       
0.06                 0
S4Vectors                         47      278 16.91       
0.05                 0
DelayedMatrixStats                14       74 18.92       
0.14                 2
GenomicRanges                     23      106 21.70       
0.12                 0
RColorBrewer                       1        4 25.00       
0.01                 1
SummarizedExperiment              23       82 28.05       
0.19                 0
illuminaio                         1        3 33.33       
0.04                 2
quadprog                           1        2 50.00       
0.01                 1
beanplot                           1        1 100.00       
0.01                 1
mclust                            NA      271 NA       
0.04                 1
nor1mix                           NA       38 NA       
0.02                 1

so, with the exception of 'bumphunter', it doesn't look like the removal 
of a single dependency will give you much gain. it seems that minfi 
imports a single functionality from bumphunter:

imp <- pkgDepImports("minfi")
imp[imp$pkg %in% "bumphunter", ]
# A tibble: 1 x 2
   pkg        fun
   <chr>      <chr>
1 bumphunter bumphunter

you can explore the gain by excluding combinations of package 
dependencies with the function 'pkgCombDependencyGain()':

pcd <- pkgCombDependencyGain("minfi", depdf, maxNbr=2L)
dim(pcd)
[1] 561   3
head(pcd[order(pcd$DepGain, decreasing = TRUE), ])
                           Packages NbrExcl DepGain
160           bumphunter, GEOquery       2      43
175         bumphunter, genefilter       2      40
98        BiocParallel, bumphunter       2      31
161          bumphunter, HDF5Array       2      29
165           bumphunter, siggenes       2      28
157 bumphunter, DelayedMatrixStats       2      27

have fun with the dependency exploration game! :)

robert.

On 3/3/21 1:28 PM, Kasper Daniel Hansen wrote:
> I am happy to engage in a discussion about this, although I'm not sure that
> I am ultimately interested in having two packages.
>
> But first I would like to look at some dependency graphs. I am wondering
> what makes the dependency tree this big (and my tree is smaller than yours,
> but still big: library(minfi) gives me 16 attached packages and 89 loaded
> packages for the current release). This includes some part of the tidyverse
> which we don't really use much though (and which could probably get removed
> from the package with almost no work).
>
> What's the current best tool for dependency graphs in Bioconductor?
> pkgDepTools?
>
> Best,
> Kasper
>
> On Mon, Mar 1, 2021 at 6:24 PM Carlos Ruiz <carlos.ruiz using isglobal.org> wrote:
>
>> Dear Bioc developers,
>>
>> I have been developing different packages to analyze DNA methylation. In
>> all of them, I have used minfi's class GenomicRatioSet to manage DNA
>> methylation data, in order to take profit of the features of
>> RangedSummarizedExperiment.
>>
>> Although I am very happy with the potential of the class, importing its
>> definition from minfi, makes me add the package to imports. As minfi has a
>> high number of dependencies (129 in the current release), my packages end
>> up having hundreds of dependencies too. This is particularly problematic as
>> I do not use any of the other functions of minfi.
>>
>> I am wondering whether it could be possible to move minfi's class (or at
>> least GenomicRatioSet) to a lighter package, so people developing packages
>> on DNA methylation could rely on this class without having to import the
>> whole minfi package and its dependencies.
>>
>> Thank you very much,
>> --
>>
>> Carlos Ruiz
>>
>> --
>>
>>
>> This message is intended exclusively for its addressee and may contain
>> information that is CONFIDENTIAL and protected by professional privilege.
>> If
>> you are not the intended recipient you are hereby notified that any
>> dissemination, copy or disclosure of this communication is strictly
>> prohibited
>> by law. If this message has been received in error, please
>> immediately notify
>> us via e-mail and delete it.
>>
>>
>>
>> DATA PROTECTION. We
>> inform you that your personal data, including your
>> e-mail address and data
>> included in your email correspondence, are included in
>> the ISGlobal
>> Foundation files. Your personal data will be used for the purpose
>> of
>> contacting you and sending information on the activities of the above
>> foundations. You can exercise your rights of access, rectification,
>> cancellation and opposition by contacting the following address:
>> lopd using isglobal.org <mailto:lopd using isglobal.org>. ISGlobal
>> Privacy Policy at
>> www.isglobal.org <http://www.isglobal.org/>.
>>
>>
>>
>>
>> -----------------------------------------------------------------------------------------------------------------------------
>>
>> CONFIDENCIALIDAD. Este mensaje y sus anexos se dirigen exclusivamente a
>> su
>> destinatario y puede contener información confidencial, por lo que la
>> utilización,
>> divulgación y/o copia sin autorización está prohibida por la
>> legislación
>> vigente. Si ha recibido este mensaje por error, le rogamos lo
>> comunique
>> inmediatamente por esta misma vía y proceda a su destrucción.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> PROTECCIÓN DE DATOS. Sus datos de carácter personal utilizados en este
>> envío, incluida su dirección de e-mail, forman parte de ficheros de
>> titularidad
>> de la Fundación ISGlobal  para cualquier
>> finalidades de
>> contacto, relación institucional y/o envío de información sobre
>> sus
>> actividades. Los datos que usted nos pueda facilitar contestando este
>> correo quedarán incorporados en los correspondientes ficheros, autorizando
>> el
>> uso de su dirección de e-mail para las finalidades citadas. Puede
>> ejercer los
>> derechos de acceso, rectificación, cancelación y oposición
>> dirigiéndose a lopd using isglobal.org <mailto:lopd using isglobal.org>* *. Política
>> de
>> privacidad
>> en www.isglobal.org <http://www.isglobal.org/>.
>>
>>          [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
-- 
Robert Castelo, PhD
Associate Professor
Dept. of Experimental and Health Sciences
Universitat Pompeu Fabra (UPF)
Barcelona Biomedical Research Park (PRBB)
Dr Aiguader 88
E-08003 Barcelona, Spain
telf: +34.933.160.514
fax: +34.933.160.550



More information about the Bioc-devel mailing list