[Bioc-devel] Moving minfi classes definition to a lighter package

顾祖光 jokergoo @end|ng |rom gm@||@com
Thu Mar 4 10:29:08 CET 2021


Hi all,

I have a package `pkgndep` (
https://cran.r-project.org/web/packages/pkgndep/index.html) which can make
a dependency heatmap for a specific package. As shown in the following
figure for minfi package (version 1.30.0)

https://jokergoo.github.io/pkgndep/stat/image/minfi.png

Rows are the packages minfi depends on, grouped by whether they are in
Depends, Imports or Suggests fields,
and columns are the namespaces each package brings to the R session. For
this version of minfi (1.30.0), loading
minfi will bring 89 namespaces and if also loading the packages in
Suggests, there will be 117 namespaces loaded.

If we only look at the dependency for Depends and Imports, actually there
are three packages: bumphunter, GEOquery
and genefilter which are the major reasons for the heavy dependency of
minfi. If there is a way to move these three
packages to Suggests, the dependency will be reduced to 30+.

Best,
Zuguang


On Wed, 3 Mar 2021 at 14:43, Robert Castelo <robert.castelo using upf.edu> wrote:

> hi,
>
> about a year ago we had a developer's forum session devoted to this
> subject, you might find useful the discussion we had starting on minute
> 29th here:
>
> https://www.youtube.com/watch?v=xsM4nN85cok
>
> part of the result of that discussion is in section 7 of this vignette:
>
>
> http://bioconductor.org/packages/release/bioc/vignettes/BiocPkgTools/inst/doc/BiocPkgTools.html#dependency-burden
>
> which illustrates how to calculate some metrics on the dependency burden
> of a package using functionality we implemented in the package
> BiocPkgTools, in the case of minfi, this is the output:
>
> library(BiocPkgTools)
> depdf <- buildPkgDependencyDataFrame(repo=c("BioCsoft", "CRAN"),
> dependencies=c("Depends", "Imports"))
> minfidepmetrics <- pkgDepMetrics("minfi", depdf)
> minfidepmetrics
>                       ImportedAndUsed Exported  Usage DepOverlap
> DepGainIfExcluded
> DelayedArray                       1      188 0.53
> 0.11                 0
> grDevices                          1      112 0.89
> 0.01                 0
> data.table                         1      100 1.00
> 0.01                 1
> MASS                               1       78 1.28
> 0.04                 0
> limma                              4      310 1.29
> 0.04                 0
> reshape                            1       67 1.49
> 0.03                 2
> nlme                               2      109 1.83
> 0.05                 1
> utils                              4      216 1.85
> 0.01                 0
> lattice                            3      144 2.08
> 0.05                 0
> BiocGenerics                       5      141 3.55
> 0.04                 0
> stats                             16      449 3.56
> 0.01                 0
> siggenes                           2       51 3.92
> 0.13                 3
> genefilter                         2       49 4.08
> 0.38                 3
> Biobase                            6      128 4.69
> 0.05                 0
> GenomeInfoDb                       3       60 5.00
> 0.09                 0
> preprocessCore                     2       39 5.13
> 0.02                 1
> GEOquery                           1       17 5.88
> 0.32                 4
> HDF5Array                          5       72 6.94
> 0.15                 4
> bumphunter                         1       14 7.14
> 0.76                25
> BiocParallel                       6       68 8.82
> 0.07                 0
> Biostrings                        23      240 9.58
> 0.11                 0
> graphics                           9       87 10.34
> 0.01                 0
> IRanges                           40      254 15.75
> 0.06                 0
> S4Vectors                         47      278 16.91
> 0.05                 0
> DelayedMatrixStats                14       74 18.92
> 0.14                 2
> GenomicRanges                     23      106 21.70
> 0.12                 0
> RColorBrewer                       1        4 25.00
> 0.01                 1
> SummarizedExperiment              23       82 28.05
> 0.19                 0
> illuminaio                         1        3 33.33
> 0.04                 2
> quadprog                           1        2 50.00
> 0.01                 1
> beanplot                           1        1 100.00
> 0.01                 1
> mclust                            NA      271 NA
> 0.04                 1
> nor1mix                           NA       38 NA
> 0.02                 1
>
> so, with the exception of 'bumphunter', it doesn't look like the removal
> of a single dependency will give you much gain. it seems that minfi
> imports a single functionality from bumphunter:
>
> imp <- pkgDepImports("minfi")
> imp[imp$pkg %in% "bumphunter", ]
> # A tibble: 1 x 2
>    pkg        fun
>    <chr>      <chr>
> 1 bumphunter bumphunter
>
> you can explore the gain by excluding combinations of package
> dependencies with the function 'pkgCombDependencyGain()':
>
> pcd <- pkgCombDependencyGain("minfi", depdf, maxNbr=2L)
> dim(pcd)
> [1] 561   3
> head(pcd[order(pcd$DepGain, decreasing = TRUE), ])
>                            Packages NbrExcl DepGain
> 160           bumphunter, GEOquery       2      43
> 175         bumphunter, genefilter       2      40
> 98        BiocParallel, bumphunter       2      31
> 161          bumphunter, HDF5Array       2      29
> 165           bumphunter, siggenes       2      28
> 157 bumphunter, DelayedMatrixStats       2      27
>
> have fun with the dependency exploration game! :)
>
> robert.
>
> On 3/3/21 1:28 PM, Kasper Daniel Hansen wrote:
> > I am happy to engage in a discussion about this, although I'm not sure
> that
> > I am ultimately interested in having two packages.
> >
> > But first I would like to look at some dependency graphs. I am wondering
> > what makes the dependency tree this big (and my tree is smaller than
> yours,
> > but still big: library(minfi) gives me 16 attached packages and 89 loaded
> > packages for the current release). This includes some part of the
> tidyverse
> > which we don't really use much though (and which could probably get
> removed
> > from the package with almost no work).
> >
> > What's the current best tool for dependency graphs in Bioconductor?
> > pkgDepTools?
> >
> > Best,
> > Kasper
> >
> > On Mon, Mar 1, 2021 at 6:24 PM Carlos Ruiz <carlos.ruiz using isglobal.org>
> wrote:
> >
> >> Dear Bioc developers,
> >>
> >> I have been developing different packages to analyze DNA methylation. In
> >> all of them, I have used minfi's class GenomicRatioSet to manage DNA
> >> methylation data, in order to take profit of the features of
> >> RangedSummarizedExperiment.
> >>
> >> Although I am very happy with the potential of the class, importing its
> >> definition from minfi, makes me add the package to imports. As minfi
> has a
> >> high number of dependencies (129 in the current release), my packages
> end
> >> up having hundreds of dependencies too. This is particularly
> problematic as
> >> I do not use any of the other functions of minfi.
> >>
> >> I am wondering whether it could be possible to move minfi's class (or at
> >> least GenomicRatioSet) to a lighter package, so people developing
> packages
> >> on DNA methylation could rely on this class without having to import the
> >> whole minfi package and its dependencies.
> >>
> >> Thank you very much,
> >> --
> >>
> >> Carlos Ruiz
> >>
> >> --
> >>
> >>
> >> This message is intended exclusively for its addressee and may contain
> >> information that is CONFIDENTIAL and protected by professional
> privilege.
> >> If
> >> you are not the intended recipient you are hereby notified that any
> >> dissemination, copy or disclosure of this communication is strictly
> >> prohibited
> >> by law. If this message has been received in error, please
> >> immediately notify
> >> us via e-mail and delete it.
> >>
> >>
> >>
> >> DATA PROTECTION. We
> >> inform you that your personal data, including your
> >> e-mail address and data
> >> included in your email correspondence, are included in
> >> the ISGlobal
> >> Foundation files. Your personal data will be used for the purpose
> >> of
> >> contacting you and sending information on the activities of the above
> >> foundations. You can exercise your rights of access, rectification,
> >> cancellation and opposition by contacting the following address:
> >> lopd using isglobal.org <mailto:lopd using isglobal.org>. ISGlobal
> >> Privacy Policy at
> >> www.isglobal.org <http://www.isglobal.org/>.
> >>
> >>
> >>
> >>
> >>
> -----------------------------------------------------------------------------------------------------------------------------
> >>
> >> CONFIDENCIALIDAD. Este mensaje y sus anexos se dirigen exclusivamente a
> >> su
> >> destinatario y puede contener información confidencial, por lo que la
> >> utilización,
> >> divulgación y/o copia sin autorización está prohibida por la
> >> legislación
> >> vigente. Si ha recibido este mensaje por error, le rogamos lo
> >> comunique
> >> inmediatamente por esta misma vía y proceda a su destrucción.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> PROTECCIÓN DE DATOS. Sus datos de carácter personal utilizados en este
> >> envío, incluida su dirección de e-mail, forman parte de ficheros de
> >> titularidad
> >> de la Fundación ISGlobal  para cualquier
> >> finalidades de
> >> contacto, relación institucional y/o envío de información sobre
> >> sus
> >> actividades. Los datos que usted nos pueda facilitar contestando este
> >> correo quedarán incorporados en los correspondientes ficheros,
> autorizando
> >> el
> >> uso de su dirección de e-mail para las finalidades citadas. Puede
> >> ejercer los
> >> derechos de acceso, rectificación, cancelación y oposición
> >> dirigiéndose a lopd using isglobal.org <mailto:lopd using isglobal.org>* *.
> Política
> >> de
> >> privacidad
> >> en www.isglobal.org <http://www.isglobal.org/>.
> >>
> >>          [[alternative HTML version deleted]]
> >>
> >> _______________________________________________
> >> Bioc-devel using r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>
> >
> --
> Robert Castelo, PhD
> Associate Professor
> Dept. of Experimental and Health Sciences
> Universitat Pompeu Fabra (UPF)
> Barcelona Biomedical Research Park (PRBB)
> Dr Aiguader 88
> E-08003 Barcelona, Spain
> telf: +34.933.160.514
> fax: +34.933.160.550
>
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list