[Bioc-devel] new package: annotate function interaction from Reactome DB
Martin Morgan
martin.morgan at roswellpark.org
Thu Apr 7 11:08:18 CEST 2016
As we exchanged in separate email, I think that the SIZE of this data
does NOT require that you produce an annotation package.
The package guidelines
http://bioconductor.org/developers/package-guidelines/#correctness say
that the package should occupy less than 4MB on disk. Your package has
your-pkg$ du -sh
134M .
The biggest files are
your-pkg$ find -type f -size +1M|xargs ls -sh
52M ./inst/extdata/all_gene_disease_associations.txt
7.4M ./inst/extdata/FIsInGene_121514_with_annotations.txt
5.0M ./inst/extdata/imgs/demoReactomeCmp.gif
17M ./inst/extdata/ListProfData.RData
The file that you wish to make an annotation package is about 7 MB. When
stored as RDS rather than text, it is only 478k. Saving as RDS (or
RData) means that input is also fast.
The largest file is used as a data.frame, and when saved as RDS it is
only 3.5M. It might be that THIS data is a candidate for an annotation
package, but likely the solution here is instead to develop the R script
at http://www.disgenet.org/web/DisGeNET/menu/downloads#r into a function
that returns web service requests as R objects for interactive use (see
https://gist.github.com/mtmorgan/ea10d0d424bf7e414d8e064d903f026d)
The gif can be stored as png and is then 156K.
The ListProfData file seems to contain an environment with function
definitions etc. Probably this file contains much more information than
you intended; it is hard to know what it's actual size can be.
I know your package also contained MathJax, at about 33M on disk. As you
have discovered, it is not necessary to include MathJax.
It seems that by appropriately representing the data, you will have a
package that is close to the guidelines, and at the same time faster
when accessing the data.
It might be argued that the file FIsInGene_121514_with_annotations.txt
is useful in general, and for that reason it should be an annotation
package. But it is so easy and quick to obtain
download.file("http://reactomews.oicr.on.ca:8080/caBigR3WebApp2014/FIsInGene_121514_with_annotations.txt.zip",
tmp <- tempfile())
xx = read.delim(unzip(tmp))
that it doesn't seem to justify the additional package infrastructure.
Finally, for the benefit of other package authors, we also mentioned in
our off-list email the importance of appropriate attribution of data
sources (clearly, in the DESCRIPTION file and / or in man pages
describing the data) and ensuring that your use is consistent with how
the data is licensed (via the License: field in the DESCRIPTION file,
and / or the LICENSE file).
So please, reconsider the need for an annotation package for this data.
Your reviewer recognized that your package was much too large; make the
changes above and it will not be much to large, and so you will not need
to make an annotation package.
Martin
On 04/07/2016 02:19 AM, Karim Mezhoud wrote:
> Dear bioC devel,
> I write an annotate package named reactomeFI to avoid the big size files in
> /extdata folder.
> Finally when compressed the txt file to RDS format I reduce enough the size
> of files.
>
> reactomeFI provides annotation that do not exist in any other package (to
> my knowledge).
> Nor reactome.db nor PSICQUIC provide the arrow direction and the type of
> interaction.
>
> library(reactomeFI)
> dim(ld_reactomeFI(2014))
> [1] 217249 5
>> dim(ld_reactomeFI(2015))
> [1] 229300 5
>> head(ld_reactomeFI(version= 2015))
> Gene1 Gene2 Annotation Direction Score
> 1 16-5-5 CDC42 predicted - 0.82
> 2 16-5-5 RHOJ predicted - 0.82
> 3 16-5-5 RHOQ predicted - 0.82
> 4 <DELTA>FAS/APO-1/CD95 BID activate -> 1.00
> 5 <DELTA>FAS/APO-1/CD95 CASP10 complex - 1.00
> 6 <DELTA>FAS/APO-1/CD95 DAXX complex; reaction - 1.00
>
>> tail(ld_reactomeFI(2015))
> Gene1 Gene2 Annotation Direction Score
> 229295 ZP3 ZP4 complex - 1.00
> 229296 ZPR1 ZYX predicted - 0.59
> 229297 ZW10 ZWILCH complex; input - 1.00
> 229298 ZW10 ZWINT complex; input - 1.00
> 229299 ZWILCH ZWINT complex; input - 1.00
> 229300 ZXDA ZXDC predicted - 0.59
>
> I can add other argument to specify the type of interaction or direction as
>
> ld_reactomeFI(version=2014, type=c(activated, complex),
> direction="arrowhead")
>
> I am ready to submit this package if you consider as new annotate
> information.
> Thank you,
> Karim
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
More information about the Bioc-devel
mailing list