[Bioc-devel] new package: annotate function interaction from Reactome DB

Martin Morgan martin.morgan at roswellpark.org
Thu Apr 7 11:08:18 CEST 2016


As we exchanged in separate email, I think that the SIZE of this data 
does NOT require that you produce an annotation package.

The package guidelines 
http://bioconductor.org/developers/package-guidelines/#correctness say 
that the package should occupy less than 4MB on disk. Your package has

your-pkg$ du -sh
134M	.

The biggest files are

your-pkg$ find -type f -size +1M|xargs ls -sh
52M ./inst/extdata/all_gene_disease_associations.txt
7.4M ./inst/extdata/FIsInGene_121514_with_annotations.txt
5.0M ./inst/extdata/imgs/demoReactomeCmp.gif
17M ./inst/extdata/ListProfData.RData

The file that you wish to make an annotation package is about 7 MB. When 
stored as RDS rather than text, it is only 478k. Saving as RDS (or 
RData) means that input is also fast.

The largest file is used as a data.frame, and when saved as RDS it is 
only 3.5M. It might be that THIS data is a candidate for an annotation 
package, but likely the solution here is instead to develop the R script 
at http://www.disgenet.org/web/DisGeNET/menu/downloads#r into a function 
that returns web service requests as R objects for interactive use (see 
https://gist.github.com/mtmorgan/ea10d0d424bf7e414d8e064d903f026d)

The gif can be stored as png and is then 156K.

The ListProfData file seems to contain an environment with function 
definitions etc. Probably this file contains much more information than 
you intended; it is hard to know what it's actual size can be.

I know your package also contained MathJax, at about 33M on disk. As you 
have discovered, it is not necessary to include MathJax.

It seems that by appropriately representing the data, you will have a 
package that is close to the guidelines, and at the same time faster 
when accessing the data.


It might be argued that the file FIsInGene_121514_with_annotations.txt 
is useful in general, and for that reason it should be an annotation 
package. But it is so easy and quick to obtain

 
download.file("http://reactomews.oicr.on.ca:8080/caBigR3WebApp2014/FIsInGene_121514_with_annotations.txt.zip", 
tmp <- tempfile())
   xx = read.delim(unzip(tmp))

that it doesn't seem to justify the additional package infrastructure.


Finally, for the benefit of other package authors, we also mentioned in 
our off-list email the importance of appropriate attribution of data 
sources (clearly, in the DESCRIPTION file and / or in man pages 
describing the data) and ensuring that your use is consistent with how 
the data is licensed (via the License: field in the DESCRIPTION file, 
and / or the LICENSE file).


So please, reconsider the need for an annotation package for this data. 
Your reviewer recognized that your package was much too large; make the 
changes above and it will not be much to large, and so you will not need 
to make an annotation package.

Martin


On 04/07/2016 02:19 AM, Karim Mezhoud wrote:
> Dear bioC devel,
> I write an annotate package named reactomeFI to avoid the big size files in
> /extdata folder.
> Finally when compressed the txt file to RDS format I reduce enough the size
> of files.
>
> reactomeFI provides annotation that do not exist in any other package (to
> my knowledge).
> Nor reactome.db nor PSICQUIC provide the arrow direction and the type of
> interaction.
>
> library(reactomeFI)
> dim(ld_reactomeFI(2014))
> [1] 217249      5
>> dim(ld_reactomeFI(2015))
> [1] 229300      5
>> head(ld_reactomeFI(version= 2015))
>                    Gene1  Gene2        Annotation Direction Score
> 1                16-5-5  CDC42         predicted         -  0.82
> 2                16-5-5   RHOJ         predicted         -  0.82
> 3                16-5-5   RHOQ         predicted         -  0.82
> 4 <DELTA>FAS/APO-1/CD95    BID          activate        ->  1.00
> 5 <DELTA>FAS/APO-1/CD95 CASP10           complex         -  1.00
> 6 <DELTA>FAS/APO-1/CD95   DAXX complex; reaction         -  1.00
>
>> tail(ld_reactomeFI(2015))
>          Gene1  Gene2     Annotation Direction Score
> 229295    ZP3    ZP4        complex         -  1.00
> 229296   ZPR1    ZYX      predicted         -  0.59
> 229297   ZW10 ZWILCH complex; input         -  1.00
> 229298   ZW10  ZWINT complex; input         -  1.00
> 229299 ZWILCH  ZWINT complex; input         -  1.00
> 229300   ZXDA   ZXDC      predicted         -  0.59
>
> I can add other argument to specify the type of interaction or direction as
>
> ld_reactomeFI(version=2014, type=c(activated, complex),
> direction="arrowhead")
>
> I am ready to submit this package if you consider as new annotate
> information.
> Thank you,
> Karim
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>


This email message may contain legally privileged and/or confidential information.  If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited.  If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.



More information about the Bioc-devel mailing list