[Bioc-devel] Query regarding size limit and including external datasets

Anand MT anand_mt at hotmail.com
Mon Oct 2 03:16:21 CEST 2017

Hi all,

I maintain maftools package which offers multitude of functions to perform various analyses and visualization of MAF (Mutation Annotation Format) files from cancer cohorts.

In the upcoming bioconductor release, I plan to include all MAFs from 32 TCGA cohorts as a part of the package. These tcga mafs will be stored as MAF objects containing curated somatic mutations along with clinical information in the extdata directory and can be loaded via a “tcga_load” function.

I think this will help many researchers working with tcga mutation data and saves the time and hassle of going through various databases to search and assemble. I believe this also helps in reproducible research.

However, size of these MAF objects vary according to the cohorts size and mutation burden; with LAML (leukemia) being the smallest (91 kb) and LUAD (Lung Adeno Carcinoma) being the largest (3.7 mb). Also including these MAFs increases package size to 46 mb (from 7mb without theses datasets).

My question is,

  *   is it okay for a package to be of this size ?
  *   I haven't tried to push these commits to repository yet, but in case git rejects my push due to size limit, is it possible to make an exception, given the scenario ?

If this can't be done in any ways or if it breaks any rules of package guidelines, I don't mind dropping the idea either.



	[[alternative HTML version deleted]]

More information about the Bioc-devel mailing list