[Bioc-devel] Query regarding size limit and including external datasets

Kasper Daniel Hansen kasperdanielhansen at gmail.com
Mon Oct 2 03:29:16 CEST 2017


I cannot speak for the core team.

You should separate the data from the software methods and provide a data
package containing the MAFs. This has the additional advantage of
separating versionning of the mutation data from your software. As a data
package this does not sound extensive; the largest dataset is 3.7Mb. There
is a potential privacy problem with sharing mutations, but I don't know at
what level the mutations are described. I assume you have considered this?

Best,
Kasper

On Sun, Oct 1, 2017 at 9:16 PM, Anand MT <anand_mt at hotmail.com> wrote:

> Hi all,
>
> I maintain maftools package which offers multitude of functions to perform
> various analyses and visualization of MAF (Mutation Annotation Format)
> files from cancer cohorts.
>
> In the upcoming bioconductor release, I plan to include all MAFs from 32
> TCGA cohorts as a part of the package. These tcga mafs will be stored as
> MAF objects containing curated somatic mutations along with clinical
> information in the extdata directory and can be loaded via a “tcga_load”
> function.
>
> I think this will help many researchers working with tcga mutation data
> and saves the time and hassle of going through various databases to search
> and assemble. I believe this also helps in reproducible research.
>
> However, size of these MAF objects vary according to the cohorts size and
> mutation burden; with LAML (leukemia) being the smallest (91 kb) and LUAD
> (Lung Adeno Carcinoma) being the largest (3.7 mb). Also including these
> MAFs increases package size to 46 mb (from 7mb without theses datasets).
>
> My question is,
>
>   *   is it okay for a package to be of this size ?
>   *   I haven't tried to push these commits to repository yet, but in case
> git rejects my push due to size limit, is it possible to make an exception,
> given the scenario ?
>
> If this can't be done in any ways or if it breaks any rules of package
> guidelines, I don't mind dropping the idea either.
>
> Thanks.
>
> -Anand.
>
>
>         [[alternative HTML version deleted]]
>
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list