[Bioc-devel] Documentation on how to update a BioC experimental package?

Vincent Carey stvjc at channing.harvard.edu
Fri Oct 24 04:45:13 CEST 2014

There is a README.txt in the pkgs folder.

I will attach it.  I think this is accurate, but there may be something
else on the site.

On Thu, Oct 23, 2014 at 10:23 PM, Henrik Bengtsson <hb at biostat.ucsf.edu>

> It's been a while since I worked with experimental packages.  Where
> can I find documentation on how to (Subversion) update our
> AffymetrixDataTestFiles package with additional data files?  All I
> know is that the SVN repository only contains a stub of the package
> and
> http://www.bioconductor.org/developers/package-guidelines/#package-types
> provides little information and basically only point to the devel
> mailing list.
> Thanks,
> Henrik
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
-------------- next part --------------
 BioC Experiment Data Package SVN Repos

:Date: 2006-08-10
:Author: S. Falcon
:svn URL: https://hedgehog.fhcrc.org/bioc-data/trunk/experiment/pkgs


This svn directory contains BioC experiment data packages.  Data
packages contain potentially large binary files that do not change
often.  Most updates to these packages involve the package
infrastructure files.

Obtaining a working copy of a data package over a slow connection can
be frustrating, especially when all that is needed is the
infrastructure files and not the actual data.  We have implemented a
scheme that allows separate checkout of infrastructure files and data
files.  This document describes the scheme and provides instructions
for checkout and update of existing data packages as well as for
adding new packages.

How to Create an Infrastructure-Only Workingcopy

You can obtain a checkout of all experiment data package
infrastructure files as follows::

    svn checkout \

To obtain the files for a particular package, say ``ALL``::

    svn checkout \

If you want to preview what is available, you might try the

    # get the top-level scripts, but don't recurse into subdirs        
    svn checkout -N \

    # see what is there
    svn ls https://hedgehog.fhcrc.org/bioc-data/trunk/experiment/pkgs

    # get a particular package's infrastructure files
    cd pkgs
    svn up ALL
    # see next section for getting complete working copy w/ data

How to Create a Complete Workingcopy

First create a workingcopy of the infrastructure files as described

Next use the helper script ``add_data.py`` (you will need Python).  It
is located here:


Here's a complete example for ``ALL``::

    svn checkout \

    python add_data.py ALL

This will add the big data directories (usually data/, but sometimes
also dirs under inst/) to your working copy.  Usually, the svn:ignore
property has been set so that you won't accidentally add these dirs
when working with the package, but please take care anyway.

A note about committing changes to the data

If you want to modify the actual data, cd into the appropriate dir
after having run add_data.py and do your commit from there.  The
script adds a full working copy inside the infrastructure working

How to Add a New Data Package

1. Add the infrastructure files under ``pkgs``.

2. Add any large data directories to ../data_store/PKGNAME/.  For
   example, if there is large data in PKGNAME/data and
   PKGNAME/inst/extdata, you would add PKGNAME/data and
   PKGNAME/inst/extdata to ../data_store.

3. Create a file 'external_data_store.txt' listing each dir that is
   stored externally (each on a separate line).  Contining the example
   above, the file would contain::


   This should go in the top-level of the package dir.

4. Add svn ignore properties.  Continuing the example::

       cd PKGNAME
       svn propset svn:ignore '*' ./data/ ## property 'svn:ignore' set on '.'
       	   	   	      	  	  ## in the data folder 
       svn propset svn:ignore '*' ./inst/

       or (this might not work anymore)
       svn propedit svn:ignore .   ## add 'data' here
       svn propedit svn:ignore inst  ## add 'extdata' here

5. Commit.

Details of Storage Scheme

Experiment data package infrastructure files live in
``experiment/pkgs``.  Package subdirectories that contain large files
are stored under ``experiment/data_store``.  There is no mechanism to
support separate storage of individual files.

Here is an example of how data for the ``davidTiling`` package is



The ``davidTiling`` package contains large data in the ``data/``,
``inst/celfiles/``, and ``inst/website/`` subdirectories.  As you can
see, each of these is stored separately from the package
infrastructure files.  The file ``external_data_store.txt`` lists the
location of the externally stored data.  Here is the contents for


To create a complete directory containing both infrastructure and
data, one first does a checkout of the infrastructure and then does a
checkout of each individual externally stored subdir.  This can be
done inside the infrastructure working copy.  There is a helper script
to automate the required svn commands.  One option that might be worth
adding to the script is to do an export instead of checkout.
Additionally, the ``svn:ignore`` property has been set in the
infrastructure dir to help prevent folks from accidentally adding the
external data to the infrastructure dir itself.

More information about the Bioc-devel mailing list