[Bioc-devel] Help with creating first Bioconductor package

Martin Morgan mtmorgan at fredhutch.org
Fri Nov 14 23:28:40 CET 2014


On 11/14/2014 01:11 PM, January Weiner wrote:
> Dear all,
>
> thanks for your input, it was very helpful. I have some other specific
> questions, though:
>
> Martin
>> You'll want to develop your package on Bioc- and R-devel, as this
>> is the environment in which your package will be introduced to
>> the Bioc community.
>
> Specifically, I won't want to; I will have to. This is the last
> obstacle and I am aware of it. There is no way that I do my research
> on development version of R (not only for scientific reasons,
> unfortunately), so I need two versions running concurrently. There are
> means and ways to do it (I guess from the fact that it all runs on svn
> and that one can set up scripts setting environmental variables; there
> is no real guide on that, am I right?), but from my experience, for
> someone who is not a full time developer it will be horrible, and
> keeping it up to date -- without automata and apt-get -- will sooner
> or later lead to a disaster.

It's not horrible, no. On Linux I do

First time, R-devel:

mkdir -p ~/src/R-devel
cd ~/src/R-devel
svn co https://svn.r-project.org/R/trunk
tools/rsync-recommended
cd ~/

mkdir -p ~/bin/R-devel
cd ~/bin/R-devel
~/src/R-devel/configure && make -j

mkdir -p ~/R/x86_64-unknown-linux-gnu-library/devel/

To update:

cd ~/src/R-devel && svn up && tools/rysnc-recommended
cd ~/bin/R-devel && make -j

To use

R_LIBS_USER=~/R/x86_64-unknown-linux-gnu-library/devel 
/home/mtmorgan/bin/R-devel/bin/R

The later is usually abbreviated by editing ~/.bash_alias

alias Rdev='R_LIBS_USER=/home/jweiner/R/x86_64-unknown-linux-gnu-library/devel 
/home/jweiner/bin/R-devel/bin/R'

so that you can say Rdev when you want to use R-devel. Always use biocLite() to 
install packages, from CRAN or Bioc, and you're fine. Do the same for the 
current release branch with the svn url 
https://svn.r-project.org/R/branches/R-3-1-branch.

There are many other ways to skin this cat (perhaps why there is no definitive 
guide), likely we'll hear some of them...

>
> Julian:
>> If you want to check it beforehand, have a look at
>> e.g. http://win-builder.r-project.org/.
>
> I use it regularly to check my CRAN packages (pca3d, riverplot and
> tagcloud), but I assumed that it does not have org.Hs.eg.db and GO.db
> which I need for my vignette.
>
> True, most likely there will not be any problems -- but I have had at
> least once troubles with a package that did not build correctly on
> Windows only (well, it did include C code).
>
> Tim Triche, Jr:
>> "S3 objects as far as the eye can see"
>
> Is using S3 a problem? For simple things like to overload a few
> standard functions like plot and print? (Also, as a I user, I much
> prefer limma's EList than anything that was even lying next to an
> ExpressionSet; but then, I like Perl much more than Python).

EList is an S4 class (that's a technically true statement, but it extends 'list' 
so doesn't benefit from, e.g., type checking).

Many packages should NOT implement classes of their own, but rather re-use 
existing classes to make it easier for the user to integrate the package into 
their work flow. Many useful existing classes in Bioconductor are S4 classes, so...

If you do implement a new class, then most likely it should _extend_ an existing 
class, so see the previous point.

Likely you want to contribute your package to Bioconductor because you'd like to 
interoperate with other Bioconductor packages (else why not contribute to CRAN, 
or point prospective users to github, or...). Here it pays to play well with the 
other packages you want to work with, so using the same objects they work with. 
In the sequencing realm, this is almost always a GRanges-related class, e.g, 
GRanges, GRangesList, SummarizedExperiment, GAlignments, VCF. In microarrays, 
and whatever your own prejudices are, you'll likely want to support working with 
an ExpressionSet.

If you implement something completely novel, then the difference between 
implementing it in S3 and in S4 is not that large, the 'user experience' should 
be almost identical, and the advantages of using a formal class system become 
apparent as the complexity of the software grows. It's possible to take short 
cuts in creating  both types of classes, e.g., expecting the user to access list 
elements directly in S3 or using slot access in S4 rather than providing an 
accessor (often a plain-old-function, for both S3 and S4), but that undermines 
the S4 benefits of reliable data representation for robust software (which you 
value, based on your reluctance to use development versions of software for 
scientific work).

Martin

>
> Nathaniel Hayden:
>> Yihui's formatR ( http://cran.r-project.org/web/packages/formatR/index.html ) makes formatting R files so simple and painless
>
> That is precisely the package I had in mind when I said that it would
> be annoying -- I'd still need to switch (and worse, *remember to
> switch*) between the two formatting styles whenever I was to submit a
> package :-)
>
> Kind regards,
>
> j.
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioc-devel mailing list