[R] Version control and R package development (was: plot graph with error bars trouble)

Mon Oct 1 23:13:59 CEST 2007

First a quick summary of the beginning of an interesting discussion  
(not all indented correctly but in the correct order at least):

> From: "Gabor Grothendieck" <ggrothendieck at gmail.com>
> On 9/30/07, hadley wickham <h.wickham at gmail.com> wrote:
>> On 9/30/07, jiho <jo.irisson at gmail.com> wrote:
>>> [...]
>>> BTW, have you thought about opening ggplot2 development (provide a
>>> way to check out the dev code and have the possibility to submit
>>> patches at least) or do you prefer to keep it a personal project for
>>> now? [...]
>>
>> It's something I have thought a little bit about, but I haven't made
>> much progress. Ideally, if it's something that I do for ggplot2, I
>> should do it for all my other R packages too.  I have thought about
>> setting up google code projects for each package, which would also
>> provide a nice set of bugtracking tools.  I've cc'd Gabor on this
>> email in the hope that he might describe his experiences with this
>> approach.
>>
>>> [...]
>>
>> The one thing that google code currently lacks is a nice timeline +
>> browser interface.  I find this very useful for GGobi
>> (http://src.ggobi.org) and would like to maintain that functionality
>> somehow.  It also makes it easier to track progress of the code
>> through rss, or intermittent reading of the trac site.
>>
>> [...]
>
> If you already know svn then google code is very easy to use.  Setting
> yourself up on it is really just a few minutes of work in that  
> case.  I have
> used other similar sites but google code is by far the easiest one to
> work with of the ones I have tried. By default everyone has read  
> access
> and only you have write access so you still control the project.   
> You can
> browse through the R projects that are already in google code here:
> http://code.google.com/hosting/search?q=label:R

> From: "hadley wickham" <h.wickham at gmail.com>
> On 10/1/07, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
>> On 10/1/07, hadley wickham <h.wickham at gmail.com> wrote:
>>>
>>> The biggest drawback (to me) to both google code and R-forge, is  
>>> their
>>> failure to offer a nice interface to browser the svn repository and
>>> view the timeline of changes.  I particularly like trac (e.g.
>>> http://src.ggobi.org/) despite it's many problems, and I don't  
>>> think I
>>> want to do without that convenient view of my code.
>>
>> Maybe you are referring to something else but both R-Forge and
>> Google code allow you to browse the svn repository over the
>> intenet from within a web browser.  In Google code click on the  
>> Source
>> tab and then the Subversion repository link.  For example,
>
> Yes, but compare with:
>
> http://src.ggobi.org/timeline for seeing what has changed recently  
> and by who
> http://src.ggobi.org/browser for easily navigating the repository and
> setting back through revisions
>
> You can also subscribe to the RSS feed of the project timeline to keep
> track of what is changing.

On 2007-October-01  , at 20:24 , Gabor Grothendieck wrote:
> On 10/1/07, hadley wickham <h.wickham at gmail.com> wrote:
>>> These seem nearly identical to what you can get with R-Forge or with
>>> TortoiseSVN (and likely other svn clients too).  Since any developer
>>> is likely to have an svn client a web interface more  
>>> sophisticated than
>>> what is already available via the net has less utility than if  
>>> this info were
>>> not already available anyways.  Google code can send out email  
>>> alerts.
>>> On the other hand the complexity in dealing with Trac is a  
>>> significant
>>> disadvantage for projects the size of an R package.  I previously  
>>> used Trac
>>> for Ryacas but currently use a WISHLIST and NEWS file (both plain  
>>> text
>>> files created in a text editor) plus the svn log and find that  
>>> adequate.
>>> Clearly a lot of this is a matter of taste and of project size  
>>> and there is no
>>> right answer.
>>
>> That's true.  From my perspective, using a command line svn client on
>> OS X, I certainly prefer the web interface for exploring past  
>> commits.
>>  However, while any developer will have a svn client, a more casual
>> user or someone just interested in looking at the code won't, and I
>> don't think the google interface is that friendly.  (Mind you, that's
>> probably not a very common use case).
>>
>> I'm not sure what problems you had with trac and a large repository -
>> it works very well for GGobi, with a repository that's almost 2 gig.
>> However, it did take me a long time to find a combination of web
>> server and trac setup that didn't crash every couple of days.  The
>> prevalence of trac spam also has lead me to turn off the wiki and bug
>> reporting.
>>
>> I think it's good to have this discussion about package  
>> development so
>> that we can learn how others work.
>
> I think the casual user just wants to browse the HEAD revision and  
> that
> is what google provides and makes easy to do since the interface is  
> not
> cluttered with a bunch of links that won't be used anyways.  If  
> they want
> more than that they are probably not a casual user but a developer and
> have an svn client.
>
> Regarding Trac I was able to use it successfully but its a large
> complex system and it just took too much of my time investigating
> all the numerous features.  I think its better suited to projects  
> larger
> than an R package.
>
> By the way, there are some distributed version control systems that
> work with svn such as svk that I had intended to investigate at some
> point but so far have not found the time.

OK, lots of interesting advice there. Here are my two additional cents.

I am using svn for all my personal development, which I think sums up  
to what a decent R package can be. I started with a local svn server  
and file:// operations and augmented the complexity of my setup  
progressively, until I recently thought about setting up Trac.  
However, as Gabor pointed out, it is non-trivial to set up and  
maintain and is probably too much for a project with a small number  
of contributors such as an R package (and was definitely too much for  
my own code!). I have been using the same setup for a few months now  
and it proved to be simple and stable but still feature full.

What I want from a version control system is:
- to provide version control on my files: revert changes, diff  
revisions etc. (obviously)
- to provide an history of the changes I did on the files (i.e. the  
logs)
- to provide an easy way of seeing what my files looked like, say 2  
weeks ago
- to let anyone browse the latest version easily (point people to a  
URL, et voilà)
- to be able to add contributors easily

The setup that allows me to do this without too much overhead is this  
one:
- svn repositories on a web-accessible server, in user space (i.e. / 
home/some-name/)
- repositories served by Apache via a symbolic link from Apache's  
root. Apache provides simple access control rules (read for anyone,  
write with valid login+password. passwords stored in user space,  
encrypted): 15 minutes to set up, following the explanations of the  
svn book, chap 6 section "Basic HTTP Authentication", p 361.
- web interface to the repositories provided by webSVN. This also  
installs in user space, no Apache configuration is required (unless  
one wants URL rewrite -i.e. your repository to look like http:// 
www.someserver.com/svn/trunk/someFolder/someSubFolder/ in the addess  
bar- in which case you must allow .htaccess files to override  
Apache's defaults). 5 minutes to set up for the simple case, 10-15 to  
smooth everything with URL rewrite.

The end result looks like this:
	http://cbetm.univ-perp.fr/irisson/svn/
to see what an inner view looks like:
	http://cbetm.univ-perp.fr/irisson/svn/distribution_data/tetiaroa/ 
trunk/data/?rev=0&sc=1
You can play around with diffs, view logs, compare files etc.

This setup really helps me a lot in my day to day work. For a more  
collaborative project, its limitations are probably that it does not  
have patches/bug tracking though (it is not its point).

To conclude on the subject, I think that, if I was to choose a  
versioning system now that I know a little more abut them, I would  
probably choose git rather than svn, mainly because of two advantages:
- each checkout is a full repository (so each check out is a also a  
backup!) and works are a personal branch
- the possibility of finer grained 'commits'. when using git, one can  
use it as an hybrid between a distributed system and a central  
repository system: one publicly available checkout serves as the  
reference. Then, developers commit their changes to their private,  
local branch. These only become visible to the rest when "pushed" to  
the central reference checkout. Therefore, one can commit much more  
often, without the risk to disturb the reference version and then  
only push the changes when everything is right and tested. This  
allows the reference version to stay usable at all time while leaving  
freedom to the developers in their commits.
There are unfortunately much less tools available for git than for  
subversion but IMHO the model is, in itself, superior. I don't know  
the fine details about history of renamed files or copies between  
repositories in git though.

Hope that helps.

JiHO
---
http://jo.irisson.free.fr/