[R] Which version control system to learn for managing R projects?
Gabor Grothendieck
ggrothendieck at gmail.com
Wed Oct 27 00:55:40 CEST 2010
On Tue, Oct 26, 2010 at 12:16 PM, Tal Galili <tal.galili at gmail.com> wrote:
> Hello all,
>
> I wish to learn a version control system for managing my R (data analysis)
> projects.
>
> I know of SVN and github, and wonder if there is any reason for which I
> should prefer the one over the other (or any other platform). An example for
> a reason could be if it will make it easier for me to later work with
> R-forge or CRAN or any other platform for R code distribution.
>
>
> Thanks,
> Tal
There are several considerations:
1. What is everyone else using? The network effect is important since
you want people to be able to access your repository and you want to
leverage your knowledge of the version control system for other
projects' repositories. To that extent Subversion is the clear choice
since its used on R-Forge, by R itself and on Google code (Google code
also supports Mercurial).
2. Features. Git, Mercurial, Bazaar and a few others are distributed
version control systems which represent the next generation after
centralized ones like Subversion. Git claims to be the most popular
and is the fastest of the three distributed systems being written in C
(the others in Python), Git is an ugly combination of C, shell
scripts and other languages. The underlying basic design is
attractive despite a somewhat messy implementation. At the time I
evaluated them Mercurial was the only one with decent cross platform
support and is the one used by Google but the situation relative to
cross platform support could have changed since then, and Bazaar has
the advantage of being pre-installed on many systems.
3. Repositories. The repository you use (R-Forge, Google code,
Github) is a key associated decision. R-Forge uses Subversion and has
the advantage that it does automatic builds but it has an annoying
delay of about 20 minutes every time you change your home page, etc.
before it appears. Google code supports Subversion and Mercurial.
Google code is easier to use and, in particular, it uses http rather
than R-Forge's ssh. http is more convenient, particularly for Windows
users. Google code also has no delay and it also has integrated issue
tracking, a wiki and a download area for your project. It is possible
to host your project on Google code but still have R-Forge front-end
it and at least one R developer has posted that he does precisely
that. Google code is more restrictive regarding the licenses that it
accepts although it does accept most of the popular free ones such as
GPL, Apache, etc. On the plus side, Google code does have specific
support for separate code and documentation licenses which may be an
advantage. R-Forge does not restrict the license and Github allows
non-free projects but you have to pay for using it in that case.
4. Editor/IDE. You don't really need integration of your editor and
version control system yet is may be convenient if its available.
Subversion is integrated with Microsoft Common Source Code Control
Interface (MSSCCI) so any editor or IDE that supports that standard
integrates with Subversion. There are subversion plugins for vim,
Emacs, Eclipse and likely many other editors and IDEs. There may be
plugins for some of the other version control systems too.
5. Windows Explorer. There are Windows explorer extensions for
Subversion, Mercurial, Git and Bazaar. The Subversion and Mercurial
"Tortoise" extensions are reasonably mature. I believe the Git and
Bazaar Tortoise extensions are newer.
6. Client vs. Server and Installation. With subversion you typically
use a subversion repository on a remote system to host your software
and you only install client software. You can use the command line or
a number of other alternative command line or GUI clients. With the
distributed ones every installation is both a client and a repository
(and potentially a server). Its harder to install a subversion
repository but that does not really matter since normally you only
install the client. The distributed systems are generally easy to
install although if you need to install them from source on a
non-supported or old system Mercurial was the easiest to get going
based on my experience in having to do exactly that. Bazaar is often
pre-installed on some distributions in which case there is zero
installation.
Although I have focused on the most popular there are other version
control systems too. darcs is a powerful distributed system but may
not be able to handle as large projects as the ones discussed here.
fossil was developed by the SQLite developer and features particularly
streamlined installation. Both these have their enthusiastic
adherents.
I personally mostly use Subversion, the TortoiseSVN client and when I
want a distributed system I use Mercurial, usually from the command
line or to a lesser extent via TortoiseHG. I have contributed an
extension to the Mercurial project (not related to R) which might bias
me slightly toward it so caveat emptor.
--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
More information about the R-help
mailing list