[Rd] RFC on first public draft of 'Debian R Policy'

Dirk Eddelbuettel edd at debian.org
Wed Dec 31 03:53:07 MET 2003


r-devel and debian-devel readers:

Below is a draft for a suggested policy for R packages within Debian. In the
six years that we have been maintaining R for Debian, the total number of R
related packages has grown to a full thirty -- eleven based on the main
tarball released by R Core, as well as nineteen contributed packages --
reflecting the work of five different Debian maintainers.

This draft document is concerned mostly with how we can assure the integrity
of the contributed packages. This is a timely concern: R-based archives such
as CRAN (cran.r-project.org) and BioConductor (www.bioconductor.org) are
experiencing unprecendented growth in the number of their packages. More and
more of these may eventually be turned into Debian packages. We would like
to suggest some mechansisms to ensure consistency, similar to what Debian
has achieved with add-on packages based on Perl, Python or Emacs Lisp
packages.

We're looking forward to your comments. It may be beneficial for the flow of
the discussion to carbon-copy both mailing lists.

With best regards, and Happy New Year,

Dirk Eddelbuettel and Doug Bates




			     Debian R Policy
			  Draft Proposal - v 0.1.3

		  Dirk Eddelbuettel (edd at debian.org)
		 and Douglas Bates (bates at debian.org)

			   December 30, 2003


0. Introduction:

The `r-base' package for Debian GNU/Linux (http://www.debian.org) has been
available since December 1997 and provides R (http://www.r-project.org), a
language and environment for statistical computing and graphics.  Like
Debian, R has grown considerably since then.  The r-base package is now a
small meta package that depends on several other Debian packages, including
r-base-core which provides the essential parts of the R environment.

The R system has its own concept of packages that provide base functionality
and extensions for the language.  The Debian r-base-core package installs 15
required R packages in the directory /usr/lib/R/library.  The Debian package
r-recommended installs another 13 R packages in the same directory.

The creation of user-written R packages is explicitly encouraged by the R Core
team.  The format of R packages, as documented in the manual "Writing R
Extensions", available in the Debian r-doc-html and r-doc-pdf packages, is
loosely based on the Debian packaging format.

CRAN (http://cran.r-project.org), the Comprehensive R Archive Network
(a network of global mirrors), already contains well over 300
contributed R packages.  CRAN is patterned after archive networks such as
CPAN (http://www.cpan.org) and CTAN (http://www.ctan.org).  While CRAN
constitutes the principal source of R packages, many other R packages are
available in specialized archives such as Bioconductor
(http://www.bioconductor.org), Omegahat (http://www.omegahat.org), and
Sourceforge (http://sourceforge.net) as well as private archives.  One R
package eminating from a private archive has already been released as the
Debian package r-noncran-lindsey. Debian currently contains a total of
nineteen add-on packages for R. 

The purpose of this document is to propose standards for creating Debian
packages of R packages.

0.1 Terminology:

The term "package" may mean either an R package or a Debian package.  When
necessary we will distinguish between these by using the full names "R
package" and "Debian package".

An R library is a file system directory that contains a collection of R
packages.  A search path of libraries is maintained during an R session.  The
library search path is initialized at startup from the environment variable
`R_LIBS', which (if defined) should be a colon-separated list of directories
at which R library trees are rooted. On Debian systems, R environment
variables are typically only defined inside /usr/bin/R, but can be defined by
the user if needed, as in the case of private libraries below $HOME. Starting
with the Debian pre-releases of R 1.7.0, R_LIBS was set up for three distinct
directories: user-installed packages are installed in
/usr/local/lib/R/site-library, Debian packages will install into
/usr/lib/R/site-library and the r-base-core and r-recommended packages will
use the standard /usr/lib/R/library directory. See section 2.2 below for more
details.

As described in "Writing R Extensions", an R package should contain data sets
and/or R code, and R help files in Rd (R documentation) format.  Optionally
an R package can contain C/C++/Fortran code, regression tests, and/or
expository documents, called "vignettes", that are written in a noweb format.

Source R packages are distributed as gzip'd tar files named according to the
package, version and sub-version, e.g. car_1.0-0.tar.gz.

In what follows, the R prompt, "> ", will be used to distinguish example
commands typed to the R interpreter from those typed to the shell.


1.0 Installing R packages

The primary executable installed by the Debian package r-base-core is the
shell script /usr/bin/R.  Invoked by itself it starts an R session.  It is
also the tool for managing R packages.  A source package would be installed
as

 ## install in the local system-wide packages library location
 ## note that the directory /usr/local/lib/R/site-library is now
 ## used automatically on Debian systems, see section 2.2 below
 R CMD INSTALL car_1.0-0.tar.gz

or

 ## install in a private library ~/Rlibs
 R CMD car_1.0-0.tar.gz INSTALL -l ~/Rlibs  

When run on a computer connected to the Internet, the R system provides an
interface to CRAN.  One can install or update packages by

 > install.packages("car")
 > # update currently installed packages to the latest available 
 > # version on CRAN
 > update.packages()    

Naturally, installation of an R package in a library directory requires write
permission on the library directory.

1.1 Why provide Debian packages of R packages?

One reason for providing a Debian package of an R package is to use Debian
package dependencies to ensure that any system libraries or include files
required to compile the code in the R package are available.  For example,
the Debian postgresql-dev package must be installed if the R package Rpgsql
is to be installed successfully.

The second reason is for convenience.  Someone who already uses Debian tools
such as apt-get to update the packages on a Debian system may find installing
or updating a Debian package to be more convenient than installing the r-base
Debian package plus learning to update R packages from within R or externally
using R CMD INSTALL.  Because R is beginning to be used more widely in fields
such as in biology (e.g. Bioconductor) and social sciences, we should not
count on the typical user being an R guru.  Having R packages controlled by
apt seems worth the small amount of overhead in creating the Debian
packages. This also applies to systems maintained by (presumably non-R using)
system administrators who may already be more familiar with Debian's package
mechanism. By using this system to distribute CRAN packages, another learning
curve is avoided for those who may not actually use R but simply provide it
for others.

The third reason is quality control. The CRAN team already goes to great
length to ensure the individual quality and coherence of an R package.
Embedding a binary R package in the Debian package management system provides
additional control over dependencies between required compoments or
libraries, as well as access to a fully automated system of `build daemons'
that recompile a source package for up to ten other architectures -- which
provides a good portability and quality control test.

The fourth reason is scalability. More and more users are using several
machines, or may need to share work with co-workers. Being able to create,
distribute and install identical binary packages makes it easier to keep
machines synchronised in order to provide similar environments.

The fifth reason plays on Debian's strength as a common platform for other
'derived' systems. Examples are Knoppix and its derivatives such as
Quantian. Providing Debian packages of R packages allows others to use these
in new environments.


2.0 Proposed conventions for Debian packages of R packages.

2.1 Name and version number of the Debian package

We propose that the Debian packages be named r-<Rarchive>-<Rpackage>. An R
package from a private archive can use "noncran" for the archive name
indicating that it did not come from CRAN. For example

 r-cran-car            - the car package from CRAN
 r-bioconductor-affy   - the affy (Affymetrix) package from Bioconductor.org
 r-omegahat-rgtk       - the Gtk bindings package from Omegahat.org
 r-noncran-lindsey     - a package from Jim Lindsey's private archive.

This determines the name of the binary Debian package. The Debian source
package can in most cases retain the <Rpackage> name. E.g., for the examples
above one could use car, affy, rgtk and lindsey. This makes it consistent
with the upstream archive: CRAN mirrors will have a current tar.gz file with
sources for car, and so will Debian mirrors. As general rule, Debian package
names have to be in lowercase, and any potential dots should be replaced with
hyphens.

If the potential for name-space collision with other packages is sufficient,
then the binary package name can instead be used as the source package name.

One new category 'other' may be introduced for packages not originating from
CRAN, BioConductor or OmegaHat.  We suggest to use the scheme

 r-other-$AUTHOR-$PACKAGE

for these. For example, the package by Jim Lindsey could be re-released as a
set of packages which would comprise 'r-other-lindsey-rmutil',
'r-other-lindsey-gnlm' and so on.

Version numbers should allow for a final Debian revision to permit uploads
independent of CRAN uploads.  Given that CRAN packages use a scheme `a.b-c',
adding a Debian revision d leads to scheme `a.b-c-d' which looks unusual due
to the double hyphens. Alternatively, the hyphen in the CRAN version can be
translated into a dot yielding `a.b.c-d'. Both formats are permitted.

2.2 Installation directory

Only the r-base-core and r-recommended packages should install an R package
into the library /usr/lib/R/library.  Other Debian packages should install
into the library /usr/lib/R/site-library. User-installed packages should go
into /usr/local/lib/R/site-library.

Using the default setting of
R_LIBS=${R_LIBS-'/usr/local/lib/R/site-library:/usr/lib/R/site-library:/usr/lib/R/library'}
ensures that user installations will go into /usr/local/lib/R/site-library as
this is the first directory listed in R_LIBS.  Debian packages will
explicitly set the library /usr/lib/R/site-library in their debian/rules
files (see below). Lastly, default Makefiles for R released by R Core will
continue to use /usr/lib/R/library.

One advantage of using separate libraries for separate archives is that the
packages are listed according to library directories by R's library()
function.  Note that further directories should be added to R_LIBS only after
at least some informal consultation with the Debian R maintainers.

2.3 Interaction with other package managers

In the case of a Debian system, a possible conflict exists between the
standard Debian way of managing packages with apt-get and friends, and the
common R way of using 'update.packages()'.  For example, a user may install a
package via apt-get, but later call (as root) the update.packages() function
from within R.  This could invalidate the package information stored by dpkg.

As a first line of defense, we will rely on common sense: mixing
package management system cannot be construed as a good idea.  As a
second line of defense, it is anticipated that future R versions will
add a new entry record 'Package-Manager: Debian' to the installed
DESCRIPTION file of each package.  This allows upgrade.packages() to
at a minimum warn about an attempted upgrade of Debian-installed
package, or to possibly even refuse to execute such an upgrade. On the
other hand, such a mechanism still allows for direct installation from
CRAN for packages that are not (yet ?)  available as Debian packages.


3.0 Template

The following section provides a template for the debian/ directory of a
Debian package created from an R package.  Medium-term, and with a modest
amount of effort on our part, the process of creating a Debian package from
an R package on CRAN could be automated to a large extent, leaving the
possibility of eventually providing a tool like dh-make-r akin to the
existing dh-make-perl which aids in creating Debian packages from CPAN
modules.  This has become a lot easier thank to the cdbs package and its
ability to reduce a debian/rules file to a few lines. In fact, Albrecht
Gebhardt at the University of Klagenfurt has already build a mostly complete
infrastructure to do this, and we intent to eventually provide Debian package
via apt-get'able section of the CRAN mirror network.

Packaging for Debian has become fairly exhaustively documented, for example
under the 'Packaging' section of http://www.debian.org/devel/ where the
Debian Policy manual (http://www.debian.org/doc/debian-policy) as well as the
Developers Reference (http://www.debian.org/doc/developers-reference) can be
found.  This section does not aim to be a substitute for
these. Indeed, we recommend following the policy and reference manuals if
they conflict with this section.

3.1 Overview

Sources for an R package are normally supplied as a compressed tar
archive. Converting this into a Debian package can be as easy as adding a
directory debian/ to the main directory from the untarred archive.  At a
minimum, four files must be in the debian/ directory.

One distinction (that has to be made at the beginning) relates to whether the
package will be architecture dependent or independent. R packages that do not
contain any C, C++ or Fortran source code are architecture independent, which
is reflected in the package name ending in `_all.deb'. As such, they do not
need to be rebuilt on different machine architectures by the Debian build
system. On the other hand, packages containing source code that is to be
compiled and linked must be rebuilt for different architectures.  For R
packages, the difference between the two setups concerns mostly the file
debian/control. The file debian/rules would normally be affected
too. However, in the case of R, the implicitly architecture-neutral way of
building R packages via 'R CMD INSTALL' actually bridges between both types
of packages and permits us to rely on a single template for both cases.

3.2 Discussion

Below, we discuss each of the required files briefly along with an example
from the Debian r-cran-car package of the car CRAN package. To illustrate an
architecture-dependent package, we will also show one file from the
r-cran-rodbc package.

3.2.1 debian/control

3.2.1.1 debian/control for an architecture-independent package

This file contains metainformation about the package and is similar in
spirit to the DESCRIPTION file of a R package.  The control file typically
contains two sections. 

The first section provides the name of the source package (i.e. name of the
upstream tarball stripped of its version numbers) as well as the maintainer
name and email. It also suggests both a section and priority for the Debian
archive, and finally supplies the minimum set of packages (beyond the
build-essential package) needed to build the package. Note that the field is
called 'Build-Depends-Indep' as it lists the build requirements for an
architecture-independent package, the car package.

The second section describes the resulting package(s). For each package, it
provides its name, the architecture (where 'all' signals that no recompilation
on other hardware platforms is needed), a textual description as well as the
dependencies -- in this case only R itself.

...........................................................................
Source: car
Section: math
Priority: optional
Maintainer: Dirk Eddelbuettel <edd at debian.org>
Build-Depends-Indep: debhelper (>> 4.1.0), r-base-dev (>> 1.7.1), cdbs
Standards-Version: 3.6.1.0

Package: r-cran-car
Architecture: all
Depends: r-base-core (>= 1.7.1)
Description: GNU R Companion to Applied Regression by John Fox
 This package accompanies J. Fox, An R and S-PLUS Companion to Applied 
 Regression, Sage, 2002. The package contains mostly functions for applied 
 regression, linear models, and generalized linear models, with an emphasis 
 on regression diagnostics, particularly graphical diagnostic methods. 
 There are also some utility functions.   
...........................................................................

3.2.1.2 debian/control for an architecture-dependent package

For an architecture-dependent package, the setup is similar. The build
dependencies are listed in a field 'Build-Depends'. This information is
essential for the automated build infrastructure which compiles Debian
packages for a variety of hardware platforms. The `Architecture: any' setting
shows that the package needs to be rebuilt. Dynamic libraries from standard
locations are automatically identified and inserted via the ${shlibs:Depends}
pragma. As R uses a private directory for its dynamic library, a dependency
on R has to be added explicitly.

...........................................................................
Source: rodbc
Section: math
Priority: optional
Maintainer: Dirk Eddelbuettel <edd at debian.org>
Build-Depends: debhelper (>>4.1.0), cdbs, r-base-dev (>> 1.7.1), unixodbc-dev
Standards-Version: 3.6.1.0

Package: r-cran-rodbc
Architecture: any
Depends: ${shlibs:Depends}, r-base-core (>= 1.7.1)
Suggests: odbc-postgresql, libmyodbc
Description: GNU R package for ODBC database access
 This CRAN package provides access to any Open DataBase Connectivity (ODBC)
 accessible database. 
 .
 The package should be platform independent and provide access to any
 database for which a driver exists.  It has been tested with MySQL
 and PostgreSQL on both Linux and Windows (and to those DBMSs on Linux
 hosts from R under Windows), Microsoft Access, SQL Server and Excel
 spreadsheets (read-only), and users have reported success with
 connections to Oracle and DBase.
 .
 Usage is covered in the R Data Import/Export manual (available via the 
 r-doc-pdf, r-doc-html and r-doc-info packages).
...........................................................................

3.2.2 debian/copyright

The copyright file typically consists of three sections. First,
information about the package, its author and purpose are briefly
stated. Second, the canonical source of the package is identified. Third,
the copyright information is stated. As Debian adheres to the Debian Free
Software Guidelines (http://www.debian.org/social_contract#guidelines),
only software that matches this criteria can be added to the Debian
archive.  As CRAN follows a similar spirit, most R packages should be
suitable but packagers of prospective R packages should be careful to
ensure that the R package is DFSG-free.

...........................................................................
This is the Debian GNU/Linux r-cran-car package of car, the Companion to
Applied Regression package for GNU R. Car was written by John Fox.

This package was created by Dirk Eddelbuettel <edd at debian.org>.
The sources were downloaded from 
	http://cran.us.r-project.org/src/contrib/

The package was renamed from its upstream name 'car' to 'r-cran-car'
to fit the pattern of CRAN (and non-CRAN) packages for R.

Car is copyright John Fox and released under the GNU General Public License
(GPL).

On a Debian GNU/Linux system, the GPL license is included in the file
/usr/share/common-licenses/GPL.

For reference, the upstream DESCRIPTION [with lines broken to 80 cols] file
is included below:

   Package: car
   Version: 1.0-5
   Date: 2003/5/26
   Title: Companion to Applied Regression
   Author: John Fox <jfox at mcmaster.ca>. I am grateful to Douglas Bates,
     David Firth, Michael Friendly, Georges Monette, Brian Ripley, and
     Sanford Weisberg for various suggestions.
   Maintainer: John Fox <jfox at mcmaster.ca>
   Depends: R (>= 1.7.0), modreg
   Description: 
     This package accompanies J. Fox, An R and S-PLUS Companion to Applied 
     Regression, Sage, 2002.
     The package contains mostly functions for applied regression, linear 
     models, and generalized linear models, with an emphasis on regression
     diagnostics, particularly graphical diagnostic methods. There are also
     some utility functions. 
     With some exceptions, I have tried not to duplicate capabilities in the
     basic distribution of R, nor in widely used packages. Some of the
     functions in car will use functions in the MASS package, if it is 
     present; the subsets function graphs objects produced by the regsubsets
     function in the leaps package. Where relevant, the functions in car are
     consistent with na.action = na.omit or na.exclude.
   License: GPL version 2 or newer
   URL: http://www.r-project.org, http:/www.socsci.mcmaster.ca/jfox/
...........................................................................

3.2.3 debian/changelog

This file details the changes made to the package. Command-line tools for
adding to it, as well as a full-featured Emacs mode are available.

...........................................................................
car (1.0.9-1) unstable; urgency=low

  * Upgraded to new upstream release
  * debian/rules: Minor update moving towards common cdbs file

 -- Dirk Eddelbuettel <edd at debian.org>  Wed, 10 Dec 2003 20:52:29 -0600

car (1.0.8-1) unstable; urgency=low

  * New upstream release
  * debian/rules: Updated moving towards common cdbs file
  * debian/control: Increased Standards-Version to 3.6.1.0

 -- Dirk Eddelbuettel <edd at debian.org>  Mon, 13 Oct 2003 22:25:00 -0500

car (1.0.7-1) unstable; urgency=low

  * New upstream release
  * debian/control: Standards-Version increased to 3.6.0.1
  * debian/rules: Rewritten build stage with 'R CMD INSTALL .'
  * debian/control: Build-Depends-Indep on r-base-dev (>> 1.7.1)

 -- Dirk Eddelbuettel <edd at debian.org>  Sat, 23 Aug 2003 12:41:04 -0500

car (1.0.5-1) unstable; urgency=low

  * Initial Debian Release

 -- Dirk Eddelbuettel <edd at debian.org>  Sat,  5 Jul 2003 13:48:44 -0500
...........................................................................

3.2.4 debian/rules

This file provides the nuts and bolts of the actual packaging. It is
written in the GNU Make language, and often employs additional Debian tools
such as debhelper. In a nutshell, it provides a framework for the common
'configure; make; make install; make clean' cycle of installing software.
However, the installation actually happens to a subdirectory of the actual
build directory. The resulting filesystem tree is then wrapped into a
tarball which, along with control and metainformation as well as the
possible pre- and post-installation and removal scripts, is placed into an
ar archive ending in the .deb suffix.

For our purposes, only a few key lines matter as R CMD INSTALL does all
the R package building work. Moreover, as 'R CMD INSTALL' is invoked for both
types of architecture dependent and independent, we have adapted debian/rules
accordingly.  By using the cdbs built system, we can in fact use the same
debian/rules file a variety of packages.

...........................................................................
#!/usr/bin/make -f
# 							-*- makefile -*-
# debian/rules file for the Debian/GNU Linux r-cran-car package
# Copyright 2003 by Dirk Eddelbuettel <edd at debian.org>

include /usr/share/cdbs/1/rules/debhelper.mk
include /usr/share/cdbs/1/class/langcore.mk

## We need the CRAN (upstream) name 
cranName	:= $(shell grep Package: DESCRIPTION | cut -f2 -d" ")
## and we need to build a Debian Policy-conformant lower-case package name
cranNameLC	:= $(shell echo $(cranName) | tr "[A-Z]" "[a-z]" | tr "." "-" )
## which we can use to build the package directory 
package		:= r-cran-$(cranNameLC)
## which we use for the to-be-installed-in directory
debRlib		:=$(CURDIR)/debian/$(package)/usr/lib/R/site-library

common-install-indep:: R_any_arch
common-install-arch:: R_any_arch

R_any_arch:
		dh_installdirs		usr/lib/R/site-library
		R CMD INSTALL -l $(debRlib) --clean .
		rm -vf $(debRlib)/R.css $(debRlib)/$(cranNameLC)/COPYING
...........................................................................

The 'package' variable would need to change from `r-cran-$(cranNameLC)' to
`r-omegahat-$(cranNameLC)' for a package from Omegahat.org, and similarly for
BioConductor.

A currently open question is the desire to also run 'R CMD check' at build
time. However, as of R 1.8.1, this requires a minor upstream change in the R
tools to allow the build to proceed from directories also containing a
version number in their name. We expect to add this feature in one of the
next R releases.

3.3 Putting it all together

Invoking 'dpkg-buildpackage -rfakeroot -us -uc' from inside the top-level
directory of an expanded CRAN package will build the Debian package, along
with .dsc. .diff.gz and .changes files (see the Debian Policy manual and the
Developer Reference for more details).  The resulting .changes file can then
used for a package check via the lintian tool.


4. Acknowledgements

Comments and suggestions by Albrecht Gebhard, Frank Harrell, Kurt
Hornik, Rafael Laboissiere, Friedrich Leisch, Steffen Moeller and Tony
Rossini are gratefully acknowledged.

$Id: R-packages.txt,v 1.6 2003/12/31 02:30:21 edd Exp $





-- 
The relationship between the computed price and reality is as yet unknown.  
                                             -- From the pac(8) manual page



More information about the R-devel mailing list