[R] Formal definitions of R-language.

Thu Jul 17 16:08:54 CEST 2003

[Oops. Accidentally sent only to D.Trainor first time around, even
though it was intended mainly for M.Kondrin and the list]

Douglas Trainor <trainor at transborder.org> writes:

> Uwe Ligges wrote:
> 
> > M.Kondrin wrote:
> >
> >> Hello!
> >> Some CS-guys (the type who knows what Church formalism is) keep
> >> asking me questions about formal definitions of R-language that I
> >> can not answer (or even understand). Is there some freely available
> >> papers which I can throw at them where it would be explained is R
> >> functional/OOP/procedural language, does it use weak/strong,
> >> dynamic/static typization, does it use lazy or ...(do not know
> >> what) evaluation, what sort of garbage collector it uses?
> >> Thanks.
> >
> >
> > R ships with a draft version of the manual "R Language Definition".
> > Another source is Venables & Ripley (2000): S Programming, Springer.
> 
> 
> Tell the "CS-guys" to grab the source code and chew on the LALR
> context-free grammar stuff in the file "gram.y" as in:
> 
>     R-1.7.1/src/main/gram.y

Don't. You'll get laughed out... The grammar of the language is
essentially unrelated to the kind of categorizations CS people are
looking for.

Probably, Robert Gentleman or Luke Tierney are the guys best qualified
to answer the question, but I can give a try:

R is quite close to being Scheme plus "syntactic sugar". Lexical
scoping and function closures comes directly from Scheme and the
internal structure of calls and functions (i.e. that they are
equivalent to lists so that you can do things like

  e <- quote(2+2) ; lapply(e,deparse)

R is a functional language, with lazy evaluation and weak dynamic
typing (a variable can change type at will: a <- 1 ; a <- "a" is
allowed). Semantically, everything is copy-on-modify although some
optimization tricks are used in the implementation to avoid the worst
inefficiencies.

Parameter passing is according to the "pass-by-value illusion", i.e.
what is really getting passed down to a function is a "promise", which
embodies the expression used in the call. This is at the core of the
lazy evaluation mechanism: The result of the expression is not
computed until needed. It also allows a function to get hold of the
the expression itself: This is useful for labeling plots but it also
allows some variants of "pass-by-name"-like semantics via evaluation
in the environment of the caller.

R is not object oriented in the same sense as Java or C++. We don't
(generally) have methods that are semantically part of objects. However,
we do have class-based function dispatch and generic functions, so
that a function can do different things of different kind of objects,
and - with the S4 class system - also for combinations of objects.

Other distinctive features are that basic operations are vectorized,
and that (somewhat perversly to LISP programmers) the traditional
"dotted-pair" list lives quite deeply hidden from users, whereas the
thing called "list" in R is really a generic vector where each element
can be of different type. [This is not the only place where we have
some rather unfortunate clashes in terminology between the S language
(which is historically of APL descent) and the Scheme-like engine
underneath.]

Some implementation stuff is documented in sketches on
developer.r-project.org (do read the opening paragraph!) and the
intention is to have it documented in the R Language Definition,
although that is still rather incomplete (and as Thomas Lumley once
put it, "it would be desirable if one could truthfully say that it is
a work in progress"). Also of course, there is the paper by Ross and
Robert in JCGS.

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907