[Rd] Bridging R to OpenOffice
Erich Neuwirth
erich.neuwirth at univie.ac.at
Thu Mar 29 14:55:11 CEST 2007
Since I implemented the RExcel interface I also would like this
discussion to be continued on r-devel or some other list where
I can follow it.
Let me add some thoughts:
As Leonhard Mada suggested, probably the most needed connection
between R and a spreadsheet program is a way to transfer
dataframes easily from the spreadsheet to R and analysis
results back from R to Excel. To do this, one needs to have a
mechanism to transfer large amounts of data of different types.
An additional complication for the way back fro R to the
spreadsheet is that R results quite often have data types not
supported by the spreadsheet program (complex numbers for
example).
The convenience tool that people really want in the first place
is an item on the menu which allows to transfer a range with
data from the spreadsheet to R.
The question then is: should the users see the R command line?
There has to be a way of telling R what kind of analysis to
perform. Using a menu like the one supplied by RCommander is a
sensible option. Using this also would reuse all the work
invested in designing a good menu structure for end users.
Getting results back into the spreadsheet is more difficult.
Not technically, but from the design point of view. Analysis
results in R usually are not arrays, but lists, i.e. compounds
of compounds of data of different basic types, and of different
sizes. There is no clear general rule how to put R results into
spreadsheet ranges. The basic compound data type in
spreadsheets are arrays, and the data types in R are much more
complicated, and the conceptual mapping of result lists to
spreadsheet ranges has to be designed differently for different
types of analyses and results. Of course, a brute force method
(implemented for example by the connection mechanism between
NAG and Excel) would be to "just print" the results into the
spreadsheet as strings. This way, spreadsheet rows become
printed lines without further structure, and numbers in the
results are not easily accessible for further computations on
the spreadsheet.
Such a "transfer data frame and get results" connection,
however, is not really using the spreadsheet program as a
spreadsheet program, but as a data grid and output formatting
machinery, since it is completely independent of the
spreadsheet program's most important feature, automatic
recalculation triggered by changes of cell values.
A really tight integration of R and a spreadsheet can extend
the spreadsheet program's computational engine by the complete
R engine. It could allow spreadsheet formulas like
RApply("pchisq",A1,A2)
which would have R compute the value of the chi-squared
distribution with arguments in cells A1 and A2 of the
spreadsheet. Changing the value in A1 would trigger R to
recalculate the chi-square value.
In this case, the connection between R and the spreadsheet
program has to be very fast, since the spreadsheet program
essentially is using R as a dynamically linked library. The
problem of incompatible data types also becomes much harder to
deal with. The results of R computations are directly put into
spreadsheet ranges, so having R results consisting of lists
makes things really difficult.
Thomas Baier and I recently published a paper in Computational
Statistics which discusses different models of integration
between R and spreadsheets. Excel is used as an example, but
the concepts are independent from the concrete implementation.
It is accessible at http://dx.doi.org/10.1007/s00180-007-0023-6
If you cannot access it, write to me, I will send you a copy.
Currently, we are working on a cross-platform alternative to using
COM to connect the spread-sheet to R. The platforms in mind are
(at least) Windows, Linux and MacOS (X). The spreadsheet program
of choice for our next integration will be Gnumeric, where the
integration is already worked on by students.
--
Erich Neuwirth, Didactic Center for Computer Science
University of Vienna
Visit our SunSITE at http://sunsite.univie.ac.at
Phone: +43-1-4277-39464 Fax: +43-1-4277-9394
More information about the R-devel
mailing list