[Rd] Re: [Omega-devel] StatDataML

David James David James <dj@research.bell-labs.com>
Fri, 3 Mar 2000 16:36:36 -0500 (EST)


I just had a very quick look at the StatDataML proposal --- nice
work!   At the risk of showing my ignorance, I want to mention 
my first impressions.

My first impression is that defining datasets in terms of
arrays and list is a bit too high a level.  What about 
simpler vectors, scalars? (I know that R/S don't have scalars,
but other systems/applications do.)  Can we think of a core
set of "basic" data types (factors, strings, integers, etc.)
from which to build on other, possibly recursive types (perhaps
similar to corba's IDL basic data types or S's datadump?).  
Would it make sense to imagine, say xlispstat/python/java applications 
reading  and interpreting an StatDataML document without serious difficulties? 

My gut feeling (which is often wrong) is that the DTD should make
the data self-describing:  e.g., the factor "machineId" has 
levels (or defining set) "Stepper1", "Stepper2", ... "Stepper20", 
eventhough the particular dataset at hand has only a
subset of those.  Similarly, perhaps allowing  units and classes
to be included in the dataset (in the case of currency, it is certainly 
a number, perhaps single precision, perhaps not, with specific units 
dollars, euros, pesos, etc.)

More long-term, how about application-defined data?  Application may have
it's own set of data objects that fully exploits contextual 
information that could be extremely useful to capture and 
communicate.  Also, do the data have to be in ASCII format?  What about 
(possibly mime-encoded) images? sound?

As I mentioned, these are questions coming from my lack of experience
with XML, but may be worth raising now better than later :-)

David A. James
Statistics Research, Room 2C-253            Phone:  (908) 582-3082       
Bell Labs, Lucent Technologies              Fax:    (908) 582-3340
Murray Hill, NJ 09794-0636

> From: Friedrich Leisch <Friedrich.Leisch@ci.tuwien.ac.at>
> MIME-Version: 1.0
> Content-Transfer-Encoding: 8bit
> Date: Fri, 3 Mar 2000 17:07:37 +0100 (CET)
> To: omega-devel@omegahat.org, r-devel@R-project.org, 
Erich.Neuwirth@univie.ac.at, hothorn@ci.tuwien.ac.at, baier@ci.tuwien.ac.at, 
> Subject: [Omega-devel] StatDataML
> X-Mailman-Version: 1.0rc2
> List-Id: Developers of Omega <omega-devel.www.omegahat.org>
> X-BeenThere: omega-devel@www.omegahat.org
> Hi,
> we have a first draft of R functions reading/writing data to XML files
> including a rather general DTD ... which borrows heavily from the data
> types of a certain programming language :-)
> The basic idea is to create an XML standard for data exchange,
> together with import/export functions for as many applications as
> possible. We here will need R, Matlab & Octave for our research
> program, but the idea is of course to create a general standard.
> After looking in several other applications we think that all the data
> types there can easily be represented using S constructs (i.e., arrays
> and lists together with attributes) ... so why make life complicated
> and invent something new.
> Of course this only applies to the low-level representaion ... the
> real thing will come next when one starts defining higher level
> classes, this step we have avoided so far because one needs the
> low-level things first to have something to play with.
> A short description of the DTD and an R package with import/export
> functions can be found at
> 	http://www.ci.tuwien.ac.at/~leisch/R
> (Modulo some bugs) R data objects can be saved/restored without loss
> of information. We don't intend to cover functions or models yet.
> All comments and ideas are appreciated! This is just a proposal and
> anything can still be changed ... 
> Best,
> Fritz
> PS: Almost all the work has been done by Torsten Hothorn, I'm just
> writing the email ;-)
> -- 
> -------------------------------------------------------------------
>                         Friedrich  Leisch 
> Institut für Statistik                     Tel: (+43 1) 58801 10715
> Technische Universität Wien                Fax: (+43 1) 58801 10798
> Wiedner Hauptstraße 8-10/1071      Friedrich.Leisch@ci.tuwien.ac.at
> A-1040 Wien, Austria             http://www.ci.tuwien.ac.at/~leisch
>      PGP public key http://www.ci.tuwien.ac.at/~leisch/pgp.key
> -------------------------------------------------------------------
> _______________________________________________
> Omega-devel maillist  -  Omega-devel@www.omegahat.org
> http://www.omegahat.org/mailman/listinfo/omega-devel

r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch