[Rd] Re: [Omega-devel] StatDataML
Friedrich Leisch
Friedrich.Leisch@ci.tuwien.ac.at
Mon, 6 Mar 2000 12:27:25 +0100 (CET)
>>>>> On Fri, 3 Mar 2000 16:36:36 -0500 (EST),
>>>>> David James (DJ) wrote:
DJ> Hi,
DJ> I just had a very quick look at the StatDataML proposal --- nice
DJ> work! At the risk of showing my ignorance, I want to mention
DJ> my first impressions.
DJ> My first impression is that defining datasets in terms of
DJ> arrays and list is a bit too high a level. What about
DJ> simpler vectors, scalars? (I know that R/S don't have scalars,
DJ> but other systems/applications do.) Can we think of a core
DJ> set of "basic" data types (factors, strings, integers, etc.)
DJ> from which to build on other, possibly recursive types (perhaps
DJ> similar to corba's IDL basic data types or S's datadump?).
Hmm, basically we have that ... just that I don't see why it's
necessary to differentiate between a vector (=1-dimensional array) and
higher dimensions, i.e., introduce different tags for it. But if many
others feel like this is necessary: I don't have s trong opinion about
it, we just wanted to keep the thing as simple as possible.
Regarding data types: Torsten and I just discussed that we want to
keep the mode of an array as abstract as possible such that
applications can use the internal representation that fits the data
best.
IMO the following modes will be necessary to represent statistical
data:
logical, nominal, ordinal, integer, real, complex
DJ> Would it make sense to imagine, say xlispstat/python/java applications
DJ> reading and interpreting an StatDataML document without serious difficulties?
Sure! What's the difference?
DJ> My gut feeling (which is often wrong) is that the DTD should make
DJ> the data self-describing: e.g., the factor "machineId" has
DJ> levels (or defining set) "Stepper1", "Stepper2", ... "Stepper20",
DJ> eventhough the particular dataset at hand has only a
DJ> subset of those. Similarly, perhaps allowing units and classes
DJ> to be included in the dataset (in the case of currency, it is certainly
DJ> a number, perhaps single precision, perhaps not, with specific units
DJ> dollars, euros, pesos, etc.)
DJ> More long-term, how about application-defined data? Application may have
DJ> it's own set of data objects that fully exploits contextual
DJ> information that could be extremely useful to capture and
DJ> communicate.
We definitely need (and want) any user to be able to exctend
StatDataML, i.e., define new classes. There should be a set of
standard classes (like dataframe or time series), but also interfaces
for defining new classes.
The current idea (in R) is to have the following: If the SDML object
has a class and there exists a conversion function for that particular
class then use it, otherwise do the default thing.
The conversion function shouldn't do to much, probably mostly renaming
some slots and re-organizing the structure (as claases on different
systems will probably have different structures).
DJ> Also, do the data have to be in ASCII format? What about
DJ> (possibly mime-encoded) images? sound?
Hmm, haven't thought about that yet.
DJ> As I mentioned, these are questions coming from my lack of experience
DJ> with XML, but may be worth raising now better than later :-)
YES!!! That's why we called it ``proposal'' rather than
``StatDataML version 1.0'' :-)
Best,
Fritz
PS: We are also no XML experts!
--
-------------------------------------------------------------------
Friedrich Leisch
Institut für Statistik Tel: (+43 1) 58801 10715
Technische Universität Wien Fax: (+43 1) 58801 10798
Wiedner Hauptstraße 8-10/1071 Friedrich.Leisch@ci.tuwien.ac.at
A-1040 Wien, Austria http://www.ci.tuwien.ac.at/~leisch
PGP public key http://www.ci.tuwien.ac.at/~leisch/pgp.key
-------------------------------------------------------------------
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._