[Rd] Suggestion: Dimension-sensitive attributes
Bengoechea Bartolomé Enrique (SIES 73)
enrique.bengoechea at credit-suisse.com
Thu Jul 9 16:14:32 CEST 2009
Very good points. They closely match the current prototype I have written...
> Starting by working on an interface for such object(s) is probably the first step toward a unified solution
Agree. Getting a good API is always the most important step.
> Dimension-level is what seems to the be most needed...
True, and that was Henrik's original suggestion. But I find all three are closely related to the same topic (metadata) and as such deserve to be worked out together, but if most people agree otherwise, the direction is clear.
> - Object-level, if not linked to any dimension-attribute is such saying that one want to attach anything to any object. That's what attr() is already doing.
Except that plain attributes are dropped when subsetting. I've found myself dozens of times creating classes must to create a `[` method for them that preserves some attributes. This looks like such a common situation that having a mechanism to avoid the user programming the same stuff again and again would be handy.
> - Cell-level, is may be out-of-scope for one first trial (but may be I missed the use-cases for it)
Although I agree that cell-level is far less common, here are a couple of use cases I've hit recently:
1) the array represents time series in columns. The original data comes in a different frequency for each column, with some data missing. When you align to a common frequency and interpolate missing values, I needed a factor array of the same dimension as the data array identifying whether each observation corresponded to the actual original series, or had been interpolated, and whether interpolation was due to missing data or to frequency alignment. Of course, I needed the factor array to be subsetted together with the array.
2) the array is a table representing data to be formatted by a reporting system (Sweave, R2HTML, etc), similar to the 'xtable' class. So I needed to associate formatting information to each individual "cell" (font, color, borders...), as well to each dimension and to the whole table.
Anyway, it's far easier to add "cell-level" metadata on top of the other features with a new class: for `[` subscripting just call NextMethod() and then apply the same indexes to the object storing the cell-level metadata. But I still think it's useful to work out data object's metadata at all possible levels with a unified interface.
About the subscripting `[` methods, I don't see the need to modify `[<-` for arrays, as out-of-bound indexes generate errors with arrays (unlike vectors or data frames), so `[<-` would only replace data and leave metadata untouched. Am I missing something?
> may be a function called "dimmeta()" (for consistency with "dimnames()") ?
I'm using 'dimdata' in my current prototype, and Henrik suggested 'dimattr', but I really like your proposal more.
Wrappers to the two first elements of 'dimmeta' for 2-dim arrays could be added in the same vein as 'rownames' and 'colnames': 'rowmeta' and 'colmeta'.
> The signature could be dimmeta(x, i), with x the object,
For consistency with 'dimnames', the 'i' argument could be dropped and use dimmeta(x)[[i]] instead...
Other standard generics to be affected would be:
* rbind & cbind for 2-dim arrays/matrices: they should combine the metadata, and for dimension-sensitive metadata can be modelled upon what is done with dimnames: use rowmeta (colmeta) of the first object with them in cbind (rbind), and combine colmeta (rowmeta) of all objects with them, filling with NAs/NULLs/.. for non metadata-sensitive objects being combined. An issue of coercing dimmeta of different classes may arise.
* `dim<-`, but this may raise the same problem of coercing dimmeta of different classes.
...and I agree with the rest of your comments.
Best,
Enrique
-----Original Message-----
From: Laurent Gautier [mailto:lgautier at gmail.com]
Sent: jueves, 09 de julio de 2009 14:15
Cc: Heinz Tuechler; Bengoechea Bartolomé Enrique (SIES 73); Tony Plate; Henrik Bengtsson; r-devel at r-project.org
Subject: Re: [Rd] Suggestion: Dimension-sensitive attributes
Starting by working on an interface for such object(s) is probably the first step toward a unified solution, and this before about if and how R attributes are used.
It would also help to ensure a smooth transition from the existing classes implementing a similar solution (first the interface is added to those classes, then after a grace period the classes are eventually refactored).
Dimension-level is what seems to the be most needed... but I am not convinced of the practicality of the object-level, and cell-level scheme s proposed:
- Object-level, if not linked to any dimension-attribute is such saying that one want to attach anything to any object. That's what attr() is already doing.
- Cell-level, is may be out-of-scope for one first trial (but may be I missed the use-cases for it)
If starting with behaviour, it seems to boil to having "["/"[<-" and
"dimmeta()"/"dimmeta<-()", :
- extract "[" / replace "[<-" :
* keeps working the way it already does
* extracts a subset of the object as well as a subset of the
dimension-associated metadata.
* departing too much from the way "[" is working and add
behind-the-curtain name matching will only compromise the chances of
adoption.
* forget about the bit about which metadata is kept and which one
isn't when using "[". Make a function "unmeta()" (similar behavior to
"unname()") to drop them all, or work it out with something like
> dimmeta(x, 1) <- NULL # drop the metadata associated with dimension 1
- access the dimension-associated metadata:
* may be a function called "dimmeta()" (for consistency with
"dimnames()") ? The signature could be dimmeta(x, i), with x the object,
and i the dimension requested. A replace function "dimmeta<-"(x, i,
value) would be provided.
In the abstract the "names" associated with a given dimension is just
one of possible metadata, but I'd keep away from meddling with it for a
start.
It would seem natural that metadata associated with one dimension:
would a table-like object (data.frame seems natural in R, and
unfortunately there is no data.frame-like structure in R).
L.
More information about the R-devel
mailing list