[R] Best method to add unit information to dataframe ?
Marc Schwartz
marc_schwartz at me.com
Mon Oct 3 17:08:42 CEST 2011
On Oct 3, 2011, at 9:35 AM, bruno Piguet wrote:
> Dear all,
>
> I'd like to have a dataframe store information about the units of
> the data it contains.
>
> You'll find below a minimal exemple of the way I do, so far. I add a
> "units" attribute to the dataframe. But I dont' like the long syntax
> needed to access to the unit of a given variable (namely, something
> like :
> var_unit <- attr(my_frame, "units")[[match(var_name, attr(my_frame,
> "names"))]]
>
> Can anybody point me to a better solution ?
>
> Thanks in advance,
>
> Bruno.
>
>
> # Dataframe creation
> x <- c(1:10)
> y <- c(11:20)
> z <- c(101:110)
> my_frame <- data.frame(x, y, z)
> attr(my_frame, "units") <- c("x_unit", "y_unit")
>
> #
> # later on, using dataframe
> for (var_name in c("x", "y")) {
> idx <- match(var_name, attr(my_frame, "names"))
> var_unit <- attr(my_frame, "units")[[idx]]
> print (paste("max ", var_name, ": ", max(my_frame[[var_name]]), var_unit))
> }
The problem is that there are operations on data frames (e.g. subset()) that will end up stripping your attributes.
> str(my_frame)
'data.frame': 10 obs. of 3 variables:
$ x: int 1 2 3 4 5 6 7 8 9 10
$ y: int 11 12 13 14 15 16 17 18 19 20
$ z: int 101 102 103 104 105 106 107 108 109 110
- attr(*, "units")= chr "x_unit" "y_unit"
newDF <- subset(my_frame, x <= 5)
> str(newDF)
'data.frame': 5 obs. of 3 variables:
$ x: int 1 2 3 4 5
$ y: int 11 12 13 14 15
$ z: int 101 102 103 104 105
You might want to look at either ?comment or the ?label function in Frank's Hmisc package on CRAN, either to use or for example code on how he handles this.
HTH,
Marc Schwartz
More information about the R-help
mailing list