[R] function table
Gabor Grothendieck
ggrothendieck at myway.com
Fri Feb 11 19:28:01 CET 2005
Carsten Steinhoff <carsten.steinhoff <at> stud.uni-goettingen.de> writes:
:
: Hi,
:
: my problem is the following:
:
: I have a large database of insurance-damage data and want to model the
: frequency of these events. So to fit a distribution on my frequency-data I
: want to count the number of events in each month via the date of occurrence.
: Therefor I use this command which works very well:
:
: count_table <- table(months(date_occ),years(date_occ))
:
: But there is another column in my database called "weight". So I don't want
: to count EACH event by "1", some are e.g. only counted by "half an event".
: How could I modify my table function, that the output is not a simple
: counting but a counting of the weights?
I assume you are using chron since months and years is defined in
that package so assuming some dummy data:
library(chron)
set.seed(1)
x <- chron(1:1000)
w <- runif(1000)
We could do this:
result.matrix <- tapply(w, list(month = months(x), year = years(x)), sum)
# This gives a matrix but you could coerce it into a table
# if you like:
result.table <- as.table(result.matrix)
# and that could be coerced to a data frame
result.df <- as.data.frame(result.table)
Note that you cannot coerce the matrix directly to a data frame
since that won't give you the right form.
:
: And - another question how can I directly access the columns of my table? I
: tried e.g. count_table$2004 (because one column is named 2004), or
: count_table[2004] ... nothing worked. The access via column - number
: (count_table[,9]) woks, but will be to unflexible for the future.
Tables cannot be indexed by name. I think this is significant problem
for users since margining and conditioning a table often changes the
positions of dimensions so its very confusing to access them since
one has to manually keep track of the changing positions of variables.
There are some functions which allow one to specify the variable name
but that's a significant limitation since it means that every function
that has to access a table has to reinvent the mapping.
You could do it yourself by redefining [.table or slightly easier but
not as slick would be define a function which does the mapping
for you. You might design it so it can be used like this:
count_table[,m(2004)] # m is specific to count_table
or maybe like this:
m(count_table,,2004) # more general m possible
or define a table-like S3 or S4 class which does it.
Another possibility is to use the data frame form. For example,
subset(result.df, subset = year == 1970)
If you are doing to move back and forth between tables and
data frames then you should check out ?xtabs .
More information about the R-help
mailing list