[R] Question about computing offsets automatically

Marc Schwartz MSchwartz at medanalytics.com
Thu Nov 6 21:31:49 CET 2003


On Thu, 2003-11-06 at 13:33, Louisell, Paul T. wrote:
> Hi,
> 
> I'm using R version 1.8.0 on Windows NT. When fitting a glm with Poisson
> random component and a log link, I frequently need to include an offset.
> Typically I use xtabs or table to get the counts for the contingency table,
> and then I use as.data.frame.table to create a data frame that I can use in
> the glm function. I have not found an option that allows me to total the
> offset variable to obtain offsets for cells in the contingency table. 
> 
> For example, suppose I have the following data frame named Data:
> 
>   F1 F2 Off
> 1  A  C   4
> 2  A  C   3
> 3  A  C   2
> 4  B  C   3
> 5  A  D   2
> 6  A  D   4
> 7  B  D   1
> 
> xtabs(~F1+F2, data=Data) produces the contingency table:
> 
>    F2
> F1  C D
>   A 3 2
>   B 1 1
> 
> And as.data.frame.table(xtabs(~F1+F2, data=Data)) changes the contingency
> table to a data frame suitable for use in the glm function:
> 
>   F1 F2 Freq
> 1  A  C    3
> 2  B  C    1
> 3  A  D    2
> 4  B  D    1
> 
> What I'm looking for is some option that would add a 4th column to the
> output of as.data.frame.table which contains the offsets for each cell in
> the contingency table:
> 
>   F1 F2 Freq  Off
> 1  A  C    3    9
> 2  B  C    1    3
> 3  A  D    2    6
> 4  B  D    1    1
> 
> Does such an option exist somewhere in R (I wasn't able to find it in the
> documentation for the table, xtabs, as.data.frame.table, or glm functions)?
> I can obtain the Off column easily enough in a simple loop, but I thought
> there might be an option for this somewhere.


I don't know of an easy 'option' approach, but you can use aggregate()
to get the sums and then do a cbind() to add the fourth column:

> aggregate(Data$Off, list(F1 = Data$F1, F2 = Data$F2), sum)
  F1  F2 x
1  A   C 9
2  B   C 3
3  A   D 6
4  B   D 1

So:

> df <- as.data.frame.table(xtabs(~F1+F2, data = Data))
> df
  F1  F2 Freq
1  A   C    3
2  B   C    1
3  A   D    2
4  B   D    1

> Off <- aggregate(Data$Off, list(F1 = Data$F1, F2 = Data$F2), sum)$x
> Off
[1] 9 3 6 1

> cbind(df, Off)
  F1  F2 Freq Off
1  A   C    3   9
2  B   C    1   3
3  A   D    2   6
4  B   D    1   1
 

HTH,

Marc Schwartz




More information about the R-help mailing list