[R] Strange behavior with poisosn and glm

Noah Silverman noah at smartmediacorp.com
Tue Mar 2 09:58:52 CET 2010


Ted,

Brilliant explanation (as usual)

I'm back in school, just starting on a post-graduate degree in stats so 
the help is really appreciated.

Now, I have a slightly trickier question about the same model.

I've seen more than one way to get "values" out of the glm model.

i.e.  If we're looking at the 10th item in the dataset:
note: "m" is the model

fitted(m)[10]
predict(m,dataset[10,])

Give me different results.  From my data, I get the following real results:
 > predict(m,data[100,])
      100
7.727999
 > fitted(m)[100]
      179
3956.637

 From my understanding, the exp of the prediction should be equal to the 
fitted value.  Here it is not.  I don't understand why.  Any insight?

-N



On 3/2/10 12:47 AM, (Ted Harding) wrote:
> On 02-Mar-10 08:02:27, Noah Silverman wrote:
>    
>> Hi,
>> I'm just learning about poison links for the glm function.
>>
>> One of the data sets I'm playing with has several of the
>> variables as factors (i.e. month, group, etc.)
>>
>> When I call the glm function with a formula that has a factor
>> variable, R automatically converts the variable to a series of
>> variables with unique names and binary values.
>>
>> For example, with this pseudo data:
>>
>> y        v1        month
>> 2        1            january
>> 3        1.4        februrary
>> 1.5    6.3        february
>> 1.2    4.5        january
>> 5.5    4.0        march
>>
>> I use this call:
>>
>> m<- glm(y ~ v1 + month, family="poisson")
>>
>> R gives me back a model with variables of
>> Intercept
>> v1
>> monthJanuary
>> monthFebruary
>> monthMarch
>>
>> I'm concerned that this might be doing some strange things
>> to my model.
>> Can anyone offer some enlightenment?
>> Thanks!
>>      
> The creation of auxiliary variables is the way to incorporate
> a factor variable into a model. These are usually called
> "dummy variables", and are essentially indicator variables.
>
> Your data above would correspond to variables I (for Intercept),
> J (for January), F (for February) and M (for March) in addition
> to the other variables y and v1 as below:
>
>    y      v1    I   J   F   M   #   month
>    2      1     1   1   0   0   #  january
>    3      1.4   1   0   1   0   #  februrary
>    1.5    6.3   1   0   1   0   #  february
>    1.2    4.5   1   1   0   0   #  january
>    5.5    4.0   1   0   0   1   #  march
>
> The linear predictor L in the model for y would then be
>
>    L = a*I + b*v1 + c1*J + c2*F + c3*J
>
> evaluated arithmetically; e.g. for row 2 of the data it is
>
>    a + b*1.4 + c2
>
> However, as given, J + F + M = I, so there is redundancy in
> the variables, since there are only three independent values
> there  (not so if you exclude the Intercept using a model
> formula y ~ v1 + month - 1), so R will provide estimates
> which are computed in terms of some pattern of differences
> between these four variables called contrasts. Different
> patterns of difference present different representations
> of the three independent aspects.
>
> There are many different kinds of contrasts available.
> One of these will be chosen as default by R (depending in
> particular on whether the factor variable is being used
> as an ordered factor or an unordered factor). See ?contrasts
> for an outline of what is there, ?contrast for more detail,
> and look at the help for particular contrasts such as
> ?contr.helmert, ?contr.poly, ?contr.sum, ?contr.treatment.
>
> After all that: No, R is not doing strange things to your model!
>
> ted.
>
> --------------------------------------------------------------------
> E-Mail: (Ted Harding)<Ted.Harding at manchester.ac.uk>
> Fax-to-email: +44 (0)870 094 0861
> Date: 02-Mar-10                                       Time: 08:47:11
> ------------------------------ XFMail ------------------------------
>



More information about the R-help mailing list