[R] Strange behavior with poisosn and glm

Tue Mar 2 16:05:24 CET 2010

On Tue, 2010-03-02 at 00:58 -0800, Noah Silverman wrote:
> Ted,
> 
> Brilliant explanation (as usual)
> 
> I'm back in school, just starting on a post-graduate degree in stats so 
> the help is really appreciated.
> 
> Now, I have a slightly trickier question about the same model.
> 
> I've seen more than one way to get "values" out of the glm model.
> 
> i.e.  If we're looking at the 10th item in the dataset:
> note: "m" is the model
> 
> fitted(m)[10]
> predict(m,dataset[10,])
> 
> Give me different results.  From my data, I get the following real results:
>  > predict(m,data[100,])
>       100
> 7.727999
>  > fitted(m)[100]
>       179
> 3956.637

I find that unlikely - why is one labelled 100 and the other 179, so
perhaps something is wrong here? 

However, that said, those two calls *will* give you different results
because with predict, we can have several types of predictions.
see ?predict.glm and note that the default is for type = "link", i.e.
top produce predictions on the scale of the linear predictor/link
function, which then need the inverse of the link function applying to
them.

What does

predict(m, data, type = "response")[100]

and 

fitted(m)[100]

yield?

Do you have missing values etc in your data?

G

> 
>  From my understanding, the exp of the prediction should be equal to the 
> fitted value.  Here it is not.  I don't understand why.  Any insight?
> 
> -N
> 
> 
> 
> On 3/2/10 12:47 AM, (Ted Harding) wrote:
> > On 02-Mar-10 08:02:27, Noah Silverman wrote:
> >    
> >> Hi,
> >> I'm just learning about poison links for the glm function.
> >>
> >> One of the data sets I'm playing with has several of the
> >> variables as factors (i.e. month, group, etc.)
> >>
> >> When I call the glm function with a formula that has a factor
> >> variable, R automatically converts the variable to a series of
> >> variables with unique names and binary values.
> >>
> >> For example, with this pseudo data:
> >>
> >> y        v1        month
> >> 2        1            january
> >> 3        1.4        februrary
> >> 1.5    6.3        february
> >> 1.2    4.5        january
> >> 5.5    4.0        march
> >>
> >> I use this call:
> >>
> >> m<- glm(y ~ v1 + month, family="poisson")
> >>
> >> R gives me back a model with variables of
> >> Intercept
> >> v1
> >> monthJanuary
> >> monthFebruary
> >> monthMarch
> >>
> >> I'm concerned that this might be doing some strange things
> >> to my model.
> >> Can anyone offer some enlightenment?
> >> Thanks!
> >>      
> > The creation of auxiliary variables is the way to incorporate
> > a factor variable into a model. These are usually called
> > "dummy variables", and are essentially indicator variables.
> >
> > Your data above would correspond to variables I (for Intercept),
> > J (for January), F (for February) and M (for March) in addition
> > to the other variables y and v1 as below:
> >
> >    y      v1    I   J   F   M   #   month
> >    2      1     1   1   0   0   #  january
> >    3      1.4   1   0   1   0   #  februrary
> >    1.5    6.3   1   0   1   0   #  february
> >    1.2    4.5   1   1   0   0   #  january
> >    5.5    4.0   1   0   0   1   #  march
> >
> > The linear predictor L in the model for y would then be
> >
> >    L = a*I + b*v1 + c1*J + c2*F + c3*J
> >
> > evaluated arithmetically; e.g. for row 2 of the data it is
> >
> >    a + b*1.4 + c2
> >
> > However, as given, J + F + M = I, so there is redundancy in
> > the variables, since there are only three independent values
> > there  (not so if you exclude the Intercept using a model
> > formula y ~ v1 + month - 1), so R will provide estimates
> > which are computed in terms of some pattern of differences
> > between these four variables called contrasts. Different
> > patterns of difference present different representations
> > of the three independent aspects.
> >
> > There are many different kinds of contrasts available.
> > One of these will be chosen as default by R (depending in
> > particular on whether the factor variable is being used
> > as an ordered factor or an unordered factor). See ?contrasts
> > for an outline of what is there, ?contrast for more detail,
> > and look at the help for particular contrasts such as
> > ?contr.helmert, ?contr.poly, ?contr.sum, ?contr.treatment.
> >
> > After all that: No, R is not doing strange things to your model!
> >
> > ted.
> >
> > --------------------------------------------------------------------
> > E-Mail: (Ted Harding)<Ted.Harding at manchester.ac.uk>
> > Fax-to-email: +44 (0)870 094 0861
> > Date: 02-Mar-10                                       Time: 08:47:11
> > ------------------------------ XFMail ------------------------------
> >
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%