[R] Strange behavior with poisosn and glm
Gavin Simpson
gavin.simpson at ucl.ac.uk
Tue Mar 2 16:05:24 CET 2010
On Tue, 2010-03-02 at 00:58 -0800, Noah Silverman wrote:
> Ted,
>
> Brilliant explanation (as usual)
>
> I'm back in school, just starting on a post-graduate degree in stats so
> the help is really appreciated.
>
> Now, I have a slightly trickier question about the same model.
>
> I've seen more than one way to get "values" out of the glm model.
>
> i.e. If we're looking at the 10th item in the dataset:
> note: "m" is the model
>
> fitted(m)[10]
> predict(m,dataset[10,])
>
> Give me different results. From my data, I get the following real results:
> > predict(m,data[100,])
> 100
> 7.727999
> > fitted(m)[100]
> 179
> 3956.637
I find that unlikely - why is one labelled 100 and the other 179, so
perhaps something is wrong here?
However, that said, those two calls *will* give you different results
because with predict, we can have several types of predictions.
see ?predict.glm and note that the default is for type = "link", i.e.
top produce predictions on the scale of the linear predictor/link
function, which then need the inverse of the link function applying to
them.
What does
predict(m, data, type = "response")[100]
and
fitted(m)[100]
yield?
Do you have missing values etc in your data?
G
>
> From my understanding, the exp of the prediction should be equal to the
> fitted value. Here it is not. I don't understand why. Any insight?
>
> -N
>
>
>
> On 3/2/10 12:47 AM, (Ted Harding) wrote:
> > On 02-Mar-10 08:02:27, Noah Silverman wrote:
> >
> >> Hi,
> >> I'm just learning about poison links for the glm function.
> >>
> >> One of the data sets I'm playing with has several of the
> >> variables as factors (i.e. month, group, etc.)
> >>
> >> When I call the glm function with a formula that has a factor
> >> variable, R automatically converts the variable to a series of
> >> variables with unique names and binary values.
> >>
> >> For example, with this pseudo data:
> >>
> >> y v1 month
> >> 2 1 january
> >> 3 1.4 februrary
> >> 1.5 6.3 february
> >> 1.2 4.5 january
> >> 5.5 4.0 march
> >>
> >> I use this call:
> >>
> >> m<- glm(y ~ v1 + month, family="poisson")
> >>
> >> R gives me back a model with variables of
> >> Intercept
> >> v1
> >> monthJanuary
> >> monthFebruary
> >> monthMarch
> >>
> >> I'm concerned that this might be doing some strange things
> >> to my model.
> >> Can anyone offer some enlightenment?
> >> Thanks!
> >>
> > The creation of auxiliary variables is the way to incorporate
> > a factor variable into a model. These are usually called
> > "dummy variables", and are essentially indicator variables.
> >
> > Your data above would correspond to variables I (for Intercept),
> > J (for January), F (for February) and M (for March) in addition
> > to the other variables y and v1 as below:
> >
> > y v1 I J F M # month
> > 2 1 1 1 0 0 # january
> > 3 1.4 1 0 1 0 # februrary
> > 1.5 6.3 1 0 1 0 # february
> > 1.2 4.5 1 1 0 0 # january
> > 5.5 4.0 1 0 0 1 # march
> >
> > The linear predictor L in the model for y would then be
> >
> > L = a*I + b*v1 + c1*J + c2*F + c3*J
> >
> > evaluated arithmetically; e.g. for row 2 of the data it is
> >
> > a + b*1.4 + c2
> >
> > However, as given, J + F + M = I, so there is redundancy in
> > the variables, since there are only three independent values
> > there (not so if you exclude the Intercept using a model
> > formula y ~ v1 + month - 1), so R will provide estimates
> > which are computed in terms of some pattern of differences
> > between these four variables called contrasts. Different
> > patterns of difference present different representations
> > of the three independent aspects.
> >
> > There are many different kinds of contrasts available.
> > One of these will be chosen as default by R (depending in
> > particular on whether the factor variable is being used
> > as an ordered factor or an unordered factor). See ?contrasts
> > for an outline of what is there, ?contrast for more detail,
> > and look at the help for particular contrasts such as
> > ?contr.helmert, ?contr.poly, ?contr.sum, ?contr.treatment.
> >
> > After all that: No, R is not doing strange things to your model!
> >
> > ted.
> >
> > --------------------------------------------------------------------
> > E-Mail: (Ted Harding)<Ted.Harding at manchester.ac.uk>
> > Fax-to-email: +44 (0)870 094 0861
> > Date: 02-Mar-10 Time: 08:47:11
> > ------------------------------ XFMail ------------------------------
> >
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Dr. Gavin Simpson [t] +44 (0)20 7679 0522
ECRC, UCL Geography, [f] +44 (0)20 7679 0565
Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/
UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
More information about the R-help
mailing list