[R] reference category for factor in regression

Stephan Kolassa Stephan.Kolassa at gmx.de
Mon Jan 19 17:47:51 CET 2009


Hi Jos,

you can force R to set contrasts for factors the way you like them with 
contrasts(). You seem to be thinking of treatment contrasts, which are 
most easily interpreted, but there are also others.

However: are you sure you want to bin an age variable into categories? 
You will lose power, along with a lot of other unpleasant things:
http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/CatContinuous

With five categories, you are giving up 4 df. I'd recommend looking into 
splines, where you should be able to get more bang for the buck. Look at 
rcs() in the Design package, and at Frank Harrell's excellent book 
"Regression Modeling Strategies".

Of course, if you only have the binned data, all this is irrelevant...

HTH,
Stephan


Jos Elkink schrieb:
> Hi Thierry,
> 
> Thanks for your quick answer. The problem is not so much the LABOUR
> variable, however, but the AGE variable, which consists of about 5
> categories for which I do indeed not create separate dummy variables.
> But R does not behave as expected when deciding on which dummy to use
> as reference category ...
> 
> Jos
> 
> On Mon, Jan 19, 2009 at 2:37 PM, ONKELINX, Thierry
> <Thierry.ONKELINX at inbo.be> wrote:
>> Dear Jos,
>>
>> In R you don't need to create you own dummy variables. Just create a
>> factor variable LABOUR (with two levels) and rerun your model. Then you
>> should be able to calculate all coefficients.
>>
>> HTH,
>>
>> Thierry
>>
>> ------------------------------------------------------------------------
>> ----
>> ir. Thierry Onkelinx
>> Instituut voor natuur- en bosonderzoek / Research Institute for Nature
>> and Forest
>> Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
>> methodology and quality assurance
>> Gaverstraat 4
>> 9500 Geraardsbergen
>> Belgium
>> tel. + 32 54/436 185
>> Thierry.Onkelinx at inbo.be
>> www.inbo.be
>>
>> To call in the statistician after the experiment is done may be no more
>> than asking him to perform a post-mortem examination: he may be able to
>> say what the experiment died of.
>> ~ Sir Ronald Aylmer Fisher
>>
>> The plural of anecdote is not data.
>> ~ Roger Brinner
>>
>> The combination of some data and an aching desire for an answer does not
>> ensure that a reasonable answer can be extracted from a given body of
>> data.
>> ~ John Tukey
>>
>> -----Oorspronkelijk bericht-----
>> Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
>> Namens Jos Elkink
>> Verzonden: maandag 19 januari 2009 15:16
>> Aan: r-help at r-project.org
>> Onderwerp: [R] reference category for factor in regression
>>
>> Hi all,
>>
>> I am struggling with a strange issue in R that I have not encountered
>> before and I am not sure how to resolve this.
>>
>> The model looks like this, with all irrelevant variables left out:
>>
>> LABOUR - a dummy variable
>> NONLABOUR = 1 - LABOUR
>> AGE - a categorical variable / factor
>> VOTE - a dummy variable
>>
>> glm(VOTE ~ 0 + LABOUR + NONLABOUR + LABOUR : AGE + NONLABOUR : AGE,
>> family=binomial(link="logit"))
>>
>> In other words, a standard interaction model, but I want to know the
>> intercepts and coefficients for each of the two cases (LABOUR and
>> NONLABOUR), instead of getting coefficients for the differences as in
>> a normal interaction model.
>>
>> But the strange thing is, for the two occurances of the AGE variable,
>> it makes a different choice as to which AGE category to leave out of
>> the regression. The cross-table of AGE with LABOUR does not have empty
>> cells.
>>
>> Anyone any idea what might be going wrong? Or what I could do about
>> this?
>>
>> Thanks in advance for any help!
>>
>> Regards,
>>
>> Jos
>>
>> --
>> Johan A. Elkink
>> Lecturer
>> School of Politics and International Relations & CHS Graduate School
>> University College Dublin
>> Ph. +353 1 716 7026  |  Library Building, Rm 512
>> http://jaeweb.cantr.net
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>> Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer
>> en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
>> door een geldig ondertekend document. The views expressed in  this message
>> and any annex are purely those of the writer and may not be regarded as stating
>> an official position of INBO, as long as the message is not confirmed by a duly
>> signed document.
>>
> 
> 
>




More information about the R-help mailing list