[R-sig-ME] factor / intiger

Daniel Ezra Johnson danielezrajohnson at gmail.com
Tue Aug 3 15:50:35 CEST 2010


If a predictor is numeric/integer then it will usually be assigned one
parameter in the model (corresponding to a slope, change in
response/change in predictor).

If the predictor is a factor with k levels then there will usually be
(k-1) parameters for that predictor in the model.

In practice I think your model is going to fit better with the
predictors as factors but - see above - be much more complex, and the
AIC is one way of assessing that tradeoff.

Without knowing what the integers represent I can't really say which
approach is better. If you think there is a linear relationship
between the predictor(s) and the (log-odds of the) response, then you
may well be justified (perhaps others will disagree) in running the
predictors as numeric even though they happen to take only integer
values...

Dan

On Tue, Aug 3, 2010 at 9:43 AM, Sam <Sam_Smith at me.com> wrote:
> Dear Dan,
>
> Thanks for this,
>
> I was not working back from the AIC i was just unsure why they are different - in what way are they a different model?
>
> If i have categorical predictors i should code them as factors in GLMM - correct?
>
> Thanks
>
> Sam
> On 3 Aug 2010, at 14:40, Daniel Ezra Johnson wrote:
>
> Dear Sam,
>
> When the factor levels are numbers, you have to do:
>
>> A <- as.factor(as.character(A))
>
> Regarding your other question, it's an entirely different model, if
> you treat the predictors as linear/numeric or as factors. You should
> choose based on what the predictor(s) is/are, probably not working
> backwards from the AIC.
>
> Dan
>
> On Tue, Aug 3, 2010 at 9:34 AM, Sam <Sam_Smith at me.com> wrote:
>> Dear List
>>
>> I have a excel spread sheet with 5 columns that contain categorical data. I have recoded them to numbers
>>
>> A       B       C
>> 0       0       0
>> 1       1       1
>> 2       2       2
>> 3       3
>> 4
>> 5
>>
>> etc
>>
>> When i read it into R and do str(dataframe) i get -
>>
>>  $ A       : int  1 1 1 1 1 1 1 1 1 1 ...
>>  $ B    : int  1 1 1 1 1 1 1 1 1 1 ...
>>  $ C       : int  0 0 0 0 0 0 0 0 0 0 ...
>>  $ D    : int  0 0 0 0 0 0 0 0 0 0 ...
>>  $ E : int  0 0 0 0 0 0 0 0 0 0 ...
>>
>> I then realised they should probably be factors instead of integers so used as.factor to convert them -
>>
>> A <- as.factor(A)
>>
>> Now when i run the GLMM the AIC values are different from when they were integers, i have 2 questions
>>
>> 1. Should i not have converted the categories to numbers in the excel spreadsheet before import.
>>
>> 2. Why are the AIC values different when i use as.factor as opposed to keeping them as integers, and which approach is recommended?
>>
>> Thanks
>>
>> Sam
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>




More information about the R-sig-mixed-models mailing list