[R-sig-ME] Managing person identifier variable

Wed Oct 5 21:28:31 CEST 2016

> On Oct 5, 2016, at 2:21 PM, Theodore Lytras <thlytras at gmail.com> wrote:
> 
> Στις Τετάρτη, 5 Οκτωβρίου 2016 6:59:30 Μ.Μ. EEST MACDOUGALL Margaret έγραψε:
>> I would be most grateful for some advice in relation to the interpretation
>> of a person identifier variable (persID, say),  in R. I would like to
>> represent persons, as an independent variable, by a random effect. However,
>> there are over 200 such persons. Each person is allocated a random
>> numerical code as a unique identifier.  Currently, R is reading the
>> identifier variable as a numeric variable. Is there a quick way of
>> addressing this problem by recoding the variable?  (I do not wish to bin
>> the values into category ranges; rather, I wish to avoid the numerical
>> codes being interpreted literally.)
> 
> Just recode it as a factor, i.e. factor(persID).
> 
> By the way, lme4 does that implicitly if you specify a numeric variable as a 
> random effect in a model formula, i.e. you can just say: y ~ x + (1|persID) 
> instead of: y ~ x + (1|factor(persID))

Just a quick pointer here which is that if the persID values contained leading zeros that are a material part of the unique IDs, such as:

  01234
  001234

then coercing to factors, after having been coerced to numeric values, will result in both of the above being 1234:

> factor(as.numeric("01234"))
[1] 1234
Levels: 1234

> factor(as.numeric("001234"))
[1] 1234
Levels: 1234

Food for thought...

Regards,

Marc Schwartz

	[[alternative HTML version deleted]]