[R] coding for categorical variables with unequal observations

Nordlund, Dan (DSHS/RDA) NordlDJ at dshs.wa.gov
Thu Apr 3 23:45:25 CEST 2008

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Tanya Yatsunenko
> Sent: Thursday, April 03, 2008 1:55 PM
> To: r-help at r-project.org
> Subject: [R] coding for categorical variables with unequal 
> observations
> Hi,
> I am doing multiple regression, and have several X variables that are 
> categorical.
> I read that I can use dummy or contrast codes for that, but are there 
> any special rules when there're unequal #observations in each 
> groups (4 
> females vs 7 males in a "gender" variable)?
> Also, can R generate these codes for me?
> THanks.

You don't need to do anything special, and yes you can just let SAS do it for you.  For most of the regression PROCs you can put your categorical variables in a CLASS statement.  Depending on which procedure you are using, you may be able to specify whether you want effects or dummy coding, and which level of the categorical variable should be the "comparison" level.  It is also possible to use PROC GLMMOD to create your design variables to be fed into other PROCs.  Other approaches are possible as well.

If you provide more detail on what analyses you plan to undertake, someone may be able to provide more specific advice.

Hope this is helpful,


Daniel J. Nordlund
Research and Data Analysis
Washington State Department of Social and Health Services
Olympia, WA  98504-5204

More information about the R-help mailing list