[R] Recoding categorical gender variable into numeric factors

Ista Zahn istazahn at gmail.com
Wed Sep 5 22:14:02 CEST 2012


Hi Conrad,

On Wed, Sep 5, 2012 at 3:14 PM, Conradsb <csbaldne at vt.edu> wrote:
> I currently have a data set in which gender is inputed as "Male" and "Female"
> , and I'm trying to convert this into "1" and "0".

This is usually not necessary, and makes things more confusing. "Male"
and "Female" is clear and self-explanatory: "0" and "1" are not.
>
> I found a website which reccomended using two commands:
>
> data$scode[data$sex=="M"] <- "1"
> data$scode[data$sex=="F"] <- "2"

Nope, "1" is the character 1, not the number 1 in R. Also, you said
the values were "Male" and "Female", not "F" and "M". To convert
"Male" to 1 and "Female" to 2 you can use

data$scode[data$sex=="Male"] <- 1
> data$scode[data$sex=="Female"] <- 2

Notice "Male" and "Female", instead of "M" and "F", and 1 and 2
instead of "1" and "2"


>
> to convert to numbers, and:
>
> data$scode <- factor(data$scode)
>
> to convert this variable to a factor.

No need to convert it to a factor first. Just use

data$sex <- factor(data$sex)

>
>
>
> My issue is that, after I use the first command, *only* the female values
> get converted to a number. I am left with a column filled with 2's and blank
> spaces.

Strange, especially if sex is actually "Male" and "Female", in which
case scode should be all NA. If you want to follow up on this, please
post the result of

dput(dat["sex"])

Instead of typing both lines of the first command, I copy and pasted
> the first line and changed the letter representing gender. I also made sure
> that both letters were exactly as they appear in the dataset.
>
> My questions are: is there any visible issue with my syntax, and are there
> any other methods to accomplish this?

In this case you don't actually need to convert to numeric. Just use

data$scode <- factor(scode)

If you really need to convert characters to numbers, it is often
convenient to use factors as intermediate steps, like this:

dat <- data.frame(sex=sample(c("Male", "Female"), 10, replace=TRUE))

dat$sex.n <- as.numeric(
  as.character(
    factor(
      dat$sex,
      levels = c("Female", "Male"),
      labels = c("0", "1"))))

Best,
Ista
>
> I'm also very new to R, so complex syntax is beyond me.
>
> Conrad Baldner
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Recoding-categorical-gender-variable-into-numeric-factors-tp4642316.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list