[R] Recoding multiple columns consistently

Ron Crump
Wed Aug 29 02:01:18 CEST 2007


I have a dataframe that contains pedigree information;
that is individual, sire and dam identities as separate
columns. It also has date of birth.

These identifiers are not numeric, or not sequential.

Obviously, an identifier can appear in one or two columns,
depending on whether it was a parent or not. These should
be consistent.

Not all identifiers appear in the individual column - it
is possible for a parent not to have its own record if its
parents were not known.

Missing parental (sire and/or dam) identifiers can occur.

I need to export the data for use in another program that
requires the pedigree to be coded as integers, increasing
with date of birth (therefore sire and dam always have
lower identifiers than their offspring) and with missing
values coded as 0.

How would I go about doing this?

And a second, simpler related question, if I have a column with
n different values (may be strings or non-sequential integers)
identifying levels (possibly with repeated occurences), how
can I recode them to be sequential from 1 to n?

I can solve both problems in fortran, so could use loops to
do it in R, but feel there should be quicker, more elegant,
"more R" solution.

Thanks for your help.


