[R] Recoding multiple columns consistently
Uwe Ligges
ligges at statistik.uni-dortmund.de
Wed Aug 29 09:59:18 CEST 2007
Ron Crump wrote:
> Hi,
>
> I have a dataframe that contains pedigree information;
> that is individual, sire and dam identities as separate
> columns. It also has date of birth.
>
> These identifiers are not numeric, or not sequential.
>
> Obviously, an identifier can appear in one or two columns,
> depending on whether it was a parent or not. These should
> be consistent.
>
> Not all identifiers appear in the individual column - it
> is possible for a parent not to have its own record if its
> parents were not known.
>
> Missing parental (sire and/or dam) identifiers can occur.
>
> I need to export the data for use in another program that
> requires the pedigree to be coded as integers, increasing
> with date of birth (therefore sire and dam always have
> lower identifiers than their offspring) and with missing
> values coded as 0.
>
> How would I go about doing this?
>
> And a second, simpler related question, if I have a column with
> n different values (may be strings or non-sequential integers)
> identifying levels (possibly with repeated occurences), how
> can I recode them to be sequential from 1 to n?
rank(x, ties.method="first")
For the question above you can do as follows, for example:
order() identifiers by date, make them unique() and assign them to a new
"levels" object. Then make them ordered factors:
factor(some_column, levels=levels, ordered = TRUE)
and then as.numeric(factor_object) is what you are going to get.
Uwe Ligges
> I can solve both problems in fortran, so could use loops to
> do it in R, but feel there should be quicker, more elegant,
> "more R" solution.
>
> Thanks for your help.
>
> Ron.
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list