[R] Warning: as.numeric reorders factor data
Frank E Harrell Jr
fharrell at virginia.edu
Sun Dec 8 16:07:03 CET 2002
On Sun, 08 Dec 2002 10:03:54 -0500
Bud Gibson <fpgibson at umich.edu> wrote:
> Recently, I was using aggregate() to develop averages by trial for an
> experiment I was running. Trials were indicated as ordinal numbers for
> each subject. aggregate() turned trial into factors during the
> aggregation process. I then wanted to create a scatter plot of subject
> performance by trial, so I applied as.numeric to the (now) factor
> variable trial. as.numeric reordered the trial indicator creating some
> (at first) incomprehensible results.
>
> Investigation revealed that aggregate must first be interpreting trial
> as a character and then turning it into a factor. The behavior I
> observed is reproducible from the following transcript using R1.6.1 on
> RH linux 7.3.
>
> > test <- as.factor(as.character(c(1,2,3,4,5,6,7,8,9,10,11)))
> > test
> [1] 1 2 3 4 5 6 7 8 9 10 11
> Levels: 1 10 11 2 3 4 5 6 7 8 9
> > as.numeric(test)
> [1] 1 4 5 6 7 8 9 10 11 2 3
>
> It strikes me that as.numeric should *never* reorder the vector it is
> working on. There is this workaround for the problem:
>
> > as.numeric(as.character(test))
> [1] 1 2 3 4 5 6 7 8 9 10 11
>
> However, I should not have to know about the internals of aggregate to
> be able to use its results.
>
> Bud Gibson
One of the reasons for being of the summarize function in the Hmisc library (http://hesweb1.med.virginia.edu/biostat/s/Hmisc.html) is that it preserves the nature of the stratification variables. summarize produces data frames that are like the original data except with the response variables replaced by scalar or vector statistical summaries.
--
Frank E Harrell Jr Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat
More information about the R-help
mailing list