[R] Weird problem with median on a factor
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Sun Nov 2 12:50:37 CET 2003
Christoph Bier <christoph.bier at web.de> writes:
> Dave Cacela schrieb:
> > Christoph,
> > I concur with the other respondents who questioned why someone would
> > wish to
> > calculate the median of a factor. However, with regard to your actual
> > question, I suspect that median() is giving different answers because the
> > two vectors are not both factors, i.e., that one of them is a character. Did
> > you test that?
>
> Yes, I did and find the same like Tony Plate:
>
> > is.factor(fbhint.spss1$V15.SPS) # = column 264
> [1] TRUE
> > mode(fbhint.spss1$V15.SPS)
> [1] "numeric"
> > is.factor(fbhint.spss1$V15.SP1) # = column 566
> [1] TRUE
> > mode(fbhint.spss1$V15.SP1)
> [1] "numeric"
>
> > Using S, I have seen quirks in this regard that relate to import procedure
> > and the value of the first element in the vector. In your case, the first
> > elements differ in that one is NA while the other is "teils/teils".
>
> It also occurs if the first element is the same. For example "wichtig"
> in columns 263 and 565.
I suspect the solution to the riddle was given earlier (I forget by
whom): If there's an odd number of non-NA observations, the median is
the middle obs.; sort(x)[(N+1)/2], if there is an even number, you
take the average of the two middle obs.; sum(sort(x)[c(N/2,N/2+1])/2.
Only the latter involves arithmetic on factors, which is Verboten.
(Arguably, sorting an unordered factor ought to Verboten as well,
though!)
--
O__ ---- Peter Dalgaard Blegdamsvej 3
c/ /'_ --- Dept. of Biostatistics 2200 Cph. N
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list