[R] Weird problem with median on a factor

Peter Dalgaard p.dalgaard at biostat.ku.dk
Sun Nov 2 12:50:37 CET 2003


Christoph Bier <christoph.bier at web.de> writes:

> Dave Cacela schrieb:
> > Christoph,
> > I concur with the other respondents who questioned why someone would
> > wish to
> > calculate the median of a factor. However, with regard to your actual
> > question, I suspect that median() is giving different answers because the
> > two vectors are not both factors, i.e., that one of them is a character. Did
> > you test that?
> 
> Yes, I did and find the same like Tony Plate:
> 
>  > is.factor(fbhint.spss1$V15.SPS) # = column 264
> [1] TRUE
>  > mode(fbhint.spss1$V15.SPS)
> [1] "numeric"
>  > is.factor(fbhint.spss1$V15.SP1) # = column 566
> [1] TRUE
>  > mode(fbhint.spss1$V15.SP1)
> [1] "numeric"
> 
> > Using S, I have seen quirks in this regard that relate to import procedure
> > and the value of the first element in the vector. In your case, the first
> > elements differ in that one is NA while the other is "teils/teils".
> 
> It also occurs if the first element is the same. For example "wichtig"
> in columns 263 and 565.

I suspect the solution to the riddle was given earlier (I forget by
whom): If there's an odd number of non-NA observations, the median is
the middle obs.; sort(x)[(N+1)/2], if there is an even number, you
take the average of the two middle obs.; sum(sort(x)[c(N/2,N/2+1])/2.
Only the latter involves arithmetic on factors, which is Verboten.
(Arguably, sorting an unordered factor ought to Verboten as well,
though!)

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907




More information about the R-help mailing list