[Rd] Wrong df in bartlett.test when subsetting data (PR#1124)

Wed, 10 Oct 2001 08:52:33 -0700 (PDT)

On Wed, 10 Oct 2001 jens.lund@nordea.com wrote:

<stuff about actual bug in bartlett.test() snipped
>
> This might leed to a discussion on whether nlevels should return the number of
> attained factor levels or the number of "theoretical" levels:
> nlevels(f)       # Returns 10, fine!
> nlevels(f[1:25]) # Returns 10, might not be fine; 5 might be the correct
> answer.
> The same question might be asked for levels, levels(f) versus levels(f[1:25]).
> At least the documentation should be clear on this point, which I do not think
> it is now. (Perhaps the levels attribute on the factor should be changed for a
> subset of the data, but this is probably not an efficient way of doing
> things...)
>
> A similar problem also comes up with interactions between factors in unbalanced
> designs:
> g <- gl(2,40,50)
> nlevels(f:g)         # Returns 20
> length(unique(f:g))  # Returns 10
> table(f,g)           # See the design

I strongly believe that levels() and nlevels() are doing the right thing
here.  If a variable is a factor with two levels (eg Male & Female) then
it shouldn't stop being a factor with two levels just because you have a
small sample.  In particular, when you have a single observation g[i] it
is still important to know the set of levels it comes from, rather than
just the category it is in.

	-thomas

Thomas Lumley			Asst. Professor, Biostatistics
tlumley@u.washington.edu	University of Washington, Seattle

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._