[Rd] Possible bug in factor (PR#531)
Murray Smith
mh.smith@auckland.ac.nz
Thu, 04 May 2000 15:27:32 +1200
--------------24483C2DB55E1A8493E7697F
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Hi all,
The problem arose when I had a data frame in which all the factor variables
were declared to be ordered (although some of them clearly should not have been
ordered). For future analyses using the frame I wanted to correct the classes
of the those factors. This was because I wished to use Helmert contrasts
rather than polynomial contrasts in the future analyses. I decided to correct
the class of these factors by using
> x<-factor(x)
Although I didn't say so in my bug report, I got into trouble by assuming,
according to the documentation (as Peter Dalgaard pointed out), that the
default value of the parameter ordered in factor() was FALSE. I reassembled
the data frame and still found that polynomial contrasts were being used for
all factors.
By the way as.factor() behaves in the same way as factor().
The issue comes down to that of whether the class "factor" and the class
c("factor","ordered") are nested or disjoint in relation to the functions
factor(), as.factor(), class().
In R we have 2 kinds of factor, unordered and ordered. A vector is made an
unordered factor by the function factor() and an ordered factor by the function
ordered(). That is unless the vector is already an ordered factor. Then
factor() does not create an unordered factor. Why is it so much more difficult
to convert an ordered factor to an unordered factor than vice versa? No good
reason. The semantics of representing the 2 types of factor by the class
"factor" and the class c("factor","ordered") have got in the way. We certainly
want both types to be of class "factor" so that is.factor() is TRUE for both,
but it is wrong to think of ordered factors as somehow nested in unordered
factors just because of the way the classes are represented in R.
I suggest that it is natural to have the default value of FALSE for the
parameter ordered in the functions factor() and as.factor()and I would make a
plea for that to be reimplemented (since the current manual entry for factor()
probably represents what the situation was once).
Murray
--
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Murray H. Smith, Senior Lecturer in Engineering Statistics
Engineering Science Department
The University of Auckland
Private Bag 92019
Auckland
New Zealand
Phone: +64 9 373 7599 x4517, Fax: +64 9 373 7468
email: mh.smith@auckland.ac.nz
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
--------------24483C2DB55E1A8493E7697F
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: 7bit
<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
<html>
Hi all,
<p>The problem arose when I had a data frame in which all the factor variables
were declared to be ordered (although some of them clearly should not have
been ordered). For future analyses using the frame I wanted to correct
the classes of the those factors. This was because I wished to use
Helmert contrasts rather than polynomial contrasts in the future analyses.
I decided to correct the class of these factors by using
<br><tt>> x<-factor(x)</tt>
<br>Although I didn't say so in my bug report, I got into trouble by assuming,
according to the documentation (as Peter Dalgaard pointed out), that the
default value of the parameter <tt>ordered</tt> in <tt>factor()</tt> was
<tt>FALSE.</tt> I reassembled the data frame and still found that
polynomial contrasts were being used for all factors.
<p>By the way <tt>as.factor()</tt> behaves in the same way as <tt>factor()</tt>.
<p>The issue comes down to that of whether the <tt>class</tt> <tt>"factor"</tt>
and the <tt>class c("factor","ordered")</tt> are nested or disjoint in
relation to the functions <tt>factor(), as.factor(), class()</tt>.
<p>In R we have 2 kinds of factor, unordered and ordered. A vector is made
an unordered factor by the function <tt>factor()</tt> and an ordered factor
by the function <tt>ordered()</tt>. That is unless the vector is already
an ordered factor. Then <tt>factor()</tt> does not create an unordered
factor. Why is it so much more difficult to convert an ordered factor
to an unordered factor than vice versa? No good reason. The semantics
of representing the 2 types of factor by the <tt>class</tt> <tt>"factor"</tt>
and the <tt>class c("factor","ordered")</tt> have got in the way. We certainly
want both types to be of <tt>class "factor"</tt> so that <tt>is.factor()</tt>
is <tt>TRUE</tt> for both, but it is wrong to think of ordered factors
as somehow nested in unordered factors just because of the way the classes
are represented in R.
<p>I suggest that it is natural to have the default value of <tt>FALSE</tt>
for the parameter <tt>ordered</tt> in the functions <tt>factor()</tt> and
<tt>as.factor()</tt>and I would make a plea for that to be reimplemented
(since the current manual entry for <tt>factor()</tt> probably represents
what the situation was once).
<p>Murray
<p>--
<br>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
<br>Murray H. Smith, Senior Lecturer in Engineering
Statistics
<br>Engineering Science Department
<br>The University of Auckland
<br>Private Bag 92019
<br>Auckland
<br>New Zealand
<br>Phone: +64 9 373 7599 x4517, Fax: +64 9 373 7468
<br>email: mh.smith@auckland.ac.nz
<br>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
<br> </html>
--------------24483C2DB55E1A8493E7697F--
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._