(PR#1608) merge.data.frame can coerce character vectors to factor in some circumstances (PR#1608)

Prof Brian D Ripley ripley@stats.ox.ac.uk
Wed, 29 May 2002 14:42:15 +0100 (BST)


In this case I think I know how to solve the idiosyncracy of the subsetting
methods.  As far as I can tell (and similar things have come up before)
the dataframe.R code is taken from an early version of S (as David James
alludes to) and a number of things have been changed since in S but the
problems remain in R.

On the other hand, I don't make such changes without ample time to think
them through ....

B

On Wed, 29 May 2002, David James wrote:

> I do use data frames for storing character data, fully aware that I'm
> stretching their intended use.  Data frames came about in the context
> of modeling software (see "Statistical Models in S," the "White book"
> by Chambers and Hastie, eds).  Originally, the primary use of data
> frames was for holding the data given to the model fitting functions,
> and thus the classes of objects that LM, GLM, GAM, Tree, etc.,
> required are simple ones (numeric and factors -- note that character
> vectors are not well suited for fitting models).  Very soon after,
> people began to include other types of objects (Terry Therneau's
> censored/survival classes, among others, come to my mind).  So the
> behaviour of the data.frame class has evolved into what we are
> currently using, and some of its apparent "idiosincracies" make
> perfect sense in light of its original intended purpose.
>
> It has been argued before that we may need other more general
> container classes to hold other "tabular" data (e.g., contigency
> tables, data from relational databases) that don't require the
> restriction that data frames have traditionally imposed.  Of course
> is not obvious to me that introducing yet another set of classes
> is necessarily a good thing --- a lot of care and thought would have
> to be put into the effort to ensure that any new container classes (or
> any other type, for that matter) are well designed and with a clear
> purpose, just like data frames were well-designed for the purpose
> of holding data for fitting models.
>
> David Kane  <David Kane wrote:
> > Prof Brian D Ripley writes:
> >  > I have already patiently explained it to you.  It is a side issue of
> >  > subscripting of data frames converting character columns to factor.
> >  > I have also given you a workaround.
> >
> > Yes, and many thanks for the patient explanations! Indeed, thanks to you and
> > the rest of R core for a simply amazing piece of software. Even though our
> > budget would allow us to use most any statistics package around, we use R
> > because it seems to us to be the best.
> >
> > My only comment on the workaround (using I() to create vectors of class AsIs)
> > is that it is largely undocumented. I was concerned that, in this case,
> > undocumented meant "discouraged from use and possibly not present in future
> > versions." In any event, we will now do as you suggested.
> >
> >  > As I said before, this is a consequence of the general rules.  Data frames
> >  > are not designed to have character columns, and those who insist on using
> >  > them must make themselves aware of the consequences.
> >
> > Ahh! It had never been clear to me that data frames are not "designed to have
> > character columns." Of course, now that I carefully (re)read the documentation,
> > I see that this is the case. My comment here is that R certainly provides
> > enough rope (colClasses as "character" in read.table, for example) for
> > unsuspecting users like me to hang themselves on this point. Indeed, I would
> > wager that the vast majority of users (even some members of R core?) have
> > dataframes with character variables in them. My question is: Why aren't
> > dataframes "designed" to have character columns? This would seem to be a
> > desirable feature . . . but perhaps I am misunderstanding what a dataframe
> > really "is". Or, perhaps the answer is: "Patches are accepted." ;-)
> >
> > Thanks again,
> >
> > Dave Kane
> > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> > r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> > Send "info", "help", or "[un]subscribe"
> > (in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
> > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
>
> --
> David A. James
> Statistics Research, Room 2C-253            Phone:  (908) 582-3082
> Bell Labs, Lucent Technologies              Fax:    (908) 582-3340
> Murray Hill, NJ 09794-0636
>

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._