[R] two questions for R beginners
Duncan Murdoch
murdoch at stats.uwo.ca
Tue Mar 2 18:55:21 CET 2010
On 02/03/2010 11:53 AM, William Dunlap wrote:
> > -----Original Message-----
> > From: r-help-bounces at r-project.org
> > [mailto:r-help-bounces at r-project.org] On Behalf Of John Sorkin
> > Sent: Tuesday, March 02, 2010 3:46 AM
> > To: Karl Ove Hufthammer; r-help at stat.math.ethz.ch
> > Subject: Re: [R] two questions for R beginners
> >
> > Please take what follows not as an ad hominem statement, but
> > rather as an attempt to improve what is already an excellent
> > program, that has been built as a result of many, many hours
> > of dedicated work by many, many unpaid, unsung volunteers.
> >
> > It troubles me a bit that when a confusing aspect of R is
> > pointed out the response is not to try to improve the
> > language so as to avoid the confusion, but rather to state
> > that the confusion is inherent in the language. I understand
> > that to make changes that would avoid the confusing aspect of
> > the language that has been discussed in this thread would
> > take time and effort by an R wizard (which I am not), time
> > and effort that would not be compensated in the traditional
> > sense. This does not mean that we should not acknowledge the
> > confusion. If we what R to be the de facto lingua franca of
> > statistical analysis doesn't it make sense to strive for
> > syntax that is as straight forward and consistent as possible?
>
> Whenever one changes the language that way old code
> will break.
I think in this case not much code would break. Mostly when people have
a matrix M and ask for M$column they'll get an error; the proposal is
that they'll get the requested column. (It is possible to have a list
with names that is also a matrix with dimnames, but I think that is a
pretty unusual construction.) But I haven't been convinced that the
proposal is a net improvement to the language.
Duncan Murdoch
> The developers can, with a lot of effort,
> fix their own code, and perhaps even user-written code
> on CRAN, but code that thousands of users have written
> will break. There is a lot of code out there that was
> written by trial and error and by folks who no longer
> work at an institution: the code works but no one knows
> exactly why it works. Telling folks they need to change
> that code because we have a cleaner but different syntax
> now is not good. Why would one spend time writing a
> package that might stop working when R is "upgraded"?
>
> I think the solution is not to change current semantics
> but to write functions that behave better and encourage
> users to use them, gradually abandoning the old constructs.
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
> >
> > Again, please understand that my comment is made with deepest
> > respect for the many people who have unselfishly contributed
> > to the R project. Many thanks to each and every one of you.
> >
> > John
> >
> >
> > >>> Karl Ove Hufthammer <karl at huftis.org> 3/2/2010 4:00 AM >>>
> > On Mon, 01 Mar 2010 10:00:07 -0500 Duncan Murdoch
> > <murdoch at stats.uwo.ca>
> > wrote:
> > > Suppose X is a dataframe or a matrix. What would you
> > expect to get from
> > > X[1]? What about as.vector(X), or as.numeric(X)?
> >
> > All this of course depends on type of object one is speaking
> > of. There
> > are plenty of surprises available, and it's best to use the
> > most logical
> > way of extracting. E.g., to extract the top-left element of a 2D
> > structure (data frame or matrix), use 'X[1,1]'.
> >
> > Luckily, R provides some shortcuts. For example, you can
> > write 'X[2,3]'
> > on a data frame, just as if it was a matrix, even though the
> > underlying
> > structure is completely different. (This doesn't work on a
> > normal list;
> > there you have to type the whole 'X[[2]][3]'.)
> >
> > The behaviour of the 'as.' functions may sometimes be surprising, at
> > least for me. For example, 'as.data.frame' on a named vector gives a
> > single-column data frame, instead of a single-row data frame.
> >
> > (I'm not sure what's the recommended way of converting a
> > named vector to
> > row data frame, but 'as.data.frame(t(X))' works, even though both 'X'
> > and 't(X)' looks like a row of numbers.)
> >
> > > The point is that a dataframe is a list, and a matrix
> > isn't. If users
> > > don't understand that, then they'll be confused somewhere. Making
> > > matrices more list-like in one respect will just move the confusion
> > > elsewhere. The solution is to understand the difference.
> >
> > My main problem is not understanding the difference, which is
> > easy, but
> > knowing which type of I have when I get the output a function in a
> > package. If I know the object is a named vector or a matrix
> > with column
> > names, it's easy enough to type 'X[,"colname"]', and if it's a data
> > frame one may use the shortcut 'X$colname'.
> >
> > Usually, it *is* documented what the return value of a
> > function is, but
> > just looking at the output is much faster, and *usually* gives the
> > correct answer.
> >
> > For example, 'mean' applied on a data frame gives a named
> > vector, not a
> > data frame, which is somewhat surprising (given that the columns of a
> > data frame may be of different types, while the elements of a
> > vector may
> > not). (And yes, I know that it's *documented* that it returns a named
> > vector.) On the other hand, perhaps it is surprising that
> > 'mean' works
> > on data frames at all. :-)
> >
> > --
> > Karl Ove Hufthammer
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> > Confidentiality Statement:
> > This email message, including any attachments, is for
> > th...{{dropped:6}}
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list