[R] two questions for R beginners

Tue Mar 2 19:05:20 CET 2010

William,
I agree that changing syntax can lead to problems. I don't, however think extending the language will break existing code. Providing a common syntax for accessing matrices and dataframes will not change the way things have been done to date, but rather how things will be done in the future.
John  
John Sorkin
JSorkin at grecc.umaryland.edu 
-----Original Message-----
From: "William Dunlap" <wdunlap at tibco.com>
To: John Sorkin <jsorkin at grecc.umaryland.edu>
To: Karl Ove Hufthammer <karl at huftis.org>
To:  <r-help at stat.math.ethz.ch>

Sent: 3/2/2010 11:53:45 AM
Subject: RE: [R] two questions for R beginners

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of John Sorkin
> Sent: Tuesday, March 02, 2010 3:46 AM
> To: Karl Ove Hufthammer; r-help at stat.math.ethz.ch
> Subject: Re: [R] two questions for R beginners
> 
> Please take what follows not as an ad hominem statement, but 
> rather as an attempt to improve what is already an excellent 
> program, that has been built as a result of many, many hours 
> of dedicated work by many, many unpaid, unsung volunteers.
> 
> It troubles me a bit that when a confusing aspect of R is 
> pointed out the response is not to try to improve the 
> language so as to avoid the confusion, but rather to state 
> that the confusion is inherent in the language. I understand 
> that to make changes that would avoid the confusing aspect of 
> the language that has been discussed in this thread would 
> take time and effort by an R wizard (which I am not), time 
> and effort that would not be compensated in the traditional 
> sense. This does not mean that we should not acknowledge the 
> confusion. If we what R to be the de facto lingua franca of 
> statistical analysis doesn't it make sense to strive for 
> syntax that is as straight forward and consistent as possible? 

Whenever one changes the language that way old code
will break.  The developers can, with a lot of effort,
fix their own code, and perhaps even user-written code
on CRAN, but code that thousands of users have written
will break.  There is a lot of code out there that was
written by trial and error and by folks who no longer
work at an institution: the code works but no one knows
exactly why it works.  Telling folks they need to change
that code because we have a cleaner but different syntax
now is not good.  Why would one spend time writing a
package that might stop working when R is "upgraded"?

I think the solution is not to change current semantics
but to write functions that behave better and encourage
users to use them, gradually abandoning the old constructs.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

> 
> Again, please understand that my comment is made with deepest 
> respect for the many people who have unselfishly contributed 
> to the R project. Many thanks to each and every one of you.
> 
> John
> 
> 
> >>> Karl Ove Hufthammer <karl at huftis.org> 3/2/2010 4:00 AM >>>
> On Mon, 01 Mar 2010 10:00:07 -0500 Duncan Murdoch 
> <murdoch at stats.uwo.ca> 
> wrote:
> > Suppose X is a dataframe or a matrix.  What would you 
> expect to get from 
> > X[1]?  What about as.vector(X), or as.numeric(X)?
> 
> All this of course depends on type of object one is speaking 
> of. There 
> are plenty of surprises available, and it's best to use the 
> most logical 
> way of extracting. E.g., to extract the top-left element of a 2D 
> structure (data frame or matrix), use 'X[1,1]'.
> 
> Luckily, R provides some shortcuts. For example, you can 
> write 'X[2,3]' 
> on a data frame, just as if it was a matrix, even though the 
> underlying 
> structure is completely different. (This doesn't work on a 
> normal list; 
> there you have to type the whole 'X[[2]][3]'.)
> 
> The behaviour of the 'as.' functions may sometimes be surprising, at 
> least for me. For example, 'as.data.frame' on a named vector gives a 
> single-column data frame, instead of a single-row data frame.
> 
> (I'm not sure what's the recommended way of converting a 
> named vector to 
> row data frame, but 'as.data.frame(t(X))' works, even though both 'X' 
> and 't(X)' looks like a row of numbers.)
> 
> > The point is that a dataframe is a list, and a matrix 
> isn't.  If users 
> > don't understand that, then they'll be confused somewhere.  Making 
> > matrices more list-like in one respect will just move the confusion 
> > elsewhere.  The solution is to understand the difference.
> 
> My main problem is not understanding the difference, which is 
> easy, but 
> knowing which type of I have when I get the output a function in a 
> package. If I know the object is a named vector or a matrix 
> with column 
> names, it's easy enough to type 'X[,"colname"]', and if it's a data 
> frame one may use the shortcut 'X$colname'.
> 
> Usually, it *is* documented what the return value of a 
> function is, but 
> just looking at the output is much faster, and *usually* gives the 
> correct answer.
> 
> For example, 'mean' applied on a data frame gives a named 
> vector, not a 
> data frame, which is somewhat surprising (given that the columns of a 
> data frame may be of different types, while the elements of a 
> vector may 
> not). (And yes, I know that it's *documented* that it returns a named 
> vector.) On the other hand, perhaps it is surprising that 
> 'mean' works 
> on data frames at all. :-)
> 
> -- 
> Karl Ove Hufthammer
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help 
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html 
> and provide commented, minimal, self-contained, reproducible code.
> 
> Confidentiality Statement:
> This email message, including any attachments, is for=...{{dropped:18}}