[R] two questions for R beginners

Patrick Burns pburns at pburns.seanet.com
Wed Mar 3 11:44:19 CET 2010


I think Duncan's example of a list that is
a matrix is a compelling argument not to do
the change.

A matrix that is a list with both names and
dimnames *is* probably rare (but certainly
imaginable).  A matrix that is a list is not
so rare, and the proposed double meaning of
'$' would certainly be confusing in that case.

Pat


On 02/03/2010 17:55, Duncan Murdoch wrote:
> On 02/03/2010 11:53 AM, William Dunlap wrote:
>> > -----Original Message-----
>> > From: r-help-bounces at r-project.org >
>> [mailto:r-help-bounces at r-project.org] On Behalf Of John Sorkin
>> > Sent: Tuesday, March 02, 2010 3:46 AM
>> > To: Karl Ove Hufthammer; r-help at stat.math.ethz.ch
>> > Subject: Re: [R] two questions for R beginners
>> > > Please take what follows not as an ad hominem statement, but >
>> rather as an attempt to improve what is already an excellent >
>> program, that has been built as a result of many, many hours > of
>> dedicated work by many, many unpaid, unsung volunteers.
>> > > It troubles me a bit that when a confusing aspect of R is >
>> pointed out the response is not to try to improve the > language so as
>> to avoid the confusion, but rather to state > that the confusion is
>> inherent in the language. I understand > that to make changes that
>> would avoid the confusing aspect of > the language that has been
>> discussed in this thread would > take time and effort by an R wizard
>> (which I am not), time > and effort that would not be compensated in
>> the traditional > sense. This does not mean that we should not
>> acknowledge the > confusion. If we what R to be the de facto lingua
>> franca of > statistical analysis doesn't it make sense to strive for >
>> syntax that is as straight forward and consistent as possible?
>> Whenever one changes the language that way old code
>> will break.
> I think in this case not much code would break. Mostly when people have
> a matrix M and ask for M$column they'll get an error; the proposal is
> that they'll get the requested column. (It is possible to have a list
> with names that is also a matrix with dimnames, but I think that is a
> pretty unusual construction.) But I haven't been convinced that the
> proposal is a net improvement to the language.
> Duncan Murdoch
>
>> The developers can, with a lot of effort,
>> fix their own code, and perhaps even user-written code
>> on CRAN, but code that thousands of users have written
>> will break. There is a lot of code out there that was
>> written by trial and error and by folks who no longer
>> work at an institution: the code works but no one knows
>> exactly why it works. Telling folks they need to change
>> that code because we have a cleaner but different syntax
>> now is not good. Why would one spend time writing a
>> package that might stop working when R is "upgraded"?
>>
>> I think the solution is not to change current semantics
>> but to write functions that behave better and encourage
>> users to use them, gradually abandoning the old constructs.
>>
>> Bill Dunlap
>> Spotfire, TIBCO Software
>> wdunlap tibco.com
>> > > Again, please understand that my comment is made with deepest >
>> respect for the many people who have unselfishly contributed > to the
>> R project. Many thanks to each and every one of you.
>> > > John
>> > > > >>> Karl Ove Hufthammer <karl at huftis.org> 3/2/2010 4:00 AM >>>
>> > On Mon, 01 Mar 2010 10:00:07 -0500 Duncan Murdoch >
>> <murdoch at stats.uwo.ca> > wrote:
>> > > Suppose X is a dataframe or a matrix. What would you > expect to
>> get from > > X[1]? What about as.vector(X), or as.numeric(X)?
>> > > All this of course depends on type of object one is speaking > of.
>> There > are plenty of surprises available, and it's best to use the >
>> most logical > way of extracting. E.g., to extract the top-left
>> element of a 2D > structure (data frame or matrix), use 'X[1,1]'.
>> > > Luckily, R provides some shortcuts. For example, you can > write
>> 'X[2,3]' > on a data frame, just as if it was a matrix, even though
>> the > underlying > structure is completely different. (This doesn't
>> work on a > normal list; > there you have to type the whole 'X[[2]][3]'.)
>> > > The behaviour of the 'as.' functions may sometimes be surprising,
>> at > least for me. For example, 'as.data.frame' on a named vector
>> gives a > single-column data frame, instead of a single-row data frame.
>> > > (I'm not sure what's the recommended way of converting a > named
>> vector to > row data frame, but 'as.data.frame(t(X))' works, even
>> though both 'X' > and 't(X)' looks like a row of numbers.)
>> > > > The point is that a dataframe is a list, and a matrix > isn't.
>> If users > > don't understand that, then they'll be confused
>> somewhere. Making > > matrices more list-like in one respect will just
>> move the confusion > > elsewhere. The solution is to understand the
>> difference.
>> > > My main problem is not understanding the difference, which is >
>> easy, but > knowing which type of I have when I get the output a
>> function in a > package. If I know the object is a named vector or a
>> matrix > with column > names, it's easy enough to type
>> 'X[,"colname"]', and if it's a data > frame one may use the shortcut
>> 'X$colname'.
>> > > Usually, it *is* documented what the return value of a > function
>> is, but > just looking at the output is much faster, and *usually*
>> gives the > correct answer.
>> > > For example, 'mean' applied on a data frame gives a named >
>> vector, not a > data frame, which is somewhat surprising (given that
>> the columns of a > data frame may be of different types, while the
>> elements of a > vector may > not). (And yes, I know that it's
>> *documented* that it returns a named > vector.) On the other hand,
>> perhaps it is surprising that > 'mean' works > on data frames at all. :-)
>> > > -- > Karl Ove Hufthammer
>> > > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the
>> posting guide > http://www.R-project.org/posting-guide.html > and
>> provide commented, minimal, self-contained, reproducible code.
>> > > Confidentiality Statement:
>> > This email message, including any attachments, is for >
>> th...{{dropped:6}}
>> > > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide >
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Patrick Burns
pburns at pburns.seanet.com
http://www.burns-stat.com
(home of 'The R Inferno' and 'A Guide for the Unwilling S User')



More information about the R-help mailing list