[R] two questions for R beginners
kMan
kchamberln at gmail.com
Thu Mar 4 05:04:21 CET 2010
John,
I felt a short, somewhat strong reply was in order. One of the inherent
aspects of the language is that R demands more of an understanding from
users about what is taking place. Model formulae, for example, are close to
what one would use if they were to write the model on paper. I consider this
a strong feature. The confusing aspects that you point out are not the
result of syntax. Syntax in R is well specified and, I believe, far easier
to work with than many programming languages.
English is a confusing language. C++ is a confusing language. One may have
far more success learning, say, French if he/she does not like the syntax or
grammar of English, or visual Pascal if the syntax of C++ is not preferred,
rather than changing the language. If one wants to do business in a
particular area, then it generally behooves one to suck it up and learn the
native tongue or hire someone for that part. If one wants the program that
is the standard for other world class statistics packages, which also
happens to have a very amendable license agreement, then it behooves one to
suck it up and learn R.
R is what it is. If someone does not like it, he/she can use something else,
pay far more for an inferior product which will also take longer to do a
calculation and handle less data at once, while risking that the content of
their understanding of statistics is diminished for it. Not that there is
not room for development in R, but the sort of development you demand will
evolve according to similar laws as those that govern economics and/or
change in spoken language.
You'd need major financial backing, and a strong influence over the culture
of those who use R to pull this off. Other than that, you'll have to wait
for the dialect to change over time from the cumulative effect of
contributions from people the world over who all want something different
out of the language.
If someone wants to take on the R challenge for him/herself, however, then
there is likely no better technical support in the world than the R
community, albeit perhaps after dispensing with some of the niceties.
Sincerely,
KeithC.
-----Original Message-----
From: John Sorkin [mailto:jsorkin at grecc.umaryland.edu]
Sent: Tuesday, March 02, 2010 4:46 AM
To: Karl Ove Hufthammer; r-help at stat.math.ethz.ch
Subject: Re: [R] two questions for R beginners
Please take what follows not as an ad hominem statement, but rather as an
attempt to improve what is already an excellent program, that has been built
as a result of many, many hours of dedicated work by many, many unpaid,
unsung volunteers.
It troubles me a bit that when a confusing aspect of R is pointed out the
response is not to try to improve the language so as to avoid the confusion,
but rather to state that the confusion is inherent in the language. I
understand that to make changes that would avoid the confusing aspect of the
language that has been discussed in this thread would take time and effort
by an R wizard (which I am not), time and effort that would not be
compensated in the traditional sense. This does not mean that we should not
acknowledge the confusion. If we what R to be the de facto lingua franca of
statistical analysis doesn't it make sense to strive for syntax that is as
straight forward and consistent as possible?
Again, please understand that my comment is made with deepest respect for
the many people who have unselfishly contributed to the R project. Many
thanks to each and every one of you.
John
>>> Karl Ove Hufthammer <karl at huftis.org> 3/2/2010 4:00 AM >>>
On Mon, 01 Mar 2010 10:00:07 -0500 Duncan Murdoch <murdoch at stats.uwo.ca>
wrote:
> Suppose X is a dataframe or a matrix. What would you expect to get
> from X[1]? What about as.vector(X), or as.numeric(X)?
All this of course depends on type of object one is speaking of. There are
plenty of surprises available, and it's best to use the most logical way of
extracting. E.g., to extract the top-left element of a 2D structure (data
frame or matrix), use 'X[1,1]'.
Luckily, R provides some shortcuts. For example, you can write 'X[2,3]'
on a data frame, just as if it was a matrix, even though the underlying
structure is completely different. (This doesn't work on a normal list;
there you have to type the whole 'X[[2]][3]'.)
The behaviour of the 'as.' functions may sometimes be surprising, at least
for me. For example, 'as.data.frame' on a named vector gives a single-column
data frame, instead of a single-row data frame.
(I'm not sure what's the recommended way of converting a named vector to row
data frame, but 'as.data.frame(t(X))' works, even though both 'X'
and 't(X)' looks like a row of numbers.)
> The point is that a dataframe is a list, and a matrix isn't. If users
> don't understand that, then they'll be confused somewhere. Making
> matrices more list-like in one respect will just move the confusion
> elsewhere. The solution is to understand the difference.
My main problem is not understanding the difference, which is easy, but
knowing which type of I have when I get the output a function in a package.
If I know the object is a named vector or a matrix with column names, it's
easy enough to type 'X[,"colname"]', and if it's a data frame one may use
the shortcut 'X$colname'.
Usually, it *is* documented what the return value of a function is, but just
looking at the output is much faster, and *usually* gives the correct
answer.
For example, 'mean' applied on a data frame gives a named vector, not a data
frame, which is somewhat surprising (given that the columns of a data frame
may be of different types, while the elements of a vector may not). (And
yes, I know that it's *documented* that it returns a named
vector.) On the other hand, perhaps it is surprising that 'mean' works on
data frames at all. :-)
--
Karl Ove Hufthammer
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Confidentiality Statement:
This email message, including any attachments, is for th...{{dropped:4}}
More information about the R-help
mailing list