[R] R imperfections? -- was: repeated searching of no-missing values

Wacek Kusnierczyk Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
Thu Dec 11 21:54:21 CET 2008


replies inline below.


Bert Gunter wrote:
> Replies inline below.
>
>
> [bert (?)]	...?tapply says that the first argument is an **atomic** vector. A
> factor is not an atomic vector. So tapply interprets it as such by looking
> only at its representation, which is as integer values.
> 	
>
>
> [stavros] What is the rationale for this?  If it is just backwards compatibility with
> some long-ago implementation decision, perhaps tapply should be deprecated
> and replaced by something cleaner (perhaps plyr).  If it is something deeper
> than that, it would be useful to know what.
>
> ****
> [bert] Rationale? -- you'll have to ask the developers. As for deprecating (or
> changing) tapply: do you have any idea how much code that could break?! I
> think that is probably a wholly unrealistic suggestion.
>   

do you have any idea how much old code has been broken in the history of
programming just because programming languages moved from version x to
version x+1?  the argument that old code would be broken is repeated
here ad nauseam, literally.  there always is a tradeoff between
protecting the old developers against the need for reimplementation of
existing code and protecting the future developers against the need to
spend days on figuring our how to hack around broken designs and
implementations.

> The way forward is through efforts like Hadley's plyr package.  Among other
> things, that's what packages are for. 

packages play an important role in about every language.  but packages,
especially ones written by third parties, should serve as an *extension*
of the core functionality, and not as a replacement.  perhaps it is just
fine to say that a function from plyr should be used instead of tapply
(which, note, is in the base package).  but perhaps the core stuff
should rather evolve than be duplicated by external patches.

as to the original problem, since you (bert) say:

"?tapply says that the first argument is an **atomic** vector. A factor
is not an atomic vector. So tapply interprets it as such by looking only
at its representation, which is as integer values."

can you explain the following:

is.atomic(as.factor(1:10))
# TRUE

is.atomic(factor(0))
# TRUE

?is.atomic says:

"'is.atomic' returns 'TRUE' if 'x' is an atomic vector (or 'NULL') and
'FALSE' otherwise."

which seems incoherent with the above, and also with the following:

f = factor(0)
is.atomic(f)
# TRUE
is.vector(f)
# FALSE

?vector says:

"Note that factors are _not_ vectors;  'is.vector' returns 'FALSE'"

if f is not a vector, how can it be an atomic vector?  perhaps
'is.atomic' does not mean what i would naively assume reading the docs; 
with r, one has to learn not to use common sense, as in, e.g., the case
of sort.list.


> Indeed, as you probably know, packages
> like R.oo and proto allow one to use a whole different programming
> language/paradigm within R, while still taking advantage of all of R's
> existing built-in functionality. Except for possible performance penalties,
> I don't see how you can ask for much more than that.
>   

given how comments such as those of stavros or mine are typically
answered, indeed one cannot expect much more.  the question is, why
would one not want to ask for more?

> So, no, R is certainly not perfect. I'm sure that if they could go back 20
> years with today's knowledge and experience, the developers would do some
> things differently. That's life -- and progress! But I think any objective
> assessment -- and certainly those of us who use it day in and day out in our
> work -- would consider R a truly amazing software product, warts or no. 
>
> Hence, may I suggest that instead of merely pointing out its (often well
> known,btw) imperfections and inelegancies,  you instead move to the
> developers' forum and contribute improvements. This is, I believe, a
> standard way for people with programming expertise like yourself to
> contribute to open source development. Although the developers may be a bit
> crotchety at times (I think often appropriately so given the extraordinary
> effort they've put in), I think you would find that they would welcome
> sincere efforts to help them improve R.
>   

again, same send-a-patch talk.  can't you possibly dissect between
design and implementation?  should every conceptual discussion be
replaced by a flow of patches?  python's peps have already been
mentioned;  another counterexample is jcp.  i agree that contributing
code is desirable, but discarding any other initiative  right away is
plainly rude, even if not verbally.


>
>
> 	I think that's all we can expect.  Some have lamented the lack of
> the language's perfect consistency in these matters, but I cannot understand
> how that would be possible given its nature, intended, as it is, to be
> **easily** used for high level data manipulation, graphics,statistical
> analysis etc. as well as programming.
>
>
> As a general rule, consistency makes it *easier* to learn and use a
> language.
> ***
> Of course!
> ***
>   

radio erewan strikes again?  what is so 'of course'?  stavros is right
here, and he is also right in that learning r is not quite as easy as
you seem to imply.  sure, one learns the basics in a minute, but then
it's like a ride with broken brakes.

>  
>
> 	There are just too many possible data structures to expect logical
> consistency in their handling throughout...
>
>
> I am not sure what you mean here. There has been a lot of work in the
> programming language community on consistent handling of abstract structures
> of various types. Some of their insights may be applicable to future
> versions of R.
>
> ***
> No doubt. That's progress. Are you going to write this future version? I
> certainly am not -- and CAN not (being a bear of but little brain)!
> ***
>   

could you actually explain this bit: "There are just too many possible
data structures to expect logical consistency in their handling
throughout... "?  everyone can see that data structures are not handled
particularly consistently in r, but why would this be a must in a modern
programming language?

vQ



More information about the R-help mailing list