[Rd] Wish list
murdoch at stats.uwo.ca
Tue Jan 2 00:53:27 CET 2007
On 1/1/2007 1:10 PM, Gabor Grothendieck wrote:
> On 1/1/07, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
>> A few comments thrown in, and some general comments at the bottom.
>> On 1/1/2007 1:28 AM, Gabor Grothendieck wrote:
>>> This is my 2007 New Year wishlist for R features:
>>> 1. Matrix Multiplication
>>> Enhance matrix multiplication to work with multidimensional
>>> arrays such that the last dimension of the first multiplicand
>>> must equal the first dimension of the second. See:
>>> 2. Grid
>>> - logical-valued function as first arg of grid.edit
>>> - transparency under Windows (not sure if this involves grid
>>> or just the Windows graphics device)
>>> - shading patterns
>>> - more interactivity features
>>> - safe way to get name of a grid object, e.g.
>>> names.vpPath <- names.viewport <- function(x) x$name
>>> - safe way to get children of a grid object
>>> getChildren.viewport <- function(x) x$children
>>> and the order; see:
>>> - facility for using a name, viewport or vpPath interchangably
>>> so that, for example, any of them can be specified in
>>> in print.trellis(..., draw.in=...) or draw.key(..., vp=...)
>>> 3. Lattice.
>>> - make panel functions generic
>>> - allow print.trellis args to be specified in xyplot, etc.
>>> - shading patterns (once grid implements them)
>>> - safe way to access lattice:::getStatus and lattice:::updateList
>>> - allow name, viewport or vpPath to be specified in draw.in=
>>> arg of print.trellis (and vp= arg of draw.key?)
>>> - document parameters, i.e. those output from trellis.par.get()
>>> - support for groups in histogram
>>> 4. Higher level Windows clipboard functions.
>>> Since R 2.3.0 R can handle non-text objects
>>> on the Windows clipboard. We now need some higher
>>> level functionality that makes use of that
>>> to read in non-text from the clipboard. For
>>> example, one can select a table on an HTML
>>> page in Internet Explorer and invoke copy
>>> and it will copy it to the clipboard in a
>>> non-text format. If one invokes paste in
>>> Excel, Excel will automatically detect the
>>> non-text format and copy it in the expected
>>> way so that it appears in Excel one table
>>> cell per Excel cell.
>>> However, R does not currently
>>> support this level of integration. (Current
>>> workaround is to paste it into Excel and then copy
>>> it back out of Excel. Excel will insert tabs between
>>> text that is so copied.)
>> R doesn't have HTML parsing built in, so this would be a fairly major
>> addition. It's a much better idea to write a package to do this. If
>> the R clipboard support is missing something that such a package would
>> need, that would be a reasonable addition to R.
>>> 6. Allow attributes to be associated with an environment
>>> variable without having them associated with the environment
>>> itself. This would allow more powerful inheritance in
>>> the case of subclasses of environment.
>>> and subsequent postings in that thread. Any package
>>> that uses the list(env = whatever) idiom to define
>>> objects could make use of this.
>> As I said in that thread, this is not a good suggestion.
> Yes, but I disagree with that assessment and I am not the
> only one.
It doesn't matter how many people disagree: what matters is achieving a
consensus about what change is needed. I don't think you could convince
me that the proposal you made is the right way to solve this problem,
and I know there are others who don't agree with the proposal I made to
solve it. So it is unlikely to be changed, unless someone comes up with
a new proposal.
>>> 7. documentation standards for packages
>>> - NEWS/ChangeLog (also should be accessible from CRAN page for package
>>> and should be included in built version of package)
>>> - package?mypackage
>> I don't understand the second part of this. We already support a
>> package?mypackage topic, and recommend that people put it in. Are you
>> saying packages should be rejected if they don't? That's an awful lot
>> of work you're asking other people to do.
> There should be some guidelines as to what goes into mypackage-package.Rd .
There are, in the Writing R Extensions manual.
>>> 8. Need to be able to distinguish between ordinary missing values
>>> and structurally missing ones.
>> I think this is something that you need to do in a different way. There
>> are tons of possible semantics for what NA should mean. I don't think
>> this should be made more complicated for everyone.
> Although one does not want to overcomplicate things the fact is that
> there are two issues here: structural and non-structural and trying to
> force them into a single construct is not simplifying -- rather it
> fails to model
> what is required adequately.
There aren't just two types of missingness, there are hundreds. For
example, I might have some values missing because I couldn't read the
writing where they were recorded, and others missing because the
instrument failed. In some context I might want to distinguish between
those two causes of missingness. But if I assigned special NA values to
them, then I'd need to develop an algebra of how those two kinds of
missingness combine in arithmetical operations. I don't think R should
build in all of that machinery, because it's hopeless to expect to model
all the different ideas of missingness people might want to model.
>>> 9. bidirectional pipes in Windows
>>> 10. Create a log updated at a regular frequency (daily or real time)
>>> that tracks all changes on CRAN, e.g.
>>> Date(GMT) Package Version Action
>>> 2006-09-20 21:22:01 mypkg 1.0.1 new
>>> 2006-09-20 22:00:23 mypkg2 0.2.1 updated
>>> 11. make integrate generic. Ryacas could use that.
>>> 12. Remove all R CMD dependencies on the find.exe command. find is a built
>>> in command in Windows and having find.exe on my path causes
>>> problems with other programs.
>> A simpler fix for this would be for you to define a wrapper for R CMD
>> that installed the R tools path before executing, and uninstalls it
>> afterwards. But this is unnecessary for most people, because
>> Microsoft's find.exe is pretty rarely used.
> Anyone who uses batch files will use it quite a bit. It certainly causes
> me problems on an ongoing basis and is an unacceptable conflict in
> my opinion.
If you're using batch files, the fix I suggested is trivial for you.
> I realize that its not entirely of R's doing but it would be best if R did not
> make it worse by requiring the use of find.
>>> 13. Make upper/lower case of simplify/SIMPLIFY consistent on all
>>> apply commands and add a simplify= arg to by.
>> It would have been good not to introduce the inconsistency years ago,
>> but it's too late to change now.
> Its not too late to add it to by().
> Also note that the gsubfn package does have a workaround for this. In gsubfn
> one can preface any R function with fn$ and if that is done then the function
> can have a simplify= argument which fn$ intercepts and processes. e.g.
> fn$by(CO2[4:5], CO2, x ~ coef(lm(uptake ~ ., x)), simplify = rbind)
> fn$ can also interpret formulas as functions (and does quasi perl interpolation
> in strings) so the formula in the third argument is regarded to be the same
> as the anonymous function: function(x) coef(lm(uptake ~., x)) .
> More examples are in the gsubfn vignette.
>>> 14. better reporting of location of errors and warnings in R CMD check.
>> This is in the works, but probably not for 2.5.x.
> Great. This will be very welcome.
>>> 15. tcl tile library (needs tcl 8.5 or to be compiled in with 8.4)
>>> 16. extend aggregate to allow vector valued functions:
>>> aggregate(CO2[4:5], CO2[1:2], function(x) c(mean = mean(x), sd = sd(x)))
>>> [summaryBy in doBy package and cast in reshape package can already
>>> do similar things but this seems sufficiently fundamental that it
>>> ought to be in the base of R]
>>> 17. All OSes should support input= arg of system.
>>> My previous New Year wishlists are here:
>> To anyone still reading:
>> Many of the suggestions above would improve R, but they're unlikely to
>> happen unless someone volunteers to do them. I'd suggest picking
>> whichever one of these or some other list that you think is the highest
>> priority, and post a specific proposal to this list about how to do it.
>> If you get a negative response or no response, move on to the next
>> one, or put it into a contributed package instead.
> I think it works best when contributors develop their software in
> contributed packages since it avoids squabbles with the core group.
> The core group can then integrate these into R itself if it seems warranted.
>> When you make the proposal, consider how much work you're asking other
>> people to do, and how much you're volunteering to do yourself. If
>> you're asking others to do a lot, then the suggestion had better be
>> really valuable to *them*.
> The implementation effort should not be a significant consideration in
> generating wish lists. What should be considered is what is really needed.
> Its better to know what you need and then later decide whether to implement
> it or not than to suppress articulating the need. Otherwise the development
> is driven by what is easy to do rather than what is needed.
I'm not complaining about your wish list, just suggesting what people
should do next if they want to see any of these items (or others)
More information about the R-devel