[R] Function hints
hadley wickham
h.wickham at gmail.com
Mon Jun 19 14:51:07 CEST 2006
One of the recurring themes in the recent UserR conference was that
many people find it difficult to find the functions they need for a
particular task. Sandy Weisberg suggested a small idea he would like
to see: a hints function that given an object, lists likely
operations. I've done my best to implement this function using the
tools currently available in R, and my code is included at the bottom
of this email (I hope that I haven't just duplicated something already
present in R). I think Sandy's idea is genuinely useful, even in the
limited form provided by my implementation, and I have already
discovered a few useful functions that I was unaware of.
While developing and testing this function, I ran into a few problems
which, I think, represent underlying problems with the current
documentation system. These are typified by the results of running
hints on a object produced by glm (having class c("glm", "lm")). I
have outlined (very tersely) some possible solutions. Please note
that while these solutions are largely technological, the problem is
at heart sociological: writing documentation is no easier (and perhaps
much harder) than writing a scientific publication, but the rewards
are fewer.
Problems:
* Many functions share the same description (eg. head, tail).
Solution: each rdoc file should only describe one method. Problem:
Writing rdoc files is tedious, there is a lot of information
duplicated between the code and the documenation (eg. the usage
statement) and some functions share a lot of similar information.
Solution: make it easier to write documentation (eg. documentation
inline with code), and easier to include certain common descriptions
in multiple methods (eg. new include command)
* It is difficult to tell which functions are commonly
used/important. Solution: break down by keywords. Problem: keywords
are not useful at the moment. Solution: make better list of keywords
available and encourage people to use it. Problem: people won't
unless there is a strong incentive, plus good keywording requires
considerable expertise (especially in bulding up list). This is
probably insoluable unless one person systematically keywords all of
the base packages.
* Some functions aren't documented (eg. simulate.lm, formula.glm) -
typically, these are methods where the documentation is in the
generic. Solution: these methods should all be aliased to the generic
(by default?), and R CMD check should be amended to check for this
situation. You could also argue that this is a deficiency with my
function, and easily fixed by automatically referring to the generic
if the specific isn't documented.
* It can't supply suggestions when there isn't an explicit method
(ie. .default is used), this makes it pretty useless for basic
vectors. This may not really be a problem, as all possible operations
are probably too numerous to list.
* Provides full name for function, when best practice is to use
generic part only when calling function. However, getting precise
documentation may requires that full name. I do the best I can
(returning the generic if specific is alias to a documentation file
with the same method name), but this reflects a deeper problem that
the name you should use when calling a function may be different to
the name you use to get documentation.
* Can only display methods from currently loaded packages. This is a
shortcoming of the methods function, but I suspect it is difficult to
find S3 methods without loading a package.
Relatively trivial problems:
* Needs wide display to be effective. Could be dealt with by
breaking description in a sensible manner (there may already by R code
to do this. Please let me know if you know of any)
* Doesn't currently include S4 methods. Solution: add some more code
to wrap showMethods
* Personally, I think sentence case is more aesthetically pleasing
(and more flexible) than title case.
Hadley
hints <- function(x) {
db <- eval(utils:::.hsearch_db())
if (is.null(db)) {
help.search("abcd!", rebuild=TRUE, agrep=FALSE)
db <- eval(utils:::.hsearch_db())
}
base <- db$Base
alias <- db$Aliases
key <- db$Keywords
m <- all.methods(class=class(x))
m_id <- alias[match(m, alias[,1]), 2]
keywords <- lapply(m_id, function(id) key[key[,2] %in% id, 1])
f.names <- cbind(m, base[match(m_id, base[,3]), 4])
f.names <- unlist(lapply(1:nrow(f.names), function(i) {
if (is.na(f.names[i, 2])) return(f.names[i, 1])
a <- methodsplit(f.names[i, 1])
b <- methodsplit(f.names[i, 2])
if (a[1] == b[1]) f.names[i, 2] else f.names[i, 1]
}))
hints <- cbind(f.names, base[match(m_id, base[,3]), 5])
hints <- hints[order(tolower(hints[,1])),]
hints <- rbind( c("--------", "---------------"), hints)
rownames(hints) <- rep("", nrow(hints))
colnames(hints) <- c("Function", "Task")
hints[is.na(hints)] <- "(Unknown)"
class(hints) <- "hints"
hints
}
print.hints <- function(x, ...) print(unclass(x), quote=FALSE)
all.methods <- function(classes) {
methods <- do.call(rbind,lapply(classes, function(x) {
m <- methods(class=x)
t(sapply(as.vector(m), methodsplit)) #m[attr(m, "info")$visible]
}))
rownames(methods[!duplicated(methods[,1]),])
}
methodsplit <- function(m) {
parts <- strsplit(m, "\\.")[[1]]
if (length(parts) == 1) {
c(name=m, class="")
} else{
c(name=paste(parts[-length(parts)], collapse="."), class=parts[length(parts)])
}
}
More information about the R-help
mailing list