[Rd] RFC: What should ?foo do?

Fri Apr 25 17:28:40 CEST 2008

Duncan Murdoch wrote:
> Marc Schwartz wrote:
>> Duncan Murdoch wrote:
>>  
>>> Currently ?foo does help("foo"), which looks for a man page with 
>>> alias foo.  If foo happens to be a function call, it will do a bit 
>>> more, so
>>>
>>> ?mean(something)
>>>
>>> will find the mean method for something if mean happens to be an S4 
>>> generic.  There are also the type?foo variations, e.g. methods?foo, 
>>> or package?foo.
>>>
>>> I think these are all too limited.
>>>
>>> The easiest search should be the most permissive.  Users should need 
>>> to do extra work to limit their search to man pages, with exact 
>>> matches, as ? does.
>>>
>>> We don't currently have a general purpose search for "foo", or 
>>> something like it.  We come close with RSiteSearch, and so possibly 
>>> ?foo should mean RSiteSearch("foo"), but
>>> there are problems with that: it can't limit itself to the current 
>>> version of R, and it doesn't work when you're offline (or when 
>>> search.r-project.org is down.)  We also have help.search("foo"), but 
>>> it is too limited. I'd like to have a local search that looks through 
>>> the man pages, manuals, FAQs, vignettes, DESCRIPTION files, etc., 
>>> specific to the current R installation, and I think ? should be 
>>> attached to that search.
>>>
>>> Comments, please.
>>>
>>> Duncan Murdoch
>>>     
>>
>> Duncan,
>>
>> I agree in principle with the points that you raise. I suspect that at 
>> least in part, it might assist new users with some of the issues that 
>> were raised in the latest incarnation of the 'we need better 
>> documentation' thread on r-help.
>>
>> I am not convinced that ?foo should do this however. help("foo") 
>> conceptually seems predicated upon the notion that a user is looking 
>> for a reference/help page for a specific function or descriptor called 
>> 'foo'. The user knows the name of the function or descriptor ...
> 
> With the current definition, that's correct, though man("foo") might be 
> a better match to Unix-users expectations for a function that did that.  
> For a naive user, help("foo") suggests that they're looking for help on 
> "foo".

I agree that man("foo") would be consistent with looking for help with a 
known specific function. I also agree that Linux/Unix users would expect 
such behavior. However, I am not sure that Windows users would be so 
inclined. Certainly as a former Windows user and despite many years of 
programming experience in various environments, I would not, out of the 
gate, have instinctively known or thought about man("foo").

That is not to argue against moving in that direction however. In fact, 
as part of any future consolidation of the myriad help and search 
functions, it would make a great deal of sense that ?foo become an alias 
for man("foo") rather than help("foo").

Thus, the other help/search related functions could also consolidate 
around a mechanism with two key distinctions, that being local versus 
online sources.

>> ...  and should not have to wait for a search function to locate it or 
>> conceptually related terms.  If the user has a large number of CRAN 
>> packages installed, such a search can take a rather long time. That's 
>> an issue for example with help.search().
>>   
> 
> As Brian and Hadley said, that's an implementation issue, already being 
> addressed.

My comment there was more of an observation rather than a criticism and 
my apologies if it was taken as the latter. I think it is reasonable to 
expect, that if a useR has 1,300+ CRAN packages installed, that it is 
going to take longer to search that infrastructure, than if the useR 
only has a few.

I would want to have a reasonable expectation however, that if I used 
?t.test as opposed to help.search("t test"), the result would be 
forthcoming in a more efficient manner in the former case than in the 
latter. In the former case, I am typically looking for a specific 
function in a package that is in the search path. In the latter, I am 
searching for related terms/concepts in all installed packages, etc.

>> That being said and being a firm believer in incrementalism, perhaps 
>> the first step should be to create a new function, called esearch() 
>> [as in extended search] or doc.search() [as in documentation search] 
>> or even search.all(). This new function would facilitate searching all 
>> of the local objects that you list and perhaps others. It would by 
>> default be uber-inclusive of all categories of such objects. It would 
>> support functionality along the lines of help.search() in allowing for 
>> the use of regex and fuzzy matching via grep()/agrep().
>>   
> 
> Definitely there would need to be a new function, with a new name; if we 
> were attaching the name to ? somehow, then it wouldn't matter much what 
> name was used.
> 
> I haven't done it, but I suspect we could introduce special behaviour 
> for ??foo very easily.  We could even have a whole hierarchy:
> 
> ?foo, ??foo, ???foo, ????foo, ...

Conceptually, my initial reaction, which I think is consistent to an 
extent with the differentiation that Peter made in his reply, is 
positive, though as always, there is the risk of confusion.

 From the perspective of naive users and the KISS approach, I would tend 
to favor the basic distinction of:

1. ?foo or man("foo") - look for the man page for a known specific 
function in the current searchpath

2. help.search("foo") - look for conceptual links related to 'foo', with 
some appropriate wrappers that default to either local or online sources.

>> The downside of this approach is that we would add yet another search 
>> function to the list of those already available, each of which 
>> searches a focused subset of the potential targets for assistance, 
>> whether local or online. Thus, it would require some level of 
>> understanding of the general structure of the myriad of local and 
>> online resources of R related assistance.
>>   
> 
> Part of the idea behind my suggestion is that it should be somewhat 
> automatic for a new user to learn about the different types of help.  
> One way for this to happen is the current one:  expect them to find and 
> read the manuals.  The suggestion is to make it easier to find the 
> different types.  The risk of this is that exposing a new user to a wide 
> range of different kinds of results would just be confusing.

I will admit a little ambivalence here. Part of me thinks that a useR 
*should* at minimum, read "An Introduction to R" or at least be inclined 
to look there as their first resource. It does seem that there is some 
expectation from new users that they can just dive in and become 
productive with R immediately, whether or not they have prior 
programming experience and whether or not they have experience with 
other statistical applications. In fact, there is an argument to be made 
that such prior experience can bias their expectations and frame of 
reference.

Reading "Intro" can assist them in beginning to understand the 
conceptual differences in R as compared to these other environments, 
such as methods, vectorized functions, object structures and accessor 
functions, etc.

If a user has found and knows how to use lm() and construct model 
formulae for example, why is it that they don't know about coef(), 
effects(), fitted(), etc. when these are listed on the help page for lm?

They didn't read far enough or they skipped right over the See Also 
section to the examples?

The first instinct has become to post to r-help (as just happened), 
rather than use the phenomenal resources that this community has already 
made available.

So rather than taking a little time to read a bit more, and in the long 
run, save themselves time from posting and waiting for a reply, they 
default to posting.

It is interesting to note that by users doing this, they are in effect 
providing substantive praise to this community and the support provided 
by the lists, in that they have come to expect a pretty rapid response 
from the community 24x7. I suspect that the volume of certain categories 
of e-mails might be quite different if the typical response time on the 
lists was hours rather than minutes...

The first presumption of a useR should be that the available 
documentation might cover these issues or that there is a reasonable 
possibility that somebody else has likely already asked the same 
question and thus if I don't find the answer in "Intro", I should then 
consider searching the list archives.

These are the issues that are covered in the Posting Guide, which 
clearly many don't utilize either.

So, that being the case, how do we provide a conceptual framework for 
seeking assistance in using R and how do we behaviorally modify useRs to 
actually utilize those resources to their own benefit?

I am not looking to solve 100% of the needs, but again within the notion 
of incrementalism and Pareto's 80/20 Rule, how do we address a 
reasonable majority of the needs. How do we get the biggest bang for the 
the investment of time.

>> Perhaps ?help could be augmented a bit in elucidating some of these 
>> issues. The See Also there does not reference apropos() for example 
>> and it might be worthwhile adding something along the lines of the 
>> bullets in the "Do your homework before posting" section in the 
>> Posting Guide. Thus ?help can become something of a "first place to 
>> look - local centralized help resource" for users to identify the 
>> tiered help resources that are available and to also provide a 
>> framework for how to use those resources. One could also have links to 
>> the online pages for R News, R Books, the R Wiki, the R Graph Gallery, 
>> Contributed Documentation, Bioconductor and Other Documentation, so 
>> that users become more aware of help resources beyond the 
>> documentation installed with R by default.
>>   
> 
> Those are probably good ideas, but my guess would be that few users read 
> ?help.

As I note above, somehow we need to get users to look to a central 
resource that is platform independent. That resource should include some 
type of overview of the local and online help resources that are 
available for R, and perhaps a suggested hierarchy of use.

It seems logical to me that such a resource be embedded up front in 
"Intro" with it also being included within the existing help system and 
referenced in the start up banner message.

>> A longer term plan could be to look to consolidate some of these 
>> functions into a single help/search function perhaps circa R version 
>> 3.0.0. That would enable some time for thoughtful consideration and 
>> feedback.
>>   
> 
> As all the recent bug reports show, we don't really get feedback until 
> code is released, so there's not much of an advantage of 3.0.0 (unless 
> we really break the current system) over 2.8.0.

Good point.

Regards,

Marc