[R] The hidden costs of GPL software?

Patrick Burns pburns at pburns.seanet.com
Tue Nov 23 17:45:53 CET 2004


I think John has exactly the right image -- index to a book --
but I disagree with his conclusions.

I read somewhere that an index should not be done by the
author.  It was probably written by someone who was bored
of indexing, but the logic was precisely because indices should
be about concepts.  The author of a package will have one
concept for a function but not all of the concepts that come
from various fields of study.  I suspect that no one outside of
finance would think to index "sd" with "volatility" for (a not very
good) example.

There could be an index builder that accepts a search phrase and
the function or package that is the successful answer to the search.
If this were open, then R users could contribute to the index who
don't feel qualified to submit code. It could also help diffuse the
frustration of taking too long to find a function by allowing a way
to insure that the exact same thing doesn't happen to others.

Amazon has a function that says those who bought "The Chicago
Manual of Style" also bought Strunk and White.  In the same way,
the R index could provide a list of terms that overlap the given
search term.  For example if we search for "goodness of fit", then
"hypothesis test" might be one of the related terms that pops up.

No, I'm not volunteering to build the system.

Patrick Burns

Burns Statistics
patrick at burns-stat.com
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and "A Guide for the Unwilling S User")

John Fox wrote:

>Dear Duncan,
>
>I don't think that there is an automatic, nearly costless way of providing
>an effective solution to locating R resources. The problem seems to me to be
>analogous to indexing a book. There's an excellent description of what that
>process *should* look like in the Chicago Manual of Style, and it's a lot of
>work. In my experience, most book indexes are quite poor, and automatically
>generated indexes, while not useless, are even worse, since one should index
>concepts, not words. The ideal indexer is therefore the author of the book.
>
>I guess that the question boils down to how important is it to provide an
>analogue of a good index to R? As I said in a previous message, I believe
>that the current search facilities work pretty well -- about as well as one
>could expect of an automatic approach. I don't believe that there's an
>effective centralized solution, so doing something more ambitious than is
>currently available implies farming out the process to package authors. Of
>course, there's no guarantee that all package authors will be diligent
>indexers. 
>
>Regards,
> John
>
>--------------------------------
>John Fox
>Department of Sociology
>McMaster University
>Hamilton, Ontario
>Canada L8S 4M4
>905-525-9140x23604
>http://socserv.mcmaster.ca/jfox 
>-------------------------------- 
>
>  
>
>>-----Original Message-----
>>From: r-help-bounces at stat.math.ethz.ch 
>>[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Duncan Murdoch
>>Sent: Monday, November 22, 2004 8:55 AM
>>To: Cliff Lunneborg
>>Cc: r-help at stat.math.ethz.ch
>>Subject: Re: [R] The hidden costs of GPL software?
>>
>>On Fri, 19 Nov 2004 13:59:23 -0800, "Cliff Lunneborg"
>><cliff at ms.washington.edu> quoted John Fox:
>>
>>    
>>
>>>Why not, as previously has been proposed, replace the current static 
>>>(and, in my view, not very useful) set of keywords in R 
>>>      
>>>
>>documentation 
>>    
>>
>>>with the requirement that package authors supply their own 
>>>      
>>>
>>keywords for 
>>    
>>
>>>each documented object? I believe that this is the intent of the 
>>>concept entries in Rd files, but their use certainly is not 
>>>      
>>>
>>required or 
>>    
>>
>>>even actively encouraged. (They're just mentioned in passing in the 
>>>Writing R Extensions manual.
>>>      
>>>
>>That would not be easy and won't happen quickly.  There are some
>>problems:
>>
>> - The base packages mostly don't use  \concept. (E.g. base 
>>has 365 man pages, only about 15 of them use it).  Adding it 
>>to each file is a fairly time-consuming task.
>>
>>- Before we started, we'd need to agree as to what they are for.
>>Right now, I think they are mainly used when the name of a 
>>concept doesn't match the name of the function that 
>>implements it, e.g.
>>"modulo", "remainder", "promise", "argmin", "assertion".  The 
>>need for this usage is pretty rare.  If they were used for 
>>everything, what would they contain?
>>
>> - Keywording in a useful way is hard.  There are spelling 
>>issues (e.g. optimise versus optimize); our fuzzy matching 
>>helps with those.
>>But there are also multiple names for the same thing, and 
>>multiple meanings for the same name.
>>
>>Duncan Murdoch
>>
>>______________________________________________
>>R-help at stat.math.ethz.ch mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide! 
>>http://www.R-project.org/posting-guide.html
>>    
>>
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
>
>
>  
>




More information about the R-help mailing list