[Rd] Argument recycling in substring()

William Dunlap wdunlap at tibco.com
Fri Jun 4 17:56:58 CEST 2010


> -----Original Message-----
> From: r-devel-bounces at r-project.org 
> [mailto:r-devel-bounces at r-project.org] On Behalf Of Martin Maechler
> Sent: Friday, June 04, 2010 2:46 AM
> To: Hervé Pagès
> Cc: r-devel at stat.math.ethz.ch
> Subject: Re: [Rd] Argument recycling in substring()
> 
> >>>>> "HP" == Hervé Pagès <hpages at fhcrc.org>
> >>>>>     on Thu, 03 Jun 2010 17:53:33 -0700 writes:
> 
>     HP> Hi,
>     HP> According to its man page substring() "expands (its) arguments
>     HP> cyclically to the length of the longest _provided_ none are of
>     HP> zero length".
> 
>     HP> So, as expected, I get an error here:
> 
>     >> substring("abcd", first=2L, last=integer(0))
>     HP> Error in substring("abcd", first = 2L, last = integer(0)) :
>     HP> invalid substring argument(s)
> 
>     HP> But I don't get one here:
> 
>     >> substring(character(0), first=1:2, last=3L)
>     HP> character(0)
> 
>     HP> which is unexpected. according to the docu.
> 
> My gut feeling would say that the documentation should be
> updated in this case, rather than the implementation.
> 
> RFC! other opinions?

I think it would be nice if multiargument vectorized
functions in core R used the rules that are used by
the arithmetic functions (`+`, etc.):
   a) if any argument length is 0, then the output
      length is 0
   b) otherwise the output is the length of the longest
      input
The arithmetic functions also warn if the output length
is not a multiple of some input length.   (They actually
warn 'longer ... length is not a multiple of shorter ...'
and I'm extrapolating that to more than two arguments.)
Most other multi-vectorized functions (e.g., log, pnorm)
don't currently warn.

If they all followed the same rules then it would be easier
to write code involving unfamiliar functions.  The rule
could be stated in one help file and a help file for a
given function could say that arguments x, y, and z,
but not a or b, are 'vectorized', with a link to the one
help file describing vectorization.  Even better, the C
and C++ API's could be expanded to do the standard
multivectorization so not every function would do it
in its own way.

Some functions cannot be changed to follow that rule because
it would break too much code (e.g., paste() and cat()).
However, why shouldn't substring return character(0) if
any argument is 0 long?

By the way, the 'zero rule' is there so we don't have to
write so many if(length(x)>0) statements around things like
    which(x) + 1
or
    substring(x, 1, nchar(x)-1)
where the scalar 1 would otherwise cause NA's to arise.

[Perhaps I should not state my opinion so forcibly, since.
for legal reasons, I'm not in a position to change core R code.]

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

> 
> 
>     HP> Otherwise, yes substring() will recycle its arguments to the
>     HP> length of the longest:
> 
>     >> substring("abcd", first=1:3, last=4:3)
>     HP> [1] "abcd" "bc"   "cd"
> 
> 
> 
> 
>     HP> Cheers,
>     HP> H.
> 
>     HP> -- 
>     HP> Hervé Pagès
> 
>     HP> Program in Computational Biology
>     HP> Division of Public Health Sciences
>     HP> Fred Hutchinson Cancer Research Center
>     HP> 1100 Fairview Ave. N, M2-B876
>     HP> P.O. Box 19024
>     HP> Seattle, WA 98109-1024
> 
>     HP> E-mail: hpages at fhcrc.org
>     HP> Phone:  (206) 667-5791
>     HP> Fax:    (206) 667-1319
> 
>     HP> ______________________________________________
>     HP> R-devel at r-project.org mailing list
>     HP> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 



More information about the R-devel mailing list