[R] Perl vs. R

Don MacQueen macq at llnl.gov
Wed Jun 12 16:14:04 CEST 2002


I believe the following 5 lines do the job (if it looks like 6 lines, 
the email software made the for() loop line into two lines).

inp <- scan('perlcomp.dat',what=list(a='',b=''),sep='\t')
foo <- split(inp$a,inp$b)
sink('pcmp.out')
for (i in seq(foo)) 
cat(names(foo)[1],'\t',paste(sort(foo[[i]]),collapse='\t'),'\n',sep='')
sink()

I tried
   lapply(split(inp$a,inp$b),function(x) cat(names(x),sort(x),'\n'))
instead of the for() loop, but the names(x) part doesn't pick up the 
values of B as needed. But it doesn't matter, because the for() loop 
is just as fast.

Without the overhead of starting up R,

>  system.time(source('pcomp.r'))
Read 5642 records
[1] 1.51 0.00 1.72 0.00 0.00

On 466 mHz G4 Macintosh

>  version
          _                     
platform powerpc-apple-darwin5.5
arch     powerpc               
os       darwin5.5             
system   powerpc, darwin5.5    
status                         
major    1                     
minor    5.0                   
year     2002                  
month    04                    
day      29                    
language R                     


-Don

At 7:23 AM -0400 6/12/02, John Day wrote:
>Prof. Bates,
>
>Thanks for the pointers. I ran your two-liner (the args to 
>write.table() needed to be swapped) and noted the runtime to be 
>about 0.9 secs in CMD BATCH mode, several times slower than the 
>Perl. You were right.
>
>Actually, the code is not correct. The  specification required the 
>benchmark code to collect the fields in A and use the 1301 unique 
>codes in B as a key to retrieve the A's appended and sorted in a 
>list. That might require an explicit loop, which will slow it down 
>even more.
>
>But even then, for research and learning purposes, I think I could 
>live with this sluggish performance most of the time, just to avoid 
>having to interface with Perl. It's very convenient to do everything 
>in R. Maybe occasionally use Perl where performance demands it etc.
>
>I have the new John Fox book on order. But will try to find a copy 
>of Venables-Ripley too. I don't have S-Plus, I thought the Fox book 
>might be better for R-only users.
>
>I also want to study Pinheiro-Bates, but must wait until I have 
>grasped the basics.
>
>Thanks,
>John Day
>At 11:14 AM 6/11/02 -0500, you wrote:
>>John Day <jday at csihq.com> writes:
>>
>>>  I am being told that R can process text files and strings as well as
>>>  Perl (and is certainly more elegant).
>>
>>"as well as" is in the eye of the beholder.  Perl is very highly tuned
>>to manipulating text files.  One story of how the name perl came about
>>is as an acronym for "Practical Extraction and Report Language".
>>
>>R is an environment for statistical computing and graphics.  Although
>>there are pattern matching and text substitution functions in R, it is
>>not well suited to writing "one-off" text transformation programs.
>>You will find that starting R probably takes longer than the execution
>>of the perl program.
>>
>>Rather than trying to take a simple benchmark and see how R performs
>>on it, it would be better to learn about the language and see if it
>>fulfills a real need for you.  I would suggest starting with Venables
>>and Ripley's "Modern Applied Statistics with S-PLUS (3rd ed)" or the
>>eagerly-awaited fourth edition of that book slated for publication
>>this summer.
>>
>>Having said all this, I believe your perl program can be coded in R as
>>something like
>>
>>   df <- read.table('infile', header = FALSE, sep = '\t', col = c('a', 'b'))
>>   write.table('outfile', df[order(df$b), c('b', 'a')])
>>
>>although I think it would be better for you to describe what the task
>>is rather than providing perl code to accomplish the task.  I long ago
>>gave up reading other people's perl code and trying to make sense of
>>it.  (In the Python community there is a saying that "Hell is reading
>>other people's Perl code".)
>>
>>>  Being an R neophyte I need a little boost to get started. I have a
>>>  little benchmark program in Perl that reads a delimited file, creates
>>>  an inverted table and spits the file out again in key sorted order.
>>  >
>>  >
>>  > It's just a few lines of Perl (see below). Can someone write the
>>  > equivalent in R? The benchmark and associated files are available
>>  > from: http://www.lib.uchicago.edu/keith/crisis/benchmarks/invert/
>>>
>>>
>>>  You'll note on this page that Perl runs the benchmark in 3.5
>>>  secs. That was in 1997. My 5.6.1 version of Perl runs it in 0.18 secs
>>>  now, on my 600Mhz Linux platform. Wondering how fast R will be in
>>>  comparison.
>>>
>>>
>>>  Thanks,
>>>  John Day
>>>
>>>  FYI, here's the Perl source:
>>>
>>>  #!/local/bin/perl
>>>  # invert benchmark in Perl
>>>  # see <url:http://www.lib.uchicago.edu/keith/crisis/benchmarks/invert/
>>>  # Keith Waclena <k-waclena at uchicago.edu>
>>>
>>>  while (<STDIN>) {
>>>       chop;
>>>       ($a, $b) = split(/\t/);
>>>       $B{$b} .= "\t$a";       # gotta lose leading tab later...
>>>  }
>>>
>>>  foreach $b (sort keys %B) {
>>>       # lose the leading tab with substr...
>>>       print "$b\t" . join("\t", sort(split(/\t/, substr($B{$b}, 
>>>1)))) . "\n";
>>>  }
>>>
>>> 
>>>-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
>>>  r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
>>>  Send "info", "help", or "[un]subscribe"
>>>  (in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
>>> 
>>>_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
>
>-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
>r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
>Send "info", "help", or "[un]subscribe"
>(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
>_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._


-- 
--------------------------------------
Don MacQueen
Environmental Protection Department
Lawrence Livermore National Laboratory
Livermore, CA, USA
--------------------------------------
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list