[R] Perl vs. R

Wed Jun 12 18:19:32 CEST 2002

Just for completeness, I'll add that one can invoke Perl routines
directly from R and vice-versa using the RSPerl package.  If one
already has code in Perl to do a particular task, depending on the
specifics of the problem, one can get the best of both worlds by just
calling that from R and getting the results directly back from the
Perl interpreter as R objects.  Reading output from Perl via
connections in R is also a reasonable approach depending, and both of
these may become more important as the complexity of the recoding in R
grows. 

 D.

John Day wrote:
> Don,
> Thanks. I'm sure I am not the only who is learning a lot from this 
> benchmark problem. It's very simple but is an abstract model for virtually 
> all apps: open a file, do something to it, write out a new version of it etc.
> John Day
> 
> At 07:14 AM 6/12/02 -0700, you wrote:
> >I believe the following 5 lines do the job (if it looks like 6 lines, the 
> >email software made the for() loop line into two lines).
> >
> >inp <- scan('perlcomp.dat',what=list(a='',b=''),sep='\t')
> >foo <- split(inp$a,inp$b)
> >sink('pcmp.out')
> >for (i in seq(foo)) 
> >cat(names(foo)[1],'\t',paste(sort(foo[[i]]),collapse='\t'),'\n',sep='')
> >sink()
> >
> >I tried
> >   lapply(split(inp$a,inp$b),function(x) cat(names(x),sort(x),'\n'))
> >instead of the for() loop, but the names(x) part doesn't pick up the 
> >values of B as needed. But it doesn't matter, because the for() loop is 
> >just as fast.
> >
> >Without the overhead of starting up R,
> >
> >>  system.time(source('pcomp.r'))
> >Read 5642 records
> >[1] 1.51 0.00 1.72 0.00 0.00
> >
> >On 466 mHz G4 Macintosh
> >
> >>  version
> >          _
> >platform powerpc-apple-darwin5.5
> >arch     powerpc
> >os       darwin5.5
> >system   powerpc, darwin5.5
> >status
> >major    1
> >minor    5.0
> >year     2002
> >month    04
> >day      29
> >language R
> >
> >
> >-Don
> >
> >At 7:23 AM -0400 6/12/02, John Day wrote:
> >>Prof. Bates,
> >>
> >>Thanks for the pointers. I ran your two-liner (the args to write.table() 
> >>needed to be swapped) and noted the runtime to be about 0.9 secs in CMD 
> >>BATCH mode, several times slower than the Perl. You were right.
> >>
> >>Actually, the code is not correct. The  specification required the 
> >>benchmark code to collect the fields in A and use the 1301 unique codes 
> >>in B as a key to retrieve the A's appended and sorted in a list. That 
> >>might require an explicit loop, which will slow it down even more.
> >>
> >>But even then, for research and learning purposes, I think I could live 
> >>with this sluggish performance most of the time, just to avoid having to 
> >>interface with Perl. It's very convenient to do everything in R. Maybe 
> >>occasionally use Perl where performance demands it etc.
> >>
> >>I have the new John Fox book on order. But will try to find a copy of 
> >>Venables-Ripley too. I don't have S-Plus, I thought the Fox book might be 
> >>better for R-only users.
> >>
> >>I also want to study Pinheiro-Bates, but must wait until I have grasped 
> >>the basics.
> >>
> >>Thanks,
> >>John Day
> >>At 11:14 AM 6/11/02 -0500, you wrote:
> >>>John Day <jday at csihq.com> writes:
> >>>
> >>>>  I am being told that R can process text files and strings as well as
> >>>>  Perl (and is certainly more elegant).
> >>>
> >>>"as well as" is in the eye of the beholder.  Perl is very highly tuned
> >>>to manipulating text files.  One story of how the name perl came about
> >>>is as an acronym for "Practical Extraction and Report Language".
> >>>
> >>>R is an environment for statistical computing and graphics.  Although
> >>>there are pattern matching and text substitution functions in R, it is
> >>>not well suited to writing "one-off" text transformation programs.
> >>>You will find that starting R probably takes longer than the execution
> >>>of the perl program.
> >>>
> >>>Rather than trying to take a simple benchmark and see how R performs
> >>>on it, it would be better to learn about the language and see if it
> >>>fulfills a real need for you.  I would suggest starting with Venables
> >>>and Ripley's "Modern Applied Statistics with S-PLUS (3rd ed)" or the
> >>>eagerly-awaited fourth edition of that book slated for publication
> >>>this summer.
> >>>
> >>>Having said all this, I believe your perl program can be coded in R as
> >>>something like
> >>>
> >>>   df <- read.table('infile', header = FALSE, sep = '\t', col = c('a', 'b'))
> >>>   write.table('outfile', df[order(df$b), c('b', 'a')])
> >>>
> >>>although I think it would be better for you to describe what the task
> >>>is rather than providing perl code to accomplish the task.  I long ago
> >>>gave up reading other people's perl code and trying to make sense of
> >>>it.  (In the Python community there is a saying that "Hell is reading
> >>>other people's Perl code".)
> >>>
> >>>>  Being an R neophyte I need a little boost to get started. I have a
> >>>>  little benchmark program in Perl that reads a delimited file, creates
> >>>>  an inverted table and spits the file out again in key sorted order.
> >>>  >
> >>>  >
> >>>  > It's just a few lines of Perl (see below). Can someone write the
> >>>  > equivalent in R? The benchmark and associated files are available
> >>>  > from: http://www.lib.uchicago.edu/keith/crisis/benchmarks/invert/
> >>>>
> >>>>
> >>>>  You'll note on this page that Perl runs the benchmark in 3.5
> >>>>  secs. That was in 1997. My 5.6.1 version of Perl runs it in 0.18 secs
> >>>>  now, on my 600Mhz Linux platform. Wondering how fast R will be in
> >>>>  comparison.
> >>>>
> >>>>
> >>>>  Thanks,
> >>>>  John Day
> >>>>
> >>>>  FYI, here's the Perl source:
> >>>>
> >>>>  #!/local/bin/perl
> >>>>  # invert benchmark in Perl
> >>>>  # see <url:http://www.lib.uchicago.edu/keith/crisis/benchmarks/invert/
> >>>>  # Keith Waclena <k-waclena at uchicago.edu>
> >>>>
> >>>>  while (<STDIN>) {
> >>>>       chop;
> >>>>       ($a, $b) = split(/\t/);
> >>>>       $B{$b} .= "\t$a";       # gotta lose leading tab later...
> >>>>  }
> >>>>
> >>>>  foreach $b (sort keys %B) {
> >>>>       # lose the leading tab with substr...
> >>>>       print "$b\t" . join("\t", sort(split(/\t/, substr($B{$b}, 1)))) 
> >>>> . "\n";
> >>>>  }
> >>>>
> >>>>-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> >>>>  r-help mailing list -- Read 
> >>>> http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> >>>>  Send "info", "help", or "[un]subscribe"
> >>>>  (in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
> >>>>_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
> >>
> >>-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> >>r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> >>Send "info", "help", or "[un]subscribe"
> >>(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
> >>_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
> >
> >
> >--
> >--------------------------------------
> >Don MacQueen
> >Environmental Protection Department
> >Lawrence Livermore National Laboratory
> >Livermore, CA, USA
> >--------------------------------------
> 
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

-- 
_______________________________________________________________

Duncan Temple Lang                duncan at research.bell-labs.com
Bell Labs, Lucent Technologies    office: (908)582-3217
700 Mountain Avenue, Room 2C-259  fax:    (908)582-3340
Murray Hill, NJ  07974-2070       
         http://cm.bell-labs.com/stat/duncan
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._