[Rd] Rgeneric.py assists in rearranging generic function definitions [inline]
Ross Boylan
ross at biostat.ucsf.edu
Mon Jan 25 20:46:55 CET 2010
On Thu, 2010-01-21 at 11:38 -0800, Ross Boylan wrote:
> I've attached a script I wrote that pulls all the setGeneric definitions
> out of a set of R files and puts them in a separate file, default
> allGenerics.R. I thought it might help others who find themselves in a
> similar situation.
>
> The "situation" was that I had to change the order in which files in my
> package were parsed; the scheme in which the generic definition is in
> the "first" file that has the corresponding setMethod breaks under
> re-ordering. So I pulled out all the definitions and put them first.
>
> In retrospect, it is clearly preferable to create allGenerics.py from
> the start. If you didn't, and discover you should have, the script
> automates the conversion.
>
> Thanks to everyone who helped me with my packaging problems. The
> package finally made it to CRAN as
> http://cran.r-project.org/web/packages/mspath/index.html. I'll send a
> public notice of that to the general R list.
>
> Ross Boylan
Apparently the attachment didn't make it through. I've pasted
Rgeneric.py below.
#! /usr/bin/python
# python 2.5 required for with statement
from __future__ import with_statement
# Rgeneric.py extracts setGeneric definitions from R sources and
# writes them to a special file, while removing them from the
# original.
#
# Context: In a system with several R files, having generic
# definitions sprinkled throughout, there are errors arising from the
# sequencing of files, or of definitions within files. In general,
# changing the order in which files are parsed (e.g., by the Collate:
# filed in DESCRIPTION) will break things even when they were
# working. For example, a setMethod may occur before the
# corresponding setGeneric, and then fail. Given that it is not safe
# to call setGeneric twice for the same function, the cleanest
# solution may be to move all the generic definitions to a separate
# file that will be read before any of the setMethod's. Rgeneric.py
# helps automate that process.
#
# It is, of course, preferable not to get into this situation in the
# first place, for example by creating an allGenerics.R file as you
# go.
# Typical useage: ./Rgeneric.py *.R
# Will create allGenerics.R with all the extracted generic
# definitions, including any preceding comments.
# Rewrites the *.R files, replacing the setGeneric's with comments
# indicating the generic has moved to allGenerics.py.
# *.R.old has the original .R files.
#
# The program does not work for all conceivable styles. In
# particular, it assumes that
# 1. setGeneric is immediately followed by an open parenthesis and
# a quoted name of the function. Subsequent parts of the
# definition may be split across lines and have interspersed
# comments.
#
# 2. Comments precede the definition. They are optional, and will
# be left in place in the .R file and copied to allGenerics.R.
#
# 3. If you first define an ordinary function foo, and then do
# setGeneric("foo") the setGeneric will be moved to
# allGenerics.R. It will not work properly there; you should
# make manual adjustments such as moving it back to the
# original. The code at the bottom reports on all such
# definitions, and then lists all the generic functions processed.
#
# 4. allGenerics.R will contain generic definitions in the order of
# files examined, and in the order they are defined within the
# file. This is to preserve context for the comments, in
# particular for comments which apply to a block of
# definitions. If you would like something else, e.g.,
# alphabetical ordering, you should post-process the AllForKey
# object created at the bottom of this file.
#
# There are program (not command line) options to do a read-only scan,
# and a class to hold the results, which can be inspected in various
# ways.
# Copyright 2010 Regents of University of California
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# See <http://www.gnu.org/licenses/> for the full license.
# Author: Ross Boylan <ross at biostat.ucsf.edu>
#
# Revision History:
#
# 1.0 2010-01-21 Initial release.
import os, os.path, re, sys
class ParseGeneric:
"""Extract setGeneric functions and preceding comments in one file.
states of the parser:
findComment -- look for start of comment
inComment -- found comment; accumulate and look for end
inGeneric -- extract setGeneric definition.
Typical use:
p = ParseGeneric()
results = p.parse("myfile.R")
or
p.parse("myfile.R")
results = p.generics()
"""
def __init__(self):
self.reStartComment = re.compile(r"^\s*#")
self.reInComment = re.compile(r"^(\s*#)|(\s*$)", re.DOTALL)
self.reStartGeneric = re.compile(r"^([^#]*)(\s*setGeneric\(\"([^\"]+)\".*)$", re.DOTALL)
self._gfname = "allGenerics.R"
if os.path.exists(self._gfname):
os.remove(self._gfname)
def parse(self, fname, makeOutput=True):
"parse the entire file. Return list of generics"
self._fname = fname
self._state = self.findComment
self._generics = [] # results will go here
self._currentGeneric = None # holds current parse
if makeOutput:
ofname = fname+".new"
self._ofname = ofname
self._ofile = open(ofname, "w")
self._gfile = open(self._gfname, "a")
else:
self._ofname = None
self._ofile = None
self._gfile = None
try:
with open(fname, "r") as fin:
if self._gfile:
self._gfile.write("\n\n########## generics from %s #############\n\n"%fname)
for line in fin:
self._state(line)
return self.generics()
finally:
if makeOutput:
self.cleanup()
def cleanup(self):
"Final processing when we output a revised file"
if self._ofile:
self._ofile.close()
self._ofile = None
if self._gfile:
self._gfile.close()
self._gfile = None
backupName = self.fileName()+".old"
if os.path.exists(backupName):
os.remove(backupName)
# on Unix, but not MS Windows, preceding step is unnecessary
os.rename(self.fileName(), backupName)
os.rename(self._ofname, self.fileName())
def fileName(self):
return self._fname
def write(self, line):
if self._ofile:
self._ofile.write(line)
def stripGeneric(self, pre, name):
""""strip generic function name from file.
pre is preceding material on line, before setGeneric."""
if not self._ofile:
return
if pre and not pre.isspace():
self._ofile.write(pre+"\n")
self._ofile.write("# %s generic definition stripped out"%name)
if self._gfname and self._gfile:
self._ofile.write(" and put in %s.\n"%(self._gfname))
else:
self._ofile.write(".\n")
def currentGeneric(self):
"Return current generic, creating it if necessary--for internal use"
s = self._currentGeneric
if s:
return s
s = SetGeneric(self.fileName())
self._currentGeneric = s
return s
def findComment(self, line):
"look for start of a comment"
if self.reStartComment.match(line):
self.currentGeneric().addComment(line)
self._state = self.inComment
if not self.checkGeneric(line):
self.write(line)
def inComment(self, line):
"scan through a comment"
if self.reInComment.match(line):
self.currentGeneric().addComment(line)
elif self.checkGeneric(line):
return
else:
self._state = self.findComment
self._currentGeneric = None
self.write(line)
def checkGeneric(self, line):
"True if line starts generic definition"
m = self.reStartGeneric.match(line)
if m:
self._state = self.inGeneric
self._parenDepth = 0
self._commas = 0
self.stripGeneric(m.group(1), m.group(3))
self.currentGeneric().setName(m.group(3))
self.inGeneric(m.group(2))
return True
return False
def inGeneric(self, line):
"extract entire generic definition"
i = 1 # 1 past current parse position
for c in line:
i += 1
if c == "(":
self._parenDepth += 1
elif c == ",":
self._commas += 1
elif c == ")":
self._parenDepth -= 1
if self._parenDepth <= 0:
self.currentGeneric().addDef(line[0:i])
post = line[i:len(line)]
if not post.isspace():
self.write(post)
return self.makeGeneric(self._commas)
self.currentGeneric().addDef(line)
def makeGeneric(self, ncommas):
"Record generic based on _currentGeneric. It has ncommas+1 arguments"
self.currentGeneric().setNargs(ncommas+1)
self._generics.append(self.currentGeneric())
if self._gfile:
self._gfile.write("%s\n"%(self.currentGeneric().asText()))
self._currentGeneric = None
self._state = self.findComment
def generics(self):
"return list of SetGeneric instances I found"
return self._generics
class SetGeneric:
"""Describes a single generic function definition."""
def __init__(self, sourceFile):
"sourceFile <String> where this generic was defined"
self._comment = []
self._code = []
self._file = sourceFile
def addComment(self, line):
"Add a line that is a comment"
self._comment.append(line)
def addDef(self, line):
self._code.append(line)
def setName(self, genericName):
"set name of function being defined"
self._name = genericName
def setNargs(self, nargs):
self._nargs = nargs
def isFull(self):
"""True if generic definition is complete in itself,
rather than relying on an existing regular function definition."""
return self._nargs > 1
def name(self):
return self._name
def file(self):
"<String> file name where generic was defined"
return self._file
def hasComment(self):
"True if there is a comment defined for me"
return len(self._comment) > 0
def comment(self):
"return comment as a (possibly multi-line) string"
return "".join(self._comment)
def code(self):
"return definition as (possibly multi-line) string"
return "".join(self._code)
def asText(self):
return self.comment() + self.code()
def __str__(self):
"Summary description"
if self.hasComment():
wc = "with"
else:
wc = "without"
if self.isFull():
f = ""
else:
f = "(need prior plain fn def)"
return "setGeneric(%s) %s comment %s"%(self.name(), wc, f)
class AllForKey:
"track situations in which there may be more than one entry per key"
def __init__(self):
"values are lists containing the real values"
self._dict = dict()
def addKey(self, key, value):
vs = self._dict.setdefault(key, [])
vs.append(value)
def keys(self):
return self._dict.keys()
def values(self, key):
"return LIST of values for key"
return self._dict[key]
def duplicateKeys(self):
"return a list of all keys with multiple entries"
ks = [ k for (k, v) in self._dict.iteritems() if len(v)>1 ]
return ks
p = ParseGeneric()
all = AllForKey()
for fn in sys.argv[1:len(sys.argv)]:
#print fn
xs = p.parse(fn)
for x in xs:
all.addKey(x.name(), x)
dups = all.duplicateKeys()
if dups:
print "There were duplicates."
dups.sort()
for k in (dups[0:1]):
print "%s: "%k ,
for v in all.values(k):
print "%s "%(v.file()) ,
print
else:
print "No Duplicates"
keys = all.keys()
keys.sort()
print "Report for all definitions"
for key in keys:
print "%s: "%key ,
for v in all.values(key):
wc = "" if v.hasComment() else "without comment "
f = "" if v.isFull() else "PARTIAL DEFINITION "
if not v.isFull():
print "%s%s in %s; "%(f, wc , v.file()) ,
print
More information about the R-devel
mailing list