[Rd] Rgeneric.py assists in rearranging generic function definitions [inline]

Ross Boylan ross at biostat.ucsf.edu
Mon Jan 25 20:46:55 CET 2010


On Thu, 2010-01-21 at 11:38 -0800, Ross Boylan wrote:
> I've attached a script I wrote that pulls all the setGeneric definitions
> out of a set of R files and puts them in a separate file, default
> allGenerics.R.  I thought it might help others who find themselves in a
> similar situation.
> 
> The "situation" was that I had to change the order in which files in my
> package were parsed; the scheme in which the generic definition is in
> the "first" file that has the corresponding setMethod breaks under
> re-ordering.  So I pulled out all the definitions and put them first.
> 
> In retrospect, it is clearly preferable to create allGenerics.py from
> the start.  If you didn't, and discover you should have, the script
> automates the conversion.
> 
> Thanks to everyone who helped me with my packaging problems.  The
> package finally made it to CRAN as
> http://cran.r-project.org/web/packages/mspath/index.html.  I'll send a
> public notice of that to the general R list.
> 
> Ross Boylan

Apparently the attachment didn't make it through. I've pasted
Rgeneric.py below.
#! /usr/bin/python
# python 2.5 required for with statement
from __future__ import with_statement

# Rgeneric.py extracts setGeneric definitions from R sources and 
# writes them to a special file, while removing them from the
# original.
#
# Context: In a system with several R files, having generic
# definitions sprinkled throughout, there are errors arising from the
# sequencing of files, or of definitions within files.  In general,
# changing the order in which files are parsed (e.g., by the Collate:
# filed in DESCRIPTION) will break things even when they were
# working.  For example, a setMethod may occur before the
# corresponding setGeneric, and then fail.  Given that it is not safe
# to call setGeneric twice for the same function, the cleanest
# solution may be to move all the generic definitions to a separate
# file that will be read before any of the setMethod's.  Rgeneric.py
# helps automate that process.
#
# It is, of course, preferable not to get into this situation in the
# first place, for example by creating an allGenerics.R file as you
# go.

# Typical useage: ./Rgeneric.py *.R
# Will create allGenerics.R with all the extracted generic
# definitions, including any preceding comments.
# Rewrites the *.R files, replacing the setGeneric's with comments
# indicating the generic has moved to allGenerics.py.
# *.R.old has the original .R files.
#
# The program does not work for all conceivable styles.  In
# particular, it assumes that
#    1. setGeneric is immediately followed by an open parenthesis and
#       a quoted name of the function.  Subsequent parts of the
#       definition may be split across lines and have interspersed
#       comments.
#
#    2. Comments precede the definition.  They are optional, and will
#       be left in place in the .R file and copied to allGenerics.R.
#
#    3. If you first define an ordinary function foo, and then do
#       setGeneric("foo") the setGeneric will be moved to
#       allGenerics.R.  It will not work properly there; you should
#       make manual adjustments such as moving it back to the
#       original.  The code at the bottom reports on all such
#       definitions, and then lists all the generic functions processed.
#
#    4. allGenerics.R will contain generic definitions in the order of
#       files examined, and in the order they are defined within the
#       file.  This is to preserve context for the comments, in
#       particular for comments which apply to a block of
#       definitions.  If you would like something else, e.g.,
#       alphabetical ordering, you should post-process the AllForKey
#       object created at the bottom of this file.


#
# There are program (not command line) options to do a read-only scan,
# and a class to hold the results, which can be inspected in various
# ways.

#     Copyright 2010 Regents of University of California
#
#     This program is free software: you can redistribute it and/or modify
#     it under the terms of the GNU General Public License as published by
#     the Free Software Foundation, either version 3 of the License, or
#     (at your option) any later version.
#
#     This program is distributed in the hope that it will be useful,
#     but WITHOUT ANY WARRANTY; without even the implied warranty of
#     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#     GNU General Public License for more details.
#
#     See <http://www.gnu.org/licenses/> for the full license.

# Author: Ross Boylan <ross at biostat.ucsf.edu>
#
# Revision History:
#
# 1.0 2010-01-21 Initial release.

import os, os.path, re, sys

class ParseGeneric:
    """Extract setGeneric functions and preceding comments in one file.
    states of the parser:
    findComment -- look for start of comment
    inComment -- found comment; accumulate and look for end
    inGeneric -- extract setGeneric definition.

    Typical use:
    p = ParseGeneric()
    results = p.parse("myfile.R")
    
    or 
    p.parse("myfile.R")
    results = p.generics()
    """

    def __init__(self):
        self.reStartComment = re.compile(r"^\s*#")
        self.reInComment = re.compile(r"^(\s*#)|(\s*$)", re.DOTALL)
        self.reStartGeneric = re.compile(r"^([^#]*)(\s*setGeneric\(\"([^\"]+)\".*)$", re.DOTALL)
        self._gfname = "allGenerics.R"
        if os.path.exists(self._gfname):
            os.remove(self._gfname)

    def parse(self, fname, makeOutput=True):
        "parse the entire file.  Return list of generics"
        self._fname = fname
        self._state = self.findComment
        self._generics = []  # results will go here
        self._currentGeneric = None  # holds current parse
        if makeOutput:
            ofname = fname+".new"
            self._ofname = ofname
            self._ofile = open(ofname, "w")
            self._gfile = open(self._gfname, "a")
        else:
            self._ofname = None
            self._ofile = None
            self._gfile = None
        try:
            with open(fname, "r") as fin:
                if self._gfile:
                    self._gfile.write("\n\n########## generics from %s #############\n\n"%fname)
                for line in fin:
                    self._state(line)
                return self.generics()
        finally:
            if makeOutput:
                self.cleanup()

    def cleanup(self):
        "Final processing when we output a revised file"
        if self._ofile:
            self._ofile.close()
            self._ofile = None
        if self._gfile:
            self._gfile.close()
            self._gfile = None
        backupName = self.fileName()+".old"
        if os.path.exists(backupName):
            os.remove(backupName)
        # on Unix, but not MS Windows, preceding step is unnecessary
        os.rename(self.fileName(), backupName)
        os.rename(self._ofname, self.fileName())

    def fileName(self):
        return self._fname

    def write(self, line):
        if self._ofile:
            self._ofile.write(line)

    def stripGeneric(self, pre, name):
        """"strip generic function name from file.  
        pre is preceding material on line, before setGeneric."""
        if not self._ofile:
            return
        if pre and not pre.isspace():
            self._ofile.write(pre+"\n")
        self._ofile.write("# %s generic definition stripped out"%name)
        if self._gfname and  self._gfile:
            self._ofile.write(" and put in %s.\n"%(self._gfname))
        else:
            self._ofile.write(".\n")

    def currentGeneric(self):
        "Return current generic, creating it if necessary--for internal use"
        s = self._currentGeneric
        if s:
            return s
        s = SetGeneric(self.fileName())
        self._currentGeneric = s
        return s

    def findComment(self, line):
        "look for start of a comment"
        if self.reStartComment.match(line):
            self.currentGeneric().addComment(line)
            self._state = self.inComment
        if not self.checkGeneric(line):
            self.write(line)

    def inComment(self, line):
        "scan through a comment"
        if self.reInComment.match(line):
            self.currentGeneric().addComment(line)
        elif self.checkGeneric(line):
            return
        else:
            self._state = self.findComment
            self._currentGeneric = None
        self.write(line)

    def checkGeneric(self, line):
        "True if line starts generic definition"
        m = self.reStartGeneric.match(line)
        if m:
            self._state = self.inGeneric
            self._parenDepth = 0
            self._commas = 0
            self.stripGeneric(m.group(1), m.group(3))
            self.currentGeneric().setName(m.group(3))
            self.inGeneric(m.group(2))
            return True
        return False

    def inGeneric(self, line):
        "extract entire generic definition"
        i = 1  # 1 past current parse position
        for c in line:
            i += 1
            if c == "(":
                self._parenDepth += 1
            elif c == ",":
                self._commas += 1
            elif c == ")":
                self._parenDepth -= 1
                if self._parenDepth <= 0:
                    self.currentGeneric().addDef(line[0:i])
                    post = line[i:len(line)]
                    if not post.isspace():
                        self.write(post)
                    return self.makeGeneric(self._commas)
        self.currentGeneric().addDef(line)

    def makeGeneric(self, ncommas):
        "Record generic based on _currentGeneric.  It has ncommas+1 arguments"
        self.currentGeneric().setNargs(ncommas+1)
        self._generics.append(self.currentGeneric())
        if self._gfile:
            self._gfile.write("%s\n"%(self.currentGeneric().asText()))
        self._currentGeneric = None
        self._state = self.findComment

    def generics(self):
        "return list of SetGeneric instances I found"
        return self._generics

class SetGeneric:
    """Describes a single generic function definition."""
    
    def __init__(self, sourceFile):
        "sourceFile <String> where this generic was defined"
        self._comment = []
        self._code = []
        self._file = sourceFile

    def addComment(self, line):
        "Add a line that is a comment"
        self._comment.append(line)

    def addDef(self, line):
        self._code.append(line)

    def setName(self, genericName):
        "set name of function being defined"
        self._name = genericName

    def setNargs(self, nargs):
        self._nargs = nargs

    def isFull(self):
        """True if generic definition is complete in itself,
        rather than relying on an existing regular function definition."""
        return self._nargs > 1

    def name(self):
        return self._name

    def file(self):
        "<String> file name where generic was defined"
        return self._file

    def hasComment(self):
        "True if there is a comment defined for me"
        return len(self._comment) > 0

    def comment(self):
        "return comment as a (possibly multi-line) string"
        return "".join(self._comment)

    def code(self):
        "return definition as (possibly multi-line) string"
        return "".join(self._code)

    def asText(self):
        return self.comment() + self.code()

    def __str__(self):
        "Summary description"
        if self.hasComment():
            wc = "with"
        else:
            wc = "without"
        if self.isFull():
            f = ""
        else:
            f = "(need prior plain fn def)"
        return "setGeneric(%s) %s comment %s"%(self.name(), wc, f)

class AllForKey:
    "track situations in which there may be more than one entry per key"
    def __init__(self):
        "values are lists containing the real values"
        self._dict = dict()

    def addKey(self, key, value):
        vs = self._dict.setdefault(key, [])
        vs.append(value)

    def keys(self):
        return self._dict.keys()

    def values(self, key):
        "return LIST of values for key"
        return self._dict[key]

    def duplicateKeys(self):
        "return a list of all keys with multiple entries"
        ks = [ k for (k, v) in self._dict.iteritems() if len(v)>1 ] 
        return ks

p = ParseGeneric()
all = AllForKey()
for fn in sys.argv[1:len(sys.argv)]:
    #print fn
    xs = p.parse(fn)
    for x in xs:
        all.addKey(x.name(), x)
dups = all.duplicateKeys()
if dups:
    print "There were duplicates."
    dups.sort()
    for k in (dups[0:1]):
        print "%s: "%k ,
        for v in all.values(k):
            print "%s "%(v.file()) ,
        print
else:
    print "No Duplicates"
keys = all.keys()
keys.sort()
print "Report for all definitions"
for key in keys:
    print "%s: "%key ,
    for v in all.values(key):
        wc = "" if v.hasComment() else "without comment "
        f = "" if v.isFull() else "PARTIAL DEFINITION "
        if not v.isFull():
            print "%s%s in %s; "%(f, wc , v.file()) ,
    print



More information about the R-devel mailing list