[Bioc-devel] TypeInfo

Martin Morgan mtmorgan at fhcrc.org
Fri Mar 9 01:31:35 CET 2007


> Question: I have a function
>
> setGeneric("vsn2",
>   function(x, reference, strata, ...)
>    standardGeneric("vsn2"))
>
> and methods like:
>
> setMethod("vsn2", "ExpressionSet",
>    function(x,  reference, strata, ...)
>       vsnMatrix(exprs(x),  reference, strata, ...))
>
> for which not _all_ arguments are typed in the signature (vsnMatrix is a
> 'normal' R function). As far as I understand (and it may be wrong),
> these arguments can also not be typed with TypeInfo. So is this style of
> programming discouraged?

Don't want to be the arbiter of good style. setGeneric has an argument
'signature' that can be used to indicate which of the named arguments
are actually used for dispatch (likely not reference or strata). Once
the args are present in the signature, you can write a TypeInfo
specification that allows for only valid values of reference or
strata. The '...' comes up again below, and probably reflects a
tradeoff between programmer convenience and user-friendliness.

> Martin wrote:
>> If you'd like an example,
>> point to a suitable function and I'll generate something
>
> The function "justvsn" would be appropriate.

Ah yes, a nice simple example. If you were doing this to use within
the context of R, then you might, in your package, simply

typeInfo(justvsn) <-
  SimultaneousTypeSpecification(
    TypedSignature(x="ExpressionSet"),
    returnType="ExpressionSet")

This says that justvsn's argument x must be (or inherit from) an
ExpressionSet, and that the return type is guaranteed to be an
ExpressionSet. Let's try:

> res <- justvsn(new("ExpressionSet", exprs=matrix(rnorm(50),10,5)))
vsn: 10 x 5 matrix (1 stratum). 100% done.
> justvsn(matrix(rnorm(50),10,5))
Error: TypeInfo could not match signature.
Supplied arguments and their types:
  x: matrix
  ...: name
Available signature(s):
  [SimultaneousTypeSpecification]
    [TypedSignature]
      x: is(x, c('ExpressionSet'))  [InheritsTypeTest] 
    returnType: is(returnType, c('ExpressionSet'))  [InheritsTypeTest] 

Looks good, basically. You can 'ask' justvsn about it's arguments

> typeInfo(justvsn)
An object of class "SimultaneousTypeSpecification"
[[1]]
[TypedSignature]
  x: is(x, c('ExpressionSet'))  [InheritsTypeTest] 

Slot "returnType":
An object of class "InheritsTypeTest"
[1] "ExpressionSet"

and this can be extracted programmatically (to be used to construct a
gui, for instance).

The justvsn manual implies (mistakenly ;) that the argument x can be
either ExpressionSet or matrix. We could add this type of restriction
with

typeInfo(justvsn) <-
  SimultaneousTypeSpecification(
    TypedSignature(x="ExpressionSet"),
    TypedSignature(x="matrix"),
    returnType="ExpressionSet")


which says that the arguments can be of type either x="ExpressionSet",
OR of type x="matrix"; regardless, the return type is
ExpressionSet. Unfortunately,

> justvsn(matrix(rnorm(50),10,5))
vsn: 10 x 5 matrix (1 stratum). 100% done.
Error in function (classes, fdef, mtable)  : 
	unable to find an inherited method for function "exprs<-", for signature "matrix", "matrix"

TypeInfo lets the matrix through, but justvsn tries to use exprs<- on
x!

There are fancier ways of adding signatures and specifying tests, and
in principle the tests that you perform (e.g., that there be more than
two columns in the exprs matrix) could be moved to TypeInfo. This
would, however, get in the way of some things, like programmatically
extracting information about your function parameters.

In terms of the original goal, we need to be a bit thoughtful. I guess
the ArrayExpress people would like this function to be exposed as a
web service. Well, generic users of the web likely don't have
ExpressionSet objects lying around. Further, the ... don't work well
for structuring user input or for documenting what parameters need to
be provided.

I'd create a parameter class to encapsulate ..., identifying likely
parameters and reasonable default values, maybe along the lines of:

setClass("JustVSNParameter",
         representation=representation(
           strata="factor",
           lts.quantile="numeric",
           subsample="integer"),
         prototype=prototype(
           strata=factor(integer(0), levels="all"),
           lts.quantile=1,
           subsample=0L))

I'd then write a wrapper around justvsn that exposed a reasonable data
type (maybe the ArrayExpress people want to have the lingua-franca
representation of expression data being MAGE objects, and they've
figured out how to write a converter between MAGE and a numeric matrix
of some kind?):

webVsn <- function(expressionMatrix, justVSNParameter) {
    klass <- class(justVSNParameter)
    args <- lapply(slotNames(klass),
                   function(elt) slot(justVSNParameter, elt))
    names(args) <- slotNames(klass)
    res <- do.call("justvsn",
                   c(new("ExpressionSet", exprs=expressionMatrix),
                     args))
    exprs(res)
}

typeInfo(webVsn) <-
  SimultaneousTypeSpecification(
    TypedSignature(
      expressionMatrix="matrix",
      justVSNParameter="JustVSNParameter"),
    returnType="matrix")

and then

> res <- webVsn(matrix(rnorm(50),10,5), new("JustVSNParameter"))
vsn: 10 x 5 matrix (1 stratum). 100% done.
 
There are a couple of additional points that need to be considered,
perhaps in consultation with the ArrayExpress people. One is that a
generic 'matrix' really over-states what justvsn is suitable for --
it's not any old matrix, but rather a matrix of expression values
collected from a single experiment on a common chip (etc.). You might
provide a class to convey this information,

setClass("WebVSNMatrix", contains="matrix")

and modify webVsn appropriately. As hinted at earlier, a better
solution might be a mapping between ExpressionSet and MAGE
(non-trivial for a complete mapping, but not so bad to translate just
the 'exprs' matrix into BioDataCube, and the reverse. You personally
wouldn't want to do this, but the ArrayExpress people might.

Another point is that 'matrix' and other R objects are not quite
strongly typed enough -- typeof(matrix(0)) is "double", whereas
typeof(matrix(0L)) is "integer", for instance. For this reason our
RWebServices package created classes IntegerMatrix, etc to map type
more strongly. Again this is an issue to discuss with ArrayExpress,
and the solution depends on how objects in R get mapped to Java.

For fun, I spent literally 10 minutes taking the stuff above and
putting it into a package WebVSN (with NAMESPACE and dependencies
including Biobase, TypeInfo, and vsn). I then ran RWebServices to
create first a Java representation of the data classes and a
'WebVSN' Java program capable of invoking the methods webVsn (matrix)
and webVsn1 (ExpressionSet), and second all the web bindings to expose
this as a regular web service. Here's the top of the Java class for
webVSN:

package org.bioconductor.packages.webVSN;

	/**
	* This file was auto-generated by R function 
	* createJavaBean Thu Mar  8 15:49:02 2007. 
	* It represents the S4 Class JustVSNParameter in R package WebVSN. 
	*/


public class JustVSNParameter extends org.bioconductor.packages.rservices.RObject  {
	private org.bioconductor.packages.rservices.RFactor strata;
	private org.bioconductor.packages.rservices.RNumeric ltsrquantile;
	private org.bioconductor.packages.rservices.RInteger subsample;

	public JustVSNParameter() {
		this.strata = new org.bioconductor.packages.rservices.RFactor();
		this.ltsrquantile = new org.bioconductor.packages.rservices.RNumeric();
		this.subsample = new org.bioconductor.packages.rservices.RInteger();
	}

Here's the top of the Java class for ExpressionSet

package org.bioconductor.packages.biobase;

	/**
	* This file was auto-generated by R function 
	* createJavaBean Thu Mar  8 15:49:02 2007. 
	* It represents the S4 Class ExpressionSet in R package Biobase. 
	*	Container for high-throughput assays and
	*	experimental metadata. ExpressionSet class is
	*	derived from eSet, and requires a matrix named
	*	exprs as assayData member.
	*/


public class ExpressionSet extends org.bioconductor.packages.rservices.RObject  {
	private org.bioconductor.packages.biobase.AssayDataFactory assayData;
	private org.bioconductor.packages.biobase.AnnotatedDataFrame phenoData;
	private org.bioconductor.packages.biobase.AnnotatedDataFrame featureData;
	private org.bioconductor.packages.biobase.MIAME experimentData;
	private org.bioconductor.packages.rservices.RChar annotation;
	private org.bioconductor.packages.biobase.Versions r__classVersion__;

	public ExpressionSet() {

You can see it's capturing the structure of the object, and even
extracted documentation from Biobase.

Here's an edited version of the service that gets exposed, with a
method for ExpressionSet and a method for matrix.

package org.bioconductor.rserviceJms.services.WebVSN;
import org.apache.activemq.ActiveMQConnectionFactory;
import javax.jms.ConnectionFactory;
import javax.jms.Connection;
import org.bioconductor.packages.rservices.RServicesConnection;
import java.util.*;


public class WebVSN {
	private ConnectionFactory connectionFactory;
	private String queueName;
	private long timeout;
	private org.bioconductor.packages.webVSN.WebVSN myWebVSN;

	public WebVSN() throws Exception {
    // snip -- RWebServices has a queue for multiple requests
	}

	public org.bioconductor.packages.biobase.ExpressionSet webVsn1(
        org.bioconductor.packages.biobase.ExpressionSet expressionMatrix,
        org.bioconductor.packages.webVSN.JustVSNParameter justVSNParameter)
      throws java.rmi.RemoteException {
      // snip
	}

	public org.bioconductor.packages.rservices.RMatrix webVsn(
        org.bioconductor.packages.rservices.RMatrix expressionMatrix,
        org.bioconductor.packages.webVSN.JustVSNParameter
        justVSNParameter)
      throws java.rmi.RemoteException {
      // snip
	}
}

Martin


Wolfgang Huber <huber at ebi.ac.uk> writes:

> Dear Robert and Martin,
>
> thanks for your answers! That is really encouraging and interesting.
> I had read the R News article / vignette (but then was surprised at
> seeing so little use in madman/Rpacks and in the svn log of the package
> directory.)
>
>
>> The input arguments to S4 methods are already strongly typed, so for
>> these methods the benefit of TypeInfo is restricted to specifying
>> return values.
>
> Question: I have a function
>
> setGeneric("vsn2",
>   function(x, reference, strata, ...)
>    standardGeneric("vsn2"))
>
> and methods like:
>
> setMethod("vsn2", "ExpressionSet",
>    function(x,  reference, strata, ...)
>       vsnMatrix(exprs(x),  reference, strata, ...))
>
> for which not _all_ arguments are typed in the signature (vsnMatrix is a
> 'normal' R function). As far as I understand (and it may be wrong),
> these arguments can also not be typed with TypeInfo. So is this style of
> programming discouraged?
>
>  Best wishes
>   Wolfgang
>
>
>
> Martin Morgan wrote:
>> Hi Wolfgang --
>> 
>> Yes, you should do it!
>> 
>> There's a vignette in TypeInfo written in the form of an R News
>> article (though not published as such) that outlines the basic steps &
>> benefits. 
>> 
>> 
>> It's used in our RWebServices packages, which should make it to Rpacks
>> real soon now.  We've exposed functionality of DNAcopy, PROcess and
>> affy as web services (http://cabig.bioconductor.org/axis/services;
>> affy is not there yet, these are not meant to be 'production'
>> services). There's a powerpoint presentation at
>> http://tinyurl.com/34u4lq discussing some issues; slides 7-15
>> illustrate some of the functionality, slides 27-28 sketch TypeInfo
>> approaches.
>> 
>> TypeInfo provides functionality that is useful for ensuring that
>> incoming and return arguments match specified type. This means that
>> the code in the functions can focus on the operations they perform,
>> rather than checking that the arguments are valid.
>> 
>> TypeInfo also provides the language 'reflection' that's necessary when
>> moving from a weakly typed language like R to strongly typed languages
>> like Java or xsd-based web representations of data and methods. It
>> also makes creation of, e.g., graphical widgets easier.
>> 
>> There are costs to TypeInfo, in that the argument and return types are
>> actually checked (could be computationally expensive, but probably not
>> in terms of the overall functionality of vsn). Also, TypeInfo offers
>> advanced features that are beyond what the ArrayExpress people want --
>> basically, you'll want to stick to SimulataneousTypeSpecification with
>> TypedSignature matching arguments to specific classes (TypeInfo could,
>> in principle, do things like make sure that the lengths of different
>> input vectors were the same, or that there were no NA values, or...)
>> 
>> Which also points to an important issue, that to be useful in the web
>> services environment TypeInfo has to identify either primitive types
>> (e.g., "numeric") or S4 classes; S3 classes are not structured enough
>> to be useful.
>> 
>> The input arguments to S4 methods are already strongly typed, so for
>> these methods the benefit of TypeInfo is restricted to specifying
>> return values.
>> 
>> In terms of the ArrayExpress request, my experience has been that the
>> web environment probably wants to 'see' only selected functions or
>> functions that integrate a series of steps into a convenient work
>> flow. The basic issues are: big data transfer (which will make repeated
>> 'interactive' calls impossibly slow); restricted functionality (side
>> effects like image creation are hard to deal / need to be made into
>> main effects with in the web context); and maybe a need to expose only
>> methods and parameters that are suitable for consumption by the
>> general populace rather than the full gamut of options relevant to
>> exploratory analyses or research statisticians. For these reasons, we
>> ended up writing 'wrapper' packages that contain a small number of
>> 'work flow' functions that take and return S4 classes, and that have
>> TypeInfo applied.
>> 
>> We did some initial investigation with vsn and it seemed quite
>> possible to add the necessary information. If you'd like an example,
>> point to a suitable function and I'll generate something for you.
>> 
>> Martin
>> 
>> Wolfgang Huber <huber at ebi.ac.uk> writes:
>> 
>>> Hi Martin, Robert, Seth
>>>
>>> what's up with "TypeInfo"? Seems that nobody has been using it so far in 
>>> Bioconductor:
>>>
>>> huber at lobito:~/madman/Rpacks$ find . -name DESCRIPTION \
>>>      -exec grep -H TypeInfo {} \;
>>> ./TypeInfo/DESCRIPTION:Package: TypeInfo
>>> ......and nothing else......
>>>
>>> and the last substantial svn commits are from 2005.
>>>
>>> Misha's person at ArrayExpress who wants to put Web-GUIs on 
>>> bioc-packages has asked me to provide the functions in vsn with 
>>> TypeInfo, is that something worthwhile?
>>>
>>> Best wishes
>>>    Wolfgang
>>>
>>> ------------------------------------------------------------------
>>> Wolfgang Huber  EBI/EMBL  Cambridge UK  http://www.ebi.ac.uk/huber
>>>
>>> _______________________________________________
>>> Bioc-devel at stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>> 
>
>
> -- 
> Best wishes
>  Wolfgang
>
> ------------------------------------------------------------------
> Wolfgang Huber  EBI/EMBL  Cambridge UK  http://www.ebi.ac.uk/huber
>

-- 
Martin Morgan
Bioconductor / Computational Biology
http://bioconductor.org



More information about the Bioc-devel mailing list