[Bioc-devel] plotPCA for BiocGenerics

Thomas Dybdal Pedersen thomasp85 at gmail.com
Tue Oct 21 09:54:35 CEST 2014

While I tend to agree with you that PCA is too big an operation to be hidden within a plotting function (MDS is an edge-case I would say), I can’t see how we can ever reach a point where there is only one generic plot function. In the case of PCA there is a number of different plot-types that can all lay claim to the plot function of a PCA class, for instance scoreplot, scatterplot matrix of all scores, biplot, screeplot, accumulated R^2 barplot, leverage vs. distance-to-model… (you get the idea). So while having some very well-thought out classes for very common result types such as PCA, this class would still need a lot of different plot methods such as plotScores, plotScree etc (or plot(…, type=‘score’), but I don’t find that very appealing). Expanding beyond PCA only muddles the water even more - there are very few interesting data structures that only have one visual representation to-rule-them-all…

just my 2c


> Date: Mon, 20 Oct 2014 18:50:48 -0400
> From: Kevin Coombes <kevin.r.coombes at gmail.com>
> Well. I have two responses to that.
> First, I think it would be a lot better/easier for users if (most) 
> developers could make use of the same plot function for "basic" classes 
> like PCA.
> Second, if you think the basic PCA plotting routine needs enhancements, 
> you still have two options.  On the one hand, you could (as you said) 
> try to convince the maintainer of PCA to add what you want.  If it's 
> generally valuable, then he'd probably do it --- and other classes that 
> use it would benefit.  On the other hand, if it really is a special 
> enhancement that only makes sense for your class, then you can derive a 
> class from the basic PCA class
>     setClass("mySpecialPCA", contains=c("PCA"), *other stuff here*)
>  and implement your own version of the "plot" generic for this class.  
> And you could tweak the "as.PCA" function so it returns an object of the 
> mySpecialPCA class. And the user could still just "plot" the result 
> without hacving to care what's happening behind the scenes.
> On 10/20/2014 5:59 PM, Michael Love wrote:
>> Ah, I see now. Personally, I don't think Bioconductor developers 
>> should have to agree on single plotting functions for basic classes 
>> like 'PCA' (because this logic applies equally to the situation of all 
>> Bioconductor developers agreeing on single MA-plot, a single 
>> variance-mean plot, etc). I think letting developers define their 
>> plotPCA makes contributions easier (I don't have to ask the owner of 
>> plot.PCA to incorporate something), even though it means we have a 
>> growing list of generics.
>> Still you have a good point about splitting computation and plotting. 
>> In practice, we subset the rows so PCA is not laborious.
>> On Mon, Oct 20, 2014 at 5:38 PM, Kevin Coombes 
>> <kevin.r.coombes at gmail.com <mailto:kevin.r.coombes at gmail.com>> wrote:
>>    Hi,
>>    I don't see how it needs more functions (as long as you can get
>>    developers to agree).  Suppose that someone can define a reusable
>>    PCA class.  This will contain a single "plot" generic function,
>>    defined once and reused by other classes. The existing "plotPCA"
>>    interface can also be implemented just once, in this class, as
>>        plotPCA <- function(object, ...) plot(as.PCA(object), ...)
>>    This can be exposed to users of your class through namespaces. 
>>    Then the only thing a developer needs to implement in his own
>>    class is the single "as.PCA" function.  And he/she would have
>>    already been rquired to implement this as part of the old
>>    "plotPCA" function.  So it can be extracted from that, and the
>>    developer doesn't have to reimplement the visualization code from
>>    the PCA class.
>>    Best,
>>      Kevin
>>    On 10/20/2014 5:15 PM, davide risso wrote:
>>>    Hi Kevin,
>>>    I see your points and I agree (especially for the specific case
>>>    of plotPCA that involves some non trivial computations).
>>>    On the other hand, having a wrapper function that starting from
>>>    the "raw" data gives you a pretty picture (with virtually zero
>>>    effort by the user) using a sensible choice of parameters that
>>>    are more or less OK for RNA-seq data is useful for practitioners
>>>    that just want to look for patterns in the data.
>>>    I guess it would be the same to have a PCA method for each of the
>>>    objects and then using the plot method on those new objects, but
>>>    that would just create a lot more objects and functions than the
>>>    current approach (like Mike was saying).
>>>    Your "as.pca" or "performPCA" approach would be definitely better
>>>    if all the different methods would create objects of the *same*
>>>    PCA class, but since we are talking about different packages, I
>>>    don't know how easy it would be to coordinate. But perhaps this
>>>    is the way we should go.
>>>    Best,
>>>    davide
>>>    On Mon, Oct 20, 2014 at 1:26 PM, Kevin Coombes
>>>    <kevin.r.coombes at gmail.com <mailto:kevin.r.coombes at gmail.com>> wrote:
>>>        Hi,
>>>        It depends.
>>>        The "traditional" R approach to these matters is that you (a)
>>>        first perform some sort of an analysis and save the results
>>>        as an object and then (b) show or plot what you got.  It is
>>>        part (b) that tends to be really generic, and (in my opinion)
>>>        should have really generic names -- like "show" or "plot" or
>>>        "hist" or "image".
>>>        With PCA in particular, you usually have to perform a bunch
>>>        of computations in order to get the principal components from
>>>        some part of the data.  As I understand it now, these
>>>        computations are performed along the way as part of the
>>>        various "plotPCA" functions.  The "R way" to do this would be
>>>        something like
>>>            pca <- performPCA(mySpecialObject)  # or
>>>        as.PCA(mySpecialObject)
>>>            plot(pca) # to get the scatter plot
>>>        This apporach has the user-friendly advantage that you can
>>>        tweak the plot (in terms of colors, symbols, ranges, titles,
>>>        etc) without having to recompute the principal components
>>>        every time. (I often find myself re-plotting the same PCA
>>>        several times, with different colors or symbols for different
>>>        factrors associated with the samples.) In addition, you could
>>>        then also do something like
>>>            screeplot(pca)
>>>        to get a plot of the percentages of variance explained.
>>>        My own feeling is that if the object doesn't know what to do
>>>        when you tell it to "plot" itself, then you haven't got the
>>>        right abstraction.
>>>        You may still end up needing generics for each kind of
>>>        computation you want to perform (PCA, RLE, MA, etc), which is
>>>        why I suggested an "as.PCA" function.  After all, "as" is
>>>        already pretty generic.  In the long run, l this would herlp
>>>        BioConductor developers, since they wouldn't all have to
>>>        reimplement the visualization code; they would just have to
>>>        figure out how to convert their own object into a PCA or RLE
>>>        or MA object.
>>>        And I know that this "plotWhatever" approach is used
>>>        elsewhere in BioConductor, and it has always bothered me. It
>>>        just seemed that a post suggesting a new generic function
>>>        provided a reasonable opportunity to point out that there
>>>        might be a better way.
>>>        Best,
>>>          Kevin
>>>        PS: My own "ClassDicsovery" package, which is available from
>>>        RForge via
>>>        **|install.packages("ClassDiscovery",
>>>        repos="http://R-Forge.R-project.org"
>>>        <http://R-Forge.R-project.org>)|**
>>>        includes a "SamplePCA" class that does something roughly
>>>        similar to this for microarrays.
>>>        PPS (off-topic): The worst offender in base R -- because it
>>>        doesn't use this "typical" approch -- is the "heatmap"
>>>        function.  Having tried to teach this function in several
>>>        different classes, I have come to the conclusion that it is
>>>        basically unusable by mortals.  And I think the problem is
>>>        that it tries to combine too many steps -- clustering rows,
>>>        clustering columns, scaling, visualization -- all in a single
>>>        fiunction
>>>        On 10/20/2014 3:47 PM, davide risso wrote:
>>>>        Hi Kevin,
>>>>        I don't agree. In the case of EDASeq (as I suppose it is the
>>>>        case for DESeq/DESeq2) plotting the principal components of
>>>>        the count matrix is only one of possible exploratory plots
>>>>        (RLE plots, MA plots, etc.).
>>>>        So, in my opinion, it makes more sense from an object
>>>>        oriented point of view to have multiple plotting methods for
>>>>        a single "RNA-seq experiment" object.
>>>>        In addition, this is the same strategy adopted elsewhere in
>>>>        Bioconductor, e.g., for the plotMA method.
>>>>        Just my two cents.
>>>>        Best,
>>>>        davide
>>>>        On Mon, Oct 20, 2014 at 11:30 AM, Kevin Coombes
>>>>        <kevin.r.coombes at gmail.com
>>>>        <mailto:kevin.r.coombes at gmail.com>> wrote:
>>>>            I understand that breaking code is a problem, and that
>>>>            is admittedly the main reason not to immediately adopt
>>>>            my suggestion.
>>>>            But as a purely logical exercise, creating a "PCA"
>>>>            object X or something similar and using either
>>>>                plot(X)
>>>>            or
>>>>            plot(as.PCA(mySpecialObject))
>>>>            is a much more sensible use of object-oriented
>>>>            programming/design. This requires no new generics (to
>>>>            write or to learn).
>>>>            And you could use it to transition away from the current
>>>>            system by convincing the various package maintainers to
>>>>            re-implement plotPCA as follows:
>>>>            plotPCA <- function(object, ...) {
>>>>              plot(as.PCA(object), ...)
>>>>            }
>>>>            This would be relatively easy to eventually deprecate
>>>>            and teach users to switch to the alternative.
>>>>            On 10/20/2014 1:07 PM, Michael Love wrote:
>>>>>            hi Kevin,
>>>>>            that would imply there is only one way to plot an
>>>>>            object of a given class. Additionally, it would break a
>>>>>            lot of code.?
>>>>>            best,
>>>>>            Mike
>>>>>            On Mon, Oct 20, 2014 at 12:50 PM, Kevin Coombes
>>>>>            <kevin.r.coombes at gmail.com
>>>>>            <mailto:kevin.r.coombes at gmail.com>> wrote:
>>>>>                But shouldn't they all really just be named "plot"
>>>>>                for the appropriate objects?  In which case, there
>>>>>                would already be a perfectly good generic....
>>>>>                On Oct 20, 2014 10:27 AM, "Michael Love"
>>>>>                <michaelisaiahlove at gmail.com
>>>>>                <mailto:michaelisaiahlove at gmail.com>> wrote:
>>>>>                    I noticed that 'plotPCA' functions are defined
>>>>>                    in EDASeq, DESeq2, DESeq,
>>>>>                    affycoretools, Rcade, facopy, CopyNumber450k,
>>>>>                    netresponse, MAIT (maybe
>>>>>                    more).
>>>>>                    Sounds like a case for BiocGenerics.
>>>>>                    best,
>>>>>                    Mike
>>>>>                    [[alternative HTML version deleted]]
>>>>>                    _______________________________________________
>>>>>                    Bioc-devel at r-project.org
>>>>>                    <mailto:Bioc-devel at r-project.org> mailing list
>>>>>                    https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>            ------------------------------------------------------------------------
>>>>            <http://www.avast.com/> 	
>>>>            This email is free from viruses and malware because
>>>>            avast! Antivirus <http://www.avast.com/> protection is
>>>>            active.
>>>>        -- 
>>>>        Davide Risso, PhD
>>>>        Post Doctoral Scholar
>>>>        Division of Biostatistics
>>>>        School of Public Health
>>>>        University of California, Berkeley
>>>>        344 Li Ka Shing Center, #3370
>>>>        Berkeley, CA 94720-3370
>>>>        E-mail: davide.risso at berkeley.edu
>>>>        <mailto:davide.risso at berkeley.edu>
>>>        ------------------------------------------------------------------------
>>>        <http://www.avast.com/> 	
>>>        This email is free from viruses and malware because avast!
>>>        Antivirus <http://www.avast.com/> protection is active.
>>>    -- 
>>>    Davide Risso, PhD
>>>    Post Doctoral Scholar
>>>    Division of Biostatistics
>>>    School of Public Health
>>>    University of California, Berkeley
>>>    344 Li Ka Shing Center, #3370
>>>    Berkeley, CA 94720-3370
>>>    E-mail: davide.risso at berkeley.edu <mailto:davide.risso at berkeley.edu>
>>    ------------------------------------------------------------------------
>>    <http://www.avast.com/> 	
>>    This email is free from viruses and malware because avast!
>>    Antivirus <http://www.avast.com/> protection is active.
> ---
> This email is free from viruses and malware because avast! Antivirus protection is active.
> 	[[alternative HTML version deleted]]
> ------------------------------
> _______________________________________________
> Bioc-devel mailing list
> Bioc-devel at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> End of Bioc-devel Digest, Vol 127, Issue 43
> *******************************************

More information about the Bioc-devel mailing list