[Bioc-devel] plotPCA for BiocGenerics

Kevin Coombes kevin.r.coombes at gmail.com
Tue Oct 21 00:50:48 CEST 2014


Well. I have two responses to that.

First, I think it would be a lot better/easier for users if (most) 
developers could make use of the same plot function for "basic" classes 
like PCA.

Second, if you think the basic PCA plotting routine needs enhancements, 
you still have two options.  On the one hand, you could (as you said) 
try to convince the maintainer of PCA to add what you want.  If it's 
generally valuable, then he'd probably do it --- and other classes that 
use it would benefit.  On the other hand, if it really is a special 
enhancement that only makes sense for your class, then you can derive a 
class from the basic PCA class
     setClass("mySpecialPCA", contains=c("PCA"), *other stuff here*)
  and implement your own version of the "plot" generic for this class.  
And you could tweak the "as.PCA" function so it returns an object of the 
mySpecialPCA class. And the user could still just "plot" the result 
without hacving to care what's happening behind the scenes.

On 10/20/2014 5:59 PM, Michael Love wrote:
> Ah, I see now. Personally, I don't think Bioconductor developers 
> should have to agree on single plotting functions for basic classes 
> like 'PCA' (because this logic applies equally to the situation of all 
> Bioconductor developers agreeing on single MA-plot, a single 
> variance-mean plot, etc). I think letting developers define their 
> plotPCA makes contributions easier (I don't have to ask the owner of 
> plot.PCA to incorporate something), even though it means we have a 
> growing list of generics.
>
> Still you have a good point about splitting computation and plotting. 
> In practice, we subset the rows so PCA is not laborious.
>
>
> On Mon, Oct 20, 2014 at 5:38 PM, Kevin Coombes 
> <kevin.r.coombes at gmail.com <mailto:kevin.r.coombes at gmail.com>> wrote:
>
>     Hi,
>
>     I don't see how it needs more functions (as long as you can get
>     developers to agree).  Suppose that someone can define a reusable
>     PCA class.  This will contain a single "plot" generic function,
>     defined once and reused by other classes. The existing "plotPCA"
>     interface can also be implemented just once, in this class, as
>
>         plotPCA <- function(object, ...) plot(as.PCA(object), ...)
>
>     This can be exposed to users of your class through namespaces. 
>     Then the only thing a developer needs to implement in his own
>     class is the single "as.PCA" function.  And he/she would have
>     already been rquired to implement this as part of the old
>     "plotPCA" function.  So it can be extracted from that, and the
>     developer doesn't have to reimplement the visualization code from
>     the PCA class.
>
>     Best,
>       Kevin
>
>
>     On 10/20/2014 5:15 PM, davide risso wrote:
>>     Hi Kevin,
>>
>>     I see your points and I agree (especially for the specific case
>>     of plotPCA that involves some non trivial computations).
>>
>>     On the other hand, having a wrapper function that starting from
>>     the "raw" data gives you a pretty picture (with virtually zero
>>     effort by the user) using a sensible choice of parameters that
>>     are more or less OK for RNA-seq data is useful for practitioners
>>     that just want to look for patterns in the data.
>>
>>     I guess it would be the same to have a PCA method for each of the
>>     objects and then using the plot method on those new objects, but
>>     that would just create a lot more objects and functions than the
>>     current approach (like Mike was saying).
>>
>>     Your "as.pca" or "performPCA" approach would be definitely better
>>     if all the different methods would create objects of the *same*
>>     PCA class, but since we are talking about different packages, I
>>     don't know how easy it would be to coordinate. But perhaps this
>>     is the way we should go.
>>
>>     Best,
>>     davide
>>
>>
>>
>>     On Mon, Oct 20, 2014 at 1:26 PM, Kevin Coombes
>>     <kevin.r.coombes at gmail.com <mailto:kevin.r.coombes at gmail.com>> wrote:
>>
>>         Hi,
>>
>>         It depends.
>>
>>         The "traditional" R approach to these matters is that you (a)
>>         first perform some sort of an analysis and save the results
>>         as an object and then (b) show or plot what you got.  It is
>>         part (b) that tends to be really generic, and (in my opinion)
>>         should have really generic names -- like "show" or "plot" or
>>         "hist" or "image".
>>
>>         With PCA in particular, you usually have to perform a bunch
>>         of computations in order to get the principal components from
>>         some part of the data.  As I understand it now, these
>>         computations are performed along the way as part of the
>>         various "plotPCA" functions.  The "R way" to do this would be
>>         something like
>>             pca <- performPCA(mySpecialObject)  # or
>>         as.PCA(mySpecialObject)
>>             plot(pca) # to get the scatter plot
>>         This apporach has the user-friendly advantage that you can
>>         tweak the plot (in terms of colors, symbols, ranges, titles,
>>         etc) without having to recompute the principal components
>>         every time. (I often find myself re-plotting the same PCA
>>         several times, with different colors or symbols for different
>>         factrors associated with the samples.) In addition, you could
>>         then also do something like
>>             screeplot(pca)
>>         to get a plot of the percentages of variance explained.
>>
>>         My own feeling is that if the object doesn't know what to do
>>         when you tell it to "plot" itself, then you haven't got the
>>         right abstraction.
>>
>>         You may still end up needing generics for each kind of
>>         computation you want to perform (PCA, RLE, MA, etc), which is
>>         why I suggested an "as.PCA" function.  After all, "as" is
>>         already pretty generic.  In the long run, l this would herlp
>>         BioConductor developers, since they wouldn't all have to
>>         reimplement the visualization code; they would just have to
>>         figure out how to convert their own object into a PCA or RLE
>>         or MA object.
>>
>>         And I know that this "plotWhatever" approach is used
>>         elsewhere in BioConductor, and it has always bothered me. It
>>         just seemed that a post suggesting a new generic function
>>         provided a reasonable opportunity to point out that there
>>         might be a better way.
>>
>>         Best,
>>           Kevin
>>
>>         PS: My own "ClassDicsovery" package, which is available from
>>         RForge via
>>         **|install.packages("ClassDiscovery",
>>         repos="http://R-Forge.R-project.org"
>>         <http://R-Forge.R-project.org>)|**
>>         includes a "SamplePCA" class that does something roughly
>>         similar to this for microarrays.
>>
>>         PPS (off-topic): The worst offender in base R -- because it
>>         doesn't use this "typical" approch -- is the "heatmap"
>>         function.  Having tried to teach this function in several
>>         different classes, I have come to the conclusion that it is
>>         basically unusable by mortals.  And I think the problem is
>>         that it tries to combine too many steps -- clustering rows,
>>         clustering columns, scaling, visualization -- all in a single
>>         fiunction
>>
>>
>>         On 10/20/2014 3:47 PM, davide risso wrote:
>>>         Hi Kevin,
>>>
>>>         I don't agree. In the case of EDASeq (as I suppose it is the
>>>         case for DESeq/DESeq2) plotting the principal components of
>>>         the count matrix is only one of possible exploratory plots
>>>         (RLE plots, MA plots, etc.).
>>>         So, in my opinion, it makes more sense from an object
>>>         oriented point of view to have multiple plotting methods for
>>>         a single "RNA-seq experiment" object.
>>>
>>>         In addition, this is the same strategy adopted elsewhere in
>>>         Bioconductor, e.g., for the plotMA method.
>>>
>>>         Just my two cents.
>>>
>>>         Best,
>>>         davide
>>>
>>>         On Mon, Oct 20, 2014 at 11:30 AM, Kevin Coombes
>>>         <kevin.r.coombes at gmail.com
>>>         <mailto:kevin.r.coombes at gmail.com>> wrote:
>>>
>>>             I understand that breaking code is a problem, and that
>>>             is admittedly the main reason not to immediately adopt
>>>             my suggestion.
>>>
>>>             But as a purely logical exercise, creating a "PCA"
>>>             object X or something similar and using either
>>>                 plot(X)
>>>             or
>>>             plot(as.PCA(mySpecialObject))
>>>             is a much more sensible use of object-oriented
>>>             programming/design. This requires no new generics (to
>>>             write or to learn).
>>>
>>>             And you could use it to transition away from the current
>>>             system by convincing the various package maintainers to
>>>             re-implement plotPCA as follows:
>>>
>>>             plotPCA <- function(object, ...) {
>>>               plot(as.PCA(object), ...)
>>>             }
>>>
>>>             This would be relatively easy to eventually deprecate
>>>             and teach users to switch to the alternative.
>>>
>>>
>>>             On 10/20/2014 1:07 PM, Michael Love wrote:
>>>>             hi Kevin,
>>>>
>>>>             that would imply there is only one way to plot an
>>>>             object of a given class. Additionally, it would break a
>>>>             lot of code.​
>>>>
>>>>             best,
>>>>
>>>>             Mike
>>>>
>>>>             On Mon, Oct 20, 2014 at 12:50 PM, Kevin Coombes
>>>>             <kevin.r.coombes at gmail.com
>>>>             <mailto:kevin.r.coombes at gmail.com>> wrote:
>>>>
>>>>                 But shouldn't they all really just be named "plot"
>>>>                 for the appropriate objects?  In which case, there
>>>>                 would already be a perfectly good generic....
>>>>
>>>>                 On Oct 20, 2014 10:27 AM, "Michael Love"
>>>>                 <michaelisaiahlove at gmail.com
>>>>                 <mailto:michaelisaiahlove at gmail.com>> wrote:
>>>>
>>>>                     I noticed that 'plotPCA' functions are defined
>>>>                     in EDASeq, DESeq2, DESeq,
>>>>                     affycoretools, Rcade, facopy, CopyNumber450k,
>>>>                     netresponse, MAIT (maybe
>>>>                     more).
>>>>
>>>>                     Sounds like a case for BiocGenerics.
>>>>
>>>>                     best,
>>>>
>>>>                     Mike
>>>>
>>>>                     [[alternative HTML version deleted]]
>>>>
>>>>                     _______________________________________________
>>>>                     Bioc-devel at r-project.org
>>>>                     <mailto:Bioc-devel at r-project.org> mailing list
>>>>                     https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>>>
>>>
>>>
>>>
>>>             ------------------------------------------------------------------------
>>>             <http://www.avast.com/> 	
>>>
>>>             This email is free from viruses and malware because
>>>             avast! Antivirus <http://www.avast.com/> protection is
>>>             active.
>>>
>>>
>>>
>>>
>>>
>>>         -- 
>>>         Davide Risso, PhD
>>>         Post Doctoral Scholar
>>>         Division of Biostatistics
>>>         School of Public Health
>>>         University of California, Berkeley
>>>         344 Li Ka Shing Center, #3370
>>>         Berkeley, CA 94720-3370
>>>         E-mail: davide.risso at berkeley.edu
>>>         <mailto:davide.risso at berkeley.edu>
>>
>>
>>
>>         ------------------------------------------------------------------------
>>         <http://www.avast.com/> 	
>>
>>         This email is free from viruses and malware because avast!
>>         Antivirus <http://www.avast.com/> protection is active.
>>
>>
>>
>>
>>
>>     -- 
>>     Davide Risso, PhD
>>     Post Doctoral Scholar
>>     Division of Biostatistics
>>     School of Public Health
>>     University of California, Berkeley
>>     344 Li Ka Shing Center, #3370
>>     Berkeley, CA 94720-3370
>>     E-mail: davide.risso at berkeley.edu <mailto:davide.risso at berkeley.edu>
>
>
>
>     ------------------------------------------------------------------------
>     <http://www.avast.com/> 	
>
>     This email is free from viruses and malware because avast!
>     Antivirus <http://www.avast.com/> protection is active.
>
>
>



---
This email is free from viruses and malware because avast! Antivirus protection is active.


	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list