[Bioc-devel] plotPCA for BiocGenerics

Kevin Coombes kevin.r.coombes at gmail.com
Mon Oct 20 22:26:18 CEST 2014


Hi,

It depends.

The "traditional" R approach to these matters is that you (a) first 
perform some sort of an analysis and save the results as an object and 
then (b) show or plot what you got.  It is part (b) that tends to be 
really generic, and (in my opinion) should have really generic names -- 
like "show" or "plot" or "hist" or "image".

With PCA in particular, you usually have to perform a bunch of 
computations in order to get the principal components from some part of 
the data.  As I understand it now, these computations are performed 
along the way as part of the various "plotPCA" functions. The "R way" to 
do this would be something like
     pca <- performPCA(mySpecialObject)  # or as.PCA(mySpecialObject)
     plot(pca) # to get the scatter plot
This apporach has the user-friendly advantage that you can tweak the 
plot (in terms of colors, symbols, ranges, titles, etc) without having 
to recompute the principal components every time. (I often find myself 
re-plotting the same PCA several times, with different colors or symbols 
for different factrors associated with the samples.) In addition, you 
could then also do something like
     screeplot(pca)
to get a plot of the percentages of variance explained.

My own feeling is that if the object doesn't know what to do when you 
tell it to "plot" itself, then you haven't got the right abstraction.

You may still end up needing generics for each kind of computation you 
want to perform (PCA, RLE, MA, etc), which is why I suggested an 
"as.PCA" function.  After all, "as" is already pretty generic.  In the 
long run, l this would herlp BioConductor developers, since they 
wouldn't all have to reimplement the visualization code; they would just 
have to figure out how to convert their own object into a PCA or RLE or 
MA object.

And I know that this "plotWhatever" approach is used elsewhere in 
BioConductor, and it has always bothered me. It just seemed that a post 
suggesting a new generic function provided a reasonable opportunity to 
point out that there might be a better way.

Best,
   Kevin

PS: My own "ClassDicsovery" package, which is available from RForge via
**|install.packages("ClassDiscovery", 
repos="http://R-Forge.R-project.org")|**
includes a "SamplePCA" class that does something roughly similar to this 
for microarrays.

PPS (off-topic): The worst offender in base R -- because it doesn't use 
this "typical" approch -- is the "heatmap" function.  Having tried to 
teach this function in several different classes, I have come to the 
conclusion that it is basically unusable by mortals. And I think the 
problem is that it tries to combine too many steps -- clustering rows, 
clustering columns, scaling, visualization -- all in a single fiunction

On 10/20/2014 3:47 PM, davide risso wrote:
> Hi Kevin,
>
> I don't agree. In the case of EDASeq (as I suppose it is the case for 
> DESeq/DESeq2) plotting the principal components of the count matrix is 
> only one of possible exploratory plots (RLE plots, MA plots, etc.).
> So, in my opinion, it makes more sense from an object oriented point 
> of view to have multiple plotting methods for a single "RNA-seq 
> experiment" object.
>
> In addition, this is the same strategy adopted elsewhere in 
> Bioconductor, e.g., for the plotMA method.
>
> Just my two cents.
>
> Best,
> davide
>
> On Mon, Oct 20, 2014 at 11:30 AM, Kevin Coombes 
> <kevin.r.coombes at gmail.com <mailto:kevin.r.coombes at gmail.com>> wrote:
>
>     I understand that breaking code is a problem, and that is
>     admittedly the main reason not to immediately adopt my suggestion.
>
>     But as a purely logical exercise, creating a "PCA" object X or
>     something similar and using either
>         plot(X)
>     or
>         plot(as.PCA(mySpecialObject))
>     is a much more sensible use of object-oriented programming/design.
>     This requires no new generics (to write or to learn).
>
>     And you could use it to transition away from the current system by
>     convincing the various package maintainers to re-implement plotPCA
>     as follows:
>
>     plotPCA <- function(object, ...) {
>       plot(as.PCA(object), ...)
>     }
>
>     This would be relatively easy to eventually deprecate and teach
>     users to switch to the alternative.
>
>
>     On 10/20/2014 1:07 PM, Michael Love wrote:
>>     hi Kevin,
>>
>>     that would imply there is only one way to plot an object of a
>>     given class. Additionally, it would break a lot of code.​
>>
>>     best,
>>
>>     Mike
>>
>>     On Mon, Oct 20, 2014 at 12:50 PM, Kevin Coombes
>>     <kevin.r.coombes at gmail.com <mailto:kevin.r.coombes at gmail.com>> wrote:
>>
>>         But shouldn't they all really just be named "plot" for the
>>         appropriate objects?  In which case, there would already be a
>>         perfectly good generic....
>>
>>         On Oct 20, 2014 10:27 AM, "Michael Love"
>>         <michaelisaiahlove at gmail.com
>>         <mailto:michaelisaiahlove at gmail.com>> wrote:
>>
>>             I noticed that 'plotPCA' functions are defined in EDASeq,
>>             DESeq2, DESeq,
>>             affycoretools, Rcade, facopy, CopyNumber450k,
>>             netresponse, MAIT (maybe
>>             more).
>>
>>             Sounds like a case for BiocGenerics.
>>
>>             best,
>>
>>             Mike
>>
>>                     [[alternative HTML version deleted]]
>>
>>             _______________________________________________
>>             Bioc-devel at r-project.org
>>             <mailto:Bioc-devel at r-project.org> mailing list
>>             https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>
>
>
>
>     ------------------------------------------------------------------------
>     <http://www.avast.com/> 	
>
>     This email is free from viruses and malware because avast!
>     Antivirus <http://www.avast.com/> protection is active.
>
>
>
>
>
> -- 
> Davide Risso, PhD
> Post Doctoral Scholar
> Division of Biostatistics
> School of Public Health
> University of California, Berkeley
> 344 Li Ka Shing Center, #3370
> Berkeley, CA 94720-3370
> E-mail: davide.risso at berkeley.edu <mailto:davide.risso at berkeley.edu>



---
This email is free from viruses and malware because avast! Antivirus protection is active.


	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list