[Bioc-devel] plotPCA for BiocGenerics
Thomas Dybdal Pedersen
thomasp85 at gmail.com
Tue Oct 21 09:54:35 CEST 2014
While I tend to agree with you that PCA is too big an operation to be hidden within a plotting function (MDS is an edge-case I would say), I can’t see how we can ever reach a point where there is only one generic plot function. In the case of PCA there is a number of different plot-types that can all lay claim to the plot function of a PCA class, for instance scoreplot, scatterplot matrix of all scores, biplot, screeplot, accumulated R^2 barplot, leverage vs. distance-to-model… (you get the idea). So while having some very well-thought out classes for very common result types such as PCA, this class would still need a lot of different plot methods such as plotScores, plotScree etc (or plot(…, type=‘score’), but I don’t find that very appealing). Expanding beyond PCA only muddles the water even more - there are very few interesting data structures that only have one visual representation to-rule-them-all…
just my 2c
best
Thomas
> Date: Mon, 20 Oct 2014 18:50:48 -0400
> From: Kevin Coombes <kevin.r.coombes at gmail.com>
>
> Well. I have two responses to that.
>
> First, I think it would be a lot better/easier for users if (most)
> developers could make use of the same plot function for "basic" classes
> like PCA.
>
> Second, if you think the basic PCA plotting routine needs enhancements,
> you still have two options. On the one hand, you could (as you said)
> try to convince the maintainer of PCA to add what you want. If it's
> generally valuable, then he'd probably do it --- and other classes that
> use it would benefit. On the other hand, if it really is a special
> enhancement that only makes sense for your class, then you can derive a
> class from the basic PCA class
> setClass("mySpecialPCA", contains=c("PCA"), *other stuff here*)
> and implement your own version of the "plot" generic for this class.
> And you could tweak the "as.PCA" function so it returns an object of the
> mySpecialPCA class. And the user could still just "plot" the result
> without hacving to care what's happening behind the scenes.
>
> On 10/20/2014 5:59 PM, Michael Love wrote:
>> Ah, I see now. Personally, I don't think Bioconductor developers
>> should have to agree on single plotting functions for basic classes
>> like 'PCA' (because this logic applies equally to the situation of all
>> Bioconductor developers agreeing on single MA-plot, a single
>> variance-mean plot, etc). I think letting developers define their
>> plotPCA makes contributions easier (I don't have to ask the owner of
>> plot.PCA to incorporate something), even though it means we have a
>> growing list of generics.
>>
>> Still you have a good point about splitting computation and plotting.
>> In practice, we subset the rows so PCA is not laborious.
>>
>>
>> On Mon, Oct 20, 2014 at 5:38 PM, Kevin Coombes
>> <kevin.r.coombes at gmail.com <mailto:kevin.r.coombes at gmail.com>> wrote:
>>
>> Hi,
>>
>> I don't see how it needs more functions (as long as you can get
>> developers to agree). Suppose that someone can define a reusable
>> PCA class. This will contain a single "plot" generic function,
>> defined once and reused by other classes. The existing "plotPCA"
>> interface can also be implemented just once, in this class, as
>>
>> plotPCA <- function(object, ...) plot(as.PCA(object), ...)
>>
>> This can be exposed to users of your class through namespaces.
>> Then the only thing a developer needs to implement in his own
>> class is the single "as.PCA" function. And he/she would have
>> already been rquired to implement this as part of the old
>> "plotPCA" function. So it can be extracted from that, and the
>> developer doesn't have to reimplement the visualization code from
>> the PCA class.
>>
>> Best,
>> Kevin
>>
>>
>> On 10/20/2014 5:15 PM, davide risso wrote:
>>> Hi Kevin,
>>>
>>> I see your points and I agree (especially for the specific case
>>> of plotPCA that involves some non trivial computations).
>>>
>>> On the other hand, having a wrapper function that starting from
>>> the "raw" data gives you a pretty picture (with virtually zero
>>> effort by the user) using a sensible choice of parameters that
>>> are more or less OK for RNA-seq data is useful for practitioners
>>> that just want to look for patterns in the data.
>>>
>>> I guess it would be the same to have a PCA method for each of the
>>> objects and then using the plot method on those new objects, but
>>> that would just create a lot more objects and functions than the
>>> current approach (like Mike was saying).
>>>
>>> Your "as.pca" or "performPCA" approach would be definitely better
>>> if all the different methods would create objects of the *same*
>>> PCA class, but since we are talking about different packages, I
>>> don't know how easy it would be to coordinate. But perhaps this
>>> is the way we should go.
>>>
>>> Best,
>>> davide
>>>
>>>
>>>
>>> On Mon, Oct 20, 2014 at 1:26 PM, Kevin Coombes
>>> <kevin.r.coombes at gmail.com <mailto:kevin.r.coombes at gmail.com>> wrote:
>>>
>>> Hi,
>>>
>>> It depends.
>>>
>>> The "traditional" R approach to these matters is that you (a)
>>> first perform some sort of an analysis and save the results
>>> as an object and then (b) show or plot what you got. It is
>>> part (b) that tends to be really generic, and (in my opinion)
>>> should have really generic names -- like "show" or "plot" or
>>> "hist" or "image".
>>>
>>> With PCA in particular, you usually have to perform a bunch
>>> of computations in order to get the principal components from
>>> some part of the data. As I understand it now, these
>>> computations are performed along the way as part of the
>>> various "plotPCA" functions. The "R way" to do this would be
>>> something like
>>> pca <- performPCA(mySpecialObject) # or
>>> as.PCA(mySpecialObject)
>>> plot(pca) # to get the scatter plot
>>> This apporach has the user-friendly advantage that you can
>>> tweak the plot (in terms of colors, symbols, ranges, titles,
>>> etc) without having to recompute the principal components
>>> every time. (I often find myself re-plotting the same PCA
>>> several times, with different colors or symbols for different
>>> factrors associated with the samples.) In addition, you could
>>> then also do something like
>>> screeplot(pca)
>>> to get a plot of the percentages of variance explained.
>>>
>>> My own feeling is that if the object doesn't know what to do
>>> when you tell it to "plot" itself, then you haven't got the
>>> right abstraction.
>>>
>>> You may still end up needing generics for each kind of
>>> computation you want to perform (PCA, RLE, MA, etc), which is
>>> why I suggested an "as.PCA" function. After all, "as" is
>>> already pretty generic. In the long run, l this would herlp
>>> BioConductor developers, since they wouldn't all have to
>>> reimplement the visualization code; they would just have to
>>> figure out how to convert their own object into a PCA or RLE
>>> or MA object.
>>>
>>> And I know that this "plotWhatever" approach is used
>>> elsewhere in BioConductor, and it has always bothered me. It
>>> just seemed that a post suggesting a new generic function
>>> provided a reasonable opportunity to point out that there
>>> might be a better way.
>>>
>>> Best,
>>> Kevin
>>>
>>> PS: My own "ClassDicsovery" package, which is available from
>>> RForge via
>>> **|install.packages("ClassDiscovery",
>>> repos="http://R-Forge.R-project.org"
>>> <http://R-Forge.R-project.org>)|**
>>> includes a "SamplePCA" class that does something roughly
>>> similar to this for microarrays.
>>>
>>> PPS (off-topic): The worst offender in base R -- because it
>>> doesn't use this "typical" approch -- is the "heatmap"
>>> function. Having tried to teach this function in several
>>> different classes, I have come to the conclusion that it is
>>> basically unusable by mortals. And I think the problem is
>>> that it tries to combine too many steps -- clustering rows,
>>> clustering columns, scaling, visualization -- all in a single
>>> fiunction
>>>
>>>
>>> On 10/20/2014 3:47 PM, davide risso wrote:
>>>> Hi Kevin,
>>>>
>>>> I don't agree. In the case of EDASeq (as I suppose it is the
>>>> case for DESeq/DESeq2) plotting the principal components of
>>>> the count matrix is only one of possible exploratory plots
>>>> (RLE plots, MA plots, etc.).
>>>> So, in my opinion, it makes more sense from an object
>>>> oriented point of view to have multiple plotting methods for
>>>> a single "RNA-seq experiment" object.
>>>>
>>>> In addition, this is the same strategy adopted elsewhere in
>>>> Bioconductor, e.g., for the plotMA method.
>>>>
>>>> Just my two cents.
>>>>
>>>> Best,
>>>> davide
>>>>
>>>> On Mon, Oct 20, 2014 at 11:30 AM, Kevin Coombes
>>>> <kevin.r.coombes at gmail.com
>>>> <mailto:kevin.r.coombes at gmail.com>> wrote:
>>>>
>>>> I understand that breaking code is a problem, and that
>>>> is admittedly the main reason not to immediately adopt
>>>> my suggestion.
>>>>
>>>> But as a purely logical exercise, creating a "PCA"
>>>> object X or something similar and using either
>>>> plot(X)
>>>> or
>>>> plot(as.PCA(mySpecialObject))
>>>> is a much more sensible use of object-oriented
>>>> programming/design. This requires no new generics (to
>>>> write or to learn).
>>>>
>>>> And you could use it to transition away from the current
>>>> system by convincing the various package maintainers to
>>>> re-implement plotPCA as follows:
>>>>
>>>> plotPCA <- function(object, ...) {
>>>> plot(as.PCA(object), ...)
>>>> }
>>>>
>>>> This would be relatively easy to eventually deprecate
>>>> and teach users to switch to the alternative.
>>>>
>>>>
>>>> On 10/20/2014 1:07 PM, Michael Love wrote:
>>>>> hi Kevin,
>>>>>
>>>>> that would imply there is only one way to plot an
>>>>> object of a given class. Additionally, it would break a
>>>>> lot of code.?
>>>>>
>>>>> best,
>>>>>
>>>>> Mike
>>>>>
>>>>> On Mon, Oct 20, 2014 at 12:50 PM, Kevin Coombes
>>>>> <kevin.r.coombes at gmail.com
>>>>> <mailto:kevin.r.coombes at gmail.com>> wrote:
>>>>>
>>>>> But shouldn't they all really just be named "plot"
>>>>> for the appropriate objects? In which case, there
>>>>> would already be a perfectly good generic....
>>>>>
>>>>> On Oct 20, 2014 10:27 AM, "Michael Love"
>>>>> <michaelisaiahlove at gmail.com
>>>>> <mailto:michaelisaiahlove at gmail.com>> wrote:
>>>>>
>>>>> I noticed that 'plotPCA' functions are defined
>>>>> in EDASeq, DESeq2, DESeq,
>>>>> affycoretools, Rcade, facopy, CopyNumber450k,
>>>>> netresponse, MAIT (maybe
>>>>> more).
>>>>>
>>>>> Sounds like a case for BiocGenerics.
>>>>>
>>>>> best,
>>>>>
>>>>> Mike
>>>>>
>>>>> [[alternative HTML version deleted]]
>>>>>
>>>>> _______________________________________________
>>>>> Bioc-devel at r-project.org
>>>>> <mailto:Bioc-devel at r-project.org> mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------
>>>> <http://www.avast.com/>
>>>>
>>>> This email is free from viruses and malware because
>>>> avast! Antivirus <http://www.avast.com/> protection is
>>>> active.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Davide Risso, PhD
>>>> Post Doctoral Scholar
>>>> Division of Biostatistics
>>>> School of Public Health
>>>> University of California, Berkeley
>>>> 344 Li Ka Shing Center, #3370
>>>> Berkeley, CA 94720-3370
>>>> E-mail: davide.risso at berkeley.edu
>>>> <mailto:davide.risso at berkeley.edu>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------
>>> <http://www.avast.com/>
>>>
>>> This email is free from viruses and malware because avast!
>>> Antivirus <http://www.avast.com/> protection is active.
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Davide Risso, PhD
>>> Post Doctoral Scholar
>>> Division of Biostatistics
>>> School of Public Health
>>> University of California, Berkeley
>>> 344 Li Ka Shing Center, #3370
>>> Berkeley, CA 94720-3370
>>> E-mail: davide.risso at berkeley.edu <mailto:davide.risso at berkeley.edu>
>>
>>
>>
>> ------------------------------------------------------------------------
>> <http://www.avast.com/>
>>
>> This email is free from viruses and malware because avast!
>> Antivirus <http://www.avast.com/> protection is active.
>>
>>
>>
>
>
>
> ---
> This email is free from viruses and malware because avast! Antivirus protection is active.
>
>
> [[alternative HTML version deleted]]
>
>
>
> ------------------------------
>
> _______________________________________________
> Bioc-devel mailing list
> Bioc-devel at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
> End of Bioc-devel Digest, Vol 127, Issue 43
> *******************************************
More information about the Bioc-devel
mailing list