[Bioc-devel] Adding additional validity checks when calling setter methods of inherited S4Vectors::DataFrame class

Pariksheet Nanda p@n79 @end|ng |rom p|tt@edu
Sun Jan 5 02:45:33 CET 2025


Hi Lluís,

 > So your setter method `[<-`  could have this call inside it, and this
 > would prevent the creation of an invalid object.

Thank you for solving my issue!  I wasn't successful trying something 
similar earlier because I was passing no arguments to callNextMethod(), 
but retrying your suggestion helped me figure out I need to explicitly 
set my arguments to call the setter.  My wrapper below first makes a 
copy to use for the call to the next appropriate class and only assigns 
the result if validObject() does not throw an error:

setMethod(
     "[<-",
     "CategoriesDataFrame",
     function(x, i, j, ..., value) {
         y <- x
         y <- callNextMethod(y, i, j, ..., value = value)
         validObject(y)
         x <- y
     })
## NB: Above signature per output of:
## selectMethod("[<-", "DFrame")


 > Why do you use a new S4 class for parameter input? Your comments
 > on what made you try this route would be helpful, thanks!

The whole point of using S4 classes is to have checks to ensure object 
validity and if not to fail fast / fail early.

The reason for choosing a DataFrame representation in particular is 
there are 19 sets of parameters (Coexpression, ...,ToppCell) each with 6 
values (PValue, MinGenes, ..., Enabled) to potentially edit which can 
get tedious.  Each parameter set shares the same vector (PValue, 
MinGenes, ..., Enabled) so using DataFrame semantics is intuitive to a 
user to set multiple parameters at once.  Especially now with the DFplyr 
package, users can use dplyr verbs to fine tune these parameters.

Pariksheet


On 1/4/25 5:51 AM, Lluís Revilla wrote:
> [You don't often get email from lluis.revilla using gmail.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> 
> Hi Pariksheet,
> 
> You can check if the object is valid inside your methods.
> I use something like:
> 
> method <- function(...) {
>      # Input processing
>      validObject(object)
>      object
> }
> 
> So your setter method `[<-`  could have this call inside it, and this
> would prevent the creation of an invalid object.
> In this case, even keeping the class and without adding the validity
> check on the [<- method, the validation of the input data could be
> done inside the function making the requests to the website.
> 
> I am part of a working group trying to make it easier for people to
> use classes on Bioconductor.
> I would like to write a guide about when to use classes and how to
> choose between the different object oriented paradigms existing on R
> (including the new S7) and when to use it. Why do you use a new S4
> class  for parameter input? Your comments on what made you try this
> route would be helpful, thanks!
> 
> Best,
> 
> Lluís
> 
> 
> On Fri, 3 Jan 2025 at 01:17, Nanda, Pariksheet via Bioc-devel
> <bioc-devel using r-project.org> wrote:
>>
>> Hello S4-class boffins,
>>
>> How bad of an idea it is to inherit from a S4Vectors::DataFrame / DFrame S4 class to impose additional constraints on it? I'm writing a light-weight wrapper around the ToppGene web API (github com/ImmuSystems-Lab/toppgene/blob/main/R/categories.R) [1] and while it's functional, it currently only runs JSON web queries using default values. To pass non-default values, each category queried needs to have associated parameters within some boundaries. While it's intuitive for a Biconductor user to see and manipulate DataFrames containing the parameters, the trouble I'm seeing is that validObject() is of course not automagically run on the dispatched S4Vectors setters and I don't know how to inject validObject() into the process without rewriting / repeating a lot of the S4Vectors method implementation internals; callNextMethod() does not seem like it would work? Currently, the only time validObject() is called is when invoking the constructor, CategoriesDataFrame() and because code is worth a thousand words, see below the "---" line for what I mean.
>>
>> Pariksheet
>>
>> [1] Yes, I'm trying to avoid the GitHub URL from being mangled into illegible horrors by removing the protocol prefix and the dot before the domain, so you'll have to add at least the latter back in to visit the GitHub page.
>>
>> ---
>>
>>> devtools::load_all()
>> [...]
>>
>>> cats <- CategoriesDataFrame()
>>
>>> cats
>> ToppGene CategoriesDataFrame with 19 enabled categories
>>                                PValue MinGenes MaxGenes MaxResults Correction Enabled
>> Coexpression                    0.05        2     1500         50        FDR    TRUE
>> CoexpressionAtlas               0.05        2     1500         50        FDR    TRUE
>> Computational                   0.05        2     1500         50        FDR    TRUE
>> Cytoband                        0.05        2     1500         50        FDR    TRUE
>> Disease                         0.05        2     1500         50        FDR    TRUE
>> Domain                          0.05        2     1500         50        FDR    TRUE
>> Drug                            0.05        2     1500         50        FDR    TRUE
>> GeneFamily                      0.05        2     1500         50        FDR    TRUE
>> GeneOntologyBiologicalProcess   0.05        2     1500         50        FDR    TRUE
>> GeneOntologyCellularComponent   0.05        2     1500         50        FDR    TRUE
>> GeneOntologyMolecularFunction   0.05        2     1500         50        FDR    TRUE
>> HumanPheno                      0.05        2     1500         50        FDR    TRUE
>> Interaction                     0.05        2     1500         50        FDR    TRUE
>> MicroRNA                        0.05        2     1500         50        FDR    TRUE
>> MousePheno                      0.05        2     1500         50        FDR    TRUE
>> Pathway                         0.05        2     1500         50        FDR    TRUE
>> Pubmed                          0.05        2     1500         50        FDR    TRUE
>> TFBS                            0.05        2     1500         50        FDR    TRUE
>> ToppCell                        0.05        2     1500         50        FDR    TRUE
>> ------------------------------
>> Values allowed by ToppGene are:
>>    PValue: [0, 1] <numeric>
>>    MinGenes: [1, 5000] <integer>
>>    MaxGenes: [2, 5000] <integer>
>>    MaxResults: [1, 5000] <integer>
>>    Correction: {None, FDR, Bonferroni} <character>
>>
>> ## This next line should not complete without an error!  But it does.
>>> cats[, "PValue"] <- 2
>>
>> ## Explicitly calling validObject() will point out the problem post-hoc,
>> ## but not prevent the above assignment.
>>> validObject(cats)
>> Error in validObject(cats) :
>>    invalid class “CategoriesDataFrame” object: column PValue must contain values <= 1
>>
>> _______________________________________________
>> Bioc-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel



More information about the Bioc-devel mailing list