[Bioc-devel] How best to remap S4Vectors::Hits indices?

Hervé Pagès hp@ge@ @ending from fredhutch@org
Fri May 25 19:31:32 CEST 2018


Hi Pariksheet,

On 05/22/2018 04:57 PM, Pariksheet Nanda wrote:
> Hi folks,
> 
> I'm working on a package that does some trivial GRanges position
> classifications; primarily to standardize nomenclature according to the
> literature in workflows.
> 
> The API for S4Vectors::Hits() generally doesn't seem amenable to modify
> Hits objects, except for the remapHits() feature (which I see underneath
> the covers really generates a new Hits object).

Exactly. And that is the case for any object in R that is not a
reference object (i.e. that is not an environment, external pointer,
or reference class instance). Modifying it always generates a new
object. For example replacing a column of a data frame with
my_df$foo <- value or a slot of an S4 object with my_object using foo <- value
generates a new object. So adding setter methods for Hits objects
wouldn't change that.

The only reason we don't provide from()/queryHits() or
to()/subjectHits() setters is because we've not been able to identify
use cases that justify having them so far. For those use cases where
the 'from' and 'to' slots both need to be modified (in an atomic way),
calling the Hits() constructor to generate a new object does the job.

> 
> I was hoping someone could take a quick look at a short function I'm using
> to subset and reindex Hits in the da_tss() function:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_coregenomics_nascentrna_blob_a2d9d10564c3a88759237b56ec49d0d3e73f6d16_R_classify.R-23L70&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=NZbAyFFpxVrRnJ_wgmnGDVpP3zsiyUN-I1CYW18k--I&s=wdseZzTGLbMSi02jPq5IsSaOgUlJVn_Pbqop_swCpjc&e=
> Yes, to illustrate the problem I'm having, I've directly used the @-style
> S4 access which is, of course, a terrible thing to do because it defeats
> the purpose of S4 object validation, which is why I'm e-mailing the list
> for an alternative.  I feel like casting to something like a data.frame,
> changing the indices, and changing back to Hits would be wasteful and
> improperly using the Bioconductor framework?

No need to cast the object to a data.frame. That would indeed be
wasteful. Just compute the new 'from' and 'to' vectors then do
'Hits(from, to, nLnode=nLnode(hits), nRnode=nRnode(hits))'
to create the modified object ('hits' being the original object).

Hope this helps,
H.

> 
> Here are the corresponding tests that run the da_tss() function:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_coregenomics_nascentrna_blob_a2d9d10564c3a88759237b56ec49d0d3e73f6d16_tests_testthat_test-2Dclassifiers.R&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=NZbAyFFpxVrRnJ_wgmnGDVpP3zsiyUN-I1CYW18k--I&s=tljRQI1QSZvtWBVQ6nZxvkvHDEhsHLEuileQTDJZAu0&e=
> 
> What it comes down to is this:
> I want to compare a subset of GRanges for hits, but revert to the original
> GRanges indices when returning the results.
> 
> Thanks for any advice!
> Pariksheet
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=NZbAyFFpxVrRnJ_wgmnGDVpP3zsiyUN-I1CYW18k--I&s=axfJINFZYMTUgAtiTpF1FfjKAHOgjHrsbge0ANjtCrE&e=
> 

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages using fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list