[Bioc-devel] Changes to DataFrame

Pages, Herve hp@ge@ @end|ng |rom |redhutch@org
Wed Aug 28 03:39:59 CEST 2019

Hi developers,

Short story: these changes shouldn't affect you but I recommend you read 
the long story just in case.

Long story:

Some of you maybe already noticed that I was making changes to the 
DataFrame class. The idea is to "make room" for other data-frame-like 
containers by having DataFrame become a virtual class with no slots with 
concrete subclasses that provide specific 
representations/implementations. This will make it easier to experiment 
with on-disk data frame representations (e.g. SQL-based, Parquet-based, 
etc...) and have these data-frame-like containers re-usable in any place 
where a DataFrame object is currently expected. The typical use cases we 
have in mind is to support on-disk storage of the metadata columns of 
Vector-like derivatives or on-disk storage of the colData slot of a 
SummarizedExperiment object or derivative.

The first round of changes I made was to introduce the DFrame class as a 
subclass of DataFrame, and to have the DataFrame() constructor return a 
DFrame object instead of a DataFrame. Note that DFrame uses exactly the 
same internal representation as DataFrame (i.e. it does not add any slot 
to the current representation of DataFrame) so for now DFrame and 
DataFrame objects are equivalent (but this will change in the future 
when DataFrame "looses" its slots). However, you should no longer see 
DataFrame instances. More precisely: unless you use new("DataFrame", 
...) (which you should not, you should always use the DataFrame() 
constructor instead), you will always get DFrame instances instead of 
DataFrame instances. In order to make this change as transparent as 
possible to the end-user, show() still reports that the object is a 
DataFrame. Note that this is actually true because is( , "DataFrame") is 
true on a DFrame object so we are not lying, just hiding the truth ;-)

The only situation where you'll still see a DataFrame instance is when 
you use readRDS(), load(), or data() to deserialize an object that was 
created before these changes. Nothing wrong with these "old" objects 
though: they're still valid objects and should keep working as before.
Note however that their population will naturally start to shrink from 
now on until they completely disappear at some point in the future. FWIW 
we've actually started to consider some strategies/tricks to accelerate 
their eradication from planet earth.

I made similar changes to the DataFrameList class and subclasses.

These changes are in S4Vectors 0.23.20 and IRanges 2.19.14.

I think I've taken care of all software packages that this first round 
of changes broke. Let me know if I didn't.

We're still a long way from having DataFrame be a virtual class with no 
slots (and with DFrame being its "canonical" subclass i.e. providing the 
current in-memory representation) so expect more changes in the future.

I'll report later here as we make significant progress on this but the 
next major round of changes should not happen before the next BioC 
release (i.e. when we start the BioC 3.11 6-month devel cycle).



Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages using fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

More information about the Bioc-devel mailing list