[Bioc-devel] Numeric Operation on DataFrame

Hervé Pagès hpages at fredhutch.org
Tue Jan 16 18:47:53 CET 2018


Hi,

I think I remember it was once suggested on this list that DataFrame
objects with numeric columns could support math/summarization
operations, like data.frame objects do (can't find the thread
to provide the link, sorry).

I'll mention that wrapping a DataFrame object (or any matrix-like or
array-like object) in a DelayedArray object is one way to enable
this:

   library(DelayedArray)
   M <- DelayedArray(dataTableS4)
   colMeans(M)
   #      aFeature anotherFeature
   #             3              3

This should not copy the DataFrame so should be more memory efficient
than doing as.data.frame() on it. In addition it will transparently
use the internal DelayedArray machinery i.e. will delay some
operations (e.g. subsetting and log() in colMeans(log(M[-1, ]))
are delayed) and use block-processing for non-delayed operations
(e.g. colMeans).

Note that wrapping a DataFrame with Rle columns in a DelayedArray
object also works.

Pete's DelayedMatrixStats package will extend DelayedArray capabilities
by giving you access to all the summarization functions defined in
the matrixStats package.

That being said, it would be nice if math/summarization operations
worked directly on DataFrame objects like they do on ordinary
data frames. This could naturally be extended to DataFrame objects
with numeric Rle columns.

H.

On 01/16/2018 06:29 AM, Michael Lawrence wrote:
> Please be more specific about the desired operations, or, better, submt a
> pull request with them. colMeans() in particular was intentionally omitted
> because it depends on having homogeneous data, which is better suited for a
> matrix, not a data frame.
> 
> On Mon, Jan 15, 2018 at 10:00 PM, Dario Strbenac <dstr7320 at uni.sydney.edu.au
>> wrote:
> 
>> Good day,
>>
>> Would it be useful to provide the same operations which can be done to a
>> data.frame for a DataFrame in a future release of S4Vectors? For example,
>>
>> dataTable <- data.frame(aFeature = 1:5, anotherFeature = 5:1)
>> colMeans(dataTable)
>> #  aFeature anotherFeature
>> #         3              3
>> dataTableS4 <- DataFrame(aFeature = 1:5, anotherFeature = 5:1)
>> colMeans(dataTableS4)
>>      Error in colMeans(dataTableS4) :
>>          'x' must be an array of at least two dimensions
>>
>> --------------------------------------
>> Dario Strbenac
>> University of Sydney
>> Camperdown NSW 2050
>> Australia
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=YvMtQhvKb8pNL1GAmQmOaYiMzhMOY5gA0I116y0jnSk&s=i3ZtH69dT5x1gcDRlG472FFqoKFc_TwKOPsNFc-IT6A&e=
>>
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=YvMtQhvKb8pNL1GAmQmOaYiMzhMOY5gA0I116y0jnSk&s=i3ZtH69dT5x1gcDRlG472FFqoKFc_TwKOPsNFc-IT6A&e=
> 

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list