[Bioc-devel] rbind,DataFrame vs rbind.data.frame

Hervé Pagès hpages at fhcrc.org
Wed Apr 30 19:36:15 CEST 2014


Thanks! I didn't see the test in test_IRanges-class.R. In the process
of moving things from IRanges to S4Vectors, I run into things that have
been written a long time ago. When the code looks kind of suspect, I try
to run it in various ways to check its behavior. I was doing this for
our internal utility rbind.mcols when I ran into this rbind,DataFrame
vs rbind.data.frame discrepancy. There are other problems with
rbind.mcols() and I might come back later about that.

H.


On 04/30/2014 04:58 AM, Michael Lawrence wrote:
> There is a pre-historic coercion in rbind,DataFrame that dates back to
> XDataFrame that turns every combined column into the class of the column
> in the first argument to rbind(). At least with Emacs VCS I couldn't get
> it to go back far enough in the logs. There is a comment there that I no
> longer understand and may no longer be relevant.
>
> So I got rid of that line, and somewhat amusingly there was a test
> almost *exactly* like your example except it expected the undesired
> result. I think Patrick wrote it but it's been touched too many times
> over the years to easily know, and I'm not sure what he was thinking.
>
> Checked it in; we'll see how it goes.
>
> Thanks,
> Michael
>
>
> On Tue, Apr 29, 2014 at 11:24 PM, Hervé Pagès <hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>> wrote:
>
>     Hi Michael,
>
>     I noticed this difference between DataFrame vs data.frame when doing
>     rbind():
>
>        > rbind(data.frame(aa=NA), data.frame(aa=1:2))
>          aa
>        1 NA
>        2  1
>        3  2
>
>        > rbind(DataFrame(aa=NA), DataFrame(aa=1:2))
>        DataFrame with 4 rows and 1 column
>                 aa
>          <logical>
>        1        NA
>        2      TRUE
>        3      TRUE
>
>     If the DataFrame with NAs is put after the DataFrame with integers,
>     things look better:
>
>        > rbind(DataFrame(aa=1:3), DataFrame(aa=NA))
>        DataFrame with 4 rows and 1 column
>                 aa
>          <integer>
>        1         1
>        2         2
>        3         3
>        4        NA
>
>     As a consequence, combining 2 Vector objects, one with no metadata cols
>     and one with metadata cols, will loose the data if the object with no
>     metadata cols is put first:
>
>        ir1 <- IRanges(1:2, 5)
>        ir2 <- IRanges(11:13, 14)
>        mcols(ir2) <- DataFrame(score=2:0)
>
>     Then:
>
>        > mcols(c(ir1, ir2))
>        DataFrame with 5 rows and 1 column
>              score
>          <logical>
>        1        NA
>        2        NA
>        3      TRUE
>        4      TRUE
>        5     FALSE
>
>     How hard it would be to bring the rbind,DataFrame method in line with
>     rbind.data.frame?
>
>     Thanks,
>     H.
>
>     --
>     Hervé Pagès
>
>     Program in Computational Biology
>     Division of Public Health Sciences
>     Fred Hutchinson Cancer Research Center
>     1100 Fairview Ave. N, M1-B514
>     P.O. Box 19024
>     Seattle, WA 98109-1024
>
>     E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>     Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>     Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>
>     _________________________________________________
>     Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing list
>     https://stat.ethz.ch/mailman/__listinfo/bioc-devel
>     <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list