[Rd] nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)

Jan Gorecki j@goreck| @end|ng |rom w|t@edu@p|
Fri May 17 05:48:50 CEST 2019


Hi Gabriel

> Personally, no I wouldn't. I would consider m==0 a degenerate case, where
there is no data, but I personally find matrices (or data.frames) with rows
but no columns a very strange concept.

This distinction between matrix and data.frames is the crux in this case.
>From the dimensional modelling point of view, matrix can have non-zero
rows and zero columns, but data.frame (assuming it maps to database
table structure) should never have non-zero rows and zero columns.
This kind of issue was raised before in our issue tracker:
https://github.com/Rdatatable/data.table/issues/2422
You should find that discussion useful.

Best,
Jan Gorecki


On Fri, May 17, 2019 at 8:11 AM Pages, Herve <hpages using fredhutch.org> wrote:
>
> On 5/16/19 17:48, Gabriel Becker wrote:
>
> Hi Herve,
>
> Inline.
>
>
>
> On Thu, May 16, 2019 at 4:45 PM Pages, Herve <hpages using fredhutch.org<mailto:hpages using fredhutch.org>> wrote:
> Hi Gabe,
>
>    ncol(data.frame(aa=c("a", "b", "c"), AA=c("A", "B", "C")))
>    # [1] 2
>
>    ncol(data.frame(aa="a", AA="A"))
>    # [1] 2
>
>    ncol(data.frame(aa=character(0), AA=character(0)))
>    # [1] 2
>
>    ncol(cbind(aa=c("a", "b", "c"), AA=c("A", "B", "C")))
>    # [1] 2
>
>    ncol(cbind(aa="a", AA="A"))
>    # [1] 2
>
>    ncol(cbind(aa=character(0), AA=character(0)))
>    # [1] 2
>
>    nrow(rbind(aa=c("a", "b", "c"), AA=c("A", "B", "C")))
>    # [1] 2
>
>    nrow(rbind(aa="a", AA="A"))
>    # [1] 2
>
>    nrow(rbind(aa=character(0), AA=character(0)))
>    # [1] 2
>
> Sure, but
>
>
> > nrow(rbind(aa = c("a", "b", "c"), AA = c("a", "b", "c")))
>
> [1] 2
>
> > nrow(rbind(aa = c("a", "b", "c"), AA = "a"))
>
> [1] 2
>
> > nrow(rbind(aa = c("a", "b", "c"), AA = character()))
>
> [1] 1
>
>
> Ah, I see now.
>
> But:
>
>   > data.frame(aa = c("a", "b", "c"), AA = character())
>   Error in data.frame(aa = c("a", "b", "c"), AA = character()) :
>     arguments imply differing number of rows: 3, 0
>
> and
>
>   > mapply(`*`, 1:5, integer(0))
>   Error in mapply(`*`, 1:5, integer(0)) :
>     zero-length inputs cannot be mixed with those of non-zero length
>
> So I would declare rbind(aa = c("a", "b", "c"), AA = character()) inconsistent rather than making the case that rbind(aa = character(), AA = character()) needs to change.
>
> Cheers,
>
> H.
>
>
> So even if I ultimately "lose"  this debate (which really wouldn't shock me, even if R-core did agree with me there's backwards compatibility to consider), you have to concede that the current behavior is more complicated than the above is acknowledging.
>
> By rights of the invariance that you and Hadley are advocating,  as far as I understand it, the last should give 2 rows, one of which is all NAs, rather than giving only one row as it currently does (and, I assume?,  always has).
>
> So there are two different behavior patterns that could coherently (and internally-consistently) be generalized to apply to the  rbind(character(), character()) case, not just one. I'm making the case that the other one (that length 0 vectors do not add rows because they don't contain data) would be equally valid, and to N>1 people, at least equally intuitive.
>
> Best,
> ~G
>
> hmmm... not sure why ncol(cbind(aa=character(0), AA=character(0))) or
> nrow(rbind(aa=character(0), AA=character(0))) should do anything
> different from what they do.
>
> In my experience, and more generally speaking, the desire to treat
> 0-length vectors as a special case that deviates from the
> non-zero-length case has never been productive.
>
> H.
>
>
> On 5/16/19 13:17, Gabriel Becker wrote:
> > Hi all,
> >
> > Apologies if this has been asked before (a quick google didn't  find it for
> > me),and I know this is a case of behaving as documented but its so
> > unintuitive (to me at least) that I figured I'd bring it up here anyway. I
> > figure its probably going to not be changed,  but I'm happy to submit a
> > patch if this is something R-core feels can/should change.
> >
> > So I recently got bitten by the fact that
> >
> >> nrow(rbind(character(), character()))
> > [1] 2
> >
> >
> > I was checking whether the result of an rbind call had more than one row,
> > and that unexpected returned true, causing all sorts of shenanigans
> > downstream as I'm sure you can imagine.
> >
> > Now I know that from ?rbind
> >
> > For ‘cbind’ (‘rbind’), vectors of zero length (including ‘NULL’)
> >>       are ignored unless the result would have zero rows (columns), for
> >>
> >>       S compatibility.  (Zero-extent matrices do not occur in S3 and are
> >>
> >>       not ignored in R.)
> >>
> > But there's a couple of things here. First, for the rowbind  case this
> > reads as "if there would be zero columns,  the vectors will not be
> > ignored". This wording implies to me that not ignoring the vectors is a
> > remedy to the "problem" of the potential for a zero-column return, but
> > thats not the case.  The result still has 0 columns, it just does not also
> > have zero rows. So even if the behavior is not changed, perhaps this
> > wording can be massaged for clarity?
> >
> > The other issue, which I admit is likely a problem with my intuition, but
> > which I don't think I'm alone in having, is that even if I can't have a 0x0
> > matrix (which is what I'd prefer) I would have expected/preferred a 1x0
> > matrix, the reasoning being that if we must avoid a 0x0 return value, we
> > would do the  minimum required to avoid, which is to not ignore the first
> > length 0 vector, to ensure a non-zero-extent matrix, but then ignore the
> > remaining ones as they contain information for 0 new rows.
> >
> > Of course I can program around this now that I know the behavior, but
> > again, its so unintuitive (even for someone with a fairly well developed
> > intuition for R's sometimes "quirky" behavior) that I figured I'd bring it
> > up.
> >
> > Thoughts?
> >
> > Best,
> > ~G
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-devel using r-project.org<mailto:R-devel using r-project.org> mailing list
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=WzRf-6PuyYeprM0v55lLX2U-_hYGf__5yf3h6JNdJH0&s=nn76KQtp4viR66768zoSNcH7WpG77Pp8LyhOwYOs674&e=
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages using fredhutch.org<mailto:hpages using fredhutch.org>
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages using fredhutch.org<mailto:hpages using fredhutch.org>
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list