[Rd] I() in merge (was: Re: xftrm is more than 100x slower for AsIs than for character vectors)
Hilmar Berger
h||m@r@berger @end|ng |rom gmx@de
Tue Jul 16 09:08:18 CEST 2024
Dear all,
actually, it is not clear to me why there is still a protection of the
added Row.names column in merge using I(). This seems to stem from a
time when R would automatically convert character vectors to factor in
data.frame on insert. However, I can't reproduce this behaviour even in
data.frames generated with stringsAsFactors = T in current versions of
R. Maybe the I() inserted in r 39026 can be removed altogether?
Best regards
Hilmar
On 14.07.24 19:09, HB via R-devel wrote:
> Dear Ivan,
>
> thanks for the confirmation and the proposed patch.
>
> I just wanted to add some notes regarding the relevance of this: base::merge using by.x=0 or by.y=0 (i.e. matching on row.names) will automatically add a column Row.names which is I(row.names(x)) to the corresponding input table (using I() since revision 39026 to avoid conversion of character to factor). When this column is used for sorting (sort=TRUE by default in merge; should happen at least if all.x=T or all.y=T), this will result in slower execution.
>
> xtfrm.AsIs is unchanged since its addition in r50992 (likely unrelated to the former).
>
> So I guess that this just went unnoticed since it will not cause problems on small data frames.
>
> Best regards
>
> Hilmar
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list