[Rd] I() in merge (was: Re: xftrm is more than 100x slower for AsIs than for character vectors)

Hilmar Berger h||m@r@berger @end|ng |rom gmx@de
Tue Jul 16 09:08:18 CEST 2024


Dear all,

actually, it is not clear to me why there is still a protection of the
added Row.names column in merge using I(). This seems to stem from a
time when R would automatically convert character vectors to factor in
data.frame on insert. However, I can't reproduce this behaviour even in
data.frames generated with stringsAsFactors = T in current versions of
R. Maybe the I() inserted in r 39026 can be removed altogether?

Best regards

Hilmar

On 14.07.24 19:09, HB via R-devel wrote:
> Dear Ivan,
>
> thanks for the confirmation and the proposed patch.
>
> I just wanted to add some notes regarding the relevance of this: base::merge using by.x=0 or by.y=0 (i.e. matching on row.names) will automatically add a column Row.names which is I(row.names(x)) to the corresponding input table (using I() since  revision 39026 to avoid conversion of character to factor). When this column is used for sorting (sort=TRUE by default in merge; should happen at least if all.x=T or all.y=T), this will result in slower execution.
>
> xtfrm.AsIs is unchanged since its addition in r50992 (likely unrelated to the former).
>
> So I guess that this just went unnoticed since it will not cause problems on small data frames.
>
> Best regards
>
> Hilmar
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list