[Rd] Change 77844 breaking pkgs [Re: dimnames incoherence?]

Martin Maechler m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Sat Feb 22 21:43:58 CET 2020


>>>>> Martin Maechler 
>>>>>     on Sat, 22 Feb 2020 20:20:49 +0100 writes:

>>>>> William Dunlap 
>>>>>     on Fri, 21 Feb 2020 14:05:49 -0800 writes:

    >> If we change the behavior  NULL--[[--assignment from

    >> `[[<-`(NULL, 1, "a" ) # gives  "a"  (*not* a list)

    >> to

    >> `[[<-`(NULL, 1, "a" ) # gives  list("a")

    >> then we have more consistency there *and* your bug is fixed too.
    >> Of course, in other situations back-compatibility would be
    >> broken as well.

    >> Would that change the result of
    >> L <- list(One=1) ; L$Two[[1]] <- 2
    >> from the current list(One=1,Two=2) to list(One=1, Two=list(2))

    >> and the result of
    >> F <- 1L ; levels(F)[[1]] <- "one"
    >> from structure(1L, levels="one") to structure(1L, levels=list("one"))?

    > Yes (twice).

    > This is indeed what happens in current R-devel, as I had
    > committed the proposition above yesterday.
    > So R-devel (with svn rev >= 77844 )  does this :

    >> L <- list(One=1) ; L$Two[[1]] <- 2 ; dput(L)
    > list(One = 1, Two = list(2))
    >> F <- 1L ; levels(F)[[1]] <- "one" ; dput(F)
    > structure(1L, .Label = list("one"))
    >> 

    > but I find that still considerably more logical than current
    > (pre R-devel) R's

    >> L <- list(One=1) ; L$Two[[1]] <- 2 ; dput(L)
    > list(One = 1, Two = 2)
    >> L <- list(One=1) ; L$Two[[1]] <- 2:3 ; dput(L)
    > list(One = 1, Two = list(2:3))
    >> 
    >> F <- 1L ; levels(F)[[1]] <- "one" ; dput(F)
    > structure(1L, .Label = "one")
    >> F <- 1L ; levels(F)[[1]] <- c("one", "TWO") ; dput(F)
    > structure(1L, .Label = list(c("one", "TWO")))
    >> 


    >> This change would make L$Name[[1]] <- value act like L$Name$one <- value
    >> in cases when L did not have a component named "Name" and value
    >> had length 1.

    > (I don't entirely get what you mean, but)
    > indeed,
    > the  [[<-  assignments will be closer to corresponding $<-  assignments...
    > which I thought would be another good thing about the change.

    >> I have seen users use [[<- where [<- is more appropriate in cases like
    >> this.  Should there be a way to generate warnings about the change in
    >> behavior as you've done with other syntax changes?

    > Well, good question.
    > I'd guess one would get such warnings "all over the place",  and
    > if a warning is given only once per session it may not be
    > effective  ... also the warning be confusing to the 99.9% of R users who
    > don't even get what we are talking about here ;-)

    > Thank you for your comments.. I did not get too many.

Well, there's one situation where semi-experienced package
authors are bitten by the new R-devel behavior...

I'm seeing a few dozen CRAN packages breaking in R-devel >= r77884.

One case is exactly as you (Bill) mention above: people using
dd[[.]] <- ..   where they should use single [.].

In one package, I see an inefficient for loop over all rows of a
data frame 'dd'

for(i in 1:nrow(dd)) {

 ...

 dd$<nonexisting_column>[[i]] <-  <one character string>

}

This used to work -- as said quite inefficiently:
for i=1 it created the **full** data frame column  and then,
once the column exists, it presumably does assign one entry
after the other...

Now this code breaks (later!) in the package now, because the
new column ends up as a *list* of strings, instead of a vector
of strings.

I think there are quite a few such cases also in other CRAN
packages which now break with the latest R-devel.

Coming back to Bill Dunlap's question: Should we not warn here?
And now when our toplevel list is a data frame, maybe we should
warn indeed, if we can easily limit ourselves to such "bizarre"
ways of growng a data frame  ...


  dd $ foo [[i]] <- vv

<==>

  `*tmp*` <- dd
  dd <- `$<-`(`*tmp*`, value = `[[<-`(`*tmp*`$foo, i, vv))
  rm(`*tmp*`)
  
but then really we have the same problem as previously: The
 `[[<-`(NULL, i, vv)  part does not "know" anything about the
fact that we are in a data frame column creation context.

If the R package author had used  '[i]' instead of '[[i]]'
he|she would have been safe

(as they would be if they worked more efficiently and created
the whole variable as a vector and only then added it to the
data frame ... but then, it seems people want to perpetuate the
claim of R to be slow ... even if it's them who make R run
slowly ... ;-))



More information about the R-devel mailing list