[Rd] Subscripting issues unrelated to [Subscripting fails if name of element is "" (PR#8161)]

Fri Oct 7 19:38:26 CEST 2005

Jens,

This is a completely separate issue.  In indexing, character NA matches 
the name "NA".  That was a bug, but it is nothing to do with the subject 
line or PR#8161, and for the record let's keep this separate.  The 
`critical point' is not to build a theory around misunderstandings of 
several unrelated examples.

You say

>> get("$")(lx, as.character(NA))

goes wrong.  Now the documentation has x$name for 'name' a symbol or a 
character string, and you have passed an _expression_ and got an 
appropriate error message,

Error in get("$")(lx, as.character(NA)) : invalid subscript type

If you don't see that, please review the section in R-lang or the Blue 
Book.  Equally

>  get("$")(lx, as.character("a"))
Error in get("$")(lx, as.character("a")) : invalid subscript type

so it nothing to do with NA or "" or the subject line here.

On the other hand

>> substitute(lx$y, list(y=as.character(NA)))

is lx$NA, using a name (and no longer a character NA).

Brian

On Fri, 7 Oct 2005, "Jens Oehlschlägel" wrote:

> Dear Brian,
>
> Thanks for picking this up.
> I think the critical point is that it is not a single isolated bug and it
> would be a main effort to get this stuff consistent, because it (and
> implications) seems to be spread all over the code. The to be applauded
> efforts to properly sort out "NA" vs. as.character(NA) have not been fully
> successful yet and "" is a similar issue. Please consider the following,
> sorry for the length:
>
>
> # ERROR 1
>
> # I agree that c() disallows "" and NA names
> # it makes sense discouraging users from using such names
>
>> c(as.character(NA)=1)
> Fehler: Syntaxfehler in Zeile "c(as.character(NA)="
>> c("NA"=2, "a"=3)
> NA  a
> 2  3
>> c(""=4)
> Fehler: Versuch einen Variablennamen der Länge 0 zu nutzen
>
> # however, "NA" must be expected as a legal name, e.g. when importing data
> # and in your example specifying "no-name" in fact results in a "" name
>
>> names(c(a=1, 2))
> [1] "a" ""
>>
>
> # My interpreteation is that the user specifies a mixture of elements with
> and without names,
> # and therefore the no-names must be co-erced to "" names, and in principle
> that's completely fine
>
> # a character vector is defined to have either as.character(NA) OR "NA" OR
> "" or another positive length string
> # (which is complicated enough)
> # formally the names is an attribute (character vector) of an object and can
> be manipulated as such
>
>> x <- 1:4
>> names(x) <- c(NA, "NA", "a", "")
>> names(x)
> [1] NA   "NA" "a"  ""
>> # and in principle all of those can be properly distinguished
>> x[match(names(x), names(x))]
> <NA>   NA    a
>   1    2    3    4
>
>
> # introducing a fifth non-name state that sometimes equals "" and sometimes
> not, introduces inconsistency into the language
> # e.g. the fact that elements can be selected by their name but not by their
> non-name
> # Thus currently selecting by names is a mess from a consistency perspective
>
>
>> x[names(x)]
> <NA> <NA>    a <NA>
>   1    1    3   NA
>
> # in the following subscripting with "" works, but not with "NA"
>> for (i in names(x))
> + print(x[[i]])
> [1] 1
> [1] 1
> [1] 3
> [1] 4
>
>
> # ERROR 1a: If failing on "NA" is not a bug, I switch from programming to
> Kafka
>> x["NA"]
> <NA>
>
>   1
> # ERROR 1b: clearly wrong
>> x[["NA"]]
> [1] 1
> # ERROR 1c: and from my humble understanding failing on "" is a bug as well
>> x[""]
> <NA>
>  NA
> # wheras interestingly this is correct
>> x[[""]]
> [1] 4
>
>
> # I think it is obvious how to remove these inconsistencies
> # (as long as we do not disallow "" in names alltogether,
> #  which is almost impossible, since every users legally can set the names
> vector in a variety of ways )
>
> # these are not easy, but perfectly fine
>> x[as.character(NA)]
> <NA>
>   1
>> x[as.integer(NA)]
> <NA>
>  NA
>
> # and these are really debatable difficult ones
>> x[NA]
> <NA> <NA> <NA> <NA>
>  NA   NA   NA   NA
>> x[as.logical(NA)]
> <NA> <NA> <NA> <NA>
>  NA   NA   NA   NA
>
>
>
> ## ERROR 2+3: the above inconsistencies generalize to lists
>
> lx <- as.list(x)
>
>> lx
> $"NA"		(ERROR 2a)
> [1] 1
>
> $"NA"
> [1] 2
>
> $a
> [1] 3
>
> [[4]]		(ERROR 2b)
> [1] 4
>
> # and should read
>
>> lx
> $NA		(  or $as.character(NA) for clarity and warning )
> [1] 1
>
> $"NA"
> [1] 2
>
> $a
> [1] 3
>
> $""
> [1] 4
>
>
> # Note that - except for printing - match works perfectly in
>> lx[match(names(lx), names(lx))]
> $"NA"
> [1] 1
>
> $"NA"
> [1] 2
>
> $a
> [1] 3
>
> [[4]]
> [1] 4
>
> # and also in
>> for (i in match(names(lx), names(lx)))
> + print(lx[[i]])
> [1] 1
> [1] 2
> [1] 3
> [1] 4
>
>
> # Of course I consider the following behaviour as inconsistent
>> lx[names(lx)]
> $"NA"
> [1] 1
>
> $"NA"
> [1] 1		(ERROR 3a)
>
> $a
> [1] 3
>
> $"NA"
> NULL		(ERROR 3b)
>
>
> # using [[ the second one fails
>> for (i in names(lx))
> + print(lx[[i]])
> [1] 1
> [1] 1		(ERROR 3c)
> [1] 3
> [1] 4		(interestingly correct)
>
>
> # finally note that this works
>> eval(substitute(lx$y, list(y=as.character(NA))))
> # but not this
>> get("$")(lx, as.character(NA))
> Fehler in get("$")(lx, as.character(NA)) : ungültiger Indextyp
> # and both go wrong with "NA"
>
> -- 
> Lust, ein paar Euro nebenbei zu verdienen? Ohne Kosten, ohne Risiko!
> Satte Provisionen für GMX Partner: http://www.gmx.net/de/go/partner
>
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595