[Rd] Subscripting issues unrelated to [Subscripting fails if name of element is "" (PR#8161)]
Prof Brian Ripley
ripley at stats.ox.ac.uk
Fri Oct 7 19:38:26 CEST 2005
Jens,
This is a completely separate issue. In indexing, character NA matches
the name "NA". That was a bug, but it is nothing to do with the subject
line or PR#8161, and for the record let's keep this separate. The
`critical point' is not to build a theory around misunderstandings of
several unrelated examples.
You say
>> get("$")(lx, as.character(NA))
goes wrong. Now the documentation has x$name for 'name' a symbol or a
character string, and you have passed an _expression_ and got an
appropriate error message,
Error in get("$")(lx, as.character(NA)) : invalid subscript type
If you don't see that, please review the section in R-lang or the Blue
Book. Equally
> get("$")(lx, as.character("a"))
Error in get("$")(lx, as.character("a")) : invalid subscript type
so it nothing to do with NA or "" or the subject line here.
On the other hand
>> substitute(lx$y, list(y=as.character(NA)))
is lx$NA, using a name (and no longer a character NA).
Brian
On Fri, 7 Oct 2005, "Jens Oehlschlägel" wrote:
> Dear Brian,
>
> Thanks for picking this up.
> I think the critical point is that it is not a single isolated bug and it
> would be a main effort to get this stuff consistent, because it (and
> implications) seems to be spread all over the code. The to be applauded
> efforts to properly sort out "NA" vs. as.character(NA) have not been fully
> successful yet and "" is a similar issue. Please consider the following,
> sorry for the length:
>
>
> # ERROR 1
>
> # I agree that c() disallows "" and NA names
> # it makes sense discouraging users from using such names
>
>> c(as.character(NA)=1)
> Fehler: Syntaxfehler in Zeile "c(as.character(NA)="
>> c("NA"=2, "a"=3)
> NA a
> 2 3
>> c(""=4)
> Fehler: Versuch einen Variablennamen der Länge 0 zu nutzen
>
> # however, "NA" must be expected as a legal name, e.g. when importing data
> # and in your example specifying "no-name" in fact results in a "" name
>
>> names(c(a=1, 2))
> [1] "a" ""
>>
>
> # My interpreteation is that the user specifies a mixture of elements with
> and without names,
> # and therefore the no-names must be co-erced to "" names, and in principle
> that's completely fine
>
> # a character vector is defined to have either as.character(NA) OR "NA" OR
> "" or another positive length string
> # (which is complicated enough)
> # formally the names is an attribute (character vector) of an object and can
> be manipulated as such
>
>> x <- 1:4
>> names(x) <- c(NA, "NA", "a", "")
>> names(x)
> [1] NA "NA" "a" ""
>> # and in principle all of those can be properly distinguished
>> x[match(names(x), names(x))]
> <NA> NA a
> 1 2 3 4
>
>
> # introducing a fifth non-name state that sometimes equals "" and sometimes
> not, introduces inconsistency into the language
> # e.g. the fact that elements can be selected by their name but not by their
> non-name
> # Thus currently selecting by names is a mess from a consistency perspective
>
>
>> x[names(x)]
> <NA> <NA> a <NA>
> 1 1 3 NA
>
> # in the following subscripting with "" works, but not with "NA"
>> for (i in names(x))
> + print(x[[i]])
> [1] 1
> [1] 1
> [1] 3
> [1] 4
>
>
> # ERROR 1a: If failing on "NA" is not a bug, I switch from programming to
> Kafka
>> x["NA"]
> <NA>
>
> 1
> # ERROR 1b: clearly wrong
>> x[["NA"]]
> [1] 1
> # ERROR 1c: and from my humble understanding failing on "" is a bug as well
>> x[""]
> <NA>
> NA
> # wheras interestingly this is correct
>> x[[""]]
> [1] 4
>
>
> # I think it is obvious how to remove these inconsistencies
> # (as long as we do not disallow "" in names alltogether,
> # which is almost impossible, since every users legally can set the names
> vector in a variety of ways )
>
> # these are not easy, but perfectly fine
>> x[as.character(NA)]
> <NA>
> 1
>> x[as.integer(NA)]
> <NA>
> NA
>
> # and these are really debatable difficult ones
>> x[NA]
> <NA> <NA> <NA> <NA>
> NA NA NA NA
>> x[as.logical(NA)]
> <NA> <NA> <NA> <NA>
> NA NA NA NA
>
>
>
> ## ERROR 2+3: the above inconsistencies generalize to lists
>
> lx <- as.list(x)
>
>> lx
> $"NA" (ERROR 2a)
> [1] 1
>
> $"NA"
> [1] 2
>
> $a
> [1] 3
>
> [[4]] (ERROR 2b)
> [1] 4
>
> # and should read
>
>> lx
> $NA ( or $as.character(NA) for clarity and warning )
> [1] 1
>
> $"NA"
> [1] 2
>
> $a
> [1] 3
>
> $""
> [1] 4
>
>
> # Note that - except for printing - match works perfectly in
>> lx[match(names(lx), names(lx))]
> $"NA"
> [1] 1
>
> $"NA"
> [1] 2
>
> $a
> [1] 3
>
> [[4]]
> [1] 4
>
> # and also in
>> for (i in match(names(lx), names(lx)))
> + print(lx[[i]])
> [1] 1
> [1] 2
> [1] 3
> [1] 4
>
>
> # Of course I consider the following behaviour as inconsistent
>> lx[names(lx)]
> $"NA"
> [1] 1
>
> $"NA"
> [1] 1 (ERROR 3a)
>
> $a
> [1] 3
>
> $"NA"
> NULL (ERROR 3b)
>
>
> # using [[ the second one fails
>> for (i in names(lx))
> + print(lx[[i]])
> [1] 1
> [1] 1 (ERROR 3c)
> [1] 3
> [1] 4 (interestingly correct)
>
>
> # finally note that this works
>> eval(substitute(lx$y, list(y=as.character(NA))))
> # but not this
>> get("$")(lx, as.character(NA))
> Fehler in get("$")(lx, as.character(NA)) : ungültiger Indextyp
> # and both go wrong with "NA"
>
> --
> Lust, ein paar Euro nebenbei zu verdienen? Ohne Kosten, ohne Risiko!
> Satte Provisionen für GMX Partner: http://www.gmx.net/de/go/partner
>
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list