[R] Odd behaviour of removing 'nothing' from an array or data frame
Richard.Cotton at hsl.gov.uk
Richard.Cotton at hsl.gov.uk
Tue Oct 31 15:50:50 CET 2006
Thanks for the reply Peter, though I'm not quite convinced.
> > #dubious.records = integer(0)
> > identical(dubious.records, -dubious.records)
> [1] TRUE
> how can peoples.heights[-dubious.records,] be different from
> peoples.heights[dubious.records,]?
Tell me if I'm being willfully ignorant here, but I'm sure they should be
different. In the first case, the minus sign represents subtraction, so
it is correct that dubious.records and -dubious.records are identical.
However, in the second case, inside the square brackets, the minus sign
represents set complement, not subtraction, so dubious.records and -
dubious.records are not the same.
If x = runif(10), then x[-c(2,3,5)] means "remove from x the values at the
second, third and fifth position".
By extension x[-integer(0)] should mean "remove from x no values", and not
"remove from x all values", which is the current behaviour.
Regards,
Richie.
Mathematical Sciences Unit
HSL
Buxton
SK17 9JN
01298 21(x8672)
pd at pubhealth.ku.dk wrote on 31/10/2006 14:27:05:
> Richard.Cotton at hsl.gov.uk writes:
>
> > I've just found some behaviour which strikes me as odd, but I'm not
sure
> > whether it's a bug or a feature. If you don't mind, I'd like to
explain
> > via a couple of examples.
> >
> > Let x = 1:10.
> > Then intuitively, to me at least, the command x[-integer(0)] should
leave
> > x untouched. However the actual output under R2.4.0 is integer(0).
> >
> > A slightly more involved example demonstrates why I think this
behaviour
> > is back to front.
> > First we define a data frame, in this case some people, with their
> > heights.
> > peoples.heights = data.frame(names = c("Alice", "Bob", "Carol"),
heights =
> > c(1.67, 1.85, 175))
> >
> > To make sure the heights are sensible, we define a filter out
impossibly
> > tall people.
> > dubious.records = which(peoples.heights$heights > 2.5) #3
> > peoples.heights = peoples.heights[-dubious.records,]
> >
> > This all works fine since dubious.records is not empty. However, if
all
> > the records had been entered properly, then we would get
> > #dubious.records = integer(0)
> >
> > Then the command peoples.heights = peoples.heights[-dubious.records,]
> > strips all the rows to give
> > #[1] names heights
> > #<0 rows> (or 0-length row.names)
> >
> > i.e. instead of removing the bad records, I've lost everything.
> > I know that it's possible to recode this so problems don't occur, but
the
> > point is that the answer is unexpected.
> >
> > Can anybody explain if this behaviour is intentional or useful in some
> > way, or is it an oversight?
>
> Consistency! It's not particularly useful, but it follows from general
> principles, which it in the long run doesn't pay to depart from.
>
> The issue is that the result of using an indexing operator ("[")
> should depend only on the _value_ of its argument, not the expression
> used to compute it. Just like you most likely expect log(2+2) not to be
> different from log(4). And since
>
> > dubious.records <- integer(0)
> > identical(dubious.records, -dubious.records)
> [1] TRUE
>
> how can peoples.heights[-dubious.records,] be different from
> peoples.heights[dubious.records,]?
>
> R could actually look at the expression and act on the minus sign, but
> that way lies madness. Consider
>
> keep <- -dubious.records
> drop <- dubious.records
>
> peoples.heights[keep,]
> peoples.heights[-dubious.records,]
> peoples.heights[-keep,]
>
> etc... I think you'll get the picture.
>
> --
> O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
> c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
> (*) \(*) -- University of Copenhagen Denmark Ph: (+45)
35327918
> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45)
35327907
------------------------------------------------------------------------
ATTENTION:
This message contains privileged and confidential informatio...{{dropped}}
More information about the R-help
mailing list