[R] 2 small problems: integer division and the nature of NA
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Fri Feb 4 17:42:03 CET 2005
Denis Chabot <chabotd at globetrotter.net> writes:
> Hi,
>
> I'm wondering why
>
> 48 %/% 2 gives 24
> but
> 4.8 %/% 0.2 gives 23...
> I'm not trying to round up here, but to find out how many times
> something fits into something else, and the answer should have been
> the same for both examples, no?
Well, you can't trust floating point numbers to give you an exact
result:
> 4.8 / 0.2 - 24
[1] -3.552714e-15
and even
> (48/10) / (2/10) - 24
[1] -3.552714e-15
the basic issue being that tenths are not exactly representable in
binary floating point. I think very few people even expected you to
use integer division on non-integers, but I note that the claim on the
help page actually holds:
> 0.2 * 4.8 %/% 0.2 + 4.8 %% 0.2 == 4.8
[1] TRUE
> On a different topic, I like the behavior of NAs better in R than in
> SAS (at least they are not considered the smallest value for a
> variable), but at the same time I am surprised that the sum of NAs is
> 0 instead of NA.
>
> The sum of a vector having at least one NA but also valid data gives
> NA if we do not specify na.rm=T. But with na.rm=T, we are telling sum
> to give the sum of valid data, ignoring NAs that do not tell us
> anything about the value of a variable. I found out while getting the
> sum of small subsets of my data (such as when subsetting by several
> variables), sometimes a "cell" only contained NAs for my response
> variable. I would have expected the sum to be NA in such cases, as I
> do not have a single data point telling me the value of my response
> here. But R tells me the sum was zero in that cell! Was this behavior
> considered "desirable" when sum was built? If not, any hope it will be
> fixed?
Yes it was, and no there isn't. In math, the sum over an empty index
set is zero, which has some nice consistency properties (the sum over
a disjoint union of sets is the sum of the sums over each set, for
instance.
--
O__ ---- Peter Dalgaard Blegdamsvej 3
c/ /'_ --- Dept. of Biostatistics 2200 Cph. N
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list