[R] NA, where no NA should (could!) be!
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Sat Dec 20 23:28:38 CET 2008
Oliver Bandel wrote:
> Sarah Goslee <sarah.goslee <at> gmail.com> writes:
>
>> I think we need the reproducible example requested in
>> the posting guide.
>
> ====================
> for ( datum in names(weblog_by_date) )
> {
> print(datum)
> selected <- weblog_by_date[[datum]]
>
> res_size_by_host <- tapply( selected$size, selected$host, sum)
> mycat <- function(a,b) cat(paste(a, "==>", b, "\n"))
> mapply( mycat, selected$size, selected$host )
> print( res_size_by_host )
>
> print( "is there any NA?!")
> print( any( is.na(selected$size)) )
>
> }
> ====================
Why do so many people have such trouble with the word "reproducible"? We
can't reproduce that without access to weblog_by_date!
Anyways I think it is tapply that is behaving unexpectedly to you:
> x <- factor(1,levels=1:2)
> tapply(1,x,sum)
1 2
1 NA
which is kind of surprising since the sum over an empty set is usually
zero. However, that _is_ what the documentation for tapply says:
When 'FUN' is present, 'tapply' calls 'FUN' for each cell that has
any data in it. If 'FUN' returns a single atomic value for each
such cell (e.g., functions 'mean' or 'var') and when 'simplify' is
'TRUE', 'tapply' returns a multi-way array containing the values,
and 'NA' for the empty cells.
a passable workaround is
> sapply(split(1,x),sum)
1 2
1 0
>
>
>
> At the end of the printouts, it gives me:
>
> =======================
> 94.101.145.110 94.23.3.220
> NA NA
> [1] "is there any NA?!"
> [1] FALSE
> =======================
>
--
O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list