[R] aggregate, by, tapply bug or not?

Petr Pikal petr.pikal at precheza.cz
Thu Jan 24 16:05:11 CET 2002


On 24 Jan 2002 at 11:54, Agustin Lobo wrote:

> 
> In the case of *apply functions, the paramenters follow
> the name of the function. I.e., if you want to compute a mean
> with na.rm=T(which for one single vector would be
> mean(mivector,na.rm=T), then
> 
> apply(mat,1,mean,na.rm=T)
> 
> Agus
> 


Thanks to all.

Actually this works for mean, sum, var, sd with na.rm=T. My problem is with 
weighted.mean It works as standalone function, but inside any aggregation 
function it causes warning and it ***does not compute correctly***.

> weighted.mean(lll[rrr==2001],ttt[rrr==2001])
[1] -0.9257375

> tapply(lll,rrr,weighted.mean,ttt)
      1997       1998       1999       2000       2001 
-0.4495764 -0.4956762 -0.4920173 -0.9416626 -0.9455542 
Warning messages: 
1: longer object length
        is not a multiple of shorter object length in: x * w 
<snip>
5: longer object length
        is not a multiple of shorter object length in: x * w 

I traced the problem to ***lapply*** (probably the workhorse for all aggregate 
functions - see the enclosed code)

> lapply(split(lll,rrr),weighted.mean,ttt)

$"1997"
[1] -0.4495764

<snip>
$"2001"
[1] -0.9455542

Warning messages: 
1: longer object length
        is not a multiple of shorter object length in: x * w 
<snip>
5: longer object length
        is not a multiple of shorter object length in: x * w



I used a modified wersion of weighted.mean which works alone

> weighted.mean.modif(lll[rrr==2001],ttt[rrr==2001])
[1] -0.9257375

weighted.mean.modif_function (x, w) 
{
    if (missing(w)) 
        w <- rep(1, length(x))

{	i <- complete.cases(x,w)
        	w <- w[i]
        	x <- x[i]
    }
sw <-sum(w)
    sum(x * w)/sw
}

but using it in any aggregate function causes error and debugging does not show 
me any hints.

> tapply(lll,rrr,weighted.mean,ttt)
Error in complete.cases(...) : not all arguments have the same length

debug: rval <- .Internal(lapply(X, FUN))
Browse[1]> 
Error in complete.cases(...) : not all arguments have the same length

and this is completely beyond my ability to solve it.
 
I use R 1.4.0 Windows version,

lll is some property of a product
rrr are years
ttt is tonage of the product

they are all the same length (226) but the length varies from year to year 

> tapply(lll,rrr,length)
1997 1998 1999 2000 2001 
  48   51   40   42   45

Please if anybody can tell me where is the mistake.



> Dr. Agustin Lobo
> Instituto de Ciencias de la Tierra (CSIC)
> Lluis Sole Sabaris s/n
> 08028 Barcelona SPAIN
> tel 34 93409 5410
> fax 34 93411 0012
> alobo at ija.csic.es
> 
> 
> On Thu, 24 Jan 2002, Petr Pikal wrote:
> 
> > Dear R users
> > 
> > I searched some sources but i did not find an answer.Please give me
> > some hint to following problem.
> > 
> > I would like to compute a summary statistic for some vector for
> > different factor levels. I know I can use tapply or aggregate but I
> > do not know if there is a way how to use function with several (two)
> > variable input (like weighted.mean).
> > 
> > I wrote a simple a function for factor weighted mean
> > fff<-function(x,fact,w)
> > {
> > ws<-tapply(w,fact,sum)
> > newx<-x*w
> > tapply(newx,fact,sum)/ws
> > }
> > 
> > which can handle particular case but does exist some more general
> > solution how to use FUN(X1,X2) in aggregation procedures (tapply,
> > aggregate, by) directly?
> > 
> > Thank you
> > Petr Pikal
> > petr.pikal at precheza.cz
> > p.pik at volny.cz
> > 
> > 
Petr Pikal
petr.pikal at precheza.cz
p.pik at volny.cz


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list