[R] Tapply.

Petr PIKAL petr.pikal at precheza.cz
Wed Apr 28 08:05:19 CEST 2010


Hi

steven mosher <moshersteven at gmail.com> napsal dne 27.04.2010 17:04:04:

> Thanks,
> 
>  I had been wondering what Drop did. That makes it more clear.
>  
> While I have code that loops and does the problem correctly, I wanted to
> do things the R way and be fast and terse. hehe.
> 
> So:
>     ID                   d    y      jan  ...
> 11264402000         1 1987  NA  NA  NA  NA  NA 218  NA  NA 235 243 240 
 NA
>  11264402000         3 1987  NA  NA  NA  NA  NA 218  NA  NA 235 243 240 
 NA
> 
> in words : for each id, for each year return
>      the max of jan,feb,.....over d
>      the min of jan, feb....  over d
>      the mean of jan,feb.. over d
>      the (max+min)/2 of jan, feb...over d
>      the count of d for jan.feb......
>      the results of a function called with all elements of this id
> 

something like

aggregate(data[, months], list(id, d), my.summary)

where my.summary is a function computing all required values and returning 
them in appropriate form.

in words : split selected data to chunks according to list of indices, use 
required function to each chunk and return result.

Regards
Petr



> Anyway, your kind attention has been greatly appreciated.
> 
>  
>      
> 
> 

> On Tue, Apr 27, 2010 at 2:40 AM, Petr PIKAL <petr.pikal at precheza.cz> 
wrote:
> Hi
> r-help-bounces at r-project.org napsal dne 26.04.2010 17:05:54:
> 
> > I guess my problem was seeing a bunch of examples where they pulled a
> > variable from a dataframe..
> >
> >   tapply(df$data, index=list(..

> df$data results in vector so as eg. df[,5] unless you use drop=FALSE
> option
> 
> >
> > and I
> > assumed that the df$data was just generalizable to a collection of
> vectors
> > a vector of vector being a vector

> df[,1:15] is not a vector of vectors. R sometimes can give you nasty
> surprise with object types and modes but changing a type of object 
merely
> by selecting some part of it wold be quite problematic.
> 
> see
> 
> str(df$data)
> str(df[, 1])
> str(df[,1, drop=FALSE])
> str(df[,1:15])
> 
> Regards
> Petr
> 
> 
> 
> >
> > Thanks.
> >
> > On Mon, Apr 26, 2010 at 2:43 AM, Petr PIKAL <petr.pikal at precheza.cz>
> wrote:
> >
> > > Hi
> > >
> > >
> > > steven mosher <moshersteven at gmail.com> napsal dne 26.04.2010 
10:21:37:
> > >
> > > > That fails:
> > > >
> > > > The manual says:
> > > >
> > > > tapply(X, INDEX, FUN = NULL, ..., simplify = TRUE)
> > >
> > > > Arguments
> > > >
> > > > X
> > > >
> > > > an atomic object, typically a vector.
> > > >
> > > > INDEX
> > > >
> > > > list of factors, each of same length as X. The elements are 
coerced
> to
> > > factors by
> > > > as.factor.
> > > >
> > > > my error says:
> > >
> > > >
> > > > Error in tapply(DF[, 1:15], DF$Year, mean, na.rm = T) :
> > > >
> > > >   arguments must have same length
> > > >
> > > > The issue that I have is I dont understand what the requirements 
for
> the
> > > list of factors
> > > > are. In my example DF$Years is  a sequence of
> > > years..1979,1980,1982,1983, 1987..
> > > > like that with missing years: so when the manual say: list of
> factors
> > > each the same
> > > > length as X? what does that mean? I could have a DF with 20 rows 
and
> > > only two
> > > > different years. or 20 rows and 20 different years.
> > > >
> > > > Suppose:
> > > >
> > > > a<- c(1,2,3,4)
> > > > > b<-c(2,3,4,5)
> > > > > df=data.frame(a,b)
> > > > > length(df)
> > >
> > > data frame is not vector nor atomic but list hence length(df) gives
> you
> > > number of columns. It is similar to length of a list
> > >
> > > > lll<-list(a=1, b=2, c=3)
> > > > length(lll)
> > > [1] 3
> > > >
> > >
> > > If you accept that the first argument of tapply has to be vector you
> can
> > > not put data frame there.
> > >
> > > Next second argument has to be list of factors so you can put there
> > > several factors, each of the same length as first argument (a 
vector).
> > >
> > > If you want to perform aggregating operation on whole data frame you
> shall
> > > consider
> > >
> > > ?by or ?aggregate
> > >
> > > Other options are plyr or doBy packages.
> > >
> > > Syntax for aggregate is quite similar to tapply, only first argument
> can
> > > be data frame.
> > >
> > > Regards
> > > Petr
> > >
> > >
> > > >
> > > > The length of DF is 2.
> > > > Does that mean the "list of factors, each of same length as X."
> would
> > > have to be
> > > > 2? that doesnt seem to make sense.
> > > >
> > > >
> > > >
> > > > On Mon, Apr 26, 2010 at 12:26 AM, Petr PIKAL
> <petr.pikal at precheza.cz>
> > > wrote:
> > > > Hi
> > > >
> > > > r-help-bounces at r-project.org napsal dne 26.04.2010 06:52:55:
> > > >
> > > > > Having some difficulties with understanding how tapply works and
> > > getting
> > > > > return values I expect
> > > > >
> > > > > Data: dataframe. DF  DF$Id $D $Year.......
> > > > >
> > > > >  Id                          D  Year Jan Feb Mar Apr May Jun Jul
> Aug
> > > Sep
> > > > Oct
> > > > > Nov Dec
> > > > >  11264402000         1 1980  NA  NA  NA  NA  NA 212 203 209 228
> 237
> > >  NA
> > > > NA
> > > > >  11264402000         0 1981  NA  NA 243 244  NA  NA  NA  NA 225 
NA
> > > 231
> > > > NA
> > > > >  11264402000         1 1981  NA 251  NA 248 241  NA  NA  NA 235 
NA
> > >  NA
> > > > 245
> > > > >  11264402000         0 1982 236 237 242 240 242 205 199  NA  NA 
NA
> > >  NA
> > > > NA
> > > > >  11264402000         1 1982 236  NA  NA 240 242  NA  NA  NA  NA 
NA
> > >  NA
> > > > NA
> > > > >  11264402000         0 1983  NA 247  NA  NA  NA  NA  NA 205  NA 
NA
> > >  NA
> > > > NA
> > > > >  11264402000         1 1983  NA 247  NA  NA  NA  NA  NA  NA  NA
> 225
> > >  NA
> > > > NA
> > > > >  11264402000         0 1986  NA  NA  NA 240  NA  NA  NA 213  NA 
NA
> > >  NA
> > > > NA
> > > > >  11264402000         0 1987 241  NA  NA  NA  NA 218  NA  NA 235
> 243
> > > 240
> > > > NA
> > > > >  11264402000         1 1987  NA  NA  NA  NA  NA 218  NA  NA 235
> 243
> > > 240
> > > > NA
> > > > >  11264402000         3 1987  NA  NA  NA  NA  NA 218  NA  NA 235
> 243
> > > 240
> > > > NA
> > > > >  11264402000         0 1988 238 246 249  NA 244 213 212 224 232
> 238
> > > 232
> > > > 230
> > > > >  11264402000         1 1988 238 246 249 246 244 213 212 224 232 
NA
> > >  NA
> > > > 230
> > > > >  11264402000         3 1988 238 246 249 246 244 213 212 224 232 
NA
> > >  NA
> > > > 230
> > > > >  11264402000         0 1989 232 233 238 239 231  NA 215  NA  NA 
NA
> > >  NA
> > > > 238
> > > > >  11264402000         1 1989 232 233 238 239 231  NA  NA  NA  NA 
NA
> > >  NA
> > > > 238
> > > > >  11264402000         3 1989 232 233 238 239 231  NA  NA  NA  NA 
NA
> > >  NA
> > > > 238
> > > > >
> > > > > and the result should be a dataframe of column means by year 
 with
> the
> > > > > variable D dropped (or kept doesnt matter)
> > > > >
> > > > > 11264402000         1  1980  NA  NA  NA  NA  NA 212 203 209 228
> 237
> > >  NA
> > > > NA
> > > > >  11264402000        .5  1981  NA  NA 243 244  NA  NA  NA  NA 225
> NA
> > > 231
> > > >  NA
> > > > >  11264402000        .5  1982 236 237 242 240 242 205 199  NA  NA
> NA
> > >  NA
> > > >  NA
> > > > >  11264402000        .5  1983  NA 247  NA  NA  NA  NA  NA 205  NA
> 225
> > > NA
> > > > >  NA
> > > > >  11264402000        1  1986  NA  NA  NA 240  NA  NA  NA 213  NA 
NA
> > >  NA
> > > > NA
> > > > >  11264402000         2 1987 241  NA  NA  NA  NA 218  NA  NA 235
> 243
> > > 240
> > > > NA
> > > > >  11264402000        1.33 1988 238 246 249  246 244 213 212 224 
232
> 238
> > > > 232
> > > > > 230
> > > > >  11264402000        1.33  1989 232 233 238 239 231  NA 215  NA 
 NA
>  NA
> > > > NA
> > > > > 238
> > > > >
> > > > >  It would seem that Tapply should work
> > > > >  result<-tapply( DF[,1:15], DF$Year, colMeans,na.rm=T)
> > >
> > > > Why colMeans?  It is function used instead of apply(...,.. ,mean).
> > > >
> > > > Maybe you want
> > > >
> > > > result<-tapply( DF[,1:15], DF$Year, mean,na.rm=T)
> > > >
> > > > Regards
> > > > Petr
> > > >
> > > > >
> > > > >  but i get errors about the length of arguments, which
> > > > >
> > > > >    [[alternative HTML version deleted]]
> > > > >
> > > > > ______________________________________________
> > > > > R-help at r-project.org mailing list
> > > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > > PLEASE do read the posting guide
> > > > http://www.R-project.org/posting-guide.html
> > > > > and provide commented, minimal, self-contained, reproducible 
code.
> > >
> > >
> >
> >    [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list