[R] Tapply.
Petr PIKAL
petr.pikal at precheza.cz
Mon Apr 26 11:43:32 CEST 2010
Hi
steven mosher <moshersteven at gmail.com> napsal dne 26.04.2010 10:21:37:
> That fails:
>
> The manual says:
>
> tapply(X, INDEX, FUN = NULL, ..., simplify = TRUE)
> Arguments
>
> X
>
> an atomic object, typically a vector.
>
> INDEX
>
> list of factors, each of same length as X. The elements are coerced to
factors by
> as.factor.
>
> my error says:
>
> Error in tapply(DF[, 1:15], DF$Year, mean, na.rm = T) :
>
> arguments must have same length
>
> The issue that I have is I dont understand what the requirements for the
list of factors
> are. In my example DF$Years is a sequence of
years..1979,1980,1982,1983, 1987..
> like that with missing years: so when the manual say: list of factors
each the same
> length as X? what does that mean? I could have a DF with 20 rows and
only two
> different years. or 20 rows and 20 different years.
>
> Suppose:
>
> a<- c(1,2,3,4)
> > b<-c(2,3,4,5)
> > df=data.frame(a,b)
> > length(df)
data frame is not vector nor atomic but list hence length(df) gives you
number of columns. It is similar to length of a list
> lll<-list(a=1, b=2, c=3)
> length(lll)
[1] 3
>
If you accept that the first argument of tapply has to be vector you can
not put data frame there.
Next second argument has to be list of factors so you can put there
several factors, each of the same length as first argument (a vector).
If you want to perform aggregating operation on whole data frame you shall
consider
?by or ?aggregate
Other options are plyr or doBy packages.
Syntax for aggregate is quite similar to tapply, only first argument can
be data frame.
Regards
Petr
>
> The length of DF is 2.
> Does that mean the "list of factors, each of same length as X." would
have to be
> 2? that doesnt seem to make sense.
>
>
>
> On Mon, Apr 26, 2010 at 12:26 AM, Petr PIKAL <petr.pikal at precheza.cz>
wrote:
> Hi
>
> r-help-bounces at r-project.org napsal dne 26.04.2010 06:52:55:
>
> > Having some difficulties with understanding how tapply works and
getting
> > return values I expect
> >
> > Data: dataframe. DF DF$Id $D $Year.......
> >
> > Id D Year Jan Feb Mar Apr May Jun Jul Aug
Sep
> Oct
> > Nov Dec
> > 11264402000 1 1980 NA NA NA NA NA 212 203 209 228 237
NA
> NA
> > 11264402000 0 1981 NA NA 243 244 NA NA NA NA 225 NA
231
> NA
> > 11264402000 1 1981 NA 251 NA 248 241 NA NA NA 235 NA
NA
> 245
> > 11264402000 0 1982 236 237 242 240 242 205 199 NA NA NA
NA
> NA
> > 11264402000 1 1982 236 NA NA 240 242 NA NA NA NA NA
NA
> NA
> > 11264402000 0 1983 NA 247 NA NA NA NA NA 205 NA NA
NA
> NA
> > 11264402000 1 1983 NA 247 NA NA NA NA NA NA NA 225
NA
> NA
> > 11264402000 0 1986 NA NA NA 240 NA NA NA 213 NA NA
NA
> NA
> > 11264402000 0 1987 241 NA NA NA NA 218 NA NA 235 243
240
> NA
> > 11264402000 1 1987 NA NA NA NA NA 218 NA NA 235 243
240
> NA
> > 11264402000 3 1987 NA NA NA NA NA 218 NA NA 235 243
240
> NA
> > 11264402000 0 1988 238 246 249 NA 244 213 212 224 232 238
232
> 230
> > 11264402000 1 1988 238 246 249 246 244 213 212 224 232 NA
NA
> 230
> > 11264402000 3 1988 238 246 249 246 244 213 212 224 232 NA
NA
> 230
> > 11264402000 0 1989 232 233 238 239 231 NA 215 NA NA NA
NA
> 238
> > 11264402000 1 1989 232 233 238 239 231 NA NA NA NA NA
NA
> 238
> > 11264402000 3 1989 232 233 238 239 231 NA NA NA NA NA
NA
> 238
> >
> > and the result should be a dataframe of column means by year with the
> > variable D dropped (or kept doesnt matter)
> >
> > 11264402000 1 1980 NA NA NA NA NA 212 203 209 228 237
NA
> NA
> > 11264402000 .5 1981 NA NA 243 244 NA NA NA NA 225 NA
231
> NA
> > 11264402000 .5 1982 236 237 242 240 242 205 199 NA NA NA
NA
> NA
> > 11264402000 .5 1983 NA 247 NA NA NA NA NA 205 NA 225
NA
> > NA
> > 11264402000 1 1986 NA NA NA 240 NA NA NA 213 NA NA
NA
> NA
> > 11264402000 2 1987 241 NA NA NA NA 218 NA NA 235 243
240
> NA
> > 11264402000 1.33 1988 238 246 249 246 244 213 212 224 232 238
> 232
> > 230
> > 11264402000 1.33 1989 232 233 238 239 231 NA 215 NA NA NA
> NA
> > 238
> >
> > It would seem that Tapply should work
> > result<-tapply( DF[,1:15], DF$Year, colMeans,na.rm=T)
> Why colMeans? It is function used instead of apply(...,.. ,mean).
>
> Maybe you want
>
> result<-tapply( DF[,1:15], DF$Year, mean,na.rm=T)
>
> Regards
> Petr
>
> >
> > but i get errors about the length of arguments, which
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list