[R] Output of tapply function as data frame

Deepayan Sarkar deep@y@n@@@rk@r @end|ng |rom gm@||@com
Thu Mar 28 03:40:37 CET 2024


For more complicated examples, the (relatively new) array2DF()
function is also useful:

> with(data, tapply(count, Date, mean)) |> array2DF()
        Var1    Value
1 2024-03-23 5.416667
2 2024-03-24 5.500000
3 2024-03-25 6.000000
4 2024-03-26 4.476190
5 2024-03-27 6.538462
6 2024-03-28 5.200000

or

> tapply(data, ~ Date, with, mean(count)) |> array2DF(responseName = "count")
        Date    count
1 2024-03-23 5.416667
2 2024-03-24 5.500000
3 2024-03-25 6.000000
4 2024-03-26 4.476190
5 2024-03-27 6.538462
6 2024-03-28 5.200000

Best,
-Deepayan

On Wed, 27 Mar 2024 at 13:15, Rui Barradas <ruipbarradas using sapo.pt> wrote:
>
> Às 04:30 de 27/03/2024, Ogbos Okike escreveu:
> > Warm greetings to you all.
> >
> > Using the tapply function below:
> > data<-read.table("FD1month",col.names = c("Dates","count"))
> > x=data$count
> >   f<-factor(data$Dates)
> > AB<- tapply(x,f,mean)
> >
> >
> > I made a simple calculation. The result, stored in AB, is of the form
> > below. But an effort to write AB to a file as a data frame fails. When I
> > use the write table, it only produces the count column and strip of the
> > first column (date).
> >
> > 2005-11-01 2005-12-01 2006-01-01 2006-02-01 2006-03-01 2006-04-01
> > 2006-05-01
> >   -4.106887  -4.259154  -5.836090  -4.756757  -4.118011  -4.487942
> >   -4.430705
> > 2006-06-01 2006-07-01 2006-08-01 2006-09-01 2006-10-01 2006-11-01
> > 2006-12-01
> >   -3.856727  -6.067103  -6.418767  -4.383031  -3.985805  -4.768196
> > -10.072579
> > 2007-01-01 2007-02-01 2007-03-01 2007-04-01 2007-05-01 2007-06-01
> > 2007-07-01
> >   -5.342338  -4.653128  -4.325094  -4.525373  -4.574783  -3.915600
> >   -4.127980
> > 2007-08-01 2007-09-01 2007-10-01 2007-11-01 2007-12-01 2008-01-01
> > 2008-02-01
> >   -3.952150  -4.033518  -4.532878  -4.522941  -4.485693  -3.922155
> >   -4.183578
> > 2008-03-01 2008-04-01 2008-05-01 2008-06-01 2008-07-01 2008-08-01
> > 2008-09-01
> >   -4.336969  -3.813306  -4.296579  -4.575095  -4.036036  -4.727994
> >   -4.347428
> > 2008-10-01 2008-11-01 2008-12-01
> >   -4.029918  -4.260326  -4.454224
> >
> > But the normal format I wish to display only appears on the terminal,
> > leading me to copy it and paste into a text file. That is, when I enter AB
> > on the terminal, it returns a format in the form:
> >
> > 008-02-01  -4.183578
> > 2008-03-01  -4.336969
> > 2008-04-01  -3.813306
> > 2008-05-01  -4.296579
> > 2008-06-01  -4.575095
> > 2008-07-01  -4.036036
> > 2008-08-01  -4.727994
> > 2008-09-01  -4.347428
> > 2008-10-01  -4.029918
> > 2008-11-01  -4.260326
> > 2008-12-01  -4.454224
> >
> > Now, my question: How do I write out two columns displayed by AB on the
> > terminal to a file?
> >
> > I have tried using AB<-data.frame(AB) but it doesn't work either.
> >
> > Many thanks for your time.
> > Ogbos
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> Hello,
>
> The main trick is to pipe to as.data.frame. But the result will have one
> column only, you must assign the dates from the df's row names.
> I also include an aggregate solution.
>
>
>
> # create a test data set
> set.seed(2024)
> data <- data.frame(
>    Date = sample(seq(Sys.Date() - 5, Sys.Date(), by = "1 days"), 100L,
> TRUE),
>    count = sample(10L, 100L, TRUE)
> )
>
> # coerce tapply's result to class "data.frame"
> res <- with(data, tapply(count, Date, mean)) |> as.data.frame()
> # assign a dates column from the row names
> res$Date <- row.names(res)
> # cosmetics
> names(res)[2:1] <- names(data)
> # note that the row names are still tapply's names vector
> # and that the columns order is not Date/count. Both are fixed
> # after the calculations.
> res
> #>               count       Date
> #> 2024-03-22 5.416667 2024-03-22
> #> 2024-03-23 5.500000 2024-03-23
> #> 2024-03-24 6.000000 2024-03-24
> #> 2024-03-25 4.476190 2024-03-25
> #> 2024-03-26 6.538462 2024-03-26
> #> 2024-03-27 5.200000 2024-03-27
>
> # fix the columns' order
> res <- res[2:1]
>
>
>
> # better all in one instruction
> aggregate(count ~ Date, data, mean)
> #>         Date    count
> #> 1 2024-03-22 5.416667
> #> 2 2024-03-23 5.500000
> #> 3 2024-03-24 6.000000
> #> 4 2024-03-25 4.476190
> #> 5 2024-03-26 6.538462
> #> 6 2024-03-27 5.200000
>
>
>
> Also,
> I'm glad to help as always but Ogbos, you have been an R-Help
> contributor for quite a while, please post data in dput format. Given
> the problem the output of the following is more than enough.
>
>
> dput(head(data, 20L))
>
>
> Hope this helps,
>
> Rui Barradas
>
>
> --
> Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus.
> www.avg.com
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list