[R] Take the maximum of every 12 columns
Miluji Sb
milujisb at gmail.com
Wed Feb 21 12:00:38 CET 2018
Dear Henrik,
This is great - thank you so much!
Sincerely,
Milu
On Tue, Feb 20, 2018 at 10:12 PM, Henrik Bengtsson <
henrik.bengtsson at gmail.com> wrote:
> It looks like OP uses a data.frame, so in order to use matrixStats
> (I'm the author) one would have to pay the price to coerce to a matrix
> before using matrixStats::rowMaxs(). However, if it is that the
> original data could equally well live in a matrix, then matrixStats
> should be computational efficient for this task. (I've seen cases
> where an original matrix was turned into a data.frame just because
> that is what is commonly used elsewhere and because the user may not
> pay attention to the differences between matrices and data.frame.)
>
> If the original data would be a matrix 'X', then one can do the
> following with matrixStats:
>
> Y <- sapply(seq(from = 0, to = 2880, by = 12), FUN = function(offset) {
> rowMaxs(X, cols = offset + 1:12)
> })
>
> which avoids internal temporary copies required when using regular
> subsetting, e.g.
>
> Y <- sapply(seq(from = 0, to = 2880, by = 12), FUN = function(offset) {
> rowMaxs(X[, offset + 1:12])
> })
>
> Subsetting data frames by columns is already efficient, so the same
> argument does not apply there.
>
> /Henrik
>
> On Tue, Feb 20, 2018 at 10:00 AM, Ista Zahn <istazahn at gmail.com> wrote:
> > On Tue, Feb 20, 2018 at 11:58 AM, Bert Gunter <bgunter.4567 at gmail.com>
> > wrote:
> >
> >> Ista, et. al: efficiency?
> >> (Note: I needed to correct my previous post: do.call() is required for
> >> pmax() over the data frame)
> >>
> >> > x <- data.frame(matrix(runif(12e6), ncol=12))
> >>
> >> > system.time(r1 <- do.call(pmax,x))
> >> user system elapsed
> >> 0.049 0.000 0.049
> >>
> >> > identical(r1,r2)
> >> [1] FALSE
> >> > system.time(r2 <- apply(x,1,max))
> >> user system elapsed
> >> 2.162 0.045 2.207
> >>
> >> ## 150 times slower!
> >>
> >> > identical(r1,r2)
> >> [1] TRUE
> >>
> >> pmax() is there for a reason.
> >> Or is there something I am missing?
> >>
> >
> >
> > Personal preference I think. I prefer the consistency of apply. If speed
> > is an issue matrixStats is both consistent and fast:
> >
> > library(matrixStats)
> > x <- matrix(runif(12e6), ncol=12)
> >
> > system.time(r1 <- do.call(pmax,as.data.frame(x)))
> > ## user system elapsed
> > ## 0.109 0.000 0.109
> > system.time(r2 <- apply(x,1,max))
> > ## user system elapsed
> > ## 1.292 0.024 1.321
> > system.time(r3 <- rowMaxs(x))
> > ## user system elapsed
> > ## 0.044 0.000 0.044
> >
> > pmax is a fine alternative for max special case.
> >
> > Best,
> > Ista
> >
> >
> >
> >>
> >> Cheers,
> >> Bert
> >>
> >>
> >>
> >> Bert Gunter
> >>
> >> "The trouble with having an open mind is that people keep coming along
> and
> >> sticking things into it."
> >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >>
> >> On Tue, Feb 20, 2018 at 8:16 AM, Miluji Sb <milujisb at gmail.com> wrote:
> >>
> >>> This is what I was looking for. Thank you everyone!
> >>>
> >>> Sincerely,
> >>>
> >>> Milu
> >>>
> >>>
> >>> <https://www.avast.com/sig-email?utm_medium=email&utm_
> source=link&utm_campaign=sig-email&utm_content=webmail> Mail
> >>> priva di virus. www.avast.com
> >>> <https://www.avast.com/sig-email?utm_medium=email&utm_
> source=link&utm_campaign=sig-email&utm_content=webmail>
> >>> <#m_4297398466082743447_m_6071581590498622123_DAB4FAD8-
> 2DD7-40BB-A1B8-4E2AA1F9FDF2>
> >>>
> >>> On Tue, Feb 20, 2018 at 5:10 PM, Ista Zahn <istazahn at gmail.com> wrote:
> >>>
> >>>> Hi Milu,
> >>>>
> >>>> byapply(df, 12, function(x) apply(x, 1, max))
> >>>>
> >>>> You might also be interested in the matrixStats package.
> >>>>
> >>>> Best,
> >>>> Ista
> >>>>
> >>>> On Tue, Feb 20, 2018 at 9:55 AM, Miluji Sb <milujisb at gmail.com>
> wrote:
> >>>> > Dear all,
> >>>> >
> >>>> > I have monthly data in wide format, I am only providing data (at the
> >>>> bottom
> >>>> > of the email) for the first 24 columns but I have 2880 columns in
> >>>> total.
> >>>> >
> >>>> > I would like to take max of every 12 columns. I have taken the mean
> of
> >>>> > every 12 columns with the following code:
> >>>> >
> >>>> > byapply <- function(x, by, fun, ...)
> >>>> > {
> >>>> > # Create index list
> >>>> > if (length(by) == 1)
> >>>> > {
> >>>> > nc <- ncol(x)
> >>>> > split.index <- rep(1:ceiling(nc / by), each = by, length.out =
> nc)
> >>>> > } else # 'by' is a vector of groups
> >>>> > {
> >>>> > nc <- length(by)
> >>>> > split.index <- by
> >>>> > }
> >>>> > index.list <- split(seq(from = 1, to = nc), split.index)
> >>>> >
> >>>> > # Pass index list to fun using sapply() and return object
> >>>> > sapply(index.list, function(i)
> >>>> > {
> >>>> > do.call(fun, list(x[, i], ...))
> >>>> > })
> >>>> > }
> >>>> >
> >>>> > ## Compute annual means
> >>>> > y <- byapply(df, 12, rowMeans)
> >>>> >
> >>>> > How can I switch rowMeans with a command that takes the maximum? I
> am
> >>>> a bit
> >>>> > baffled. Any help will be appreciated. Thank you.
> >>>> >
> >>>> > Sincerely,
> >>>> >
> >>>> > Milu
> >>>> >
> >>>> > ###
> >>>> > dput(droplevels(head(x, 5)))
> >>>> > structure(list(X0 = c(295.812103271484, 297.672424316406,
> >>>> 299.006805419922,
> >>>> > 297.631500244141, 298.372741699219), X1 = c(295.361328125,
> >>>> > 297.345092773438,
> >>>> > 298.067504882812, 297.285339355469, 298.275268554688), X2 =
> >>>> > c(294.279602050781,
> >>>> > 296.401550292969, 296.777984619141, 296.089111328125,
> 297.540374755859
> >>>> > ), X3 = c(292.103118896484, 294.253601074219, 293.773803710938,
> >>>> > 293.916229248047, 296.129943847656), X4 = c(288.525024414062,
> >>>> > 291.274505615234, 289.502777099609, 290.723388671875,
> 293.615112304688
> >>>> > ), X5 = c(286.018371582031, 288.748565673828, 286.463134765625,
> >>>> > 288.393951416016, 291.710266113281), X6 = c(285.550537109375,
> >>>> > 288.159149169922, 285.976501464844, 287.999816894531,
> 291.228271484375
> >>>> > ), X7 = c(289.136962890625, 290.751159667969, 290.170257568359,
> >>>> > 291.796203613281, 293.423248291016), X8 = c(292.410003662109,
> >>>> > 292.701263427734, 294.25244140625, 295.320404052734,
> 295.248199462891
> >>>> > ), X9 = c(293.821655273438, 294.139068603516, 296.630157470703,
> >>>> > 296.963531494141, 296.036224365234), X10 = c(294.532531738281,
> >>>> > 295.366607666016, 297.677551269531, 296.715911865234,
> 296.564178466797
> >>>> > ), X11 = c(295.869476318359, 297.010070800781, 299.330169677734,
> >>>> > 297.627593994141, 297.964935302734), X12 = c(295.986236572266,
> >>>> > 297.675567626953, 299.056671142578, 297.598907470703,
> 298.481842041016
> >>>> > ), X13 = c(295.947601318359, 297.934448242188, 298.745391845703,
> >>>> > 297.704925537109, 298.819091796875), X14 = c(294.654327392578,
> >>>> > 296.722717285156, 297.0986328125, 296.508239746094, 297.822021484375
> >>>> > ), X15 = c(292.176361083984, 294.49658203125, 293.888305664062,
> >>>> > 294.172149658203, 296.117095947266 <(709)%20594-7266>), X16 =
> >>>> c(288.400726318359,
> >>>> > 291.029113769531, 289.361907958984, 290.566772460938,
> 293.554016113281
> >>>> > ), X17 = c(285.665222167969, 288.293029785156, 286.118957519531,
> >>>> > 288.105285644531, 291.429382324219), X18 = c(285.971252441406,
> >>>> > 288.3798828125, 286.444580078125, 288.495880126953, 291.447326660156
> >>>> > ), X19 = c(288.79296875, 290.357543945312, 289.657928466797,
> >>>> > 291.449066162109, 293.095275878906), X20 = c(291.999877929688,
> >>>> > 292.838348388672, 293.840362548828, 294.412322998047,
> 294.941253662109
> >>>> > ), X21 = c(293.615447998047, 294.028106689453, 296.072296142578,
> >>>> > 296.447387695312, 295.824615478516), X22 = c(294.719848632812,
> >>>> > 295.392028808594, 297.453216552734, 297.114288330078,
> 296.883209228516
> >>>> > ), X23 = c(295.634429931641, 296.783294677734, 298.592346191406,
> >>>> > 297.469512939453, 297.832122802734)), .Names = c("X0", "X1",
> >>>> > "X2", "X3", "X4", "X5", "X6", "X7", "X8", "X9", "X10", "X11",
> >>>> > "X12", "X13", "X14", "X15", "X16", "X17", "X18", "X19", "X20",
> >>>> > "X21", "X22", "X23"), row.names = c(NA, 5L), class = "data.frame")
> >>>> >
> >>>> > <https://www.avast.com/sig-email?utm_medium=email&utm_source
> >>>> =link&utm_campaign=sig-email&utm_content=webmail>
> >>>> > Mail
> >>>> > priva di virus. www.avast.com
> >>>> > <https://www.avast.com/sig-email?utm_medium=email&utm_source
> >>>> =link&utm_campaign=sig-email&utm_content=webmail>
> >>>> > <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
> >>>> >
> >>>> > [[alternative HTML version deleted]]
> >>>> >
> >>>> > ______________________________________________
> >>>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>>> > https://stat.ethz.ch/mailman/listinfo/r-help
> >>>> > PLEASE do read the posting guide http://www.R-project.org/posti
> >>>> ng-guide.html
> >>>> > and provide commented, minimal, self-contained, reproducible code.
> >>>>
> >>>
> >>>
> >>
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list