[R] Take the maximum of every 12 columns

Tue Feb 20 22:12:32 CET 2018

It looks like OP uses a data.frame, so in order to use matrixStats
(I'm the author) one would have to pay the price to coerce to a matrix
before using matrixStats::rowMaxs().  However, if it is that the
original data could equally well live in a matrix, then matrixStats
should be computational efficient for this task.  (I've seen cases
where an original matrix was turned into a data.frame just because
that is what is commonly used elsewhere and because the user may not
pay attention to the differences between matrices and data.frame.)

If the original data would be a matrix 'X', then one can do the
following with matrixStats:

Y <- sapply(seq(from = 0, to = 2880, by = 12), FUN = function(offset) {
   rowMaxs(X, cols = offset + 1:12)
})

which avoids internal temporary copies required when using regular
subsetting, e.g.

Y <- sapply(seq(from = 0, to = 2880, by = 12), FUN = function(offset) {
   rowMaxs(X[, offset + 1:12])
})

Subsetting data frames by columns is already efficient, so the same
argument does not apply there.

/Henrik

On Tue, Feb 20, 2018 at 10:00 AM, Ista Zahn <istazahn at gmail.com> wrote:
> On Tue, Feb 20, 2018 at 11:58 AM, Bert Gunter <bgunter.4567 at gmail.com>
> wrote:
>
>> Ista, et. al: efficiency?
>> (Note: I needed to correct my previous post: do.call() is required for
>> pmax() over the data frame)
>>
>> > x <- data.frame(matrix(runif(12e6), ncol=12))
>>
>> > system.time(r1 <- do.call(pmax,x))
>>    user  system elapsed
>>   0.049   0.000   0.049
>>
>> > identical(r1,r2)
>> [1] FALSE
>> > system.time(r2 <- apply(x,1,max))
>>    user  system elapsed
>>   2.162   0.045   2.207
>>
>> ## 150 times slower!
>>
>> > identical(r1,r2)
>> [1] TRUE
>>
>> pmax() is there for a reason.
>> Or is there something I am missing?
>>
>
>
> Personal preference I think. I prefer the consistency of apply. If speed
> is an issue matrixStats is both consistent and fast:
>
> library(matrixStats)
> x <- matrix(runif(12e6), ncol=12)
>
> system.time(r1 <- do.call(pmax,as.data.frame(x)))
>   ##  user  system elapsed
>   ## 0.109   0.000   0.109
> system.time(r2 <- apply(x,1,max))
>   ##  user  system elapsed
>   ## 1.292   0.024   1.321
> system.time(r3 <- rowMaxs(x))
>   ##  user  system elapsed
>   ## 0.044   0.000   0.044
>
> pmax is a fine alternative for max special case.
>
> Best,
> Ista
>
>
>
>>
>> Cheers,
>> Bert
>>
>>
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along and
>> sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>> On Tue, Feb 20, 2018 at 8:16 AM, Miluji Sb <milujisb at gmail.com> wrote:
>>
>>> This is what I was looking for. Thank you everyone!
>>>
>>> Sincerely,
>>>
>>> Milu
>>>
>>>
>>> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> Mail
>>> priva di virus. www.avast.com
>>> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
>>> <#m_4297398466082743447_m_6071581590498622123_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>>>
>>> On Tue, Feb 20, 2018 at 5:10 PM, Ista Zahn <istazahn at gmail.com> wrote:
>>>
>>>> Hi Milu,
>>>>
>>>> byapply(df, 12, function(x) apply(x, 1, max))
>>>>
>>>> You might also be interested in the matrixStats package.
>>>>
>>>> Best,
>>>> Ista
>>>>
>>>> On Tue, Feb 20, 2018 at 9:55 AM, Miluji Sb <milujisb at gmail.com> wrote:
>>>> >  Dear all,
>>>> >
>>>> > I have monthly data in wide format, I am only providing data (at the
>>>> bottom
>>>> > of the email) for the first 24 columns but I have 2880 columns in
>>>> total.
>>>> >
>>>> > I would like to take max of every 12 columns. I have taken the mean of
>>>> > every 12 columns with the following code:
>>>> >
>>>> > byapply <- function(x, by, fun, ...)
>>>> > {
>>>> >   # Create index list
>>>> >   if (length(by) == 1)
>>>> >   {
>>>> >     nc <- ncol(x)
>>>> >     split.index <- rep(1:ceiling(nc / by), each = by, length.out = nc)
>>>> >   } else # 'by' is a vector of groups
>>>> >   {
>>>> >     nc <- length(by)
>>>> >     split.index <- by
>>>> >   }
>>>> >   index.list <- split(seq(from = 1, to = nc), split.index)
>>>> >
>>>> >   # Pass index list to fun using sapply() and return object
>>>> >   sapply(index.list, function(i)
>>>> >   {
>>>> >     do.call(fun, list(x[, i], ...))
>>>> >   })
>>>> > }
>>>> >
>>>> > ## Compute annual means
>>>> > y <- byapply(df, 12, rowMeans)
>>>> >
>>>> > How can I switch rowMeans with a command that takes the maximum? I am
>>>> a bit
>>>> > baffled. Any help will be appreciated. Thank you.
>>>> >
>>>> > Sincerely,
>>>> >
>>>> > Milu
>>>> >
>>>> > ###
>>>> > dput(droplevels(head(x, 5)))
>>>> > structure(list(X0 = c(295.812103271484, 297.672424316406,
>>>> 299.006805419922,
>>>> > 297.631500244141, 298.372741699219), X1 = c(295.361328125,
>>>> > 297.345092773438,
>>>> > 298.067504882812, 297.285339355469, 298.275268554688), X2 =
>>>> > c(294.279602050781,
>>>> > 296.401550292969, 296.777984619141, 296.089111328125, 297.540374755859
>>>> > ), X3 = c(292.103118896484, 294.253601074219, 293.773803710938,
>>>> > 293.916229248047, 296.129943847656), X4 = c(288.525024414062,
>>>> > 291.274505615234, 289.502777099609, 290.723388671875, 293.615112304688
>>>> > ), X5 = c(286.018371582031, 288.748565673828, 286.463134765625,
>>>> > 288.393951416016, 291.710266113281), X6 = c(285.550537109375,
>>>> > 288.159149169922, 285.976501464844, 287.999816894531, 291.228271484375
>>>> > ), X7 = c(289.136962890625, 290.751159667969, 290.170257568359,
>>>> > 291.796203613281, 293.423248291016), X8 = c(292.410003662109,
>>>> > 292.701263427734, 294.25244140625, 295.320404052734, 295.248199462891
>>>> > ), X9 = c(293.821655273438, 294.139068603516, 296.630157470703,
>>>> > 296.963531494141, 296.036224365234), X10 = c(294.532531738281,
>>>> > 295.366607666016, 297.677551269531, 296.715911865234, 296.564178466797
>>>> > ), X11 = c(295.869476318359, 297.010070800781, 299.330169677734,
>>>> > 297.627593994141, 297.964935302734), X12 = c(295.986236572266,
>>>> > 297.675567626953, 299.056671142578, 297.598907470703, 298.481842041016
>>>> > ), X13 = c(295.947601318359, 297.934448242188, 298.745391845703,
>>>> > 297.704925537109, 298.819091796875), X14 = c(294.654327392578,
>>>> > 296.722717285156, 297.0986328125, 296.508239746094, 297.822021484375
>>>> > ), X15 = c(292.176361083984, 294.49658203125, 293.888305664062,
>>>> > 294.172149658203, 296.117095947266 <(709)%20594-7266>), X16 =
>>>> c(288.400726318359,
>>>> > 291.029113769531, 289.361907958984, 290.566772460938, 293.554016113281
>>>> > ), X17 = c(285.665222167969, 288.293029785156, 286.118957519531,
>>>> > 288.105285644531, 291.429382324219), X18 = c(285.971252441406,
>>>> > 288.3798828125, 286.444580078125, 288.495880126953, 291.447326660156
>>>> > ), X19 = c(288.79296875, 290.357543945312, 289.657928466797,
>>>> > 291.449066162109, 293.095275878906), X20 = c(291.999877929688,
>>>> > 292.838348388672, 293.840362548828, 294.412322998047, 294.941253662109
>>>> > ), X21 = c(293.615447998047, 294.028106689453, 296.072296142578,
>>>> > 296.447387695312, 295.824615478516), X22 = c(294.719848632812,
>>>> > 295.392028808594, 297.453216552734, 297.114288330078, 296.883209228516
>>>> > ), X23 = c(295.634429931641, 296.783294677734, 298.592346191406,
>>>> > 297.469512939453, 297.832122802734)), .Names = c("X0", "X1",
>>>> > "X2", "X3", "X4", "X5", "X6", "X7", "X8", "X9", "X10", "X11",
>>>> > "X12", "X13", "X14", "X15", "X16", "X17", "X18", "X19", "X20",
>>>> > "X21", "X22", "X23"), row.names = c(NA, 5L), class = "data.frame")
>>>> >
>>>> > <https://www.avast.com/sig-email?utm_medium=email&utm_source
>>>> =link&utm_campaign=sig-email&utm_content=webmail>
>>>> > Mail
>>>> > priva di virus. www.avast.com
>>>> > <https://www.avast.com/sig-email?utm_medium=email&utm_source
>>>> =link&utm_campaign=sig-email&utm_content=webmail>
>>>> > <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>>>> >
>>>> >         [[alternative HTML version deleted]]
>>>> >
>>>> > ______________________________________________
>>>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> > https://stat.ethz.ch/mailman/listinfo/r-help
>>>> > PLEASE do read the posting guide http://www.R-project.org/posti
>>>> ng-guide.html
>>>> > and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.