[R] vectorization of loops in R
Jan van der Laan
rhep @endng rom eoo@@dd@@n
Wed Nov 17 14:32:04 CET 2021
Have a look at the base functions tapply and aggregate.
For example see:

https://cran.rproject.org/doc/manuals/rrelease/Rintro.html#Thefunctiontapply_0028_0029andraggedarrays
,
 https://online.stat.psu.edu/stat484/lesson/9/9.2,
 or ?tapply and ?aggregate.
Also your current code seems to contain an error: `s = df[df$y == i,]`
should be `s = df$z[df$y == i]` I think.
HTH,
Jan
On 17112021 14:20, Luigi Marongiu wrote:
> Hello,
> I have a dataframe with 3 variables. I want to loop through it to get
> the mean value of the variable `z`, as follows:
> ```
> df = data.frame(x = c(rep(1,5), rep(2,5), rep(3,5)),
> y = rep(letters[1:5],3),
> z = rnorm(15),
> stringsAsFactors = FALSE)
> m = vector()
> for (i in unique(df$y)) {
> s = df[df$y == i,]
> m = append(m, mean(s$z))
> }
> names(m) = unique(df$y)
>> (m)
> a b c d e
> 0.6355382 0.4218053 0.7256680 0.8320783 0.2587004
> ```
> The problem is that I have one million `y` values, so the work takes
> almost a day. I understand that vectorization will speed up the
> procedure. But how shall I write the procedure in vectorial terms?
> Thank you
>
> ______________________________________________
> Rhelp using rproject.org mailing list  To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/rhelp
> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html
> and provide commented, minimal, selfcontained, reproducible code.
>
More information about the Rhelp
mailing list