[R] vectorization of loops in R

Jan van der Laan rhe|p @end|ng |rom eoo@@dd@@n|
Wed Nov 17 14:32:04 CET 2021


Have a look at the base functions tapply and aggregate.

For example see:
- 
https://cran.r-project.org/doc/manuals/r-release/R-intro.html#The-function-tapply_0028_0029-and-ragged-arrays 
,
- https://online.stat.psu.edu/stat484/lesson/9/9.2,
- or ?tapply and ?aggregate.

Also your current code seems to contain an error: `s = df[df$y == i,]` 
should be `s = df$z[df$y == i]` I think.

HTH,
Jan






On 17-11-2021 14:20, Luigi Marongiu wrote:
> Hello,
> I have a dataframe with 3 variables. I want to loop through it to get
> the mean value of the variable `z`, as follows:
> ```
> df = data.frame(x = c(rep(1,5), rep(2,5), rep(3,5)),
> y = rep(letters[1:5],3),
> z = rnorm(15),
> stringsAsFactors = FALSE)
> m = vector()
> for (i in unique(df$y)) {
> s = df[df$y == i,]
> m = append(m, mean(s$z))
> }
> names(m) = unique(df$y)
>> (m)
> a          b          c          d          e
> -0.6355382 -0.4218053 -0.7256680 -0.8320783 -0.2587004
> ```
> The problem is that I have one million `y` values, so the work takes
> almost a day. I understand that vectorization will speed up the
> procedure. But how shall I write the procedure in vectorial terms?
> Thank you
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list