[R] vectorization of loops in R

Kevin Thorpe kev|n@thorpe @end|ng |rom utoronto@c@
Wed Nov 17 14:28:15 CET 2021


If I follow what you are trying to do, you want the mean of z for each value of y.

tapply(df$z, df$y, mean)


> On Nov 17, 2021, at 8:20 AM, Luigi Marongiu <marongiu.luigi using gmail.com> wrote:
> 
> Hello,
> I have a dataframe with 3 variables. I want to loop through it to get
> the mean value of the variable `z`, as follows:
> ```
> df = data.frame(x = c(rep(1,5), rep(2,5), rep(3,5)),
> y = rep(letters[1:5],3),
> z = rnorm(15),
> stringsAsFactors = FALSE)
> m = vector()
> for (i in unique(df$y)) {
> s = df[df$y == i,]
> m = append(m, mean(s$z))
> }
> names(m) = unique(df$y)
>> (m)
> a          b          c          d          e
> -0.6355382 -0.4218053 -0.7256680 -0.8320783 -0.2587004
> ```
> The problem is that I have one million `y` values, so the work takes
> almost a day. I understand that vectorization will speed up the
> procedure. But how shall I write the procedure in vectorial terms?
> Thank you
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Kevin E. Thorpe
Head of Biostatistics,  Applied Health Research Centre (AHRC)
Li Ka Shing Knowledge Institute of St. Michael’s Hospital
Assistant Professor, Dalla Lana School of Public Health
University of Toronto
email: kevin.thorpe using utoronto.ca  Tel: 416.864.5776  Fax: 416.864.3016



More information about the R-help mailing list