[R] Replacing for loop with tapply!?
Kjetil Brinchmann Halvorsen
kjetil at acelerate.com
Fri Jun 10 17:55:53 CEST 2005
Sander Oom wrote:
>Dear all,
>
>We have a large data set with temperature data for weather stations
>across the globe (15000 stations).
>
>For each station, we need to calculate the number of days a certain
>temperature is exceeded.
>
>So far we used the following S code, where mat88 is a matrix containing
>rows of 365 daily temperatures for each of 15000 weather stations:
>
> m <- 37
> n <- 2
> outmat88 <- matrix(0, ncol = 4, nrow = nrow(mat88))
> for(i in 1:nrow(mat88)) {
> # i <- 3
> row1 <- as.data.frame(df88[i, ])
> temprow37 <- select.rows(row1, row1 > m)
> temprow39 <- select.rows(row1, row1 > m + n)
> temprow41 <- select.rows(row1, row1 > m + 2 * n)
> outmat88[i, 1] <- max(row1, na.rm = T)
> outmat88[i, 2] <- count.rows(temprow37)
> outmat88[i, 3] <- count.rows(temprow39)
> outmat88[i, 4] <- count.rows(temprow41)
> }
> outmat88
>
>
>
What you need is not tapply but apply. Something like
apply(mat88, 1, function(x) sum(x > 30))
where your treshold should replace 30 and the `1' refers to rows. For
multiple tresholds:
apply(mat88, 1, function(x) c( sum(x>20), sum(x>25), sum(x>30)))
Kjetil
>We have transferred the data to a more potent Linux box running R, but
>still hope to speed up the code.
>
>I know a for loop should be avoided when looking for speed. I also know
>the answer is in something like tapply, but my understanding of these
>commands is still to limited to see the solution. Could someone show me
>the way!?
>
>Thanks in advance,
>
>Sander.
>
>
--
Kjetil Halvorsen.
Peace is the most effective weapon of mass construction.
-- Mahdi Elmandjra
--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
More information about the R-help
mailing list