[R] Replacing for loop with tapply!?

Kjetil Brinchmann Halvorsen kjetil at acelerate.com
Fri Jun 10 17:55:53 CEST 2005


Sander Oom wrote:

>Dear all,
>
>We have a large data set with temperature data for weather stations 
>across the globe (15000 stations).
>
>For each station, we need to calculate the number of days a certain 
>temperature is exceeded.
>
>So far we used the following S code, where mat88 is a matrix containing 
>rows of 365 daily temperatures for each of 15000 weather stations:
>
>	m <- 37
>	n <- 2
>	outmat88 <- matrix(0, ncol = 4, nrow = nrow(mat88))
>	for(i in 1:nrow(mat88)) {
>		# i <- 3
>		row1 <- as.data.frame(df88[i,  ])
>		temprow37 <- select.rows(row1, row1 > m)
>		temprow39 <- select.rows(row1, row1 > m + n)
>		temprow41 <- select.rows(row1, row1 > m + 2 * n)
>		outmat88[i, 1] <- max(row1, na.rm = T)
>		outmat88[i, 2] <- count.rows(temprow37)
>		outmat88[i, 3] <- count.rows(temprow39)
>		outmat88[i, 4] <- count.rows(temprow41)
>	}
>	outmat88
>
>  
>
What you need is not tapply but apply. Something like
   apply(mat88, 1, function(x) sum(x > 30))

where your treshold should replace 30 and the `1' refers to rows. For 
multiple tresholds:

apply(mat88, 1, function(x) c( sum(x>20), sum(x>25), sum(x>30)))

Kjetil

>We have transferred the data to a more potent Linux box running R, but 
>still hope to speed up the code.
>
>I know a for loop should be avoided when looking for speed. I also know 
>the answer is in something like tapply, but my understanding of these 
>commands is still to limited to see the solution. Could someone show me 
>the way!?
>
>Thanks in advance,
>
>Sander.
>  
>


-- 

Kjetil Halvorsen.

Peace is the most effective weapon of mass construction.
               --  Mahdi Elmandjra





-- 
No virus found in this outgoing message.
Checked by AVG Anti-Virus.




More information about the R-help mailing list