[R] Replacing for loop with tapply!?

Petr Pikal petr.pikal at precheza.cz
Sat Jun 11 12:28:38 CEST 2005


Hi

On 10 Jun 2005 at 20:05, Sander Oom wrote:

> Dear all,
> 
> Dimitris and Andy, thanks for your great help. I have progressed to
> the following code which runs very fast and effective:
> 
> mat <- matrix(sample(-15:50, 15 * 10, TRUE), 15, 10)
> mat[mat>45] <- NA

> mat<-NA

By this you redefine mat as 

> str(mat)
 logi NA
>

and your code gives an error that it has to have some dimensions

+      apply(mat, 1, max, na.rm=TRUE))
Error in rowSums(mat > temp, na.rm = TRUE) : 
        'x' must be an array of at least two dimensions
>

If your matrix has one row full of NA's it only complains but 
computes a value. 

> mat[3,]<-NA
> temps <- c(35, 37, 39)
> ind <- rbind(
+      t(sapply(temps, function(temp)
+        rowSums(mat > temp, na.rm=TRUE) )),
+      rowSums(!is.na(mat), na.rm=FALSE),
+      apply(mat, 1, max, na.rm=TRUE))
Warning message:
no finite arguments to max; returning -Inf 
> ind <- t(ind)
> ind

> ind
      [,1] [,2] [,3] [,4] [,5]
 [1,]    5    5    3    9   48
 [2,]    1    1    1    9   42
 [3,]    0    0    0    0 -Inf
 
> mat
> temps <- c(35, 37, 39)
> ind <- rbind(
>      t(sapply(temps, function(temp)
>        rowSums(mat > temp, na.rm=TRUE) )),
>      rowSums(!is.na(mat), na.rm=FALSE),
>      apply(mat, 1, max, na.rm=TRUE))
> ind <- t(ind)
> ind
> 
> However, some weather stations have missing values for the whole year.
> Unfortunately, the code breaks down (when uncommenting mat<-NA).
> 
> I have tried 'ifelse' statements in the functions, but it becomes even
> more of a mess. I could subset the matrix before hand, but this would
> mean merging with a complete matrix afterwards to make it compatible
> with other years. That would slow things down.
> 
> How can I make the code robust for rows containing all missing values?


which(rowSums(!is.na(mat))==0) 
This gives you indices which lines of your matrix has all values NA 
and you can use it for fine tuning of your code. What you need to 
do depends on what results do you want, how ind matrix should 
look like after processing mat with one or more rows full of NA's.

HTH
Petr


> 
> Thanks for your help,
> 
> Sander.
> 
> Dimitris Rizopoulos wrote:
> > for the maximum you could use something like:
> > 
> > ind[, 1] <- apply(mat, 2, max)
> > 
> > I hope it helps.
> > 
> > Best,
> > Dimitris
> > 
> > ----
> > Dimitris Rizopoulos
> > Ph.D. Student
> > Biostatistical Centre
> > School of Public Health
> > Catholic University of Leuven
> > 
> > Address: Kapucijnenvoer 35, Leuven, Belgium
> > Tel: +32/16/336899
> > Fax: +32/16/337015
> > Web: http://www.med.kuleuven.ac.be/biostat/
> >      http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm
> > 
> > 
> > 
> > ----- Original Message ----- 
> > From: "Sander Oom" <slist at oomvanlieshout.net>
> > To: "Dimitris Rizopoulos" <dimitris.rizopoulos at med.kuleuven.be> Cc:
> > <r-help at stat.math.ethz.ch> Sent: Friday, June 10, 2005 12:10 PM
> > Subject: Re: [R] Replacing for loop with tapply!?
> > 
> > 
> >>Thanks Dimitris,
> >>
> >>Very impressive! Much faster than before.
> >>
> >>Thanks to new found R.basic, I can simply rotate the result with
> >>rotate270{R.basic}:
> >>
> >>>mat <- matrix(sample(-15:50, 365 * 15000, TRUE), 365, 15000)
> >>>temps <- c(37, 39, 41)
> >>>#################
> >>>#ind <- matrix(0, length(temps), ncol(mat))
> >>>ind <- matrix(0, 4, ncol(mat))
> >>>(startDate <- date())
> >>[1] "Fri Jun 10 12:08:01 2005"
> >>>for(i in seq(along = temps)) ind[i, ] <- colSums(mat > temps[i])
> >>>ind[4, ] <- colMeans(max(mat))
> >>Error in colMeans(max(mat)) : 'x' must be an array of at least two
> >>dimensions
> >>>(endDate <- date())
> >>[1] "Fri Jun 10 12:08:02 2005"
> >>>ind <- rotate270(ind)
> >>>ind[1:10,]
> >>   V4 V3 V2 V1
> >>1   0 56 75 80
> >>2   0 46 53 60
> >>3   0 50 58 67
> >>4   0 60 72 80
> >>5   0 59 68 76
> >>6   0 55 67 74
> >>7   0 62 77 93
> >>8   0 45 57 67
> >>9   0 57 68 75
> >>10  0 61 66 76
> >>
> >>However, I have not managed to get the row maximum using your 
> >>method? It
> >>should be 50 for most rows, but my first guess code gives an error!
> >>
> >>Any suggestions?
> >>
> >>Sander
> >>
> >>
> >>
> >>Dimitris Rizopoulos wrote:
> >>>maybe you are looking for something along these lines:
> >>>
> >>>mat <- matrix(sample(-15:50, 365 * 15000, TRUE), 365, 15000)
> >>>temps <- c(37, 39, 41)
> >>>#################
> >>>ind <- matrix(0, length(temps), ncol(mat))
> >>>for(i in seq(along = temps)) ind[i, ] <- colSums(mat > temps[i])
> >>>ind
> >>>
> >>>
> >>>I hope it helps.
> >>>
> >>>Best,
> >>>Dimitris
> >>>
> >>>----
> >>>Dimitris Rizopoulos
> >>>Ph.D. Student
> >>>Biostatistical Centre
> >>>School of Public Health
> >>>Catholic University of Leuven
> >>>
> >>>Address: Kapucijnenvoer 35, Leuven, Belgium
> >>>Tel: +32/16/336899
> >>>Fax: +32/16/337015
> >>>Web: http://www.med.kuleuven.ac.be/biostat/
> >>>     http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm
> >>>
> >>>
> >>>----- Original Message ----- 
> >>>From: "Sander Oom" <slist at oomvanlieshout.net>
> >>>To: <r-help at stat.math.ethz.ch>
> >>>Sent: Friday, June 10, 2005 10:50 AM
> >>>Subject: [R] Replacing for loop with tapply!?
> >>>
> >>>
> >>>>Dear all,
> >>>>
> >>>>We have a large data set with temperature data for weather
> >>>>stations across the globe (15000 stations).
> >>>>
> >>>>For each station, we need to calculate the number of days a
> >>>>certain temperature is exceeded.
> >>>>
> >>>>So far we used the following S code, where mat88 is a matrix
> >>>>containing
> >>>>rows of 365 daily temperatures for each of 15000 weather stations:
> >>>>
> >>>>m <- 37
> >>>>n <- 2
> >>>>outmat88 <- matrix(0, ncol = 4, nrow = nrow(mat88))
> >>>>for(i in 1:nrow(mat88)) {
> >>>># i <- 3
> >>>>row1 <- as.data.frame(df88[i,  ])
> >>>>temprow37 <- select.rows(row1, row1 > m)
> >>>>temprow39 <- select.rows(row1, row1 > m + n)
> >>>>temprow41 <- select.rows(row1, row1 > m + 2 * n)
> >>>>outmat88[i, 1] <- max(row1, na.rm = T)
> >>>>outmat88[i, 2] <- count.rows(temprow37)
> >>>>outmat88[i, 3] <- count.rows(temprow39)
> >>>>outmat88[i, 4] <- count.rows(temprow41)
> >>>>}
> >>>>outmat88
> >>>>
> >>>>We have transferred the data to a more potent Linux box running R,
> >>>>but still hope to speed up the code.
> >>>>
> >>>>I know a for loop should be avoided when looking for speed. I also
> >>>>know the answer is in something like tapply, but my understanding
> >>>>of these commands is still to limited to see the solution. Could
> >>>>someone show me the way!?
> >>>>
> >>>>Thanks in advance,
> >>>>
> >>>>Sander.
> >>>>--
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html

Petr Pikal
petr.pikal at precheza.cz




More information about the R-help mailing list