[R] identifying when one element of a row has a positive number
Daisy Englert Duursma
daisy.duursma at gmail.com
Fri Jan 28 10:49:00 CET 2011
Hello,
Thanks to everyone for the multiple answers. Josh, thanks for the
function. My data 12 datasets have over 500,000 rows so your answer
greatly appreciated.
Cheers,
Daisy
On Thu, Jan 27, 2011 at 9:10 PM, Joshua Wiley <jwiley.psych at gmail.com> wrote:
> Hi,
>
> This problem seemed deceptively simple to me. After chasing a
> considerable number of dead ends, I came up with fg(). It lacks the
> elegance of Dennis' solution, but (particularly for large datasets),
> it is substantially faster. I still feel like I'm missing something,
> but....
>
> ###############################################
> ## Data
> df1 <- data.frame(x = seq(1860,1950,by=10),
> y = seq(-290,-200,by=10), ANN = c(3,0,0,0,1,0,1,1,0,0),
> CTA = c(0,1,0,0,0,0,1,0,0,2), GLM = c(0,0,2,0,0,0,0,1,0,0))
> ## larger test dataset
> dftest <- do.call("rbind", rep(list(df1), 100))
>
>
> f <- function(x) ifelse(sum(x > 0) == 1L, names(which(x > 0)), NA)
> g <- function(x) ifelse(sum(x > 0) == 2L, names(which(x == 0L)), NA)
>
> fg <- function(dat) {
> cnames <- colnames(dat)
> dat <- dat > 0; z <- rowSums(dat)
> z1 <- z == 1L; z2 <- z == 2L; rm(z)
> output <- matrix(NA, nrow = nrow(dat), ncol = 2)
> output[z1, 1] <- apply(dat[z1, ], 1, function(x) cnames[x])
> output[z2, 2] <- apply(dat[z2, ], 1, function(x) cnames[!x])
> return(output)
> }
>
> ## Compare times on larger dataset
> system.time(cbind(apply(dftest[, 3:5], 1, f),
> apply(dftest[, 3:5], 1, g)))
> system.time(fg(dftest[, 3:5]))
>
> ## compare times under repetitions
> system.time(for (i in 1:100) cbind(apply(df1[, 3:5], 1, f),
> apply(df1[, 3:5], 1, g)))
> system.time(for (i in 1:100) fg(df1[, 3:5]))
> ###############################################
>
> Josh
>
>
> On Thu, Jan 27, 2011 at 12:36 AM, Dennis Murphy <djmuser at gmail.com> wrote:
>> Hi:
>>
>> Try this:
>>
>> f <- function(x) ifelse(sum(x > 0) == 1L, names(which(x > 0)), NA)
>> g <- function(x) ifelse(sum(x > 0) == 2L, names(which(x == 0L)), NA)
>>> apply(df1[, 3:5], 1, f)
>> [1] "ANN" "CTA" "GLM" NA "ANN" NA NA NA NA "CTA"
>>> apply(df1[, 3:5], 1, g)
>> [1] NA NA NA NA NA NA "GLM" "CTA" NA NA
>>
>> HTH,
>> Dennis
>>
>> On Wed, Jan 26, 2011 at 9:36 PM, Daisy Englert Duursma <
>> daisy.duursma at gmail.com> wrote:
>>
>>> Hello,
>>>
>>> I am not sure where to begin with this problem or what to search for
>>> in r-help. I just don't know what to call this.
>>>
>>> If I have 5 columns, the first 2 are the x,y, locations and the last
>>> three are variables about those locations.
>>>
>>> x<-seq(1860,1950,by=10)
>>> y<-seq(-290,-200,by=10)
>>> ANN<-c(3,0,0,0,1,0,1,1,0,0)
>>> CTA<-c(0,1,0,0,0,0,1,0,0,2)
>>> GLM<-c(0,0,2,0,0,0,0,1,0,0)
>>> df1<-as.data.frame(cbind(x,y,ANN,CTA,GLM))
>>>
>>> What I would like to produce is an additional column that tells when
>>> only 1 of the three variables has a value greater than 0. I would like
>>> this new column to give the name of the variable. Likewise, I would
>>> like a column that tells one only one of the three variables for a
>>> given row has a value of 0. For my example the new columns would be:
>>>
>>> one_presence<-c("ANN","CTA","GLM","NA","ANN","NA","NA","NA","NA","CTA")
>>> one_absence<-c("NA","NA","NA","NA","NA","NA","GLM","CTA","NA","NA")
>>>
>>> The end result should look like
>>>
>>> df2<-(cbind(df1,one_presence,one_absence))
>>>
>>> I am sure I can do this with a loop or maybe grep but I am out of ideas.
>>>
>>> Any help would be appreciated.
>>>
>>> Cheers,
>>> Daisy
>>>
>>> --
>>> Daisy Englert Duursma
>>>
>>> Room E8C156
>>> Dept. Biological Sciences
>>> Macquarie University NSW 2109
>>> Australia
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Joshua Wiley
> Ph.D. Student, Health Psychology
> University of California, Los Angeles
> http://www.joshuawiley.com/
>
--
Daisy Englert Duursma
Room E8C156
Dept. Biological Sciences
Macquarie University NSW 2109
Australia
More information about the R-help
mailing list