[R] OK - I got the data - now what? :-)

David Winsemius dwinsemius at comcast.net
Sun Jul 5 20:33:50 CEST 2009


On Jul 5, 2009, at 1:19 PM, Uwe Ligges wrote:

>>> snippedpreample
>>>
>>> test <- data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10),
>>> C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10))
>>> test<-round(test,2)
>>>
>>> #Make array ragged
>>> test$C3[2]<-0;test$C4[2]<-0;test$C5[2]<-0;test$C6[2]<-0
>>> test$C4[3]<-0;test$C5[3]<-0;test$C6[3]<-0
>>> test$C6[7]<-0
>>> test$C4[8]<-0;test$C5[8]<-0;test$C6[8]<-0
>>>
>>> test
>>>
>>> #C1 always the same so calculate it only once
>>> StartCol <- which(names(test)=="C1")
>>>
>>> #Print row 3 explicitly
>>> test[3,][StartCol :(which(test[3,] == 0.0)[1]-1)]
>>>
>>> #Row 6 fails because 0 is not found
>>> test[6,][StartCol :(which(test[6,] == 0.0)[1]-1)]
>>>
>>> EndCol <- which(test[6,] == 0.0)[1]-1
>>> EndCol
>>>
>> It's getting a bit Baroque, but here is a solution that handles an  
>> NA:
>> test[6,][StartCol :ifelse(is.na( which(test[6,] == 0.0)[1]) ,
>>                              ncol(test),   which(test[6,] == 0.0) 
>> [1]-1 )
>>            ]
>> #####-----
>>    C1   C2   C3   C4   C5   C6
>> 6 0.33 0.84 0.51 0.86 0.84 0.15
>> Maybe an R-meister can offer something more compact?
>
>
> So let's wait for some R-meister, I'd write even more ....
>
> Reason: testing for exactly zero after possible calculations is a  
> bit dangerous and ifelse() is designed for vectorized operations but  
> is not efficient for scalar operations, particularly since both  
> expressions are evaluated, so if() else would be preferable, but we  
> could use min() instead. Finally, a:b could end up in 5:3 without a  
> warning and I'd use seq() instead.
>
> Hence I'd prefer:
>
> temp <- which(sapply(test[6,], function(x, y)  
> isTRUE(all.equal(x,y)), 0))[1]

This appears to be learning moment for me. Do I have it correctly that  
the first argument to sapply, the vector(test[6,],  gets passed  
element-wise to the first parameter of the function, x, and the second  
argument, 0, is getting passed via recycling to the second parameter,  
y, through the , ...)  mechanism of the sapply function?

> test[6, seq(from = StartCol, to = min(c(temp - 1, ncol(test)), na.rm  
> = TRUE), by = 1)]

I had tried a min() solution and got Inf in return when there was an  
NA in the vector, but did not realize that it had an na.rm mode.

Thanks for the meisterhaft corrections.

>
>
> Best,
> Uwe Ligges

David Winsemius, MD
Heritage Laboratories
West Hartford, CT




More information about the R-help mailing list