[R] OK - I got the data - now what? :-)

Uwe Ligges ligges at statistik.tu-dortmund.de
Sun Jul 5 19:19:26 CEST 2009



David Winsemius wrote:
> 
> On Jul 5, 2009, at 12:19 PM, Mark Knecht wrote:
> 
>> On Sun, Jul 5, 2009 at 8:18 AM, David 
>> Winsemius<dwinsemius at comcast.net> wrote:
>>>
>>> On Jul 5, 2009, at 10:50 AM, Uwe Ligges wrote:
>>>
>>>>
>>>>
>>>> David Winsemius wrote:
>>>>>
>>>>> So if your values are calculated from other values then consider using
>>>>> all.equal()
>>>>> And repeated applications of the testing criteria process are 
>>>>> effective:
>>>>> test[3,][which(names(test)=="C1"):(which(test[3,] == 0.0)-1)]
>>>>>   C1   C2   C3
>>>>> 3 0.52 0.66 0.51
>>>>> (and a warning that does not seem accurate to me.)
>>>>> In which(names(test) == "C1"):(which(test[3, ] == 0) - 1) :
>>>>>  numerical expression has 3 elements: only the first used
>>>>
>>>>
>>>> David,
>>>>
>>>> # which(test[3,] == 0.0)
>>>> [1] 6 7 8
>>>>
>>>> and in a:b a and b must be length 1 vectors (scalars) otherwise just 
>>>> the
>>>> first element (in this case 6) is used.
>>>>
>>>> That leads us to the conclusion that writing the line above is not 
>>>> really
>>>> the cleanest way or you intended something different ....
>>>
>>> Thanks, Uwe. I see my confusion. I did want 6 to be used  and it 
>>> looks as
>>> though I would not be getting in truouble this way, but a cleaner method
>>> would be to access only the first element of which(test[3, ] == 0):
>>>
>>> test[3,][ which(names(test) == "C1") : (which(test[3,] == 0.0)[1]-1) ]
>>>
>>>>
>>>> David
>>>
>>>>> Seems to me that all of the element were used. I cannot explain that
>>>>> warning but am pretty sure it can be ignored.
>>>>>
>>>
>>> David
>>
>> OK - making lots more headway. Thanks for your help.
>>
>> QUESTION: How do I handle the case where I'm testing for 0 and don't
>> find it? In this case I need to all of the row from C1:C6.
>>
>> test <- data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10),
>> C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10))
>> test<-round(test,2)
>>
>> #Make array ragged
>> test$C3[2]<-0;test$C4[2]<-0;test$C5[2]<-0;test$C6[2]<-0
>> test$C4[3]<-0;test$C5[3]<-0;test$C6[3]<-0
>> test$C6[7]<-0
>> test$C4[8]<-0;test$C5[8]<-0;test$C6[8]<-0
>>
>> test
>>
>> #C1 always the same so calculate it only once
>> StartCol <- which(names(test)=="C1")
>>
>> #Print row 3 explicitly
>> test[3,][StartCol :(which(test[3,] == 0.0)[1]-1)]
>>
>> #Row 6 fails because 0 is not found
>> test[6,][StartCol :(which(test[6,] == 0.0)[1]-1)]
>>
>> EndCol <- which(test[6,] == 0.0)[1]-1
>> EndCol
>>
> 
> It's getting a bit Baroque, but here is a solution that handles an NA:
> 
> test[6,][StartCol :ifelse(is.na( which(test[6,] == 0.0)[1]) ,
>                               ncol(test),   which(test[6,] == 0.0)[1]-1 )
>             ]
> #####-----
>     C1   C2   C3   C4   C5   C6
> 6 0.33 0.84 0.51 0.86 0.84 0.15
> 
> 
> Maybe an R-meister can offer something more compact?


So let's wait for some R-meister, I'd write even more ....

Reason: testing for exactly zero after possible calculations is a bit 
dangerous and ifelse() is designed for vectorized operations but is not 
efficient for scalar operations, particularly since both expressions are 
evaluated, so if() else would be preferable, but we could use min() 
instead. Finally, a:b could end up in 5:3 without a warning and I'd use 
seq() instead.

Hence I'd prefer:

temp <- which(sapply(test[6,], function(x, y) isTRUE(all.equal(x,y)), 0))[1]
test[6, seq(from = StartCol, to = min(c(temp - 1, ncol(test)), na.rm = 
TRUE), by = 1)]


Best,
Uwe Ligges



> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>




More information about the R-help mailing list