[R] OK - I got the data - now what? :-)

Sun Jul 5 19:42:41 CEST 2009

2009/7/5 Uwe Ligges <ligges at statistik.tu-dortmund.de>:
>
>
> David Winsemius wrote:
>>
>> On Jul 5, 2009, at 12:19 PM, Mark Knecht wrote:
>>
>>> On Sun, Jul 5, 2009 at 8:18 AM, David Winsemius<dwinsemius at comcast.net>
>>> wrote:
>>>>
>>>> On Jul 5, 2009, at 10:50 AM, Uwe Ligges wrote:
>>>>
>>>>>
>>>>>
>>>>> David Winsemius wrote:
>>>>>>
>>>>>> So if your values are calculated from other values then consider using
>>>>>> all.equal()
>>>>>> And repeated applications of the testing criteria process are
>>>>>> effective:
>>>>>> test[3,][which(names(test)=="C1"):(which(test[3,] == 0.0)-1)]
>>>>>>  C1   C2   C3
>>>>>> 3 0.52 0.66 0.51
>>>>>> (and a warning that does not seem accurate to me.)
>>>>>> In which(names(test) == "C1"):(which(test[3, ] == 0) - 1) :
>>>>>>  numerical expression has 3 elements: only the first used
>>>>>
>>>>>
>>>>> David,
>>>>>
>>>>> # which(test[3,] == 0.0)
>>>>> [1] 6 7 8
>>>>>
>>>>> and in a:b a and b must be length 1 vectors (scalars) otherwise just
>>>>> the
>>>>> first element (in this case 6) is used.
>>>>>
>>>>> That leads us to the conclusion that writing the line above is not
>>>>> really
>>>>> the cleanest way or you intended something different ....
>>>>
>>>> Thanks, Uwe. I see my confusion. I did want 6 to be used  and it looks
>>>> as
>>>> though I would not be getting in truouble this way, but a cleaner method
>>>> would be to access only the first element of which(test[3, ] == 0):
>>>>
>>>> test[3,][ which(names(test) == "C1") : (which(test[3,] == 0.0)[1]-1) ]
>>>>
>>>>>
>>>>> David
>>>>
>>>>>> Seems to me that all of the element were used. I cannot explain that
>>>>>> warning but am pretty sure it can be ignored.
>>>>>>
>>>>
>>>> David
>>>
>>> OK - making lots more headway. Thanks for your help.
>>>
>>> QUESTION: How do I handle the case where I'm testing for 0 and don't
>>> find it? In this case I need to all of the row from C1:C6.
>>>
>>> test <- data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10),
>>> C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10))
>>> test<-round(test,2)
>>>
>>> #Make array ragged
>>> test$C3[2]<-0;test$C4[2]<-0;test$C5[2]<-0;test$C6[2]<-0
>>> test$C4[3]<-0;test$C5[3]<-0;test$C6[3]<-0
>>> test$C6[7]<-0
>>> test$C4[8]<-0;test$C5[8]<-0;test$C6[8]<-0
>>>
>>> test
>>>
>>> #C1 always the same so calculate it only once
>>> StartCol <- which(names(test)=="C1")
>>>
>>> #Print row 3 explicitly
>>> test[3,][StartCol :(which(test[3,] == 0.0)[1]-1)]
>>>
>>> #Row 6 fails because 0 is not found
>>> test[6,][StartCol :(which(test[6,] == 0.0)[1]-1)]
>>>
>>> EndCol <- which(test[6,] == 0.0)[1]-1
>>> EndCol
>>>
>>
>> It's getting a bit Baroque, but here is a solution that handles an NA:
>>
>> test[6,][StartCol :ifelse(is.na( which(test[6,] == 0.0)[1]) ,
>>                              ncol(test),   which(test[6,] == 0.0)[1]-1 )
>>            ]
>> #####-----
>>    C1   C2   C3   C4   C5   C6
>> 6 0.33 0.84 0.51 0.86 0.84 0.15
>>
>>
>> Maybe an R-meister can offer something more compact?
>
>
> So let's wait for some R-meister, I'd write even more ....
>
> Reason: testing for exactly zero after possible calculations is a bit
> dangerous and ifelse() is designed for vectorized operations but is not
> efficient for scalar operations, particularly since both expressions are
> evaluated, so if() else would be preferable, but we could use min() instead.
> Finally, a:b could end up in 5:3 without a warning and I'd use seq()
> instead.
>
> Hence I'd prefer:
>
> temp <- which(sapply(test[6,], function(x, y) isTRUE(all.equal(x,y)), 0))[1]
> test[6, seq(from = StartCol, to = min(c(temp - 1, ncol(test)), na.rm =
> TRUE), by = 1)]
>
>

I appreciate both of the answers. I don't completely understand them,
but I do appreciate them. Thanks!

I was wondering whether it's easy to simply test the last column for
==0, and if true run the previous command, if false just return
everything up to the end of the row?

Currently my data is one experiment per row, but that's wasting space
as most experiments only take 20% of the row and 80% of the row is
filled with 0's. I might want to make the array more narrow and have a
flag somewhere in the 1st 10 columns that says the this row is a
continuation row from the previous row. That way I could pack the
array better, use less memory and when I do finally test for 0 I have
a short line to traverse?

Just an idea.

Anyway, I suspect either of these will suit my short term needs. On to
the next step.

Cheers,
Mark