[R] Vectorization instead of loops problem

Bert Gunter gunter.berton at gene.com
Sun Dec 4 21:07:48 CET 2011


Inline below

On Sun, Dec 4, 2011 at 10:29 AM, Costas Vorlow <costas.vorlow at gmail.com> wrote:
> Dear Bert,
>
> You are right (obviously).
>
> Apologies for any inconvenience caused.  I thought my problem was simplistic
> with a very obvious answer which eluded me.
>
> As per your justified questions :
>
> 2: Answer is "all",
>
> hence:
>
> 3. would be include overlapping set (I guess) but this does not matter for
> the time being. I didn't give it too much thought admittedly... If I got 1 &
> 2 right I could have modified the code for point 3 (if answer in 2 !=
> "all'), so I did not consider it when I was formulating my query. However, I
> can see now why this is confusing.
>
> Anyways, thanks again for the pointers.
>
> BTW, is there a good & quick read/guide on vectorization in R that one could
> recommend? That would minimize my queries at least in the list. :-)

Vectorization is a central paradigm in R, so practically all books on
the S language discuss this. The "R language definition" manual that
ships with R is pretty comprehensive, but V&R's MASS or S Programming
Books, Patrick Burns's website tutorials (he has several well suited
for beginners), John Chambers's  "Programming with R " , etc. are just
a few among many. It is impossible for me to be more specific than
that.

-- Bert
>
> Apologies again and best regards,
> Costas
>
> On 4 December 2011 17:45, Bert Gunter <gunter.berton at gene.com> wrote:
>>
>> Costas: (and thanks for giving us your name)
>>
>> which(x == 1)
>>
>> gives you the indices where x is 1 (up to floating point equality --
>> you did not specify whether your x values are integers or calculated
>> as floating point, and that certainly makes a difference). You can
>> then use simple indexing to get the y values. No loops needed.
>>
>> However, let's explore why your question may have been too poorly
>> formed to get the answer you seek:
>>
>> 1. What if the index of the first 1 is 3 or less? -- Do you want to
>> ignore the (less than 3) preceding values or just choose as many as
>> you can?
>>
>> 2. What if, as in your example, several 1's occur in x. Do you want
>> the 3 preceding values for all of them or just the first?
>>
>> 3. If the answer to 2 is "all of them," what if several 1's are less
>> than 3 indices apart -- do you want to include the overlapping sets of
>> 3 y's -- or what?
>>
>> My point is that "etc. etc." is simply inadequate as a coherent or
>> useful problem description in your post. You _must_ be explicit,
>> complete, and concise. This can be hard. Indeed, it may require
>> considerable thought and effort. I have found -- and others have often
>> noted here -- that going through such an exercise itself often reveals
>> a solution. But be that as it may, the Posting Guide is actually an
>> excellent, comprehensive discussion of how to ask good questions in
>> forums like this. Read it. Follow it.
>>
>> ... and to be fair, your post below is, imho, probably above average
>> as posts go, allowing me to focus on specific points that I thought
>> required clarification. Quite a few posts here of late have been so
>> muddled and incoherent that I had no clue what the OP wanted. And it's
>> not English as a second language. I am a language ignoramus and speak
>> only English, so I am happy to tolerate poor grammar and vocabulary
>> from someone for whom English is only one of several languages in
>> which they can communicate. The problem is poor thinking, not poor
>> English.
>>
>> Best,
>> Bert
>>
>> On Sun, Dec 4, 2011 at 7:18 AM, Costas Vorlow <costas.vorlow at gmail.com>
>> wrote:
>> > Hello,
>> >
>> > I am having problems vectorizing the following (i/o using a
>> > for/next/while
>> > loop):
>> >
>> > I have 2 sequences such as:
>> >
>> > x, y
>> > 1, 30
>> > 2, -40
>> > 0, 50
>> > 0, 25
>> > 1, -5
>> > 2, -10
>> > 1, 5
>> > 0, 40
>> >
>> > etc etc
>> >
>> > The first sequence (x) takes integer numbers only: 0, 1, 2
>> > The sequence y can be anything...
>> >
>> > I want to be able to retrieve (in a list if possible) the 3 last values
>> > of
>> > the y sequence before a value of 1 is encountered on the x sequence,
>> > i.e:
>> >
>> > On line 5 in the above dataset, x is 1 so I need to capture values: 25,
>> > 50
>> > and -40 of the y sequence.
>> >
>> > So the outcome (if a list) should look something like:
>> >
>> > [1],[25,50,-40]
>> > [2],[-10,-5,25] # as member #7 of x sequence is 1...
>> >
>> > etc. etc.
>> >
>> > Can I do the above avoiding for/next or while loops?
>> > I am not sure I can explain it better. Any help/pointer extremely
>> > welcome.
>> >
>> > Best regards,
>> > Costas
>> >
>> >
>> > --
>> >
>> > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> > |c|o|s|t|a|s|@|v|o|r|l|o|w|.|o|r|g|
>> > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >
>> >        [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>> --
>>
>> Bert Gunter
>> Genentech Nonclinical Biostatistics
>>
>> Internal Contact Info:
>> Phone: 467-7374
>> Website:
>>
>> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
>
>
>
>
> --
>
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> |c|o|s|t|a|s|@|v|o|r|l|o|w|.|o|r|g|
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm



More information about the R-help mailing list