[R] Indexing by logical vectors

David Winsemius dwinsemius at comcast.net
Tue Jul 20 14:24:50 CEST 2010


On Jul 20, 2010, at 1:12 AM, Christian Raschke wrote:

> On Tue, 2010-07-20 at 10:12 +1000, Bill.Venables at csiro.au wrote:
>> As far as I know the answer to your question is "No", but there are  
>> things you can do to improve the readability of your code.  One  
>> thing I find useful is to avoid using "$" as much as possible and  
>> to favour things like with() and within().
>>
>
> Thank you for your answer. I had not looked at within() for this until
> now.
>
>> The first thing you might do is think about choosing shorter names,  
>> of course.  If that's not possible, you could try something like  
>> this.
>>
>> ensureNN <- function(x) {  # "ensure non-negative"
>> 	is.na(x[x < 0]) <- TRUE
>> 	x
>> }
>
> This approach would essentially require a different function for the
> different operations to be performed on the vector. I suppose that
> assigning NA based on a certain condition is probably the most common
> use, but in the end I have other cases, where the logical vector is
> obtained from other operations or where the value that is assigned is
> different case by case; for example,
>
> levels(something.long)[levels(something.long) %in% LETTERS[1:3]] <-  
> "Z"
>
> So given that your general answer above to my inquiry was "No", I will
> keep experimenting and I'll also have another look at with() and
> within().

You might want to look at the sqldf package. I have noted over the  
year or two since it was released that it is sometimes possible to do  
rather amazing operations with minimal code. The sort of operations  
you anticipate (transformations dependent on logical criteria)  seem  
to be a good candidate for a database oriented syntax.

-- 
David.

>
> Thanks again!
>
>
>>
>> some.data.frame <- within(some.data.frame, {
>>  some.long.variable.name <- ensureNN(some.long.variable.name)
>>  some.other.long.variable.name <-  
>> ensureNN(some.other.long.variable.name)
>> })
>>
>> Of course if you wanted to do this to all variables in a data frame  
>> you could do
>>
>> some.data.frame <- data.frame(lapply(some.data.frame, ensureNN))
>>
>> and it all happens, no questions asled.  (I can see a generic  
>> function emerging here, perhaps...)
>>
>> W.
>>
>>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org 
>> ] On Behalf Of Christian Raschke
>> Sent: Tuesday, 20 July 2010 9:16 AM
>> To: r-help at r-project.org
>> Subject: [R] Indexing by logical vectors
>>
>> Dear R-Listers,
>>
>> My question concerns indexing vectors by logical vectors that are  
>> based
>> on the original vector. Consider the following simple example to
>> hopefully make clear what I mean:
>>
>> a <- rnorm(10)
>> a[a<0] <- NA
>>
>> However, I am now working with multiple data frames that I received,
>> where each of them has nicely descriptive, yet long names(). In my
>> scripts there are many instances where operations similar to the one
>> above are required. Again a simple example:
>>
>>
>> some.data.frame <- data.frame(some.long.variable.name=rnorm(10),
>> some.other.long.variable.name=rnorm(10))
>>
>> some.data.frame$some.other.long.variable.name[some.data.frame 
>> $some.other.long.variable.name
>> < 0] <- NA
>>
>>
>> The fact that the names are so long makes things not very readable in
>> the script and hard to debug. Is there a way in R to refer to the  
>> "self"
>> of whatever is being indexed? I am looking for something like
>>
>> some.data.frame$some.other.long.variable.name[.self < 0] <- NA
>>
>> that would accomplish the same result as above. Or is there another
>> concise, but less messy way to do this? I prefer not attaching the
>> data.frames and partial matching makes things even more messy since  
>> many
>> names() are very similar. I know I could just rename everything,  
>> but I'd
>> like to learn if there is and easy or obvious way to do this in R  
>> that I
>> have missed so far.
>>
>> I would appreciate any advice, and I apologize if this topic has been
>> discussed before.
>>
>>
>>> sessionInfo()
>> R version 2.11.0 (2010-04-22)
>> x86_64-redhat-linux-gnu
>>
>> locale:
>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list