[R] Indexing by logical vectors
Christian Raschke
crasch2 at tigers.lsu.edu
Tue Jul 20 07:12:05 CEST 2010
On Tue, 2010-07-20 at 10:12 +1000, Bill.Venables at csiro.au wrote:
> As far as I know the answer to your question is "No", but there are things you can do to improve the readability of your code. One thing I find useful is to avoid using "$" as much as possible and to favour things like with() and within().
>
Thank you for your answer. I had not looked at within() for this until
now.
> The first thing you might do is think about choosing shorter names, of course. If that's not possible, you could try something like this.
>
> ensureNN <- function(x) { # "ensure non-negative"
> is.na(x[x < 0]) <- TRUE
> x
> }
This approach would essentially require a different function for the
different operations to be performed on the vector. I suppose that
assigning NA based on a certain condition is probably the most common
use, but in the end I have other cases, where the logical vector is
obtained from other operations or where the value that is assigned is
different case by case; for example,
levels(something.long)[levels(something.long) %in% LETTERS[1:3]] <- "Z"
So given that your general answer above to my inquiry was "No", I will
keep experimenting and I'll also have another look at with() and
within().
Thanks again!
>
> some.data.frame <- within(some.data.frame, {
> some.long.variable.name <- ensureNN(some.long.variable.name)
> some.other.long.variable.name <- ensureNN(some.other.long.variable.name)
> })
>
> Of course if you wanted to do this to all variables in a data frame you could do
>
> some.data.frame <- data.frame(lapply(some.data.frame, ensureNN))
>
> and it all happens, no questions asled. (I can see a generic function emerging here, perhaps...)
>
> W.
>
>
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Christian Raschke
> Sent: Tuesday, 20 July 2010 9:16 AM
> To: r-help at r-project.org
> Subject: [R] Indexing by logical vectors
>
> Dear R-Listers,
>
> My question concerns indexing vectors by logical vectors that are based
> on the original vector. Consider the following simple example to
> hopefully make clear what I mean:
>
> a <- rnorm(10)
> a[a<0] <- NA
>
> However, I am now working with multiple data frames that I received,
> where each of them has nicely descriptive, yet long names(). In my
> scripts there are many instances where operations similar to the one
> above are required. Again a simple example:
>
>
> some.data.frame <- data.frame(some.long.variable.name=rnorm(10),
> some.other.long.variable.name=rnorm(10))
>
> some.data.frame$some.other.long.variable.name[some.data.frame$some.other.long.variable.name
> < 0] <- NA
>
>
> The fact that the names are so long makes things not very readable in
> the script and hard to debug. Is there a way in R to refer to the "self"
> of whatever is being indexed? I am looking for something like
>
> some.data.frame$some.other.long.variable.name[.self < 0] <- NA
>
> that would accomplish the same result as above. Or is there another
> concise, but less messy way to do this? I prefer not attaching the
> data.frames and partial matching makes things even more messy since many
> names() are very similar. I know I could just rename everything, but I'd
> like to learn if there is and easy or obvious way to do this in R that I
> have missed so far.
>
> I would appreciate any advice, and I apologize if this topic has been
> discussed before.
>
>
> > sessionInfo()
> R version 2.11.0 (2010-04-22)
> x86_64-redhat-linux-gnu
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
>
More information about the R-help
mailing list