[R] Searching for specific values in a matrix
Steve Lianoglou
mailinglist.honeypot at gmail.com
Mon Jul 27 22:00:19 CEST 2009
On Jul 27, 2009, at 3:50 PM, Mehdi Khan wrote:
> the problem is, it works with the example data i gave. however, it
> does NOT work with the data set i have, which is 600,000 rows. the
> class is still a data frame.
So the problem must be in your data, or what you think is in your
data. Somehow you're constructing a "boolean query" that returns false
for every row. As long as you're not getting any memory errors, the
size of your data doesn't change the mechanics of how this would work.
I suspect you're not getting <0 rows> for every possible query you can
come up with, right?
Look at the first 10 lines of your dataset and try to select some rows
from your entire data.frame by using values you can see in the first
10 rows you've just looked at.
I'm expecting this would work, in which case I'm not sure how much
more help I can provide.
-steve
> On Mon, Jul 27, 2009 at 12:15 PM, Steve Lianoglou <mailinglist.honeypot at gmail.com
> > wrote:
>
> On Jul 27, 2009, at 2:54 PM, Mehdi Khan wrote:
>
> i am able to return the first column, but anything else returns this:
> <0 rows> (or 0-length row.names)
>
> any idea?
>
> I'm not sure what you're doing.
>
> The result you're getting happens when no rows "pass" the logical
> test that you are using to index the rows of your data.frame for.
>
> Can you show the code that you are using (based on the example data
> you gave) that is giving you the <0 rows> result?
>
> -steve
>
>
>
> On Tue, Jul 21, 2009 at 12:49 PM, Steve Lianoglou <mailinglist.honeypot at gmail.com
> > wrote:
>
> On Jul 21, 2009, at 3:27 PM, Mehdi Khan wrote:
>
> I understand your explanation about the test for even numbers.
> However I am still a bit confused as to how to go about finding a
> particular value. Here is an example data set
>
> col # attr1 attr2 attr 3 LON LAT
> 17209 D NA NA -122.9409 38.27645
> 17210 BC NA NA -122.9581 38.36304
> 17211 B NA NA -123.6851 41.67121
> 17212 BC NA NA -123.0724 38.93073
> 17213 C NA NA -123.7240 41.84403
> 17214 <NA> 464 NA -122.9430 38.30988
> 17215 C NA NA -123.4442 40.65369
> 17216 BC NA NA -122.9389 38.31551
> 17217 C NA NA -123.0747 38.97998
> 17218 C NA NA -123.6580 41.59610
> 17219 C NA NA -123.4513 40.70992
> 17220 C NA NA -123.0901 39.06473
> 17221 BC NA NA -123.0653 38.94845
> 17222 BC NA NA -122.9464 38.36808
> 17223 <NA> 464 NA -123.0143 38.70205
> 17224 <NA> NA 5 -122.8609 37.94137
> 17225 <NA> NA 5 -122.8628 37.95057
> 17226 <NA> NA 7 -122.8646 37.95978
>
> For future reference, perhaps paste this in a way that's easy for us
> to paste into a running R session so we can use it, like so:
>
> df <- data.frame(
> coln=c(17209, 17210, 17211, 17212, 17213, 17214, 17215, 17216,
> 17217, 17218, 17219, 17220, 17221, 17222, 17223, 17224, 17225, 17226),
> attr1
> =
> c
> ("D
> ","BC
> ","B","BC","C",NA,"C","BC","C","C","C","C","BC","BC",NA,NA,NA,NA),
> attr2=c( NA,NA,NA,NA,NA,464,NA,NA,NA,NA,NA,NA,NA,NA,464,NA,NA,NA),
> attr3=c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,5,5,7),
> LON
> =
> c
> ( -122.9409
> ,-122.9581
> ,-123.6851
> ,-123.0724
> ,-123.7240
> ,-122.9430
> ,-123.4442
> ,-122.9389
> ,-123.0747
> ,-123.6580
> ,-123.4513
> ,-123.0901
> ,-123.0653,-122.9464,-123.0143,-122.8609,-122.8628,-122.8646),
> LAT
> =
> c
> (38.27645,38.36304,41.67121,38.93073,41.84403,38.30988,40.65369,38.31551,38.97998,41.59610,40.70992,39.06473,38.94845,38.36808,38.70205,37.94137,37.95057,37.95978
> ))
>
>
> If I wanted to find the row with Lat = 37.95978
>
> Using an "indexing vector":
>
> R> lats <- df$LAT == 37.95978
> # or with the %~% from before:
> # lats <- df$LAT %~% 37.95978
> R> df[lats,]
> coln attr1 attr2 attr3 LON LAT
> 18 17226 <NA> NA 7 -122.8646 37.95978
>
> Using the "subset" function:
>
> R> subset(df, LAT == 37.95978)
> coln attr1 attr2 attr3 LON LAT
> 18 17226 <NA> NA 7 -122.8646 37.95978
>
>
> , how would i do that? How would I find the rows with BC?
>
> R> subset(df, attr1 == 'BC')
> coln attr1 attr2 attr3 LON LAT
> 2 17210 BC NA NA -122.9581 38.36304
> 4 17212 BC NA NA -123.0724 38.93073
> 8 17216 BC NA NA -122.9389 38.31551
> 13 17221 BC NA NA -123.0653 38.94845
> 14 17222 BC NA NA -122.9464 38.36808
>
>
> If you try with an "indexing vector" the NA's will trip you up:
>
> R> df[df$attr1 == 'BC',]
> coln attr1 attr2 attr3 LON LAT
> 2 17210 BC NA NA -122.9581 38.36304
> 4 17212 BC NA NA -123.0724 38.93073
> NA NA <NA> NA NA NA NA
> 8 17216 BC NA NA -122.9389 38.31551
> 13 17221 BC NA NA -123.0653 38.94845
> 14 17222 BC NA NA -122.9464 38.36808
> NA.1 NA <NA> NA NA NA NA
> NA.2 NA <NA> NA NA NA NA
> NA.3 NA <NA> NA NA NA NA
> NA.4 NA <NA> NA NA NA NA
>
> So you could do something like:
>
> > df[df$attr1 == 'BC' & !is.na(df$attr1),]
> coln attr1 attr2 attr3 LON LAT
> 2 17210 BC NA NA -122.9581 38.36304
> 4 17212 BC NA NA -123.0724 38.93073
> 8 17216 BC NA NA -122.9389 38.31551
> 13 17221 BC NA NA -123.0653 38.94845
> 14 17222 BC NA NA -122.9464 38.36808
>
>
> HTH,
> -steve
>
> --
> Steve Lianoglou
> Graduate Student: Physiology, Biophysics and Systems Biology
> Weill Medical College of Cornell University
>
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>
>
>
>
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
> | Memorial Sloan-Kettering Cancer Center
>
> | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>
>
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
More information about the R-help
mailing list