[R] problem selecting rows meeting a criterion
Steve Lianoglou
mailinglist.honeypot at gmail.com
Tue Aug 11 21:26:40 CEST 2009
Hi,
See comments in line:
On Aug 11, 2009, at 2:45 PM, Jim Bouldin wrote:
>
> No problem John, thanks for your help, and also thanks to Dan and
> Patrick.
> Wasn't able to read or try anybody's suggestions yesterday. Here's
> what
> I've discovered in the meantime:
>
> What I did not include yesterday is that my original data frame,
> called
> "data", was this:
>
> X Y V3
> 1 1 1 0.000000
> 2 2 1 8.062258
> 3 3 1 2.236068
> 4 4 1 6.324555
> 5 5 1 5.000000
> 6 1 2 8.062258
> 7 2 2 0.000000
> 8 3 2 9.486833
> 9 4 2 2.236068
> 10 5 2 5.656854
> 11 1 3 2.236068
> 12 2 3 9.486833
> 13 3 3 0.000000
> 14 4 3 8.062258
> 15 5 3 5.099020
> 16 1 4 6.324555
> 17 2 4 2.236068
> 18 3 4 8.062258
> 19 4 4 0.000000
> 20 5 4 5.385165
> 21 1 5 5.000000
> 22 2 5 5.656854
> 23 3 5 5.099020
> 24 4 5 5.385165
> 25 5 5 0.000000
>
> To this data frame I applied the following command:
>
> data <- data[data$V3 >0,];data #to remove all rows where V3 = 0
>
> giving me this (the point from which I started yesterday):
>
> X Y V3
> 2 2 1 8.062258
> 3 3 1 2.236068
> 4 4 1 6.324555
> 5 5 1 5.000000
> 6 1 2 8.062258
> 8 3 2 9.486833
> 9 4 2 2.236068
> 10 5 2 5.656854
> 11 1 3 2.236068
> 12 2 3 9.486833
> 14 4 3 8.062258
> 15 5 3 5.099020
> 16 1 4 6.324555
> 17 2 4 2.236068
> 18 3 4 8.062258
> 20 5 4 5.385165
> 21 1 5 5.000000
> 22 2 5 5.656854
> 23 3 5 5.099020
> 24 4 5 5.385165
>
> So far so good. But when I then submit the command
>> data = data[X>Y,] #to select all rows where X > Y
This won't work in general, and is probably only working in this
particular case because you already have defined somewhere in your
workspace vars named X and Y.
What you wrote above isn't taking the values X,Y from data$X and data
$Y, respectively, but rather from var X and Y defined elsewhere.
Instead of doing data[X > Y], do:
data[data$X > data$Y,]
This should get you what you're expecting.
> I get the problem result already mentioned, namely:
>
> X Y V3
> 3 3 1 2.236068
> 4 4 1 6.324555
> 5 5 1 5.000000
> 6 1 2 8.062258
> 10 5 2 5.656854
> 11 1 3 2.236068
> 12 2 3 9.486833
> 17 2 4 2.236068
> 18 3 4 8.062258
> 24 4 5 5.385165
>
> which is clearly wrong! It doesn't matter if I give a new name to
> the data
> frame at each step or not, or whether I use the name "data" or not.
> It
> always gives the same wrong answer.
>
> However, if I instead use the command:
> subset(data, X>Y), I get the right answer, namely:
>
> X Y V3
> 2 2 1 8.062258
> 3 3 1 2.236068
> 4 4 1 6.324555
> 5 5 1 5.000000
> 8 3 2 9.486833
> 9 4 2 2.236068
> 10 5 2 5.656854
> 14 4 3 8.062258
> 15 5 3 5.099020
> 20 5 4 5.385165
That's because when you are using X, and Y in your subset(...) call,
THIS takes X and Y to mean data$X and data$Y.
> OK so the lesson so far is "use the subset function".
Hopefully you're learning a slightly different lesson now :-)
Does that clear things up at all?
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
More information about the R-help
mailing list