[R] How to remove some rows from a data.frame

Don MacQueen macq at llnl.gov
Sun Dec 23 23:14:00 CET 2007


At 4:28 PM -0500 12/23/07, affy snp wrote:
>Hello list,
>
>I have a data frame M like:
>
>BAC                 chr    pos          s1   s2
>RP11-80G24    1    77465510    -1    0
>RP11-198H14    1    78696291    -1    0
>RP11-267M21    1    79681704    -1    0
>RP11-89A19      1    80950808    -1    0
>RP11-6B16        1    82255496    -1    0
>RP11-210E16    1    228801510    0    -1
>RP11-155C15    1    230957584    0    -1
>RP11-210F8      1    237932418    0    -1
>RP11-263L17     2    65724492    0    1
>RP11-340F16     2    65879898    0    1
>RP11-68A1        2    67718674    0    0
>RP11-474G23    2    68318411    0    0
>RP11-218N6      2    68454651    0    0
>CTD-2003M22    2    68567494    0    0
>.....
>
>how to remove those rows which have 0 for both of columns s1,s2?
>sth like M[!M$21=0&!M$s2=0]?

M[ !(M$s1==0 & M$s2==0) , ]

>
>Moreover, I want to get a list which could find a subset of rows which have
>the same pattern of data. For example, the first 8 rows in M can be
>clustered
>into 2 groups (represented below in 2 rows) and shown as:
>
>chr             Start       End             # of rows     Pattern
>1             77465510   82255496       5              (-1 0)
>1            228801510  237932418     3              (0 -1)
>
>Can anybody help me out of this? Thank you very much and happy holiday!

pat <- paste(M$s1,M$s2)

## to find the first subset:
M[ pat == pat[1] ,]

## to find the second subset:
M[ pat == pat[2], ]

## and so on, for however many unique patterns there are.

## also try
table(pat)

Of course, your example does more than just "find" the subsets. It 
also does some summarizing of them. That's a little more complicated. 
I might start with the summarize() function in the Hmisc package, but 
there are potentially many ways to also do the summarizing.

-Don

>Best,
>     Allen
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.


-- 
---------------------------------
Don MacQueen
Lawrence Livermore National Laboratory
Livermore, CA, USA
925-423-1062
macq at llnl.gov



More information about the R-help mailing list