[R] How to remove some rows from a data.frame
Gabor Grothendieck
ggrothendieck at gmail.com
Sun Dec 23 23:01:53 CET 2007
On Dec 23, 2007 4:28 PM, affy snp <affysnp at gmail.com> wrote:
> Hello list,
>
> I have a data frame M like:
>
> BAC chr pos s1 s2
> RP11-80G24 1 77465510 -1 0
> RP11-198H14 1 78696291 -1 0
> RP11-267M21 1 79681704 -1 0
> RP11-89A19 1 80950808 -1 0
> RP11-6B16 1 82255496 -1 0
> RP11-210E16 1 228801510 0 -1
> RP11-155C15 1 230957584 0 -1
> RP11-210F8 1 237932418 0 -1
> RP11-263L17 2 65724492 0 1
> RP11-340F16 2 65879898 0 1
> RP11-68A1 2 67718674 0 0
> RP11-474G23 2 68318411 0 0
> RP11-218N6 2 68454651 0 0
> CTD-2003M22 2 68567494 0 0
> .....
>
> how to remove those rows which have 0 for both of columns s1,s2?
> sth like M[!M$21=0&!M$s2=0]?
>
> Moreover, I want to get a list which could find a subset of rows which have
> the same pattern of data. For example, the first 8 rows in M can be
> clustered
> into 2 groups (represented below in 2 rows) and shown as:
>
> chr Start End # of rows Pattern
> 1 77465510 82255496 5 (-1 0)
> 1 228801510 237932418 3 (0 -1)
>
Using:
M <- structure(list(BAC = structure(c(13L, 3L, 8L, 14L, 12L, 4L, 2L,
5L, 7L, 9L, 11L, 10L, 6L, 1L), .Label = c("CTD-2003M22", "RP11-155C15",
"RP11-198H14", "RP11-210E16", "RP11-210F8", "RP11-218N6", "RP11-263L17",
"RP11-267M21", "RP11-340F16", "RP11-474G23", "RP11-68A1", "RP11-6B16",
"RP11-80G24", "RP11-89A19"), class = "factor"), chr = c(1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), pos = c(77465510L,
78696291L, 79681704L, 80950808L, 82255496L, 228801510L, 230957584L,
237932418L, 65724492L, 65879898L, 67718674L, 68318411L, 68454651L,
68567494L), s1 = c(-1L, -1L, -1L, -1L, -1L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L), s2 = c(0L, 0L, 0L, 0L, 0L, -1L, -1L, -1L, 1L,
1L, 0L, 0L, 0L, 0L)), .Names = c("BAC", "chr", "pos", "s1", "s2"
), class = "data.frame", row.names = c(NA, -14L))
# try this
subset(M, s1 | s2) # as 0 regarded as FALSE and others as TRUE
# and for second question:
f <- function(x) with(x,
c(start = pos[1], end = tail(pos, 1),
chr = chr[1], nrow = NROW(x), s1 = s1[1], s2 = s2[1])
)
do.call(rbind, by(M, M[4:5], f))
More information about the R-help
mailing list