[R] subset a defined row plus the aforegoing
arun
smartpink111 at yahoo.com
Thu Nov 1 18:50:03 CET 2012
Hello,
A bit confusing:
" I would like to extract
all rows (so called* defined row*s) with type==Expression - subset (df,
type==Expression) - and the aforegoing type==DNase HS (which is not
necessarly row n-1 - assumung that the defined row is n"
In the dataset, there is "Expresssion" for column "type". If you want to subset all the rows having "Expresssion" or "DNaseHS"
res<- subset(df,type=="Expresssion"|type=="DNase HS")
head(res)
# start.ens fc.trans type end.ens peak end.grcm38 dpeak
#1 9191942 0.9379 Expresssion NA NA NA NA
#2 9191942 0.9741 Expresssion NA NA NA NA
#3 9191942 0.9748 Expresssion NA NA NA NA
#4 9195570 NA DNase HS NA NA 9195792 109
#5 9579854 NA DNase HS NA NA 9580110 131
#7 11113787 NA DNase HS NA NA 11114262 279
If you don't want those rows:
subset(df,type!="Expresssion"&type!="DNase HS")
# start.ens fc.trans type end.ens peak end.grcm38 dpeak
#6 11088023 NA p300 11088523 7 NA NA
A.K.
----- Original Message -----
From: Hermann Norpois <hnorpois at googlemail.com>
To: r-help at r-project.org
Cc:
Sent: Thursday, November 1, 2012 1:28 PM
Subject: [R] subset a defined row plus the aforegoing
Hello,
my data is sorted by start.ens (see below). And now I would like to extract
all rows (so called* defined row*s) with type==Expression - subset (df,
type==Expression) - and the aforegoing type==DNase HS (which is not
necessarly row n-1 - assumung that the defined row is n). I dont know how
to add this to my subset command.
Is that possible?
Thanks Hermann
> df
start.ens fc.trans type end.ens peak end.grcm38 dpeak
1 9191942 0.9379 Expresssion NA NA NA NA
2 9191942 0.9741 Expresssion NA NA NA NA
3 9191942 0.9748 Expresssion NA NA NA NA
4 9195570 NA DNase HS NA NA 9195792 109
5 9579854 NA DNase HS NA NA 9580110 131
6 11088023 NA p300 11088523 7 NA NA
7 11113787 NA DNase HS NA NA 11114262 279
8 11114744 0.9803 Expresssion NA NA NA NA
9 11114744 0.9904 Expresssion NA NA NA NA
10 11114850 NA DNase HS NA NA 11115400 210
11 11455056 NA DNase HS NA NA 11455381 175
12 11461513 NA DNase HS NA NA 11462571 508
13 11462408 1.0129 Expresssion NA NA NA NA
14 11462408 1.0074 Expresssion NA NA NA NA
15 11489266 1.0019 Expresssion NA NA NA NA
My (test)data:
> dput (df)
structure(list(start.ens = c(9191942L, 9191942L, 9191942L, 9195570L,
9579854L, 11088023L, 11113787L, 11114744L, 11114744L, 11114850L,
11455056L, 11461513L, 11462408L, 11462408L, 11489266L), fc.trans =
c(0.9379,
0.9741, 0.9748, NA, NA, NA, NA, 0.9803, 0.9904, NA, NA, NA, 1.0129,
1.0074, 1.0019), type = structure(c(2L, 2L, 2L, 1L, 1L, 3L, 1L,
2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("DNase HS", "Expresssion",
"p300"), class = "factor"), end.ens = c(NA, NA, NA, NA, NA, 11088523L,
NA, NA, NA, NA, NA, NA, NA, NA, NA), peak = c(NA, NA, NA, NA,
NA, 7L, NA, NA, NA, NA, NA, NA, NA, NA, NA), end.grcm38 = c(NA,
NA, NA, 9195792L, 9580110L, NA, 11114262L, NA, NA, 11115400L,
11455381L, 11462571L, NA, NA, NA), dpeak = c(NA, NA, NA, 109L,
131L, NA, 279L, NA, NA, 210L, 175L, 508L, NA, NA, NA)), .Names =
c("start.ens",
"fc.trans", "type", "end.ens", "peak", "end.grcm38", "dpeak"), row.names =
c(NA,
-15L), class = "data.frame")
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list