[R] Conditional editing of rows in a data frame
Gabor Grothendieck
ggrothendieck at gmail.com
Thu Jan 28 14:34:56 CET 2010
If DF is your data frame then:
DF$xp.bg <- ave(DF$xp.norm, DF$gene, FUN = min)
will create a new column such that the entry in each row has the
minimum xp.norm of all rows with the same gene. ave does use split
internally but I think it would be worth trying anyways since its only
one short line of code.
See help(ave)
On Thu, Jan 28, 2010 at 7:05 AM, Irene Gallego Romero <ig247 at cam.ac.uk> wrote:
> Dear R users,
>
> I have a dataframe (main.table) with ~30,000 rows and 6 columns, of
> which here are a few rows:
>
> id chr window gene xp.norm xp.top
> 129 1_32 1 32 TAS1R1 1.28882115 FALSE
> 130 1_32 1 32 ZBTB48 1.28882115 FALSE
> 131 1_32 1 32 KLHL21 1.28882115 FALSE
> 132 1_32 1 32 PHF13 1.28882115 FALSE
> 133 1_33 1 33 PHF13 1.02727430 FALSE
> 134 1_33 1 33 THAP3 1.02727430 FALSE
> 135 1_33 1 33 DNAJC11 1.02727430 FALSE
> 136 1_33 1 33 CAMTA1 1.02727430 FALSE
> 137 1_34 1 34 CAMTA1 1.40312732 TRUE
> 138 1_35 1 35 CAMTA1 1.52104538 FALSE
> 139 1_36 1 36 CAMTA1 1.04853732 FALSE
> 140 1_37 1 37 CAMTA1 0.64794094 FALSE
> 141 1_38 1 38 CAMTA1 1.23026086 TRUE
> 142 1_38 1 38 VAMP3 1.23026086 TRUE
> 143 1_38 1 38 PER3 1.23026086 TRUE
> 144 1_39 1 39 PER3 1.18154967 TRUE
> 145 1_39 1 39 UTS2 1.18154967 TRUE
> 146 1_39 1 39 TNFRSF9 1.18154967 TRUE
> 147 1_39 1 39 PARK7 1.18154967 TRUE
> 148 1_39 1 39 ERRFI1 1.18154967 TRUE
> 149 1_40 1 40 no_gene 1.79796879 FALSE
> 150 1_41 1 41 SLC45A1 0.20193560 FALSE
>
> I want to create two new columns, xp.bg and xp.n.top, using the
> following criteria:
>
> If gene is the same in consecutive rows, xp.bg is the minimum value of
> xp.norm in those rows; if gene is not the same, xp.bg is simply the
> value of xp.norm for that row;
>
> Likewise, if there's a run of contiguous xp.top = TRUE values,
> xp.n.top is the minimum value in that range, and if xp.top is false or
> NA, xp.n.top is NA, or 0 (I don't care).
>
> So, in the above example,
> xp.bg for rows 136:141 should be 0.64794094, and is equal to xp.norm
> for all other rows,
> xp.n.top for row 137 is 1.40312732, 1.18154967 for rows 141:148, and
> 0/NA for all other rows.
>
> Is there a way to combine indexing and if statements or some such to
> accomplish this? I want to it this without using split(main.table,
> main.table$gene), because there's about 20,000 unique entries for
> gene, and one of the entries, no_gene, is repeated throughout. I
> thought briefly of subsetting the rows where xp.top is TRUE, but I
> then don't know how to set the range for min, so that it only looks at
> what would originally have been consecutive rows, and searching the
> help has not proved particularly useful.
>
> Thanks in advance,
> Irene Gallego Romero
>
>
> --
> Irene Gallego Romero
> Leverhulme Centre for Human Evolutionary Studies
> University of Cambridge
> Fitzwilliam St
> Cambridge
> CB1 3QH
> UK
> email: ig247 at cam.ac.uk
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list