[R] subsetting a data.frame based on a specific group of columns
Assa Yeroslaviz
frymor at gmail.com
Fri Nov 6 14:53:27 CET 2015
sorry, for the misunderstanding. here is a more elaborate description of
what i would like to achieve.
I have a data set of counts from a RNA-Seq experiment and would like to
filter reads with low counts. I don't want to set everything to 0
automatically.
I would like to set each categorical group (e.g. condition) to 0, if and
only if all replica in the group together have less than 100 reads.
in my examples I used X and Y to represents the categories. Ususally they
have a more distinct names like "control", "knockout1", "dKo" etc.
So what I really like to do is to check if the sum of all the "control"
samples is lower than 100. If so, set all control sample to 0. This I would
like to check *for each category* of every row of the data set.
I hope it is more clear now
thanks
Assa
On Fri, Nov 6, 2015 at 2:29 PM, jim holtman <jholtman at gmail.com> wrote:
> Is this what you want:
>
> > x <- read.table(text = "X1 X2 X3 Y1 Y2 Y3
> + 1232 357 23 0 9871 72
> + 0 71 9 811 795 743
> + 43 919 1111 0 76 14", header = TRUE)
> > x
> X1 X2 X3 Y1 Y2 Y3
> 1 1232 357 23 0 9871 72
> 2 0 71 9 811 795 743
> 3 43 919 1111 0 76 14
> >
> > # create indices of columns that start with the same character
> > indx <- split(seq(ncol(x)), substring(colnames(x), 1, 1))
> > names(indx) <- NULL # remove names so output not messed up
> >
> > result <- lapply(indx, function(a){
> + row_sum <- rowSums(x[, a])
> + x[row_sum < 100, a] <- 0
> + x[, a]
> + })
> > # combine back together
> > do.call(cbind, result)
> X1 X2 X3 Y1 Y2 Y3
> 1 1232 357 23 0 9871 72
> 2 0 0 0 811 795 743
> 3 43 919 1111 0 0 0
>
>
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
> On Fri, Nov 6, 2015 at 5:40 AM, Assa Yeroslaviz <frymor at gmail.com> wrote:
>
>> Hi,
>>
>> I have a data frame with multiple columns, which are belong to several
>> groups
>> like that:
>> X1 X2 X3 Y1 Y2 Y3
>> 1232 357 23 0 9871 72
>> 0 71 9 811 795 743
>> 43 919 1111 0 76 14
>>
>> I would like to filter such rows out, where the sums in one group is lower
>> than a specifc value. For example, I would like to set all the values in a
>> group of cloums to zero, if the sum in one group is less than 100
>> In my example table I would like to set the values in the second row for
>> the three X-columns to 0, so that the table looks like that:
>>
>> X1 X2 X3 Y1 Y2 Y3
>> 1232 357 23 0 9871 72
>> 0 0 0 811 795 743
>> 43 919 1111 0 0 0
>>
>> the same apply also for the Y-values in the last column.
>> Is there a more efficient way of doing it than going row by row and use
>> the
>> apply function on each of the subgroups I have in the columns?
>>
>> thanks
>> Assa
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list