[R] Adding rows based on column value

Bansal, Vikas vikas.bansal at kcl.ac.uk
Fri Jul 15 23:29:36 CEST 2011



I have tried the aggregate command but it shows this error-


vars <- paste('Case', c('A', 'C', 'G', 'T'), sep = '')
> vars
[1] "CaseA" "CaseC" "CaseG" "CaseT"

> aggregate(file[vars], by = file['Pos'], FUN = sum)

Error in FUN(X[[1L]], ...) : invalid 'type' (character) of argument

the thing is I cant use the plyr because I want the coding so that I can use it to make a tool.

Can you please tell me why aggregate function is showing this error.I am confused.

Thanking you,
Warm Regards
Vikas Bansal
Msc Bioinformatics
Kings College London
________________________________________
From: Dennis Murphy [djmuser at gmail.com]
Sent: Friday, July 15, 2011 7:38 PM
To: Bansal, Vikas
Cc: r-help at r-project.org
Subject: Re: [R] Adding rows based on column value

Hi:

This seems to work:

library(plyr)
# select the variables to summarize:
vars <- paste('Case', c('A', 'C', 'G', 'T'), sep = '')

# Alternatively,
# vars <- names(df)[grep('Case', names(df))]

# One way: the ddply() function in package plyr in
# conjunction with the colwise() function

> ddply(df, .(Pos), colwise(sum, vars))
        Pos CaseA CaseC CaseG CaseT
1 135344110     0    24    48     0
2 135344113     0     0    24     0
3 135344114    48     0     0     0
4 135344116     0     0     0    24
5 135344118     0    24     0    24
6 135344122    24    24     0     0
7 135344123     0    48     0    24
8 135344126     0     0    24     0

The colwise() function applies the same function (here, sum) to each
variable in the variable list given by vars. The wrapper function
ddply() applies the colwise() function to each subset of the data
defined by a unique value of Pos.

Another way is to use the aggregate() function from base R. The
following code comes from another thread on this list in the past
couple of days due to Bill Dunlap.

> aggregate(df[vars], by = df['Pos'], FUN = sum)
        Pos CaseA CaseC CaseG CaseT
1 135344110     0    24    48     0
2 135344113     0     0    24     0
3 135344114    48     0     0     0
4 135344116     0     0     0    24
5 135344118     0    24     0    24
6 135344122    24    24     0     0
7 135344123     0    48     0    24
8 135344126     0     0    24     0

HTH,
Dennis


2011/7/15 Bansal, Vikas <vikas.bansal at kcl.ac.uk>:
> Dear all,
>
> I have one problem and did not find any solution.
> I have attached the question in text file also because sometimes spacing is not good in mail.
>
> I have a file(file.txt) attached with this mail.I am reading it using this code to make a data frame (file)-
>
> file=read.table("file.txt",fill=T,colClasses = "character",header=T)
>
> file looks like this-
>
>  Chr       Pos    CaseA     CaseC    CaseG      CaseT
>  10 135344110  0.000000 24.000000  0.000000  0.000000
>  10 135344110  0.000000  0.000000 24.000000  0.000000
>  10 135344110  0.000000  0.000000 24.000000  0.000000
>  10 135344113  0.000000  0.000000 24.000000  0.000000
>  10 135344114 24.000000  0.000000  0.000000  0.000000
>  10 135344114 24.000000  0.000000  0.000000  0.000000
>  10 135344116  0.000000  0.000000  0.000000 24.000000
>  10 135344118  0.000000 24.000000  0.000000  0.000000
>  10 135344118  0.000000  0.000000  0.000000 24.000000
>  10 135344122 24.000000  0.000000  0.000000  0.000000
>  10 135344122  0.000000 24.000000  0.000000  0.000000
>  10 135344123  0.000000 24.000000  0.000000  0.000000
>  10 135344123  0.000000 24.000000  0.000000  0.000000
>  10 135344123  0.000000  0.000000  0.000000 24.000000
>  10 135344126  0.000000  0.000000 24.000000  0.000000
>
> Now some of the values in column Pos are same.For these same positions i want to add the values of columns 3:6
> I will explain with an example-
> The output of first row should be-
>
>  Chr       Pos   CaseA     CaseC      CaseG      CaseT
>  10 135344110  0.000000 24.000000  48.000000  0.000000
>
> because first three rows have same value in Pos column.
>
> so the whole output for above input should be-
>
>  Chr       Pos    CaseA     CaseC         CaseG      CaseT
>  10 135344110    0.000000  24.000000  48.000000    0.000000
>  10 135344113    0.000000   0.000000   24.000000    0.000000
>  10 135344114  48.000000  0.000000    0.000000     0.000000
>  10 135344116   0.000000   0.000000    0.000000    24.000000
>  10 135344118   0.000000  24.000000   0.000000    24.000000
>  10 135344122  24.000000 24.000000   0.000000    0.000000
>  10 135344123   0.000000  48.000000   0.000000    24.000000
>  10 135344126   0.000000  0.000000    24.000000   0.000000
>
> Can you please help me.
>
>
> Thanking you,
> Warm Regards
> Vikas Bansal
> Msc Bioinformatics
> Kings College London
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



More information about the R-help mailing list