[R] lapply with data frame

jim holtman jholtman at gmail.com
Sun Feb 28 04:06:26 CET 2010


> x <- read.table(textConnection("id    group    value
+ 1    A            3.2
+ 2    A            3.0
+ 3    A            3.1
+ 4    B            5.5
+ 5    B            6.0
+ 6    B            6.2"), header=TRUE)
> # dataframe is processed by column by lapply
> lapply(x, c)
$id
[1] 1 2 3 4 5 6

$group
[1] 1 1 1 2 2 2

$value
[1] 3.2 3.0 3.1 5.5 6.0 6.2

> # normalize by group
> x$norm <- ave(x$value, x$group, FUN=function(a) a / sum(a))
> x
  id group value      norm
1  1     A   3.2 0.3440860
2  2     A   3.0 0.3225806
3  3     A   3.1 0.3333333
4  4     B   5.5 0.3107345
5  5     B   6.0 0.3389831
6  6     B   6.2 0.3502825


On Sat, Feb 27, 2010 at 9:49 PM, Noah Silverman <noah at smartmediacorp.com> wrote:
> I'm a bit confused on how to use lapply with a data.frame.
>
> For example.
>
> lapply(data, function(x) print(x))
>
> WHAT exactly is passed to the function.  Is it each ROW in the data frame,
> one by one, or each column, or the entire frame in one shot?
>
> What I want to do apply a function to each row in the data frame.  Is lapply
> the right way.
>
> A second application is to normalize a column value by group.  For example,
> if I have the following table:
> id    group    value      norm
> 1    A            3.2
> 2    A            3.0
> 3    A            3.1
> 4    B            5.5
> 5    B            6.0
> 6    B            6.2
> etc...
>
> The long version would be:
> foreach (group in unique(data$group)){
>    data$norm[group==group] <- data$value[group==group] /
> sum(data$value[group==group])
> }
>
> There must be a faster way to do this with lapply.  (Ideally, I'd then use
> mclapply to run on multi-cores and really crank up the speed.)
>
> Any suggestions?
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?



More information about the R-help mailing list