[R] looping by grouping variable

jim holtman jholtman at gmail.com
Wed Aug 31 20:39:09 CEST 2011


you can use 'ave' to add a new column with the state average:

> id<-as.character(c(01001:01010, 02001:02010))
> st<-substr(id,1,1)
> cnty<-substr(id,2,5)
> tfr10<-rnorm(1:20)
>
> mydata<-data.frame(id,st,cnty,tfr10)
> mydata$stAvg <- ave(mydata$tfr10, mydata$st)
> print(mydata)
     id st cnty      tfr10      stAvg
1  1001  1  001  1.1896489 -0.3190678
2  1002  1  002 -1.0504707 -0.3190678
3  1003  1  003 -1.6130538 -0.3190678
4  1004  1  004 -1.1573924 -0.3190678
5  1005  1  005 -0.2013412 -0.3190678
6  1006  1  006  0.5176950 -0.3190678
7  1007  1  007 -1.3256951 -0.3190678
8  1008  1  008  0.4367956 -0.3190678
9  1009  1  009  0.2025659 -0.3190678
10 1010  1  010 -0.1894306 -0.3190678
11 2001  2  001 -0.9337906 -0.3842536
12 2002  2  002  0.2999035 -0.3842536
13 2003  2  003  0.5091345 -0.3842536
14 2004  2  004 -0.4787584 -0.3842536
15 2005  2  005 -1.6958660 -0.3842536
16 2006  2  006 -0.4430861 -0.3842536
17 2007  2  007  0.2100123 -0.3842536
18 2008  2  008 -1.7471779 -0.3842536
19 2009  2  009  0.1778717 -0.3842536
20 2010  2  010  0.2592210 -0.3842536
>


On Wed, Aug 31, 2011 at 12:50 PM, jour4life <jour4life at gmail.com> wrote:
> Hello all,
>
> I hope something is not already posted regarding this exact problem I am
> trying to solve. I've read through the forums and previous postings and am
> still confused as to how to approach this. Basically, what I am trying to do
> is construct variables that utilizes an average of a variable from a
> grouping, or higher order, variable. For instance, in my dataset I have
> variables, with each observation being a county. Of those counties, we have
> an ID variable, for which, I have extracted variables from the substring of
> the ID variable. Thus, I was able to extract a state variable, for which, I
> want to use the averages, calculated at the state level, and utilize those
> averages for another variable. I know this may be confusing, so I'm posting
> an example dataset here:
>
> id.tmp1<-as.character(01001:01010)
> st<-substr(id,1,1)
> cnty<-substr(id,2,5)
> tfr10<-rnorn(1:10)
>
> mydata<-cbind(id,st,cnty,tfr10)
> print(mydata)
>     id     st  cnty  tfr10
>  [1,] "1001" "1" "001" "1.07505442756833"
>  [2,] "1002" "1" "002" "-0.882434417011687"
>  [3,] "1003" "1" "003" "2.29276525788035"
>  [4,] "1004" "1" "004" "-0.312320296652298"
>  [5,] "1005" "1" "005" "1.09001860766383"
>  [6,] "1006" "1" "006" "-0.781940988103414"
>  [7,] "1007" "1" "007" "-0.614135968631341"
>  [8,] "1008" "1" "008" "0.515142965880679"
>  [9,] "1009" "1" "009" "0.0274456168157293"
> [10,] "1010" "1" "010" "-0.538584996182184"
>
> What I want to do is get the average for of the variable "tfr10" by state.
> Based on that, I will create another calculation that will output variables.
> In other words, for each observation, calculate a new variable using the
> average at the state level. Of course, this is a simple example and will
> have 32 states, for which I do not want to create a "mean variable" for each
> state to calculate another variable and would rather do this using a loop.
>
> Or, I can potentially create a "mean" variable, but based on the
> observations at the state level using a loop. Whichever way is best and
> easiest. I hope that this example is understandable. Any help or direction
> would be greatly appreciated!!!
>
> Thanks,
>
> Carlos
>
> --
> View this message in context: http://r.789695.n4.nabble.com/looping-by-grouping-variable-tp3781580p3781580.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?



More information about the R-help mailing list