[R] Complicated For Loop (to me)

Tue Nov 10 05:00:48 CET 2009

On Nov 9, 2009, at 9:51 PM, agm. wrote:

>
> Sorry, I've been trying to work around this and just got back to  
> check my
> email.
>
> dput wasn't working too well for me because the data set also has 450
> variables and I needed more time to figure out how to properly show  
> you all
> what you needed to know.

That's not a very convincing story. Nabble lets you put files where  
they can be accessed. There are examples of nabble users doing that  
today.

> But to show you the idea, a very simple data set would be:
>
> NWEIGHT  ETHNIC   RACE   SLUNCH   DIVISION .......
> 1234            0           1         1               1
> 2345            1           1         0               5
> 3243            0           3         1               3
>   .                .           .          .                .
>   .                .           .          .                .
>   .                .           .          .                .
>   .                .           .          .                .
>
> So basically, I already have the data subset by division and race.  
> (I did
> that the inefficient way by coding it by hand)

Probably did not not need to do that:

?split

>
> But now I need to calculate the percentage of each division (by  
> race) that
> participates in SLUNCH (a 0 1 variable)

?tapply
?by

> So I am trying to avoid writing out code such as:
>
> w.cd1.s <- sum(ifelse(white.cd1$SLUNCH==1, white.cd1$NWEIGHT,
> 0))/sum(white.cd1$NWEIGHT)

Perhaps:

  by(white.cd1$NMEIGHT, white.cd1$SLUNCH, sum, na.rm=TRUE)/  
sum(white.cd1$NWEIGHT, na.rm=TRUE)

> w.cd2.s <- sum(ifelse(white.cd2$SLUNCH==1, white.cd2$NWEIGHT,
> 0))/sum(white.cd2$NWEIGHT)

> .... for all the variables.

?apply
?lapply

> One other method that I tried, which gets me the "names" i need, but  
> doesn't
> put them into a dataframe (which I am currently trying to fix) is by  
> using
> this code:
>
>
> names <- c("white","black","hispanic","asian")
> regions <- c("cd1","cd2","cd3","cd4","cd5","cd6","cd7","cd8","cd9")
> type <- c("l", "p", "r")
> name.region <- c()
> for (j in 1:length(names)){
> 	for(i in 1:length(regions)){
> 		for(k in 1:length(type)){
> 		name.holder <- paste(names[j],".",paste(regions[i],".", type[k],  
> sep=""),
> sep="")
> 		name.region <- c(name.region, name.holder)
> 		}
> 	}
> }
>
> (The "l", "p", "r" represent other variables that I am trying to do  
> the same
> thing as SLUNCH)
>
>> From here I've been trouble-shooting how to switch these named  
>> variables
> back into a data.frame context.
>
> Everyone's help has been really appreciated!  I've learned a lot  
> today that
> will hopefully move me slowly from using for loops to more efficient
> functions.  I unfortunately am still learning those and have some  
> knowledge
> about how to use loops compared to almost no knowledge of the more  
> powerful
> functions like sapply, lapply, etc.  (I'm waiting on MASS4 to be  
> returned to
> the library to read it.)
>
>
> Thanks!
>
>
> John Kane-2 wrote:
>>
>> I think that we probably need a sample database of your original  
>> data.
>> A few lines of the dataset would probably be enough as long as it was
>> fairly representative of the overall data set.  See ?dput for a way  
>> of
>> conveniently supply a sample data set.
>>
>> Otherwise off the top of my head, I would think that you could just  
>> put
>> all your subsets into a list and use lapply  but I'm simply guessing
>> without seeing the data.
>>
>> --- On Mon, 11/9/09, agm. <amurray at vt.edu> wrote:
>>
>>> From: agm. <amurray at vt.edu>
>>> Subject: Re: [R] Complicated For Loop (to me)
>>> To: r-help at r-project.org
>>> Received: Monday, November 9, 2009, 3:18 PM
>>>
>>> I've looked through ?split and run all of the code, but I
>>> am not sure that I
>>> can use it in such a way to make it do what I need.
>>> Another suggestion was
>>> using "lists", but again, I am sure that the process can do
>>> what I need, but
>>> I am not sure it would work with so many observations.
>>>
>>> I might have been too simple in my code.  Let me try
>>> to explain it more
>>> clearly:
>>>
>>> I've got a data set of 4500 observations.  I have
>>> already subset it into
>>> race/ethnicity (which I did by simple code).  Now I
>>> needed to subset each
>>> race/ethnicity again into 9 separate regions.  I again
>>> did this by simple
>>> code.
>>>
>>> The problem is now, I need to calculate a percentage for
>>> three different
>>> variables for all 9 regions for each race.  I was
>>> trying to do this through
>>> a loop command.
>>>
>>> So a snippet of my code is :
>>>
>>> names <- c("white", "black", "asian", "hispanic")
>>> for(j in 1:length(names)){
>>> for(i in 1:9){
>>> names[j].cd[i].es.wash <- subset(names[j].cd[i],
>>> SLUNCH==1)
>>> es.cd[i].names.w <-
>>> sum(names.cd[i].es.wash$NWEIGHT)/sum(names.cd[i]$NWEIGHT)
>>> }
>>> }
>>>
>>>
>>> Maybe that makes it clearer.  If not, I
>>> apologize.  Thanks for the help that
>>> I have already received.  It is greatly appreciated.
>>>
>>> Tony

David Winsemius, MD
Heritage Laboratories
West Hartford, CT