[R] Complicated For Loop (to me)
David Winsemius
dwinsemius at comcast.net
Tue Nov 10 05:00:48 CET 2009
On Nov 9, 2009, at 9:51 PM, agm. wrote:
>
> Sorry, I've been trying to work around this and just got back to
> check my
> email.
>
> dput wasn't working too well for me because the data set also has 450
> variables and I needed more time to figure out how to properly show
> you all
> what you needed to know.
That's not a very convincing story. Nabble lets you put files where
they can be accessed. There are examples of nabble users doing that
today.
> But to show you the idea, a very simple data set would be:
>
> NWEIGHT ETHNIC RACE SLUNCH DIVISION .......
> 1234 0 1 1 1
> 2345 1 1 0 5
> 3243 0 3 1 3
> . . . . .
> . . . . .
> . . . . .
> . . . . .
>
> So basically, I already have the data subset by division and race.
> (I did
> that the inefficient way by coding it by hand)
Probably did not not need to do that:
?split
>
> But now I need to calculate the percentage of each division (by
> race) that
> participates in SLUNCH (a 0 1 variable)
?tapply
?by
> So I am trying to avoid writing out code such as:
>
> w.cd1.s <- sum(ifelse(white.cd1$SLUNCH==1, white.cd1$NWEIGHT,
> 0))/sum(white.cd1$NWEIGHT)
Perhaps:
by(white.cd1$NMEIGHT, white.cd1$SLUNCH, sum, na.rm=TRUE)/
sum(white.cd1$NWEIGHT, na.rm=TRUE)
> w.cd2.s <- sum(ifelse(white.cd2$SLUNCH==1, white.cd2$NWEIGHT,
> 0))/sum(white.cd2$NWEIGHT)
> .... for all the variables.
?apply
?lapply
> One other method that I tried, which gets me the "names" i need, but
> doesn't
> put them into a dataframe (which I am currently trying to fix) is by
> using
> this code:
>
>
> names <- c("white","black","hispanic","asian")
> regions <- c("cd1","cd2","cd3","cd4","cd5","cd6","cd7","cd8","cd9")
> type <- c("l", "p", "r")
> name.region <- c()
> for (j in 1:length(names)){
> for(i in 1:length(regions)){
> for(k in 1:length(type)){
> name.holder <- paste(names[j],".",paste(regions[i],".", type[k],
> sep=""),
> sep="")
> name.region <- c(name.region, name.holder)
> }
> }
> }
>
> (The "l", "p", "r" represent other variables that I am trying to do
> the same
> thing as SLUNCH)
>
>> From here I've been trouble-shooting how to switch these named
>> variables
> back into a data.frame context.
>
> Everyone's help has been really appreciated! I've learned a lot
> today that
> will hopefully move me slowly from using for loops to more efficient
> functions. I unfortunately am still learning those and have some
> knowledge
> about how to use loops compared to almost no knowledge of the more
> powerful
> functions like sapply, lapply, etc. (I'm waiting on MASS4 to be
> returned to
> the library to read it.)
>
>
> Thanks!
>
>
> John Kane-2 wrote:
>>
>> I think that we probably need a sample database of your original
>> data.
>> A few lines of the dataset would probably be enough as long as it was
>> fairly representative of the overall data set. See ?dput for a way
>> of
>> conveniently supply a sample data set.
>>
>> Otherwise off the top of my head, I would think that you could just
>> put
>> all your subsets into a list and use lapply but I'm simply guessing
>> without seeing the data.
>>
>> --- On Mon, 11/9/09, agm. <amurray at vt.edu> wrote:
>>
>>> From: agm. <amurray at vt.edu>
>>> Subject: Re: [R] Complicated For Loop (to me)
>>> To: r-help at r-project.org
>>> Received: Monday, November 9, 2009, 3:18 PM
>>>
>>> I've looked through ?split and run all of the code, but I
>>> am not sure that I
>>> can use it in such a way to make it do what I need.
>>> Another suggestion was
>>> using "lists", but again, I am sure that the process can do
>>> what I need, but
>>> I am not sure it would work with so many observations.
>>>
>>> I might have been too simple in my code. Let me try
>>> to explain it more
>>> clearly:
>>>
>>> I've got a data set of 4500 observations. I have
>>> already subset it into
>>> race/ethnicity (which I did by simple code). Now I
>>> needed to subset each
>>> race/ethnicity again into 9 separate regions. I again
>>> did this by simple
>>> code.
>>>
>>> The problem is now, I need to calculate a percentage for
>>> three different
>>> variables for all 9 regions for each race. I was
>>> trying to do this through
>>> a loop command.
>>>
>>> So a snippet of my code is :
>>>
>>> names <- c("white", "black", "asian", "hispanic")
>>> for(j in 1:length(names)){
>>> for(i in 1:9){
>>> names[j].cd[i].es.wash <- subset(names[j].cd[i],
>>> SLUNCH==1)
>>> es.cd[i].names.w <-
>>> sum(names.cd[i].es.wash$NWEIGHT)/sum(names.cd[i]$NWEIGHT)
>>> }
>>> }
>>>
>>>
>>> Maybe that makes it clearer. If not, I
>>> apologize. Thanks for the help that
>>> I have already received. It is greatly appreciated.
>>>
>>> Tony
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
More information about the R-help
mailing list