[R] apologes if you already saw this :efficiency question

markleeds at verizon.net markleeds at verizon.net
Thu Jul 6 07:52:53 CEST 2006

>From: jim holtman <jholtman at gmail.com>
>Date: Wed Jul 05 21:49:33 CDT 2006
>To: "markleeds at verizon.net" <markleeds at verizon.net>
>Cc: r-help at stat.math.ethz.ch
>Subject: Re: [R] apologes if you already saw this :efficiency question

jim : i don't want to take advantage of your kindness and
generosity but when you have time, could you think about
the following.

remember the function gabor gave me to pick out the column
of a dataframe ( for the same named columns ) that hadt
the most non zero elements. 

it was tapply(seq(DF),names(Df),f) where f was

function(x) x[which.max(colSums(Df[x]!=0)]

I was hoping that it wouldn't be so difficult to change
the criteria to the following.

rather than pick out the column with the maximum # of nonzero elements, I want to take the average of the same named columns but
don't include zero valued elements that are in any rows. So, the resultant matrix would be the unique names and the columns would
be averages of the samed named columns but if a column had a zero
in one of it s rows, then that zero wouldn't be included in the average. Basically, this is because in this case,
zero doesn't really mean 0. it means leave it out because it's not involved.

i'm sorry to bother you and it's not urgwnt and i won't
start bothering you all the time. i am very aware of (
not in the R sense but in other ways ) how generosity can
get taken advantage of so that's the las tthing I want to do.
Thanks a lot. also, sometimes examples help, so
, if you need one, i can definitely make one up. actually,
i will make one up and send you
in the next email. i want to send this because if
i write too long an email my email dies and i lose it.


>Is this what you want to do? > x <- data.frame(a=paste(letters[1:10], 1:10), 
>+ b=paste(letters[11:20], 1:10), c=paste(LETTERS[1:10], 1:10))
>> x
>      a    b    c
>1   a 1  k 1  A 1
>2   b 2  l 2  B 2
>3   c 3  m 3  C 3
>4   d 4  n 4  D 4
>5   e 5  o 5  E 5
>6   f 6  p 6  F 6
>7   g 7  q 7  G 7
>8   h 8  r 8  H 8
>9   i 9  s 9  I 9
>10 j 10 t 10 J 10
>> (y <- as.vector(t(x[,1:2])))
> [1] "a 1"  "k 1"  "b 2"  "l 2"  "c 3"  "m 3"  "d 4"  "n 4"  "e 5"  "o 5"  "f 6"  "p 6"  "g 7"  "q 7" 
>[15] "h 8"  "r 8"  "i 9"  "s 9"  "j 10" "t 10"
>> gsub(" ", "", y)
> [1] "a1"  "k1"  "b2"  "l2"  "c3"  "m3"  "d4"  "n4"  "e5"  "o5"  "f6"  "p6"  "g7"  "q7"  "h8"  "r8" 
>[17] "i9"  "s9"  "j10" "t10"
> On 7/5/06, markleeds at verizon.net <markleeds at verizon.net> wrote:hi everyone : i'm not sure if my previous mail about
>this got sent. i was typing and
>erroneosuyl hit a button and lost what i was typing.
>anyway, i have the code below ( it works ) in which i run through the rows of a dataframe, taking out the first two
>fields which are characters strings ( with some extra spacing so
>i yuse gsub) and appending these character strings to a list so that i can build one big list.
>there are 17,000 rows so i was hoping there might be a ( even just slightly. it doesn't have to be incresible improvement ) more efficient way to do this. I also think that remember someone saying that using the c command to make something bigger is not a good idea.
>the code is below. thanks.
>             for paircounter in 1:nrow(tempdata) {
>                firsstock<-gsub(" ","",tempdata[paircounter,1]
>                secondstock<-gsub(" ","",tempdata[paircounter,2]
>                 if ( paircounter == 1 ) {
>                     stocklist<-c(firststock,secondstock)
>                  } else {
>                      stocklist<(stocklist,firststock,secondstock)
>                  }
>                }
>R-help at stat.math.ethz.ch mailing list
>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>Jim Holtman
>Cincinnati, OH
>+1 513 646 9390 (Cell)
>+1 513 247 0281 (Home)
>What is the problem you are trying to solve?

More information about the R-help mailing list