[R] Averaging within a range of values

doggysaywhat chwhite at ucsd.edu
Sat Jan 14 04:04:33 CET 2012


My apologies for the context problem.  I'll explain.  

df1 is a matrix of genes labeled g1 through g5 with start positions in the
START column and end positions in the END column.

df2 is a matrix of chromatin modification values at positions along the DNA.  

I want to average chromatin modification values for each gene from the start
to the end position.  So this would involve pulling out all values for
column C0 that are between pos 200 and 700 for the first gene and averaging
them.  Then, I would pull all values from 500 to 1000, and continue for each
gene.  

The example I gave previously was a short one, but I will be doing this for
around 1000 genes with different positions.  This is why just removing one
group.

This was something I tried to come up with that allowed me to use start and
end positions.  Your advice to use the cut is working.  

start<-df1[,2]
end<-df1[,3]

while(i<length(start)){
          i<-i+1
           print(cut(df2[,1],c(start[i],end[i])))
}

These were the results

 [1] <NA>      (200,700] <NA>      <NA>      <NA>      <NA>      <NA>     
 [8] <NA>      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>     
[15] <NA>      <NA>      <NA>      <NA>      <NA>     
Levels: (200,700]
 [1] <NA>        <NA>        (500,1e+03] (500,1e+03] <NA>        <NA>       
 [7] <NA>        <NA>        <NA>        <NA>        <NA>        <NA>       
[13] <NA>        <NA>        <NA>        <NA>        <NA>        <NA>       
[19] <NA>       
Levels: (500,1e+03]
 [1] <NA>          <NA>          <NA>          <NA>          <NA>         
 [6] (2e+03,3e+03] (2e+03,3e+03] <NA>          <NA>          <NA>         
[11] <NA>          <NA>          <NA>          <NA>          <NA>         
[16] <NA>          <NA>          <NA>          <NA>         
Levels: (2e+03,3e+03]
 [1] <NA>          <NA>          <NA>          <NA>          <NA>         
 [6] <NA>          <NA>          <NA>          <NA>          (4e+03,6e+03]
[11] (4e+03,6e+03] (4e+03,6e+03] (4e+03,6e+03] <NA>          <NA>         
[16] <NA>          <NA>          <NA>          <NA>         
Levels: (4e+03,6e+03]
 [1] <NA>          <NA>          <NA>          <NA>          <NA>         
 [6] <NA>          <NA>          <NA>          <NA>          <NA>         
[11] <NA>          <NA>          <NA>          <NA>          <NA>         
[16] (7e+03,8e+03] (7e+03,8e+03] <NA>          <NA>         
Levels: (7e+03,8e+03]


This is producing the right bins for each of the results, but I'm not sure
how to put this into a data frame.  When I did this.


start<-df1[,2]
end<-df1[,3]

while(i<length(start)){
          i<-i+1
           bins<-(cut(df2[,1],c(start[i],end[i])))
}

the bins variable was the last level.  
Is there a way to assign the results of the of the while statement to a
dataframe?

Many thanks

--
View this message in context: http://r.789695.n4.nabble.com/Averaging-within-a-range-of-values-tp4291958p4294061.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list