[R] Averaging within a range of values

Sun Jan 15 00:42:15 CET 2012

doggysaywhat <chwhite at ucsd.edu> writes:

> My apologies for the context problem.  I'll explain.  
>
> df1 is a matrix of genes labeled g1 through g5 with start positions in the
> START column and end positions in the END column.
>
> df2 is a matrix of chromatin modification values at positions along the DNA.  
>
> I want to average chromatin modification values for each gene from the start
> to the end position.  So this would involve pulling out all values for
> column C0 that are between pos 200 and 700 for the first gene and averaging
> them.  Then, I would pull all values from 500 to 1000, and continue for each
> gene.  

This type of operation is what the IRanges and GenomicRanges packages
were developed for.

Suggest you install both (from bioconductor.org), then review 

http://www.bioconductor.org/help/course-materials/2011/CSAMA/Tuesday/Morning%20Talks/IRangesLecture.pdf

and the vignettes for those packages and the help page for
'findOverlaps'.

If that doesn't solve your problem, post to the bioconductor list.

HTH,

Chuck

>
> The example I gave previously was a short one, but I will be doing this for
> around 1000 genes with different positions.  This is why just removing one
> group.
>
> This was something I tried to come up with that allowed me to use start and
> end positions.  Your advice to use the cut is working.  
>
> start<-df1[,2]
> end<-df1[,3]
>
> while(i<length(start)){
>           i<-i+1
>            print(cut(df2[,1],c(start[i],end[i])))
> }
>
> These were the results
>
>  [1] <NA>      (200,700] <NA>      <NA>      <NA>      <NA>      <NA>     
>  [8] <NA>      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>     
> [15] <NA>      <NA>      <NA>      <NA>      <NA>     
> Levels: (200,700]
>  [1] <NA>        <NA>        (500,1e+03] (500,1e+03] <NA>        <NA>       
>  [7] <NA>        <NA>        <NA>        <NA>        <NA>        <NA>       
> [13] <NA>        <NA>        <NA>        <NA>        <NA>        <NA>       
> [19] <NA>       
> Levels: (500,1e+03]
>  [1] <NA>          <NA>          <NA>          <NA>          <NA>         
>  [6] (2e+03,3e+03] (2e+03,3e+03] <NA>          <NA>          <NA>         
> [11] <NA>          <NA>          <NA>          <NA>          <NA>         
> [16] <NA>          <NA>          <NA>          <NA>         
> Levels: (2e+03,3e+03]
>  [1] <NA>          <NA>          <NA>          <NA>          <NA>         
>  [6] <NA>          <NA>          <NA>          <NA>          (4e+03,6e+03]
> [11] (4e+03,6e+03] (4e+03,6e+03] (4e+03,6e+03] <NA>          <NA>         
> [16] <NA>          <NA>          <NA>          <NA>         
> Levels: (4e+03,6e+03]
>  [1] <NA>          <NA>          <NA>          <NA>          <NA>         
>  [6] <NA>          <NA>          <NA>          <NA>          <NA>         
> [11] <NA>          <NA>          <NA>          <NA>          <NA>         
> [16] (7e+03,8e+03] (7e+03,8e+03] <NA>          <NA>         
> Levels: (7e+03,8e+03]
>
>
> This is producing the right bins for each of the results, but I'm not sure
> how to put this into a data frame.  When I did this.
>
>
> start<-df1[,2]
> end<-df1[,3]
>
> while(i<length(start)){
>           i<-i+1
>            bins<-(cut(df2[,1],c(start[i],end[i])))
> }
>
> the bins variable was the last level.  
> Is there a way to assign the results of the of the while statement to a
> dataframe?
>
> Many thanks
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Averaging-within-a-range-of-values-tp4291958p4294061.html
> Sent from the R help mailing list archive at Nabble.com.
>

-- 
Charles C. Berry                            Dept of Family/Preventive Medicine
cberry at ucsd edu			    UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901