[R] Averaging within a range of values
cberry at tajo.ucsd.edu
cberry at tajo.ucsd.edu
Sun Jan 15 00:42:15 CET 2012
doggysaywhat <chwhite at ucsd.edu> writes:
> My apologies for the context problem. I'll explain.
>
> df1 is a matrix of genes labeled g1 through g5 with start positions in the
> START column and end positions in the END column.
>
> df2 is a matrix of chromatin modification values at positions along the DNA.
>
> I want to average chromatin modification values for each gene from the start
> to the end position. So this would involve pulling out all values for
> column C0 that are between pos 200 and 700 for the first gene and averaging
> them. Then, I would pull all values from 500 to 1000, and continue for each
> gene.
This type of operation is what the IRanges and GenomicRanges packages
were developed for.
Suggest you install both (from bioconductor.org), then review
http://www.bioconductor.org/help/course-materials/2011/CSAMA/Tuesday/Morning%20Talks/IRangesLecture.pdf
and the vignettes for those packages and the help page for
'findOverlaps'.
If that doesn't solve your problem, post to the bioconductor list.
HTH,
Chuck
>
> The example I gave previously was a short one, but I will be doing this for
> around 1000 genes with different positions. This is why just removing one
> group.
>
> This was something I tried to come up with that allowed me to use start and
> end positions. Your advice to use the cut is working.
>
> start<-df1[,2]
> end<-df1[,3]
>
> while(i<length(start)){
> i<-i+1
> print(cut(df2[,1],c(start[i],end[i])))
> }
>
> These were the results
>
> [1] <NA> (200,700] <NA> <NA> <NA> <NA> <NA>
> [8] <NA> <NA> <NA> <NA> <NA> <NA> <NA>
> [15] <NA> <NA> <NA> <NA> <NA>
> Levels: (200,700]
> [1] <NA> <NA> (500,1e+03] (500,1e+03] <NA> <NA>
> [7] <NA> <NA> <NA> <NA> <NA> <NA>
> [13] <NA> <NA> <NA> <NA> <NA> <NA>
> [19] <NA>
> Levels: (500,1e+03]
> [1] <NA> <NA> <NA> <NA> <NA>
> [6] (2e+03,3e+03] (2e+03,3e+03] <NA> <NA> <NA>
> [11] <NA> <NA> <NA> <NA> <NA>
> [16] <NA> <NA> <NA> <NA>
> Levels: (2e+03,3e+03]
> [1] <NA> <NA> <NA> <NA> <NA>
> [6] <NA> <NA> <NA> <NA> (4e+03,6e+03]
> [11] (4e+03,6e+03] (4e+03,6e+03] (4e+03,6e+03] <NA> <NA>
> [16] <NA> <NA> <NA> <NA>
> Levels: (4e+03,6e+03]
> [1] <NA> <NA> <NA> <NA> <NA>
> [6] <NA> <NA> <NA> <NA> <NA>
> [11] <NA> <NA> <NA> <NA> <NA>
> [16] (7e+03,8e+03] (7e+03,8e+03] <NA> <NA>
> Levels: (7e+03,8e+03]
>
>
> This is producing the right bins for each of the results, but I'm not sure
> how to put this into a data frame. When I did this.
>
>
> start<-df1[,2]
> end<-df1[,3]
>
> while(i<length(start)){
> i<-i+1
> bins<-(cut(df2[,1],c(start[i],end[i])))
> }
>
> the bins variable was the last level.
> Is there a way to assign the results of the of the while statement to a
> dataframe?
>
> Many thanks
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Averaging-within-a-range-of-values-tp4291958p4294061.html
> Sent from the R help mailing list archive at Nabble.com.
>
--
Charles C. Berry Dept of Family/Preventive Medicine
cberry at ucsd edu UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
More information about the R-help
mailing list