[R] Sliding window over irregular intervals
David Winsemius
dwinsemius at comcast.net
Mon Mar 30 16:49:28 CEST 2009
The window you describe is not one I would call sliding and the
intervals are regular with an irregular number of events within the
windows. One way would be to use the results of trunc(pos/10000) as a
factor with tapply:
(Related functions are floor() and round(), but your pos values appear
to be positive, so there should not be problems with how they work
across 0)
After creating a dataframe, dta, try something like:
> tapply(dta$xpehh, as.factor(trunc(dta$pos/10000)), min)
1579 1580 1581 1582
-0.153413 -0.367296 0.302555 0.090302
--
David Winsemius
On Mar 30, 2009, at 9:01 AM, Irene Gallego Romero wrote:
> Dear all,
>
> I have some very big data files that look something like this:
>
> id chr pos ihh1 ihh2 xpehh
> rs5748748 22 15795572 0.0230222 0.0268394 -0.153413
> rs5748755 22 15806401 0.0186084 0.0268672 -0.367296
> rs2385785 22 15807037 0.0198204 0.0186616 0.0602451
> rs1981707 22 15809384 0.0299685 0.0176768 0.527892
> rs1981708 22 15809434 0.0305465 0.0187227 0.489512
> rs11914222 22 15810040 0.0307183 0.0172399 0.577633
> rs4819923 22 15813210 0.02707 0.0159736 0.527491
> rs5994105 22 15813888 0.025202 0.0141296 0.578651
> rs5748760 22 15814084 0.0242894 0.0146486 0.505691
> rs2385786 22 15816846 0.0173057 0.0107816 0.473199
> rs1990483 22 15817310 0.0176641 0.0130525 0.302555
> rs5994110 22 15821524 0.0178411 0.0129001 0.324267
> rs17733785 22 15822154 0.0201797 0.0182093 0.102746
> rs7287116 22 15823131 0.0201993 0.0179028 0.12069
> rs5748765 22 15825502 0.0193195 0.0176513 0.090302
>
> I'm trying to extract the maximum and minimum xpehh (last column)
> values within a sliding window (non overlapping), of width 10000
> (calculated relative to pos (third column)). However, as you can
> tell from the brief excerpt here, although all possible intervals
> will probably be covered by at least one data point, the number of
> data points will be variable (incidentally, if anyone knows of a way
> to obtain this number, that would be lovely), as will the spacing
> between them. Furthermore, values of chr (second column) will range
> from 1 to 22, and values of pos will be overlapping across them; I
> want to evaluate the window separately for each value of chr.
>
> I've looked at the help and FAQ on sliding windows, but I'm a
> relative newcomer to R and cannot find a way to do what I need to
> do. Everything I've managed to unearth so far seems geared towards
> smoother time series. Any help on this problem would be vastly
> appreciated.
>
> Thanks,
> Irene
>
> --
> Irene Gallego Romero
> Leverhulme Centre for Human Evolutionary Studies
> University of Cambridge
> Fitzwilliam St
> Cambridge
> CB2 1QH
> UK
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
More information about the R-help
mailing list