[BioC] extracting regions of consecutive values from dataframe
Sean Davis
sdavis2 at mail.nih.gov
Fri May 30 13:47:47 CEST 2008
On Fri, May 30, 2008 at 6:35 AM, Niels Høgslund <nj at birc.au.dk> wrote:
> Hi,
>
> I have a lot of data frames looking like this (SNP chromosome position and a
> local state ID):
>
> Position State
> 1 3088998 0
> 2 4215064 6
> 3 5034491 6
> 4 5211912 6
> 5 5697261 6
> 6 5809727 0
> 7 6818872 NA
> 8 6867391 0
> 9 7346904 1
> 10 7347824 1
> 11 7358232 1
> 12 7833686 1
> 13 8295795 0
> 14 10755448 0
> 15 10919778 NA
> 16 11217061 3
> 17 12463350 3
> 18 13678626 0
> 19 13892992 0
> 20 13965452 0
> 21 13969222 0
> ........
>
> Now, I want to collapse or summarize consecutive occurences of a state into
> a region with a start+end position,
> i.e. something like this:
>
> Position State
> 2 4215064 6
> 5 5697261 6
> 9 73469041 1
> 12 7833686 1
> 16 11217061 3
> 17 12463350 3
>
> Can anyone help me with this?
The rle() function is one way to do this. You will need to write a
little wrapper function to do exactly what you want, but rle() should
get you going.
Sean
More information about the Bioconductor
mailing list