[BioC] extracting regions of consecutive values from dataframe

Niels Høgslund nj at birc.au.dk
Fri May 30 12:35:25 CEST 2008


Hi,

I have a lot of data frames looking like this (SNP chromosome position  
and a local state ID):

	Position	State
1	3088998	0
2	4215064	6
3	5034491	6
4	5211912	6
5	5697261	6
6	5809727	0
7	6818872	NA
8	6867391	0
9	7346904	1
10	7347824	1
11	7358232	1
12	7833686	1
13	8295795	0
14	10755448	0
15	10919778	NA
16	11217061	3
17	12463350	3
18	13678626	0
19	13892992	0
20	13965452	0
21	13969222	0
........

Now, I want to collapse or summarize consecutive occurences of a state  
into a region with a start+end position,
i.e. something like this:

	Position	State	
2	4215064	6	
5	5697261	6	
9	73469041	1	
12	7833686	1	
16	11217061	3	
17	12463350	3	

Can anyone help me with this?

Thanks in advance.....



Niels Høgslund
BiRC -Bioinformatics Research Center
Høegh-Guldbergs Gade 10
DK-8000 Århus C
Denmark
phone: +45 89423100
mail: nj at birc.au.dk



More information about the Bioconductor mailing list