[BioC] extracting regions of consecutive values from dataframe

Sat May 31 02:51:48 CEST 2008

Hi Niels,

You can do this:

   df0 <- data.frame(
            Position=c(2, 5, 8, 9, 15, 17, 20, 21, 24, 25),
            State=as.character(c(0, 6, 6, 6, 1, 1, 0, 3, 3, 2))
          )
   x <- split(df0$Position, df0$State)
   df1 <- data.frame(start=sapply(x, min), end=sapply(x, max), State=names(x))

Now 'df1' contains one row per state with the 'start' and 'end' positions
for this state:

   > df1
     start end State
   0     2  20     0
   1    15  17     1
   2    25  25     2
   3    21  24     3
   6     5   9     6

Note that state 0 seems to be special in your data because the positions at
which it occurs are interlaced with the positions at which other states occur.

Cheers,
H.

Niels Høgslund wrote:
> Hi,
> 
> I have a lot of data frames looking like this (SNP chromosome position 
> and a local state ID):
> 
>     Position    State
> 1    3088998    0
> 2    4215064    6
> 3    5034491    6
> 4    5211912    6
> 5    5697261    6
> 6    5809727    0
> 7    6818872    NA
> 8    6867391    0
> 9    7346904    1
> 10    7347824    1
> 11    7358232    1
> 12    7833686    1
> 13    8295795    0
> 14    10755448    0
> 15    10919778    NA
> 16    11217061    3
> 17    12463350    3
> 18    13678626    0
> 19    13892992    0
> 20    13965452    0
> 21    13969222    0
> ........
> 
> Now, I want to collapse or summarize consecutive occurences of a state 
> into a region with a start+end position,
> i.e. something like this:
> 
>     Position    State   
> 2    4215064    6   
> 5    5697261    6   
> 9    73469041    1   
> 12    7833686    1   
> 16    11217061    3   
> 17    12463350    3   
> 
> Can anyone help me with this?
> 
> Thanks in advance.....
> 
> 
> 
> Niels Høgslund
> BiRC -Bioinformatics Research Center
> Høegh-Guldbergs Gade 10
> DK-8000 Århus C
> Denmark
> phone: +45 89423100
> mail: nj at birc.au.dk
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>