[R] How to Store the executed values in a dataframe & rle function

jim holtman jholtman at gmail.com
Wed Sep 28 17:37:44 CEST 2011


Here one approach:

> x <- read.table(textConnection("Chr start end sample1 sample2
+ chr2 9896633 9896683 0 0
+ chr2 9896639 9896690 0 0
+ chr2 14314039 14314098 0 -0.35
+ chr2 14404467 14404502 0 -0.35
+ chr2 14421718 14421777 -0.43 -0.35
+ chr2 16031710 16031769 -0.43 -0.35
+ chr2 16036178 16036237 -0.43 -0.35
+ chr2 16048665 16048724 -0.43 -0.35
+ chr2 37491676 37491735 0 0
+ chr2 37702947 37703009 0 0"), header = TRUE, as.is = TRUE)
> closeAllConnections()
>
> result <- lapply(c('sample1', 'sample2'), function(.samp){
+     # split by breaks in the values
+     .grps <- split(x, cumsum(c(0, diff(x[[.samp]]) != 0)))
+
+     # combine the list of dataframes
+     .range <- do.call(rbind, lapply(.grps, function(.set){
+         # create a dataframe of the results
+         data.frame(Sample = .samp
+                    , Chr = .set$Chr[1L]
+                    , Start = min(.set$start)
+                    , End = max(.set$end)
+                    , Values = .set[[.samp]][1L]
+                    , Probes = nrow(.set)
+                    )
+         }))
+     })
> # put the list of dataframes together
> result <- do.call(rbind, result)
> result
    Sample  Chr    Start      End Values Probes
0  sample1 chr2  9896633 14404502   0.00      4
1  sample1 chr2 14421718 16048724  -0.43      4
2  sample1 chr2 37491676 37703009   0.00      2
01 sample2 chr2  9896633  9896690   0.00      2
11 sample2 chr2 14314039 16048724  -0.35      6
21 sample2 chr2 37491676 37703009   0.00      2
>


On Mon, Sep 26, 2011 at 10:30 AM, sujitha <viritha.k at gmail.com> wrote:
> Hi group,
>
> This is how my test file looks like:
> Chr start end sample1 sample2
> chr2 9896633 9896683 0 0
> chr2 9896639 9896690 0 0
> chr2 14314039 14314098 0 -0.35
> chr2 14404467 14404502 0 -0.35
> chr2 14421718 14421777 -0.43 -0.35
> chr2 16031710 16031769 -0.43 -0.35
> chr2 16036178 16036237 -0.43 -0.35
> chr2 16048665 16048724 -0.43 -0.35
> chr2 37491676 37491735 0 0
> chr2 37702947 37703009 0 0
>
> This is the output that I am expecting:
> Sample Chr Start End Values Probes
> sample1 chr2 9896633 14404502 0 4
> sample1 chr2 14421718 16048724 -0.43 4
> sample1 chr2 37491676 37703001 0 2
> sample2 chr2 9896633 9896690 0 2
> sample2  chr2 14314039 16048724 -0.35 6
> sample2 chr2 37491676 37703009 0 2
>
> Here the Chr value is same but can be any other value aswell so unique among
> the similar values. The Start for the first line would be the least value
> until values are similiar (4) then the end would be highest value. The
> values is the unique value among the common values and probes is number of
> similar values.
>
> Code:
>>m<-read.table("test.txt",sep='\t',header=TRUE,colClasses=c('character','integer','integer','numeric','numeric'))
> #reading the test file
>>s<-data.frame(c(rle(m$Sample1)[[2]],rle(m$Sample2)[[2]]),c(rle(m$Sample1)[[1]],rle(m$Sample2)[[1]]))
> # to get the last 2 columns
>> names(s)=c("Values","Probes")
>>G=1
>> for(i in 1:length(s$Probes)){
> + if(G==1){first<-unique(m$Chr[G:s$Probes[i]])
> + second<-min(m$Start[G:s$Probes[i]])
> + third<-max(m$End[G:s$Probes[i]])
> + c<-cbind(first,second,third,s$Values[i],s$Probes[i])
> + print (c)
> + G=(G+s$Probes[i])}
> + else if((G-1) < length(m$Sample1)) {
> + first<-unique(m$Chr[G:(G+s$Probes[i]-1)])
> + second<-min(m$Start[G:(G+s$Probes[i]-1)])
> + third<-max(m$End[G:(G+s$Probes[i]-1)])
> + c<-cbind(first,second,third,s$Values[i],s$Probes[i])
> + print (c)
> + G=(G+s$Probes[i])}
> + else {
> + G=1
> + first<-unique(m$Chr[G:s$Probes[i]])
> + second<-min(m$Start[G:s$Probes[i]])
> + third<-max(m$End[G:s$Probes[i]])
> + c<-cbind(first,second,third,s$Values[i],s$Probes[i])
> + print (c)
> + G=(G+s$Probes[i])}
> + }
> so the output is:
>     first  second    third
> [1,] "chr2" "9896633" "14404502" "0" "4"
>     first  second     third
> [1,] "chr2" "14421718" "16048724" "-0.43" "4"
>     first  second     third
> [1,] "chr2" "37491676" "37703009" "0" "2"
>     first  second    third
> [1,] "chr2" "9896633" "9896690" "0" "2"
>     first  second     third
> [1,] "chr2" "14314039" "16048724" "-0.35" "6"
>     first  second     third
> [1,] "chr2" "37491676" "37703009" "0" "2"
>
> I get almost the required output but just need 3 modifications to this code:
> 1) Since this is just a small part of the file (with 2 samples), but my
> actual file has 150 samples, so how do I write rle function for that?
> 2) How do I store all the executed c values as a dataframe (here I am just
> printing the values)?
> 3) How do I include sample name in execution?
> Waiting for your reply ,
> Thanks,
> Suji
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/How-to-Store-the-executed-values-in-a-dataframe-rle-function-tp3843944p3843944.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?



More information about the R-help mailing list