[R] How to Store the executed values in a dataframe & rle function

jim holtman jholtman at gmail.com
Wed Sep 28 20:40:51 CEST 2011


The solution that I sent will handle the 150 different samples; just
list the column names in the argument to the top 'lapply'.  You don't
need the 'rle' in my approach.

On Wed, Sep 28, 2011 at 2:13 PM, viritha k <viritha.k at gmail.com> wrote:
> Hi,
> This is the code that I wrote for 3 samples:
> code:
>>m<-read.table("test.txt",sep='\t',header=TRUE,colClasses=c('character','integer','integer','numeric','numeric','numeric'))
>>
>> s<-data.frame(c(rle(m$Sample1)[[2]],rle(m$Sample2)[[2]],rle(m$Sample3)[[2]]),c(rle(m$Sample1)[[1]],rle(m$Sample2)[[1]],rle(m$Sample3)[[1]]))
>
>> names(s)=c("Values","Probes")
>>
>> c<-data.frame(Sample=character(s$Probes),Chr=character(s$Probes),Start=numeric(s$Probes),End=numeric(s$Probes),Values=numeric(s$Probes),Probes=numeric(s$Probes),stringsAsFactors=FALSE)
>> G=1
>> n=4
>
>> for(i in 1:length(s$Probes)){
>
> + if(G==1){c[i,1]<-names(m[n])
> + c[i,2]<-unique(m$Chr[G:s$Probes[i]])
> + c[i,3]<-min(m$Start[G:s$Probes[i]])
> + c[i,4]<-max(m$End[G:s$Probes[i]])
> + c[i,]<-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i])
>
> + G=(G+s$Probes[i])}
> + else if((G-1) < length(m$Sample1)) {
>
> + c[i,1]<-names(m[n])
> + c[i,2]<-unique(m$Chr[G:(G+s$Probes[i]-1)])
> + c[i,3]<-min(m$Start[G:(G+s$Probes[i]-1)])
> + c[i,4]<-max(m$End[G:(G+s$Probes[i]-1)])
> + c[i,]<-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i])
>
> + G=(G+s$Probes[i])}
> + else {
> + G=1
>
> + n=n+1
> +  c[i,1]<-names(m[n])
> + c[i,2]<-unique(m$Chr[G:s$Probes[i]])
> + c[i,3]<-min(m$Start[G:s$Probes[i]])
> + c[i,4]<-max(m$End[G:s$Probes[i]])
> + c[i,]<-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i])
>
> + G=(G+s$Probes[i])}}
>
>> c
>
>     Sample  Chr    Start      End Values Probes
>
> 1  Sample1 chr2  9896633 14404502      0      4
> 2  Sample1 chr2 14421718 16048724  -0.43      4
> 3  Sample1 chr2 37491676 37703009      0      2
> 4  Sample2 chr2  9896633  9896690      0      2
> 5  Sample2 chr2 14314039 16048724  -0.35      6
> 6  Sample2 chr2 37491676 37703009      0      2
> 7  Sample3 chr2  9896633 14314098      0      3
> 8  Sample3 chr2 14404467 16031769   0.32      3
> 9  Sample3 chr2 16036178 37491735   0.45      3
> 10 Sample3 chr2 37702947 37703009      0      1
>
>
> The problem that I am facing is for expanding rle function for values and
> probes.
> Defintely your code looks simpler, but I would like to read the file by just
> giving the name of the file as written in my code because my original file
> contains 150 samples,but how to use lapply or rle function for 150 such
> samples, if my file contain 150 samples similiar to sample1 and sample2.
>
> waiting for your reply,
> Thanks,
> Suji
>
> On Wed, Sep 28, 2011 at 11:37 AM, jim holtman <jholtman at gmail.com> wrote:
>>
>> Here one approach:
>>
>> > x <- read.table(textConnection("Chr start end sample1 sample2
>> + chr2 9896633 9896683 0 0
>> + chr2 9896639 9896690 0 0
>> + chr2 14314039 14314098 0 -0.35
>> + chr2 14404467 14404502 0 -0.35
>> + chr2 14421718 14421777 -0.43 -0.35
>> + chr2 16031710 16031769 -0.43 -0.35
>> + chr2 16036178 16036237 -0.43 -0.35
>> + chr2 16048665 16048724 -0.43 -0.35
>> + chr2 37491676 37491735 0 0
>> + chr2 37702947 37703009 0 0"), header = TRUE, as.is = TRUE)
>> > closeAllConnections()
>> >
>> > result <- lapply(c('sample1', 'sample2'), function(.samp){
>> +     # split by breaks in the values
>> +     .grps <- split(x, cumsum(c(0, diff(x[[.samp]]) != 0)))
>> +
>> +     # combine the list of dataframes
>> +     .range <- do.call(rbind, lapply(.grps, function(.set){
>> +         # create a dataframe of the results
>> +         data.frame(Sample = .samp
>> +                    , Chr = .set$Chr[1L]
>> +                    , Start = min(.set$start)
>> +                    , End = max(.set$end)
>> +                    , Values = .set[[.samp]][1L]
>> +                    , Probes = nrow(.set)
>> +                    )
>> +         }))
>> +     })
>> > # put the list of dataframes together
>> > result <- do.call(rbind, result)
>> > result
>>    Sample  Chr    Start      End Values Probes
>> 0  sample1 chr2  9896633 14404502   0.00      4
>> 1  sample1 chr2 14421718 16048724  -0.43      4
>> 2  sample1 chr2 37491676 37703009   0.00      2
>> 01 sample2 chr2  9896633  9896690   0.00      2
>> 11 sample2 chr2 14314039 16048724  -0.35      6
>> 21 sample2 chr2 37491676 37703009   0.00      2
>> >
>>
>>
>> On Mon, Sep 26, 2011 at 10:30 AM, sujitha <viritha.k at gmail.com> wrote:
>> > Hi group,
>> >
>> > This is how my test file looks like:
>> > Chr start end sample1 sample2
>> > chr2 9896633 9896683 0 0
>> > chr2 9896639 9896690 0 0
>> > chr2 14314039 14314098 0 -0.35
>> > chr2 14404467 14404502 0 -0.35
>> > chr2 14421718 14421777 -0.43 -0.35
>> > chr2 16031710 16031769 -0.43 -0.35
>> > chr2 16036178 16036237 -0.43 -0.35
>> > chr2 16048665 16048724 -0.43 -0.35
>> > chr2 37491676 37491735 0 0
>> > chr2 37702947 37703009 0 0
>> >
>> > This is the output that I am expecting:
>> > Sample Chr Start End Values Probes
>> > sample1 chr2 9896633 14404502 0 4
>> > sample1 chr2 14421718 16048724 -0.43 4
>> > sample1 chr2 37491676 37703001 0 2
>> > sample2 chr2 9896633 9896690 0 2
>> > sample2  chr2 14314039 16048724 -0.35 6
>> > sample2 chr2 37491676 37703009 0 2
>> >
>> > Here the Chr value is same but can be any other value aswell so unique
>> > among
>> > the similar values. The Start for the first line would be the least
>> > value
>> > until values are similiar (4) then the end would be highest value. The
>> > values is the unique value among the common values and probes is number
>> > of
>> > similar values.
>> >
>> > Code:
>>
>> >> >>m<-read.table("test.txt",sep='\t',header=TRUE,colClasses=c('character','integer','integer','numeric','numeric'))
>> > #reading the test file
>>
>> >> >>s<-data.frame(c(rle(m$Sample1)[[2]],rle(m$Sample2)[[2]]),c(rle(m$Sample1)[[1]],rle(m$Sample2)[[1]]))
>> > # to get the last 2 columns
>> >> names(s)=c("Values","Probes")
>> >>G=1
>> >> for(i in 1:length(s$Probes)){
>> > + if(G==1){first<-unique(m$Chr[G:s$Probes[i]])
>> > + second<-min(m$Start[G:s$Probes[i]])
>> > + third<-max(m$End[G:s$Probes[i]])
>> > + c<-cbind(first,second,third,s$Values[i],s$Probes[i])
>> > + print (c)
>> > + G=(G+s$Probes[i])}
>> > + else if((G-1) < length(m$Sample1)) {
>> > + first<-unique(m$Chr[G:(G+s$Probes[i]-1)])
>> > + second<-min(m$Start[G:(G+s$Probes[i]-1)])
>> > + third<-max(m$End[G:(G+s$Probes[i]-1)])
>> > + c<-cbind(first,second,third,s$Values[i],s$Probes[i])
>> > + print (c)
>> > + G=(G+s$Probes[i])}
>> > + else {
>> > + G=1
>> > + first<-unique(m$Chr[G:s$Probes[i]])
>> > + second<-min(m$Start[G:s$Probes[i]])
>> > + third<-max(m$End[G:s$Probes[i]])
>> > + c<-cbind(first,second,third,s$Values[i],s$Probes[i])
>> > + print (c)
>> > + G=(G+s$Probes[i])}
>> > + }
>> > so the output is:
>> >     first  second    third
>> > [1,] "chr2" "9896633" "14404502" "0" "4"
>> >     first  second     third
>> > [1,] "chr2" "14421718" "16048724" "-0.43" "4"
>> >     first  second     third
>> > [1,] "chr2" "37491676" "37703009" "0" "2"
>> >     first  second    third
>> > [1,] "chr2" "9896633" "9896690" "0" "2"
>> >     first  second     third
>> > [1,] "chr2" "14314039" "16048724" "-0.35" "6"
>> >     first  second     third
>> > [1,] "chr2" "37491676" "37703009" "0" "2"
>> >
>> > I get almost the required output but just need 3 modifications to this
>> > code:
>> > 1) Since this is just a small part of the file (with 2 samples), but my
>> > actual file has 150 samples, so how do I write rle function for that?
>> > 2) How do I store all the executed c values as a dataframe (here I am
>> > just
>> > printing the values)?
>> > 3) How do I include sample name in execution?
>> > Waiting for your reply ,
>> > Thanks,
>> > Suji
>> >
>> >
>> > --
>> > View this message in context:
>> > http://r.789695.n4.nabble.com/How-to-Store-the-executed-values-in-a-dataframe-rle-function-tp3843944p3843944.html
>> > Sent from the R help mailing list archive at Nabble.com.
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>>
>>
>> --
>> Jim Holtman
>> Data Munger Guru
>>
>> What is the problem that you are trying to solve?
>
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?



More information about the R-help mailing list