[R] QA

Sun May 26 03:35:27 CEST 2013

I thought you want to compare between the rows of two columns even if their corresponding values fall in the same row.

fun3<- function(mat){
        indmat<-combn(seq_len(ncol(mat)),2)
    lst1<- lapply(seq_len(ncol(indmat)),function(i) {mat[,indmat[,i]]})   
    names(lst1)<-as.character(interaction(as.data.frame(t(indmat)),sep="_",drop=TRUE))       
    lst2<- lapply(lst1,function(x) {x1<- apply(x,2,fun1)})

    lst3<- lapply(lst2,function(x) expand.grid(lapply(x,function(y) y[,1])))
    lst4<-lapply(lst3,function(x) unlist(x[which.min(apply(x,1,function(y) abs(diff(y)))),]) )
    lst5<- lapply(lst4,function(x){
                if(abs(diff(x))>(nrow(mat)/2)){
                nrow(mat)-abs(diff(x))
                    }
                        else(abs(diff(x)))
                    })

    lst6<- lapply(seq_along(lst5),function(i) {
                x2<-lst1[[i]]
                if(lst5[[i]]==0) {
                    #indx1<- seq(length(x2[,2]))
                    #sum(abs(x2[,1]-x2[indx1,2]))
             0     ######################## set to zero                     
            }
                else{
                    lapply(seq(1+lst5[[i]]),function(j){x3<-x2[,2]
                            indx1<-seq(length(x3)-(j-1))
                            indx2<-c(setdiff(seq_along(x3),indx1),indx1)
                            sum(abs(x2[,1]-x2[indx2,2]))
                            })
                }
                })   

    names(lst6)<- names(lst1)
    lst7<-lapply(lst6,unlist)
    lst8<- lapply(lst7,function(x) {
            Seq1<-seq_along(x)
            if(length(Seq1)==1) x
            else if(length(Seq1)==2){
                        sum(abs(x[1]-x[2]))   
                        }
            else{
                ind<-rep(Seq1,each=2)[-1]
                ind1<-ind[-length(ind)]
                Reduce(`+`,lapply(split(ind1,(seq_along(ind1)-1)%/%2+1),function(i) {
                                    abs(diff(x[i]))
                                    }))                           
                }

                    }
            )
    lst9<-do.call(rbind,lst8)
    lst9       
    }

 fun3(mm)
#          [,1]
#1_2  2.5966051
#1_3  1.0267435
#1_4  0.0000000
#1_5  1.8489204
#1_6  0.0000000
#2_3  0.0000000
#2_4  1.9040790
#2_5  2.2874235
#2_6  5.1526016
#3_4  0.9726777
#3_5  2.1359229
#3_6  5.0221450
#4_5  0.9124638
#4_6  0.0000000
#5_6 14.0550864

xx
#      1  8  9 23 87 89
#[1,]  5  4  4  5  6 12
#[2,] 12 NA NA  9 NA NA
#[3,] NA NA NA 12 NA NA

According to xx, 1&4, 2&3, 4&6 (also 0 because both have 12)
A.K.
________________________________
From: eliza botto <eliza_botto at hotmail.com>
To: "smartpink111 at yahoo.com" <smartpink111 at yahoo.com> 
Sent: Saturday, May 25, 2013 9:17 PM
Subject: RE: QA

thanks arun,
i dont think thANKyou is enough for wat u did. anyway, there is slight modification that i want to ask to understand the codes more efficiently. what if i want to consider the distance between the columns having atleast one peak in the same month equal to zero, instead of "initial value"?? 
more precisely The distance between column 2 and 3 should be zero instead of 4.2951411. similarly the distance between column 4 and 6 should be zero instead of 8.260419. 
Thats just for my own knowledge to understand the loop. i hope you wont mind.
The loop works absolutely well.

Elisa

> Date: Sat, 25 May 2013 18:03:33 -0700
> From: smartpink111 at yahoo.com
> Subject: Re: QA
> To: eliza_botto at hotmail.com
> CC: r-help at r-project.org
> 
> Hi,
> I hope this works for you.
> fun1<- function(x){
>     big<- x>0.8*max(x) 
>     n<- length(big)
>     startRunOfBigs<- which(c(big[1],!big[-n] & big[-1]))
>     endRunOfBigs<- which(c(big[-n] & !big[-1], big[n]))
>     index<- vapply(seq_along(startRunOfBigs),function(i) which.max(x[startRunOfBigs[i]:endRunOfBigs[i]])+startRunOfBigs[i]-1L,0L)
>      index<-ifelse(sum(is.na(match(index,c(1,12))))==0 & x[index]!=max(x[index]), NA,index)
>     data.frame(Index=index[!is.na(index)],Value=x[index[!is.na(index)]])
>      }
> 
> ##mm: data
> fun3<- function(mat){
>         indmat<-combn(seq_len(ncol(mat)),2)
>     lst1<- lapply(seq_len(ncol(indmat)),function(i) {mat[,indmat[,i]]})    
>     names(lst1)<-as.character(interaction(as.data.frame(t(indmat)),sep="_",drop=TRUE))        
>     lst2<- lapply(lst1,function(x) {x1<- apply(x,2,fun1)})
>     
>     lst3<- lapply(lst2,function(x) expand.grid(lapply(x,function(y) y[,1])))
>     lst4<-lapply(lst3,function(x) unlist(x[which.min(apply(x,1,function(y) abs(diff(y)))),]) )
>     lst5<- lapply(lst4,function(x){ 
>                 if(abs(diff(x))>(nrow(mat)/2)){
>                 nrow(mat)-abs(diff(x))
>                     }
>                         else(abs(diff(x)))
>                     })
>         
>     lst6<- lapply(seq_along(lst5),function(i) {
>                 x2<-lst1[[i]]
>                 if(lst5[[i]]==0) {
>                     indx1<- seq(length(x2[,2]))
>                     sum(abs(x2[,1]-x2[indx1,2]))
>                     }
>                 else{
>                     lapply(seq(1+lst5[[i]]),function(j){x3<-x2[,2]
>                             indx1<-seq(length(x3)-(j-1))
>                             indx2<-c(setdiff(seq_along(x3),indx1),indx1)
>                             sum(abs(x2[,1]-x2[indx2,2]))
>                             })
>                 }
>                 })    
>                 
>     names(lst6)<- names(lst1)
>     lst7<-lapply(lst6,unlist)
>     lst8<- lapply(lst7,function(x) {
>             Seq1<-seq_along(x)
>             if(length(Seq1)==1) x
>             else if(length(Seq1)==2){
>                         sum(abs(x[1]-x[2]))    
>                         }
>             else{ 
>                 ind<-rep(Seq1,each=2)[-1]
>                 ind1<-ind[-length(ind)]
>                 Reduce(`+`,lapply(split(ind1,(seq_along(ind1)-1)%/%2+1),function(i) {
>                                     abs(diff(x[i]))
>                                     }))                            
>                 }
> 
>                     }
>             )
>     do.call(rbind,lst8)    
>     }
> 
> fun3(mm)   #rownames represent the comparison between the particular columns
> #          [,1]
> #1_2  2.5966051
> #1_3  1.0267435
> #1_4  3.7387830
> #1_5  1.8489204
> #1_6  6.5233654
> #2_3  4.2951411
> #2_4  1.9040790
> #2_5  2.2874235
> #2_6  5.1526016
> #3_4  0.9726777
> #3_5  2.1359229
> #3_6  5.0221450
> #4_5  0.9124638
> #4_6  8.2604187
> #5_6 14.0550864
> 
> 
> A.K.
> 
> 
> 
> 
> 
> ________________________________
> From: eliza botto <eliza_botto at hotmail.com>
> To: "smartpink111 at yahoo.com" <smartpink111 at yahoo.com> 
> Sent: Saturday, May 25, 2013 2:14 PM
> Subject: QA
> 
> 
> 
> 
> Dear Arun,
> [text file is attached]
> After your help on preparing loop for identifying peaks, here is my latest question which is linked with my first question. but this time i will try to make it more clear.
> 
> > dput(xx)
> structure(c(5L, 12L, NA, 4L, NA, NA, 4L, NA, NA, 5L, 9L, 12L, 
> 6L, NA, NA, 12L, NA, NA), .Dim = c(3L, 6L), .Dimnames = list(
>     NULL, c("1", "8", "9", "23", "87", "89")))
> > dput(mm)
> structure(c(0.706461987893674, 0.998391468394261, 0.72402995269242, 
> 1.70874688194537, 1.93906363083693, 0.89540353128442, 0.328327645695443, 
> 0.427434603701202, 0.591932250254601, 0.444627635494183, 1.44407704434405, 
> 1.79150336746345, 0.740380661614246, 1.39756784211974, 1.43602731683199, 
> 2.40482060634346, 1.61684982192949, 0.549848553223765, 0.245763715425745, 
> 0.315411788974968, 0.390626431538384, 0.369934560068472, 0.769100067815155, 
> 1.76366863411459, 0.480885978853889, 1.21441674507622, 2.50566408677391, 
> 3.27361599826255, 1.18508780425679, 0.465943778037697, 0.29380145690883, 
> 0.36356245877522, 0.373314458026047, 0.334849362386475, 0.882050057788756, 
> 0.626807814853613, 0.774295647517675, 0.853105130179133, 0.738085443815565, 
> 1.26063449947807, 1.57350832698427, 0.790095501697794, 0.510641105191147, 
> 0.874523657118082, 1.31257333325184, 0.882086374572265, 1.13881207205977, 
> 1.29163890813439, 0.0849732189580101, 0.070591276171845, 0.0926010253161898, 
> 0.362209761457517, 1.45769283057202, 3.16165004659667, 2.74903557756267, 
> 1.94633472878995, 1.19319875840883, 0.533232612926756, 0.225531074123974, 
> 0.122949089115578, 2.06195904001605, 1.41493262330451, 1.35748791897328, 
> 1.19490680241894, 0.702488756183322, 0.338258418490199, 0.123398398622741, 
> 0.138548982660226, 0.16170889185798, 0.414543218677095, 1.84629295875002, 
> 2.24547399004563), .Dim = c(12L, 6L))
> 
> 
> You can see that that there are two matrices. "mm" is the actual matrix and "xx" is the matrix indentifying the peaks of "mm".For being a peak a value has to either the maximum value or atleast 80% of the maximum value. you can see that the maximum value of coulmn 1 is in row number 5 and thats what it showed in matrix "xx" whereas, the 80% of the maximum value is in row number 12 therefore it considered it the second peak and row number was shown in "xx". i want to calculate the distance matrix of "mm" in the following way...
> The column are continous or cyclic.
> The subtraction should start from the peak and should end when the peaks of two columns are in the same row. The peaks are to be moved towrds eachother in the shortest possible way.
> For suppose the peak of colum 2 is in 4th row and the peak of column 6 is in 12th row. Now moving these two peak towwards eachother requires moving col 2 in reverse direction or column 6 in forward direction.
> 
> For example
> 
> Initial:
> 
> Col 2
> 
> 1 2 3 4(max) 5 6 7 8 9 10 11 12
> 
> Col 6
> 
> 1 2 3 4 5 6 7 8 9 10 11 12(max)
> 
> a<-sum(abs(col2-col6))
> 
> step1:
> 
> Col 2
> 
> 2 3 4(max) 5 6 7 8 9 10 11 12 1
> 
> Col 6
> 
> 1 2 3 4 5 6 7 8 9 10 11 12(max)
> 
> b<-sum(abs(col2-col6))
> 
> step2:
> 
> Col 2
> 
> 3 4(max) 5 6 7 8 9 10 11 12 1 2
> 
> Col 6
> 
> 1 2 3 4 5 6 7 8 9 10 11 12(max)
> 
> c<-sum(abs(col2-col6))
> 
> step3:
> 
> Col 2
> 
> 4(max) 5 6 7 8 9 10 11 12 1 2 3
> 
> Col 6
> 
> 1 2 3 4 5 6 7 8 9 10 11 12(max)
> 
> d<-sum(abs(col2-col6))
> 
> step4:
> 
> Col 2
> 
> 5 6 7 8 9 10 11 12 1 2 3 4(max)
> 
> Col 6
> 
> 1 2 3 4 5 6 7 8 9 10 11 12(max)
> 
> e<-sum(abs(col2-col6))
> 
> total difference= abs(a-b)+abs(b-c)+abs(c-d)+abs(d-e)
> 
> 
> The dissimilarity is zero if the peaks are already in the same row. like for column 2 and 3 the distance is zero as peaks are under eachother. For column 1 and 4 the distance is onceagain zero. Although they have different nuber of peaks but as atleast one of their peaks is under eachother therefore distance is zero.
> 
> For Column 5 and 6 peaks can be moved in either direction as number of steps to be followed are same.
> 
> for column 1 and 2 following is the procedure
> 
> Col1 has two maximum values in row 5th and 12th and column two has only one maximum value at 4 row. As peak in 5th row of column one is closer to the peak of column 2 therefore we will move towards it and procedure should be
> 
> 
> Initial:
> 
> Col 1
> 
> 1 2 3 4 5(max) 6 7 8 9 10 11 12(max)
> 
> Col 8
> 
> 1 2 3 4(max) 5 6 7 8 9 10 11 12
> 
> a<-sum(abs(col1-col8))
> 
> Step1: 
> 
> Col 1
> 
> 1 2 3 4 5(max) 6 7 8 9 10 11 12(max)
> 
> Col 8
> 
> 12 1 2 3 4(max) 5 6 7 8 9 10 11
> 
> b<-sum(abs(col1-col8))
> 
> total difference=abs(a-b)
> 
> For column 4 and 5
> 
> Initial:
> 
> Col 4
> 
> 1 2 3 4 5(max) 6 7 8 9(max) 10 11 12(max)
> 
> Col 5
> 
> 1 2 3 4 5 6(max) 7 8 9 10 11 12
> 
> a<-sum(abs(col4-col5))
> 
> Step 1
> 
> Col 4
> 
> 1 2 3 4 5(max) 6 7 8 9(max) 10 11 12(max)
> 
> Col 5
> 
> 2 3 4 5 6(max) 7 8 9 10 11 12 1
> 
> b<-sum(abs(col4-col5))
> 
> Total Difference= abs(a-b)
> 
> If there is any point which i couldnt discuss please tell me...
> 
> 
> Elisa