[R] QA

Sun May 26 03:03:33 CEST 2013

Hi,
I hope this works for you.
fun1<- function(x){
    big<- x>0.8*max(x) 
    n<- length(big)
    startRunOfBigs<- which(c(big[1],!big[-n] & big[-1]))
    endRunOfBigs<- which(c(big[-n] & !big[-1], big[n]))
    index<- vapply(seq_along(startRunOfBigs),function(i) which.max(x[startRunOfBigs[i]:endRunOfBigs[i]])+startRunOfBigs[i]-1L,0L)
     index<-ifelse(sum(is.na(match(index,c(1,12))))==0 & x[index]!=max(x[index]), NA,index)
    data.frame(Index=index[!is.na(index)],Value=x[index[!is.na(index)]])
     }

##mm: data
fun3<- function(mat){
        indmat<-combn(seq_len(ncol(mat)),2)
    lst1<- lapply(seq_len(ncol(indmat)),function(i) {mat[,indmat[,i]]})    
    names(lst1)<-as.character(interaction(as.data.frame(t(indmat)),sep="_",drop=TRUE))        
    lst2<- lapply(lst1,function(x) {x1<- apply(x,2,fun1)})

    lst3<- lapply(lst2,function(x) expand.grid(lapply(x,function(y) y[,1])))
    lst4<-lapply(lst3,function(x) unlist(x[which.min(apply(x,1,function(y) abs(diff(y)))),]) )
    lst5<- lapply(lst4,function(x){ 
                if(abs(diff(x))>(nrow(mat)/2)){
                nrow(mat)-abs(diff(x))
                    }
                        else(abs(diff(x)))
                    })

    lst6<- lapply(seq_along(lst5),function(i) {
                x2<-lst1[[i]]
                if(lst5[[i]]==0) {
                    indx1<- seq(length(x2[,2]))
                    sum(abs(x2[,1]-x2[indx1,2]))
                    }
                else{
                    lapply(seq(1+lst5[[i]]),function(j){x3<-x2[,2]
                            indx1<-seq(length(x3)-(j-1))
                            indx2<-c(setdiff(seq_along(x3),indx1),indx1)
                            sum(abs(x2[,1]-x2[indx2,2]))
                            })
                }
                })    

    names(lst6)<- names(lst1)
    lst7<-lapply(lst6,unlist)
    lst8<- lapply(lst7,function(x) {
            Seq1<-seq_along(x)
            if(length(Seq1)==1) x
            else if(length(Seq1)==2){
                        sum(abs(x[1]-x[2]))    
                        }
            else{ 
                ind<-rep(Seq1,each=2)[-1]
                ind1<-ind[-length(ind)]
                Reduce(`+`,lapply(split(ind1,(seq_along(ind1)-1)%/%2+1),function(i) {
                                    abs(diff(x[i]))
                                    }))                            
                }

                    }
            )
    do.call(rbind,lst8)    
    }

fun3(mm)   #rownames represent the comparison between the particular columns
#          [,1]
#1_2  2.5966051
#1_3  1.0267435
#1_4  3.7387830
#1_5  1.8489204
#1_6  6.5233654
#2_3  4.2951411
#2_4  1.9040790
#2_5  2.2874235
#2_6  5.1526016
#3_4  0.9726777
#3_5  2.1359229
#3_6  5.0221450
#4_5  0.9124638
#4_6  8.2604187
#5_6 14.0550864

A.K.

________________________________
From: eliza botto <eliza_botto at hotmail.com>
To: "smartpink111 at yahoo.com" <smartpink111 at yahoo.com> 
Sent: Saturday, May 25, 2013 2:14 PM
Subject: QA

Dear Arun,
[text file is attached]
After your help on preparing loop for identifying peaks, here is my latest question which is linked with my first question. but this time i will try to make it more clear.

> dput(xx)
structure(c(5L, 12L, NA, 4L, NA, NA, 4L, NA, NA, 5L, 9L, 12L, 
6L, NA, NA, 12L, NA, NA), .Dim = c(3L, 6L), .Dimnames = list(
    NULL, c("1", "8", "9", "23", "87", "89")))
> dput(mm)
structure(c(0.706461987893674, 0.998391468394261, 0.72402995269242, 
1.70874688194537, 1.93906363083693, 0.89540353128442, 0.328327645695443, 
0.427434603701202, 0.591932250254601, 0.444627635494183, 1.44407704434405, 
1.79150336746345, 0.740380661614246, 1.39756784211974, 1.43602731683199, 
2.40482060634346, 1.61684982192949, 0.549848553223765, 0.245763715425745, 
0.315411788974968, 0.390626431538384, 0.369934560068472, 0.769100067815155, 
1.76366863411459, 0.480885978853889, 1.21441674507622, 2.50566408677391, 
3.27361599826255, 1.18508780425679, 0.465943778037697, 0.29380145690883, 
0.36356245877522, 0.373314458026047, 0.334849362386475, 0.882050057788756, 
0.626807814853613, 0.774295647517675, 0.853105130179133, 0.738085443815565, 
1.26063449947807, 1.57350832698427, 0.790095501697794, 0.510641105191147, 
0.874523657118082, 1.31257333325184, 0.882086374572265, 1.13881207205977, 
1.29163890813439, 0.0849732189580101, 0.070591276171845, 0.0926010253161898, 
0.362209761457517, 1.45769283057202, 3.16165004659667, 2.74903557756267, 
1.94633472878995, 1.19319875840883, 0.533232612926756, 0.225531074123974, 
0.122949089115578, 2.06195904001605, 1.41493262330451, 1.35748791897328, 
1.19490680241894, 0.702488756183322, 0.338258418490199, 0.123398398622741, 
0.138548982660226, 0.16170889185798, 0.414543218677095, 1.84629295875002, 
2.24547399004563), .Dim = c(12L, 6L))

You can see that that there are two matrices. "mm" is the actual matrix and "xx" is the matrix indentifying the peaks of "mm".For being a peak a value has to either the maximum value or atleast 80% of the maximum value. you can see that the maximum value of coulmn 1 is in row number 5 and thats what it showed in matrix "xx" whereas, the 80% of the maximum value is in row number 12 therefore it considered it the second peak and row number was shown in "xx". i want to calculate the distance matrix of "mm" in the following way...
The column are continous or cyclic.
The subtraction should start from the peak and should end when the peaks of two columns are in the same row. The peaks are to be moved towrds eachother in the shortest possible way.
For suppose the peak of colum 2 is in 4th row and the peak of column 6 is in 12th row. Now moving these two peak towwards eachother requires moving col 2 in reverse direction or column 6 in forward direction.

For example

Initial:

Col 2

1 2 3 4(max) 5 6 7 8 9 10 11 12

Col 6

1 2 3 4 5 6 7 8 9 10 11 12(max)

a<-sum(abs(col2-col6))

step1:

Col 2

2 3 4(max) 5 6 7 8 9 10 11 12 1

Col 6

1 2 3 4 5 6 7 8 9 10 11 12(max)

b<-sum(abs(col2-col6))

step2:

Col 2

3 4(max) 5 6 7 8 9 10 11 12 1 2

Col 6

1 2 3 4 5 6 7 8 9 10 11 12(max)

c<-sum(abs(col2-col6))

step3:

Col 2

4(max) 5 6 7 8 9 10 11 12 1 2 3

Col 6

1 2 3 4 5 6 7 8 9 10 11 12(max)

d<-sum(abs(col2-col6))

step4:

Col 2

5 6 7 8 9 10 11 12 1 2 3 4(max)

Col 6

1 2 3 4 5 6 7 8 9 10 11 12(max)

e<-sum(abs(col2-col6))

total difference= abs(a-b)+abs(b-c)+abs(c-d)+abs(d-e)

The dissimilarity is zero if the peaks are already in the same row. like for column 2 and 3 the distance is zero as peaks are under eachother. For column 1 and 4 the distance is onceagain zero. Although they have different nuber of peaks but as atleast one of their peaks is under eachother therefore distance is zero.

For Column 5 and 6 peaks can be moved in either direction as number of steps to be followed are same.

for column 1 and 2 following is the procedure

Col1 has two maximum values in row 5th and 12th and column two has only one maximum value at 4 row. As peak in 5th row of column one is closer to the peak of column 2 therefore we will move towards it and procedure should be

Initial:

Col 1

1 2 3 4 5(max) 6 7 8 9 10 11 12(max)

Col 8

1 2 3 4(max) 5 6 7 8 9 10 11 12

a<-sum(abs(col1-col8))

Step1: 

Col 1

1 2 3 4 5(max) 6 7 8 9 10 11 12(max)

Col 8

12 1 2 3 4(max) 5 6 7 8 9 10 11

b<-sum(abs(col1-col8))

total difference=abs(a-b)

For column 4 and 5

Initial:

Col 4

1 2 3 4 5(max) 6 7 8 9(max) 10 11 12(max)

Col 5

1 2 3 4 5 6(max) 7 8 9 10 11 12

a<-sum(abs(col4-col5))

Step 1

Col 4

1 2 3 4 5(max) 6 7 8 9(max) 10 11 12(max)

Col 5

2 3 4 5 6(max) 7 8 9 10 11 12 1

b<-sum(abs(col4-col5))

Total Difference= abs(a-b)

If there is any point which i couldnt discuss please tell me...

Elisa