[BioC] function "uniqueframe"

Thu Nov 4 22:43:41 CET 2010

Dear Naima,

You are right, I must have missed this.
Please replace "ds <- rbind(ds, tmp)" with:
    ds <- rbind(ds, tmp[setdiff(rownames(tmp),rownames(ds)),])

However, please note that it does not matter since lateron I intersect 
the rownames with the rownames of the expression data. Furthermore, 
please note that this was only a trial to compare the three arrays and 
there is no warranty that "script4bestmatch.R" correct. Other people 
might have better solutions to compare the three arrays based on the 
BestMatch.txt files of Affymetrix.

Best regards
Christian

On 11/4/10 11:28 AM, Naïma Oumouhou wrote:
> Dear Christian,
>
> I read your vignette « Introduction to the xps Package: Comparison to
> Affymetrix Power Tools » and I tried to compare 2 gene expression arrays
> : U133 Plus 2 andHuman Gene ST 1.
>
> I followed your R instructions in the script “script4bestmatch.R”. But I
> noticed something strange in my output.
>
> I downloaded “U133PlusVsHuGene_BestMatch.txt” in Affymetrix website.
>
> My instructions are :
>
> #Function "uniqueframe"
>
> uniqueframe <- function(ma) {
>
> maxunique <- function(id, m) {
>
> m <- m[which(m[,1] == id),];
>
> m <- m[which(m[,2] == max(m[,2])),];
>
> return(m[1,]);
>
> }
>
> dup <- duplicated(ma[,1])
>
> uni <- unique(ma[dup,1])
>
> ds <- NULL
>
> for (i in uni) {ds <- rbind(ds, maxunique(i,ma))}
>
> tmp <- ma[dup==F,]
>
> ds <- rbind(ds, tmp)
>
> ds <- ds[order(rownames(ds)),]
>
> return(ds)
>
> }
>
> # Importation of "U133PlusVsHuGene_BestMatch.txt"
>
> up2hg<-read.delim("D:/Naima/CancerMoelleOsseuse_EFS/Analyse_Package_XPS/U133PlusVsHuGene_BestMatch.txt",row.names=3,comment.char="")
>
> dim(up2hg)
>
> [1] 2912919
>
> up2hg<-up2hg[,5:6]
>
> up2hg_cor<-uniqueframe(up2hg)
>
> colnames(up2hg_cor)<-c("HuGene","PercentU2G")
>
> dim(up2hg_cor)
>
> [1] 252512
>
> write.csv2(up2hg_cor,"D:/Naima/CancerMoelleOsseuse_EFS/Outputs/Probesets_U133PlusVsHuGene.csv")
>
> The initial data frame “up2hg” contains 29 129 lines and when I do the
> instruction “uniqueframe”, the data frame obtaining is composed of 25251
> lines. But the number of unique probesets for human Gene array is 17984.
>
> When I see the output (Probesets_U133PlusVsHuGene.csv), there is
> something strange:
>
> For example:
>
> U1332P
>
> 	
>
> HuGene
>
> 	
>
> PercentU2G
>
> 1552257_a_at
>
> 	
>
> 8076569
>
> 	
>
> 99,41
>
> 1552257_a_at1
>
> 	
>
> 8076569
>
> 	
>
> 99,41
>
> 1552264_a_at
>
> 	
>
> 8074791
>
> 	
>
> 98,42
>
> 1552264_a_at1
>
> 	
>
> 8074791
>
> 	
>
> 98,42
>
> There is still duplicated probesets in HuGene probesets and new
> probesets in U1332P are created “1552257_a_at1”.
>
> I've done something wrong?
>
> Thank you for your help.
>
> Naïma
>