[BioC] function "uniqueframe"
cstrato
cstrato at aon.at
Thu Nov 4 22:43:41 CET 2010
Dear Naima,
You are right, I must have missed this.
Please replace "ds <- rbind(ds, tmp)" with:
ds <- rbind(ds, tmp[setdiff(rownames(tmp),rownames(ds)),])
However, please note that it does not matter since lateron I intersect
the rownames with the rownames of the expression data. Furthermore,
please note that this was only a trial to compare the three arrays and
there is no warranty that "script4bestmatch.R" correct. Other people
might have better solutions to compare the three arrays based on the
BestMatch.txt files of Affymetrix.
Best regards
Christian
On 11/4/10 11:28 AM, Naïma Oumouhou wrote:
> Dear Christian,
>
> I read your vignette « Introduction to the xps Package: Comparison to
> Affymetrix Power Tools » and I tried to compare 2 gene expression arrays
> : U133 Plus 2 andHuman Gene ST 1.
>
> I followed your R instructions in the script “script4bestmatch.R”. But I
> noticed something strange in my output.
>
> I downloaded “U133PlusVsHuGene_BestMatch.txt” in Affymetrix website.
>
> My instructions are :
>
> #Function "uniqueframe"
>
> uniqueframe <- function(ma) {
>
> maxunique <- function(id, m) {
>
> m <- m[which(m[,1] == id),];
>
> m <- m[which(m[,2] == max(m[,2])),];
>
> return(m[1,]);
>
> }
>
> dup <- duplicated(ma[,1])
>
> uni <- unique(ma[dup,1])
>
> ds <- NULL
>
> for (i in uni) {ds <- rbind(ds, maxunique(i,ma))}
>
> tmp <- ma[dup==F,]
>
> ds <- rbind(ds, tmp)
>
> ds <- ds[order(rownames(ds)),]
>
> return(ds)
>
> }
>
> # Importation of "U133PlusVsHuGene_BestMatch.txt"
>
> up2hg<-read.delim("D:/Naima/CancerMoelleOsseuse_EFS/Analyse_Package_XPS/U133PlusVsHuGene_BestMatch.txt",row.names=3,comment.char="")
>
> dim(up2hg)
>
> [1] 2912919
>
> up2hg<-up2hg[,5:6]
>
> up2hg_cor<-uniqueframe(up2hg)
>
> colnames(up2hg_cor)<-c("HuGene","PercentU2G")
>
> dim(up2hg_cor)
>
> [1] 252512
>
> write.csv2(up2hg_cor,"D:/Naima/CancerMoelleOsseuse_EFS/Outputs/Probesets_U133PlusVsHuGene.csv")
>
> The initial data frame “up2hg” contains 29 129 lines and when I do the
> instruction “uniqueframe”, the data frame obtaining is composed of 25251
> lines. But the number of unique probesets for human Gene array is 17984.
>
> When I see the output (Probesets_U133PlusVsHuGene.csv), there is
> something strange:
>
> For example:
>
> U1332P
>
>
>
> HuGene
>
>
>
> PercentU2G
>
> 1552257_a_at
>
>
>
> 8076569
>
>
>
> 99,41
>
> 1552257_a_at1
>
>
>
> 8076569
>
>
>
> 99,41
>
> 1552264_a_at
>
>
>
> 8074791
>
>
>
> 98,42
>
> 1552264_a_at1
>
>
>
> 8074791
>
>
>
> 98,42
>
> There is still duplicated probesets in HuGene probesets and new
> probesets in U1332P are created “1552257_a_at1”.
>
> I've done something wrong?
>
> Thank you for your help.
>
> Naïma
>
More information about the Bioconductor
mailing list