[BioC] Question about sequence data results

Hrishikesh Deshmukh d_hrishikesh at yahoo.com
Mon Feb 7 18:53:35 CET 2005


Dear Bioconductorians,

I have written a script which reads in Affymetrix
data, filters based on intensity values and then pulls
sequence data for the probes which satisfy the
intensity based filter. I notice that results given by
the script are different from the sequence data files
which Affymetrix supplies! 

Here are top three seq. from my result and below these
are the sequences for the same probe ID's from
Affymetrix seq file, why do i different sequences, i
am attaching the script file for your perusal, your
help is appreciated.
h.pbn    my.in.seq
1000_at  TTGCGCTACAGCTAGGCCGCATGCT
1000_at  TGGAAGCCAGGAAGGCCTATGTGAA
100_g_at TTCCCTGAAGGAACATTCCTTAGTC

>probe:HG-U95Av2:1000_at:399:559; 
TCTCCTTTGCTGAGGCCTCCAGCTT
>probe:HG-U95Av2:1000_at:544:185; 
AGGCCTCCAGCTTCAGGCAGGCCAA
>probe:HG-U95Av2:1000_at:530:505; 
CCAGCTTCAGGCAGGCCAAGGCCTT
>probe:HG-U95Av2:1000_at:617:349; 
AGCTCAGGTGGCCCCAGTTCAATCT
>probe:HG-U95Av2:1000_at:459:489; 
AGTTCTGGAATGGAAGGGTTCTGGC
>probe:HG-U95Av2:1000_at:408:545; 
TAGGGACTCAGGGCCATGCCTGCCC
>probe:HG-U95Av2:1000_at:484:311; 
TTCCCTGAAGGAACATTCCTTAGTC
>probe:HG-U95Av2:1000_at:548:333; 
GAAGGAACATTCCTTAGTCTCAAGG
>probe:HG-U95Av2:1000_at:578:369; 
CTTAGTCTCAAGGGCTAGCATCCCT
>probe:HG-U95Av2:1000_at:498:465; 
CTCAAGGGCTAGCATCCCTGAGGAG
>probe:HG-U95Av2:1000_at:503:441; 
GGCTAGCATCCCTGAGGAGCCAGGC
>probe:HG-U95Av2:1000_at:482:439; 
CTGTCAAAGCTGTCACTTCGCGTGC
>probe:HG-U95Av2:1000_at:397:545; 
AAGCTGTCACTTCGCGTGCCCTCGC
>probe:HG-U95Av2:1000_at:352:465; 
CGCGTGCCCTCGCTGCTTCTGTGTG
>probe:HG-U95Av2:1000_at:253:495; 
CCCTCGCTGCTTCTGTGTGTGGTGA
>probe:HG-U95Av2:1000_at:228:631; 
CTGCTTCTGTGTGTGGTGAGCAGAA
++++++++++++++++++++++++++++++++++++++++
>probe:HG-U95Av2:100_g_at:497:273; 
CATCTGGAACAGCTGCTCTTGGTCA
>probe:HG-U95Av2:100_g_at:208:557; 
AACAGCTGCTCTTGGTCACCCATCT
>probe:HG-U95Av2:100_g_at:495:355; 
GCTGCTCTTGGTCACCCATCTTGAC
>probe:HG-U95Av2:100_g_at:478:371; 
TTGAGGTGCTGCAGGCCAGTGATAA
>probe:HG-U95Av2:100_g_at:612:429; 
CTACCCCGGCTGCAGGAGCTGCTAC
>probe:HG-U95Av2:100_g_at:563:317; 
GCAGGAGCTGCTACTGTGCAACAAC
>probe:HG-U95Av2:100_g_at:223:559; 
GCAGCCTGCAGTGCTCCAGCCTCTT
>probe:HG-U95Av2:100_g_at:523:575; 
GTCCTCCTCAACCTGCAGGGTAACC
>probe:HG-U95Av2:100_g_at:551:445; 
AGGGTAACCCGCTGTGCCAAGCGGT
>probe:HG-U95Av2:100_g_at:509:475; 
GCATCTTGGAGCAACTGGCTGAACT
>probe:HG-U95Av2:100_g_at:576:249; 
AGCAACTGGCTGAACTGCTGCCTTC
>probe:HG-U95Av2:100_g_at:568:349; 
CTGGCTGAACTGCTGCCTTCAGTTA
>probe:HG-U95Av2:100_g_at:523:441; 
GCTGCCTTCAGTTAGCAGCGTCCTC
>probe:HG-U95Av2:100_g_at:562:421; 
CCTTCAGTTAGCAGCGTCCTCACCT
>probe:HG-U95Av2:100_g_at:622:473; 
AGTTAGCAGCGTCCTCACCTAAGAG
>probe:HG-U95Av2:100_g_at:567:607; 
GCCCTTTAACTTATTGGGACTGAAT

library(affy)
library(hgu95av2probe)
data(hgu95av2probe)
summary(hgu95av2probe)

Data <- ReadAffy()
pmi <- pm(Data)
mmi <- mm(Data)
pbn <- probeNames(Data)
rng.pmi <- apply(pmi,1,range)
rng.mmi <- apply(mmi,1,range)


in.boundspm <- ((rng.pmi[1,] >=200) & (rng.pmi[1,]
<=20000))
in.boundsmm <- ((rng.mmi[1,] >=200) & (rng.mmi[1,]
<=20000))

in.bounds <- (in.boundspm & in.boundsmm)

length(pmi[,1])
ac1 <- 1:201800
ac2 <- ac1[in.bounds]

h.pbn <- pbn[ac2]
h.pmi <- pmi[ac2,]
h.mmi <- mmi[ac2,]

my.in.seqpm <- hgu95av2probe$sequence[in.boundspm]
my.in.seqmm <- hgu95av2probe$sequence[in.boundsmm]

my.in.seq <- hgu95av2probe$sequence[in.bounds]
seq.data<- cbind(h.pbn,my.in.seq)
write.table(seq.data, file="SeqData.txt",quote
=F,row.names=F,col.names=T,sep = " ")

Eagerly waiting for your reply.

Thanks in advance.

Hrishi



More information about the Bioconductor mailing list