what is "not correct"?  working directly from affymetrix annotation
probe_tab
text file, we have
    PROBESET_ID PROBE_X_POS PROBE_Y_POS PROBE_INTERROGATION_POSITION
1 SNP_A-1780270        2380        1757                            3
2 SNP_A-1780270        2381        1757                            3
3 SNP_A-1780270        2626         421                            3
4 SNP_A-1780270        2627         421                            3
5 SNP_A-1780270         540        1827                            3
6 SNP_A-1780270         541        1827                            3
7 SNP_A-1780270         694         338                            3
8 SNP_A-1780270         695         338                            3
             PROBE_SEQUENCE TARGET_STRANDEDNESS PROBE_TYPE ALLELE
1 TTGTTAAGCAAGTGACTTATTTTAT                   f         PM      G
2 TTGTTAAGCAAGTGAGTTATTTTAT                   f         PM      C
3 TTGTTAAGCAAGTGACTTATTTTAT                   f         PM      G
4 TTGTTAAGCAAGTGAGTTATTTTAT                   f         PM      C
5 TTGTTAAGCAAGTGACTTATTTTAT                   f         PM      G
6 TTGTTAAGCAAGTGAGTTATTTTAT                   f         PM      C
7 TTGTTAAGCAAGTGACTTATTTTAT                   f         PM      G
8 TTGTTAAGCAAGTGAGTTATTTTAT                   f         PM      C

there are 4 replicates of each sequence


checking the pd.genomewide package, following your code snippet, we have


> dbGetQuery(kao, "select man_fsetid, fsetid from featureSet where
man_fsetid = 'SNP_A-1780270'")
     man_fsetid fsetid
1 SNP_A-1780270 326067
> dbGetQuery(kao, "select * from pmfeature where fsetid = '326067'")
      fid strand allele fsetid pos    x    y
1  906535      0      1 326067   6  694  338
2  906536      0      0 326067   5  695  338
3 1130907      0      1 326067   8 2626  421
4 1130908      0      0 326067   7 2627  421
5 4711141      0      1 326067   4 2380 1757
6 4711142      0      0 326067   3 2381 1757
7 4896901      0      1 326067   2  540 1827
8 4896902      0      0 326067   1  541 1827

now we know the fids of the probes we looked at in the original data

> dbGetQuery(kao, "select * from sequence where fid = '4711141'")
      fid offset tstrand tallele                       seq
1 4711141      3       f       G TTGTTAAGCAAGTGACTTATTTTAT
> dbGetQuery(kao, "select * from sequence where fid = '4711142'")
      fid offset tstrand tallele                       seq
1 4711142      3       f       C TTGTTAAGCAAGTGAGTTATTTTAT
> dbGetQuery(kao, "select * from sequence where fid = '1130907'")
      fid offset tstrand tallele                       seq
1 1130907      3       f       G TTGTTAAGCAAGTGACTTATTTTAT
> dbGetQuery(kao, "select * from sequence where fid = '1130908'")
      fid offset tstrand tallele                       seq
1 1130908      3       f       C TTGTTAAGCAAGTGAGTTATTTTAT

what is incorrect?




On Wed, Jan 7, 2009 at 2:22 PM, LiGang <luzifer.li@gmail.com> wrote:

>  Dear List,
>
> I used the following R code to extrace sequence information of a particular
> probeset for PM probes of Affymetrix SNP6 array. However, for 100 probesets
> I tested there were only 2 unique PM sequences for each probeset.  It
> appears that the PM sequences were not correctly catalogued.
>
> ===============
> library(pd.genomewidesnp.6)
>
> db(pd.genomewidesnp.6)->kao
>
> dbGetQuery(kao,"SELECT * from featureSet limit
> 100")$"man_fsetid"->probesets
> result<-vector("list",length(probesets))
> names(result)<-probesets
>
> for(ind in 1:length(result)){
>
> dbGetQuery(kao,paste("SELECT * from featureSet where
> man_fsetid='",names(result)[ind],"'",sep=""))$fsetid->fsetid
>
> dbGetQuery(kao, paste("select * from pmfeature where
> fsetid=",fsetid,sep=""))->pm.100
>
> c(pm.100$fid)->totiao
> paste("fid=",paste(totiao,collapse=" or fid="),sep="")->totiao
> paste("SELECT * from sequence where ",totiao,sep="")->totiao
> dbGetQuery(kao,totiao)->seq
>
> result[[ind]]<-seq
>
> }
>
>
> sapply(result, function(xxx) length(unique(xxx$seq)))
>
> ===================
>
> sessionInfo()
>
> R version 2.8.0 (2008-10-20)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>
> attached base packages:
> [1] tools     stats     graphics  grDevices utils     datasets  methods
> base
>
> other attached packages:
> [1] pd.genomewidesnp.6_0.4.2 oligoClasses_1.4.0
> Biobase_2.2.1            RSQLite_0.7-1
> [5] DBI_0.2-4
> ==============
>
>
> LiGang
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>

	[[alternative HTML version deleted]]

