[BioC] PAIR files -- feature set table

Thu Jun 6 08:26:51 CEST 2013

Dear Dr. Carvalho,

Muchos grasias for the reply.

Actually, this is what my .ndf file looks like:
> head(ndf)
  PROBE_DESIGN_ID   CONTAINER DESIGN_NOTE SELECTION_CRITERIA SEQ_ID
1  7552_0343_0009 Duplicate_1                                      
2  7552_0345_0009 Duplicate_2                                      
3  7552_0347_0009 Duplicate_1                                      
4  7552_0349_0009 Duplicate_2                                      
5  7552_0351_0009 Duplicate_2                                      
6  7552_0353_0009 Duplicate_1                                      
                                               PROBE_SEQUENCE MISMATCH MATCH_INDEX FEATURE_ID ROW_NUM COL_NUM PROBE_CLASS
1  cttgactcttctaagttcaaaggtaactcaagtgaagctgtcagatatgatccttcca        0    64535488   64535488       9     343            
2 cccaagcattaaaccttactcatatacttataatgcagccatcaagagtttgtgcaagg        0    64799310   64799310       9     345            
3          agggaggctgaaagagagagtgaatggtccagctgggcataattgctgca        0    64476989   64476989       9     347            
4          ttgttggtgggggtgttgcccttagtaccccagaccttgaagcagttaaa        0    64862794   64862794       9     349            
5          gtgtggggccccctttctttaactggaacctttctttgaagcaatttggg        0    64832726   64832726       9     351            
6          ttgtccaattccaacatgccgagacggcagggattgtgatcgtgttgttc        0    64435686   64435686       9     353            
                      PROBE_ID POSITION DESIGN_ID   X Y
1    Contig19819_1_f_28_10_535        0      7552 343 9
2 Malus_CN899188_2_f_147_1_755        0      7552 345 9
3  Contig20738_8_r_1179_2_1432        0      7552 347 9
4 Malus_CN880097_2_r_336_2_536        0      7552 349 9
5 Malus_CN918117_2_f_632_1_781        0      7552 351 9
6     Contig1991_1_f_71_2_1239        0      7552 353 9

The pair files, .532 pair files only (one-color arrays), only obtain the probe ID and signal; after some text at the top describing the experiment. My real issue is that I can further normalize and analyze the RMA files with sva and limma, etc. However, I cannot annotate the probes without the array annotation, as there are duplicates in the ndf file which are removed in the RMA.pair files available on NCBI/GEO. So they will not match in any annotation package I've failed at trying. 
So, I' tried to go back and start from the raw pair files...this custom array is really a "custom" array without
NimbleScan. 

Salud,
Franklin 

Great minds discuss ideas. Average minds discuss events. Small minds discuss people. -Eleanor Roosevelt

________________________________________
From: Benilton Carvalho [beniltoncarvalho at gmail.com]
Sent: Wednesday, June 05, 2013 6:42 PM
To: FRANKLIN JOHNSON [guest]
Cc: bioconductor at r-project.org; franklin.johnson at wsu.edu; pdInfoBuilder Maintainer
Subject: Re: [BioC] PAIR files -- feature set table

It's an unfortunate mistake to have the pairFile *argument* in the
call (not in the slots session, but I see your point). :-( I'll make
sure that this is fixed.

You need to convert the PAIR files to XYS...

Some refs that should help you in the process:

https://stat.ethz.ch/pipermail/bioconductor/2012-January/043186.html
http://comments.gmane.org/gmane.science.biology.informatics.conductor/27547

b

2013/6/5 FRANKLIN JOHNSON [guest] <guest at bioconductor.org>:
>
> Dear Maintainer,
>
> I downloaded available NimbleGen 'single channel' 532.PAIR files for a custom built expression microarray from NCBI/GEO (GPL11164). However, I get an error message when I try to make the annotation for this platform using pdInfoBuild.
>
> In pdInfoBuilder Reference Manual (June 5, 2013), under the NgsExpressionPDInfoPkgSeed method, there is a slot for pairFile, although, showClasses("Ngs.."), does not show a slot for this, only, XYS. Thus, I changed the .pair file extension to .xys.
>
> (ndf<- list.files(getwd(), pattern=".ndf", full.names=TRUE)) # read annotation file
> [1] "C:/Users/franklin.johnson.PW99-WEN/Desktop/Test/Yanmin's Microarray Paper/Yanmin Microarray RAW/GPL11164.ndf"
>
> (xys <- list.files(getwd(), pattern = ".xys", full.names = TRUE)[1])
> [1] "C:/Users/franklin.johnson.PW99-WEN/Desktop/Test/Yanmin's Microarray Paper/Yanmin Microarray RAW/GSM618107_14418002_532.xys"
>
> But, doing this resulted in an error message:
> seed <- new("NgsExpressionPDInfoPkgSeed", ndfFile = ndf, xysFile = xys, author = "FJ", organism = "Apple", species = "Malus x Domestica cv.GD")
>
> makePdInfoPackage(arrays, destDir = getwd())
> ============================================================================================================================================
> Building annotation package for Nimblegen Expression Array
> NDF: GPL11164.ndf
> XYS: GSM618107_14418002_532.xys
> ============================================================================================================================================
> Parsing file: GPL11164.ndf... OK
> Parsing file: GSM618107_14418002_532.xys... OK
> Merging NDF and XYS files... OK
> Preparing contents for featureSet table... Error in `[.data.frame`(ndfdata, , colsFS) : undefined columns selected
> In addition: Warning message:
> In is.na(ndfdata[["SIGNAL"]]) :
>   is.na() applied to non-(list or vector) of type 'NULL'
>
> The only files available from NCBI/GEO are 24 PAIR files and 1 ndf. It seems .xys has a different arrangement than .pair, thus .ndf is not applicable to annotate the .pair file? Any suggestions?
> Hope to hear from you soon.
> Franklin
>
>  -- output of sessionInfo():
>
>> sessionInfo()
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C                           LC_TIME=English_United States.1252
>
> attached base packages:
>  [1] tcltk     grid      parallel  stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
>  [1] pdInfoBuilder_1.24.0 oligo_1.24.0         oligoClasses_1.22.0  affxparser_1.32.1    RSQLite_0.11.4       DBI_0.2-7
>  [7] Mfuzz_2.18.0         DynDoc_1.38.0        widgetTools_1.38.0   e1071_1.6-1          class_7.3-7          gplots_2.11.0.1
> [13] KernSmooth_2.23-10   caTools_1.14         gdata_2.12.0.2       gtools_2.7.1         timecourse_1.32.0    MASS_7.3-26
> [19] Biobase_2.20.0       BiocGenerics_0.6.0   limma_3.16.5         ggplot2_0.9.3.1      BiocInstaller_1.10.1
>
> loaded via a namespace (and not attached):
>  [1] affyio_1.28.0         Biostrings_2.28.0     bit_1.1-10            bitops_1.0-5          codetools_0.2-8       colorspace_1.2-2
>  [7] dichromat_2.0-0       digest_0.6.3          ff_2.2-11             foreach_1.4.0         GenomicRanges_1.12.4  gtable_0.1.2
> [13] IRanges_1.18.1        iterators_1.0.6       labeling_0.1          marray_1.38.0         munsell_0.4           plyr_1.8
> [19] preprocessCore_1.22.0 proto_0.3-10          RColorBrewer_1.0-5    reshape2_1.2.2        scales_0.2.3          splines_3.0.1
> [25] stats4_3.0.1          stringr_0.6.2         tkWidgets_1.38.0      tools_3.0.1           zlibbioc_1.6.0
>>
>
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor