[BioC] PAIR files -- feature set table
Johnson, Franklin Theodore
franklin.johnson at email.wsu.edu
Thu Jun 6 08:26:51 CEST 2013
Dear Dr. Carvalho,
Muchos grasias for the reply.
Actually, this is what my .ndf file looks like:
> head(ndf)
PROBE_DESIGN_ID CONTAINER DESIGN_NOTE SELECTION_CRITERIA SEQ_ID
1 7552_0343_0009 Duplicate_1
2 7552_0345_0009 Duplicate_2
3 7552_0347_0009 Duplicate_1
4 7552_0349_0009 Duplicate_2
5 7552_0351_0009 Duplicate_2
6 7552_0353_0009 Duplicate_1
PROBE_SEQUENCE MISMATCH MATCH_INDEX FEATURE_ID ROW_NUM COL_NUM PROBE_CLASS
1 cttgactcttctaagttcaaaggtaactcaagtgaagctgtcagatatgatccttcca 0 64535488 64535488 9 343
2 cccaagcattaaaccttactcatatacttataatgcagccatcaagagtttgtgcaagg 0 64799310 64799310 9 345
3 agggaggctgaaagagagagtgaatggtccagctgggcataattgctgca 0 64476989 64476989 9 347
4 ttgttggtgggggtgttgcccttagtaccccagaccttgaagcagttaaa 0 64862794 64862794 9 349
5 gtgtggggccccctttctttaactggaacctttctttgaagcaatttggg 0 64832726 64832726 9 351
6 ttgtccaattccaacatgccgagacggcagggattgtgatcgtgttgttc 0 64435686 64435686 9 353
PROBE_ID POSITION DESIGN_ID X Y
1 Contig19819_1_f_28_10_535 0 7552 343 9
2 Malus_CN899188_2_f_147_1_755 0 7552 345 9
3 Contig20738_8_r_1179_2_1432 0 7552 347 9
4 Malus_CN880097_2_r_336_2_536 0 7552 349 9
5 Malus_CN918117_2_f_632_1_781 0 7552 351 9
6 Contig1991_1_f_71_2_1239 0 7552 353 9
The pair files, .532 pair files only (one-color arrays), only obtain the probe ID and signal; after some text at the top describing the experiment. My real issue is that I can further normalize and analyze the RMA files with sva and limma, etc. However, I cannot annotate the probes without the array annotation, as there are duplicates in the ndf file which are removed in the RMA.pair files available on NCBI/GEO. So they will not match in any annotation package I've failed at trying.
So, I' tried to go back and start from the raw pair files...this custom array is really a "custom" array without
NimbleScan.
Salud,
Franklin
Great minds discuss ideas. Average minds discuss events. Small minds discuss people. -Eleanor Roosevelt
________________________________________
From: Benilton Carvalho [beniltoncarvalho at gmail.com]
Sent: Wednesday, June 05, 2013 6:42 PM
To: FRANKLIN JOHNSON [guest]
Cc: bioconductor at r-project.org; franklin.johnson at wsu.edu; pdInfoBuilder Maintainer
Subject: Re: [BioC] PAIR files -- feature set table
It's an unfortunate mistake to have the pairFile *argument* in the
call (not in the slots session, but I see your point). :-( I'll make
sure that this is fixed.
You need to convert the PAIR files to XYS...
Some refs that should help you in the process:
https://stat.ethz.ch/pipermail/bioconductor/2012-January/043186.html
http://comments.gmane.org/gmane.science.biology.informatics.conductor/27547
b
2013/6/5 FRANKLIN JOHNSON [guest] <guest at bioconductor.org>:
>
> Dear Maintainer,
>
> I downloaded available NimbleGen 'single channel' 532.PAIR files for a custom built expression microarray from NCBI/GEO (GPL11164). However, I get an error message when I try to make the annotation for this platform using pdInfoBuild.
>
> In pdInfoBuilder Reference Manual (June 5, 2013), under the NgsExpressionPDInfoPkgSeed method, there is a slot for pairFile, although, showClasses("Ngs.."), does not show a slot for this, only, XYS. Thus, I changed the .pair file extension to .xys.
>
> (ndf<- list.files(getwd(), pattern=".ndf", full.names=TRUE)) # read annotation file
> [1] "C:/Users/franklin.johnson.PW99-WEN/Desktop/Test/Yanmin's Microarray Paper/Yanmin Microarray RAW/GPL11164.ndf"
>
> (xys <- list.files(getwd(), pattern = ".xys", full.names = TRUE)[1])
> [1] "C:/Users/franklin.johnson.PW99-WEN/Desktop/Test/Yanmin's Microarray Paper/Yanmin Microarray RAW/GSM618107_14418002_532.xys"
>
> But, doing this resulted in an error message:
> seed <- new("NgsExpressionPDInfoPkgSeed", ndfFile = ndf, xysFile = xys, author = "FJ", organism = "Apple", species = "Malus x Domestica cv.GD")
>
> makePdInfoPackage(arrays, destDir = getwd())
> ============================================================================================================================================
> Building annotation package for Nimblegen Expression Array
> NDF: GPL11164.ndf
> XYS: GSM618107_14418002_532.xys
> ============================================================================================================================================
> Parsing file: GPL11164.ndf... OK
> Parsing file: GSM618107_14418002_532.xys... OK
> Merging NDF and XYS files... OK
> Preparing contents for featureSet table... Error in `[.data.frame`(ndfdata, , colsFS) : undefined columns selected
> In addition: Warning message:
> In is.na(ndfdata[["SIGNAL"]]) :
> is.na() applied to non-(list or vector) of type 'NULL'
>
> The only files available from NCBI/GEO are 24 PAIR files and 1 ndf. It seems .xys has a different arrangement than .pair, thus .ndf is not applicable to annotate the .pair file? Any suggestions?
> Hope to hear from you soon.
> Franklin
>
> -- output of sessionInfo():
>
>> sessionInfo()
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C LC_TIME=English_United States.1252
>
> attached base packages:
> [1] tcltk grid parallel stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] pdInfoBuilder_1.24.0 oligo_1.24.0 oligoClasses_1.22.0 affxparser_1.32.1 RSQLite_0.11.4 DBI_0.2-7
> [7] Mfuzz_2.18.0 DynDoc_1.38.0 widgetTools_1.38.0 e1071_1.6-1 class_7.3-7 gplots_2.11.0.1
> [13] KernSmooth_2.23-10 caTools_1.14 gdata_2.12.0.2 gtools_2.7.1 timecourse_1.32.0 MASS_7.3-26
> [19] Biobase_2.20.0 BiocGenerics_0.6.0 limma_3.16.5 ggplot2_0.9.3.1 BiocInstaller_1.10.1
>
> loaded via a namespace (and not attached):
> [1] affyio_1.28.0 Biostrings_2.28.0 bit_1.1-10 bitops_1.0-5 codetools_0.2-8 colorspace_1.2-2
> [7] dichromat_2.0-0 digest_0.6.3 ff_2.2-11 foreach_1.4.0 GenomicRanges_1.12.4 gtable_0.1.2
> [13] IRanges_1.18.1 iterators_1.0.6 labeling_0.1 marray_1.38.0 munsell_0.4 plyr_1.8
> [19] preprocessCore_1.22.0 proto_0.3-10 RColorBrewer_1.0-5 reshape2_1.2.2 scales_0.2.3 splines_3.0.1
> [25] stats4_3.0.1 stringr_0.6.2 tkWidgets_1.38.0 tools_3.0.1 zlibbioc_1.6.0
>>
>
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list