[BioC] PAIR files -- feature set table
Benilton Carvalho
beniltoncarvalho at gmail.com
Fri Jun 7 05:11:49 CEST 2013
You will need to merge the PAIR and the NDF using the PROBE_ID column
as key. This will allow you to get the X/Y coordinates needed to
create the XYS as described on the other messages.
Regarding annotation, you may need to contact NimbleGen to request
this information directly from them...
benilton
2013/6/6 Johnson, Franklin Theodore <franklin.johnson at email.wsu.edu>:
> Dear Dr. Carvalho,
>
> Muchos grasias for the reply.
>
> Actually, this is what my .ndf file looks like:
>> head(ndf)
> PROBE_DESIGN_ID CONTAINER DESIGN_NOTE SELECTION_CRITERIA SEQ_ID
> 1 7552_0343_0009 Duplicate_1
> 2 7552_0345_0009 Duplicate_2
> 3 7552_0347_0009 Duplicate_1
> 4 7552_0349_0009 Duplicate_2
> 5 7552_0351_0009 Duplicate_2
> 6 7552_0353_0009 Duplicate_1
> PROBE_SEQUENCE MISMATCH MATCH_INDEX FEATURE_ID ROW_NUM COL_NUM PROBE_CLASS
> 1 cttgactcttctaagttcaaaggtaactcaagtgaagctgtcagatatgatccttcca 0 64535488 64535488 9 343
> 2 cccaagcattaaaccttactcatatacttataatgcagccatcaagagtttgtgcaagg 0 64799310 64799310 9 345
> 3 agggaggctgaaagagagagtgaatggtccagctgggcataattgctgca 0 64476989 64476989 9 347
> 4 ttgttggtgggggtgttgcccttagtaccccagaccttgaagcagttaaa 0 64862794 64862794 9 349
> 5 gtgtggggccccctttctttaactggaacctttctttgaagcaatttggg 0 64832726 64832726 9 351
> 6 ttgtccaattccaacatgccgagacggcagggattgtgatcgtgttgttc 0 64435686 64435686 9 353
> PROBE_ID POSITION DESIGN_ID X Y
> 1 Contig19819_1_f_28_10_535 0 7552 343 9
> 2 Malus_CN899188_2_f_147_1_755 0 7552 345 9
> 3 Contig20738_8_r_1179_2_1432 0 7552 347 9
> 4 Malus_CN880097_2_r_336_2_536 0 7552 349 9
> 5 Malus_CN918117_2_f_632_1_781 0 7552 351 9
> 6 Contig1991_1_f_71_2_1239 0 7552 353 9
>
> The pair files, .532 pair files only (one-color arrays), only obtain the probe ID and signal; after some text at the top describing the experiment. My real issue is that I can further normalize and analyze the RMA files with sva and limma, etc. However, I cannot annotate the probes without the array annotation, as there are duplicates in the ndf file which are removed in the RMA.pair files available on NCBI/GEO. So they will not match in any annotation package I've failed at trying.
> So, I' tried to go back and start from the raw pair files...this custom array is really a "custom" array without
> NimbleScan.
>
> Salud,
> Franklin
>
>
>
>
>
>
> Great minds discuss ideas. Average minds discuss events. Small minds discuss people. -Eleanor Roosevelt
>
>
>
>
> ________________________________________
> From: Benilton Carvalho [beniltoncarvalho at gmail.com]
> Sent: Wednesday, June 05, 2013 6:42 PM
> To: FRANKLIN JOHNSON [guest]
> Cc: bioconductor at r-project.org; franklin.johnson at wsu.edu; pdInfoBuilder Maintainer
> Subject: Re: [BioC] PAIR files -- feature set table
>
> It's an unfortunate mistake to have the pairFile *argument* in the
> call (not in the slots session, but I see your point). :-( I'll make
> sure that this is fixed.
>
> You need to convert the PAIR files to XYS...
>
> Some refs that should help you in the process:
>
> https://stat.ethz.ch/pipermail/bioconductor/2012-January/043186.html
> http://comments.gmane.org/gmane.science.biology.informatics.conductor/27547
>
> b
>
> 2013/6/5 FRANKLIN JOHNSON [guest] <guest at bioconductor.org>:
>>
>> Dear Maintainer,
>>
>> I downloaded available NimbleGen 'single channel' 532.PAIR files for a custom built expression microarray from NCBI/GEO (GPL11164). However, I get an error message when I try to make the annotation for this platform using pdInfoBuild.
>>
>> In pdInfoBuilder Reference Manual (June 5, 2013), under the NgsExpressionPDInfoPkgSeed method, there is a slot for pairFile, although, showClasses("Ngs.."), does not show a slot for this, only, XYS. Thus, I changed the .pair file extension to .xys.
>>
>> (ndf<- list.files(getwd(), pattern=".ndf", full.names=TRUE)) # read annotation file
>> [1] "C:/Users/franklin.johnson.PW99-WEN/Desktop/Test/Yanmin's Microarray Paper/Yanmin Microarray RAW/GPL11164.ndf"
>>
>> (xys <- list.files(getwd(), pattern = ".xys", full.names = TRUE)[1])
>> [1] "C:/Users/franklin.johnson.PW99-WEN/Desktop/Test/Yanmin's Microarray Paper/Yanmin Microarray RAW/GSM618107_14418002_532.xys"
>>
>> But, doing this resulted in an error message:
>> seed <- new("NgsExpressionPDInfoPkgSeed", ndfFile = ndf, xysFile = xys, author = "FJ", organism = "Apple", species = "Malus x Domestica cv.GD")
>>
>> makePdInfoPackage(arrays, destDir = getwd())
>> ============================================================================================================================================
>> Building annotation package for Nimblegen Expression Array
>> NDF: GPL11164.ndf
>> XYS: GSM618107_14418002_532.xys
>> ============================================================================================================================================
>> Parsing file: GPL11164.ndf... OK
>> Parsing file: GSM618107_14418002_532.xys... OK
>> Merging NDF and XYS files... OK
>> Preparing contents for featureSet table... Error in `[.data.frame`(ndfdata, , colsFS) : undefined columns selected
>> In addition: Warning message:
>> In is.na(ndfdata[["SIGNAL"]]) :
>> is.na() applied to non-(list or vector) of type 'NULL'
>>
>> The only files available from NCBI/GEO are 24 PAIR files and 1 ndf. It seems .xys has a different arrangement than .pair, thus .ndf is not applicable to annotate the .pair file? Any suggestions?
>> Hope to hear from you soon.
>> Franklin
>>
>> -- output of sessionInfo():
>>
>>> sessionInfo()
>> R version 3.0.1 (2013-05-16)
>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>
>> locale:
>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
>> [4] LC_NUMERIC=C LC_TIME=English_United States.1252
>>
>> attached base packages:
>> [1] tcltk grid parallel stats graphics grDevices utils datasets methods base
>>
>> other attached packages:
>> [1] pdInfoBuilder_1.24.0 oligo_1.24.0 oligoClasses_1.22.0 affxparser_1.32.1 RSQLite_0.11.4 DBI_0.2-7
>> [7] Mfuzz_2.18.0 DynDoc_1.38.0 widgetTools_1.38.0 e1071_1.6-1 class_7.3-7 gplots_2.11.0.1
>> [13] KernSmooth_2.23-10 caTools_1.14 gdata_2.12.0.2 gtools_2.7.1 timecourse_1.32.0 MASS_7.3-26
>> [19] Biobase_2.20.0 BiocGenerics_0.6.0 limma_3.16.5 ggplot2_0.9.3.1 BiocInstaller_1.10.1
>>
>> loaded via a namespace (and not attached):
>> [1] affyio_1.28.0 Biostrings_2.28.0 bit_1.1-10 bitops_1.0-5 codetools_0.2-8 colorspace_1.2-2
>> [7] dichromat_2.0-0 digest_0.6.3 ff_2.2-11 foreach_1.4.0 GenomicRanges_1.12.4 gtable_0.1.2
>> [13] IRanges_1.18.1 iterators_1.0.6 labeling_0.1 marray_1.38.0 munsell_0.4 plyr_1.8
>> [19] preprocessCore_1.22.0 proto_0.3-10 RColorBrewer_1.0-5 reshape2_1.2.2 scales_0.2.3 splines_3.0.1
>> [25] stats4_3.0.1 stringr_0.6.2 tkWidgets_1.38.0 tools_3.0.1 zlibbioc_1.6.0
>>>
>>
>>
>> --
>> Sent via the guest posting facility at bioconductor.org.
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list