[BioC] PAIR files -- feature set table

Johnson, Franklin Theodore franklin.johnson at email.wsu.edu
Fri Jun 7 19:39:42 CEST 2013


Resending to bioconductor message thread:

Dear Dr. Carvalho,
Thanks for the response.
As you suggested, I will look into the merge function using "Probe_ID".
After reading in the data, I will start here: merge.datasets(dataset1, dataset2, by="key"). 
Best Regards,
Franklin 

Great minds discuss ideas. Average minds discuss events. Small minds discuss people. -Eleanor Roosevelt

________________________________________
From: Benilton Carvalho [beniltoncarvalho at gmail.com]
Sent: Thursday, June 06, 2013 8:11 PM
To: Johnson, Franklin Theodore
Cc: bioconductor at r-project.org; franklin.johnson at wsu.edu
Subject: Re: [BioC] PAIR files -- feature set table

You will need to merge the PAIR and the NDF using the PROBE_ID column
as key. This will allow you to get the X/Y coordinates needed to
create the XYS as described on the other messages.

Regarding annotation, you may need to contact NimbleGen to request
this information directly from them...

benilton

2013/6/6 Johnson, Franklin Theodore <franklin.johnson at email.wsu.edu>:
> Dear Dr. Carvalho,
>
> Muchos grasias for the reply.
>
> Actually, this is what my .ndf file looks like:
>> head(ndf)
>   PROBE_DESIGN_ID   CONTAINER DESIGN_NOTE SELECTION_CRITERIA SEQ_ID
> 1  7552_0343_0009 Duplicate_1
> 2  7552_0345_0009 Duplicate_2
> 3  7552_0347_0009 Duplicate_1
> 4  7552_0349_0009 Duplicate_2
> 5  7552_0351_0009 Duplicate_2
> 6  7552_0353_0009 Duplicate_1
>                                                PROBE_SEQUENCE MISMATCH MATCH_INDEX FEATURE_ID ROW_NUM COL_NUM PROBE_CLASS
> 1  cttgactcttctaagttcaaaggtaactcaagtgaagctgtcagatatgatccttcca        0    64535488   64535488       9     343
> 2 cccaagcattaaaccttactcatatacttataatgcagccatcaagagtttgtgcaagg        0    64799310   64799310       9     345
> 3          agggaggctgaaagagagagtgaatggtccagctgggcataattgctgca        0    64476989   64476989       9     347
> 4          ttgttggtgggggtgttgcccttagtaccccagaccttgaagcagttaaa        0    64862794   64862794       9     349
> 5          gtgtggggccccctttctttaactggaacctttctttgaagcaatttggg        0    64832726   64832726       9     351
> 6          ttgtccaattccaacatgccgagacggcagggattgtgatcgtgttgttc        0    64435686   64435686       9     353
>                       PROBE_ID POSITION DESIGN_ID   X Y
> 1    Contig19819_1_f_28_10_535        0      7552 343 9
> 2 Malus_CN899188_2_f_147_1_755        0      7552 345 9
> 3  Contig20738_8_r_1179_2_1432        0      7552 347 9
> 4 Malus_CN880097_2_r_336_2_536        0      7552 349 9
> 5 Malus_CN918117_2_f_632_1_781        0      7552 351 9
> 6     Contig1991_1_f_71_2_1239        0      7552 353 9
>
> The pair files, .532 pair files only (one-color arrays), only obtain the probe ID and signal; after some text at the top describing the experiment. My real issue is that I can further normalize and analyze the RMA files with sva and limma, etc. However, I cannot annotate the probes without the array annotation, as there are duplicates in the ndf file which are removed in the RMA.pair files available on NCBI/GEO. So they will not match in any annotation package I've failed at trying.
> So, I' tried to go back and start from the raw pair files...this custom array is really a "custom" array without
> NimbleScan.
>
> Salud,
> Franklin
>
>
>
>
>
>
> Great minds discuss ideas. Average minds discuss events. Small minds discuss people. -Eleanor Roosevelt
>
>
>
>
> ________________________________________
> From: Benilton Carvalho [beniltoncarvalho at gmail.com]
> Sent: Wednesday, June 05, 2013 6:42 PM
> To: FRANKLIN JOHNSON [guest]
> Cc: bioconductor at r-project.org; franklin.johnson at wsu.edu; pdInfoBuilder Maintainer
> Subject: Re: [BioC] PAIR files -- feature set table
>
> It's an unfortunate mistake to have the pairFile *argument* in the
> call (not in the slots session, but I see your point). :-( I'll make
> sure that this is fixed.
>
> You need to convert the PAIR files to XYS...
>
> Some refs that should help you in the process:
>
> https://stat.ethz.ch/pipermail/bioconductor/2012-January/043186.html
> http://comments.gmane.org/gmane.science.biology.informatics.conductor/27547
>
> b
>
> 2013/6/5 FRANKLIN JOHNSON [guest] <guest at bioconductor.org>:
>>
>> Dear Maintainer,
>>
>> I downloaded available NimbleGen 'single channel' 532.PAIR files for a custom built expression microarray from NCBI/GEO (GPL11164). However, I get an error message when I try to make the annotation for this platform using pdInfoBuild.
>>
>> In pdInfoBuilder Reference Manual (June 5, 2013), under the NgsExpressionPDInfoPkgSeed method, there is a slot for pairFile, although, showClasses("Ngs.."), does not show a slot for this, only, XYS. Thus, I changed the .pair file extension to .xys.
>>
>> (ndf<- list.files(getwd(), pattern=".ndf", full.names=TRUE)) # read annotation file
>> [1] "C:/Users/franklin.johnson.PW99-WEN/Desktop/Test/Yanmin's Microarray Paper/Yanmin Microarray RAW/GPL11164.ndf"
>>
>> (xys <- list.files(getwd(), pattern = ".xys", full.names = TRUE)[1])
>> [1] "C:/Users/franklin.johnson.PW99-WEN/Desktop/Test/Yanmin's Microarray Paper/Yanmin Microarray RAW/GSM618107_14418002_532.xys"
>>
>> But, doing this resulted in an error message:
>> seed <- new("NgsExpressionPDInfoPkgSeed", ndfFile = ndf, xysFile = xys, author = "FJ", organism = "Apple", species = "Malus x Domestica cv.GD")
>>
>> makePdInfoPackage(arrays, destDir = getwd())
>> ============================================================================================================================================
>> Building annotation package for Nimblegen Expression Array
>> NDF: GPL11164.ndf
>> XYS: GSM618107_14418002_532.xys
>> ============================================================================================================================================
>> Parsing file: GPL11164.ndf... OK
>> Parsing file: GSM618107_14418002_532.xys... OK
>> Merging NDF and XYS files... OK
>> Preparing contents for featureSet table... Error in `[.data.frame`(ndfdata, , colsFS) : undefined columns selected
>> In addition: Warning message:
>> In is.na(ndfdata[["SIGNAL"]]) :
>>   is.na() applied to non-(list or vector) of type 'NULL'
>>
>> The only files available from NCBI/GEO are 24 PAIR files and 1 ndf. It seems .xys has a different arrangement than .pair, thus .ndf is not applicable to annotate the .pair file? Any suggestions?
>> Hope to hear from you soon.
>> Franklin
>>
>>  -- output of sessionInfo():
>>
>>> sessionInfo()
>> R version 3.0.1 (2013-05-16)
>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>
>> locale:
>> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
>> [4] LC_NUMERIC=C                           LC_TIME=English_United States.1252
>>
>> attached base packages:
>>  [1] tcltk     grid      parallel  stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>>  [1] pdInfoBuilder_1.24.0 oligo_1.24.0         oligoClasses_1.22.0  affxparser_1.32.1    RSQLite_0.11.4       DBI_0.2-7
>>  [7] Mfuzz_2.18.0         DynDoc_1.38.0        widgetTools_1.38.0   e1071_1.6-1          class_7.3-7          gplots_2.11.0.1
>> [13] KernSmooth_2.23-10   caTools_1.14         gdata_2.12.0.2       gtools_2.7.1         timecourse_1.32.0    MASS_7.3-26
>> [19] Biobase_2.20.0       BiocGenerics_0.6.0   limma_3.16.5         ggplot2_0.9.3.1      BiocInstaller_1.10.1
>>
>> loaded via a namespace (and not attached):
>>  [1] affyio_1.28.0         Biostrings_2.28.0     bit_1.1-10            bitops_1.0-5          codetools_0.2-8       colorspace_1.2-2
>>  [7] dichromat_2.0-0       digest_0.6.3          ff_2.2-11             foreach_1.4.0         GenomicRanges_1.12.4  gtable_0.1.2
>> [13] IRanges_1.18.1        iterators_1.0.6       labeling_0.1          marray_1.38.0         munsell_0.4           plyr_1.8
>> [19] preprocessCore_1.22.0 proto_0.3-10          RColorBrewer_1.0-5    reshape2_1.2.2        scales_0.2.3          splines_3.0.1
>> [25] stats4_3.0.1          stringr_0.6.2         tkWidgets_1.38.0      tools_3.0.1           zlibbioc_1.6.0
>>>
>>
>>
>> --
>> Sent via the guest posting facility at bioconductor.org.
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


More information about the Bioconductor mailing list