[BioC] PAIR files -- feature set table
Benilton Carvalho
beniltoncarvalho at gmail.com
Fri Jun 14 01:43:34 CEST 2013
dont worry about that particular warning.... just install the package
and try to read your XYS files.
2013/6/13 Johnson, Franklin Theodore <franklin.johnson at email.wsu.edu>:
> Dr. Carvalho,
>
> Yes. I see what you mean.
> Switching the columns helped in the FeatureSet table loading inserted more
> that 2 rows:
>
> Inserting 198661 rows into table featureSet... OK
> However, the warning message did print again.
>
>
> Warning message:
> In is.na(ndfdata[["SIGNAL"]]) :
> is.na() applied to non-(list or vector) of type 'NULL'
>
> Below is the output + sessionInfo(), as I upgraded to R 3.0.1.
>
> #Begin R command line code:
>
>> makePdInfoPackage(arrays, destDir = getwd(), unlink=TRUE)
> ==============================================================================================================================================================
>
>
> Building annotation package for Nimblegen Expression Array
> NDF: pdinfo_GPL11164.ndf.txt <-new .ndf file with PROBE_ID<->SEQ_ID
> XYS: XYS.txt
> ==============================================================================================================================================================
> Parsing file: pdinfo_GPL11164.ndf.txt... OK
>
> Parsing file: XYS.txt... OK
> Merging NDF and XYS files... OK
> Preparing contents for featureSet table... OK
> Preparing contents for bgfeature table... OK
> Preparing contents for pmfeature table... OK
> Creating package in E:/RANDOM/Test/Yanmin's Microarray Paper/Yanmin
> Microarray RAW/pd.pdinfo.gpl11164.ndf.txt
> Inserting 198661 rows into table featureSet... OK
> Inserting 770599 rows into table pmfeature... OK
>
> Counting rows in featureSet
> Counting rows in pmfeature
> Creating index idx_pmfsetid on pmfeature... OK
> Creating index idx_pmfid on pmfeature... OK
> Creating index idx_fsfsetid on featureSet... OK
> Saving DataFrame object for PM.
> Done.
> Warning message:
> In is.na(ndfdata[["SIGNAL"]]) :
> is.na() applied to non-(list or vector) of type 'NULL'
>
>
>> sessionInfo()
> R version 3.0.1 (2013-05-16)
> Platform: i386-w64-mingw32/i386 (32-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
> States.1252 LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C LC_TIME=English_United
> States.1252
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets methods
> base
>
> other attached packages:
> [1] pdInfoBuilder_1.24.0 oligo_1.24.0 oligoClasses_1.22.0
> affxparser_1.32.1 RSQLite_0.11.4 DBI_0.2-7
> Biobase_2.20.0
> [8] BiocGenerics_0.6.0 BiocInstaller_1.10.2
>
> loaded via a namespace (and not attached):
> [1] affyio_1.28.0 Biostrings_2.28.0 bit_1.1-10
> codetools_0.2-8 ff_2.2-11 foreach_1.4.1
> GenomicRanges_1.12.4
> [8] IRanges_1.18.1 iterators_1.0.6 preprocessCore_1.22.0
> splines_3.0.1 stats4_3.0.1 tools_3.0.1
> zlibbioc_1.6.0
>
>
>
>>q()
>
>
>
> The built pdInfopackage loaded in Destdir is identical to previous message.
>
> However the featureSet table now has more than 2 rows...
>
> Lastly, I did multiple combos, as my merged file has (X.x, Y.x)<-seems to be
> identifiers for the 'probe IDs' on the array as well as (X.y, Y.y) <- seems
> to be the sequence identifiers for the "SEQ_ID". I used X.x, Y.x and PM
> which gave the result I pasted above. All others had errors. I'm close, but
> that Warning Message is annoying...
>
>
>
> Regards,
>
> Franklin
>
>
> Great minds discuss ideas. Average minds discuss events. Small minds discuss
> people. -Eleanor Roosevelt
>
>
>
>
> ________________________________________
> From: Benilton Carvalho [beniltoncarvalho at gmail.com]
> Sent: Wednesday, June 12, 2013 8:25 PM
>
> To: Johnson, Franklin Theodore
> Cc: bioconductor at r-project.org
> Subject: Re: [BioC] PAIR files -- feature set table
>
> That does not look ok.
>
> The problem is the count for the featureSet table... This table stores
> the information for "genes" (or whatever the target for this
> particular array is)... so, it is unlikely that you have a microarray
> with only 2 "target units"... I'd expect something around the
> thousands...
>
> pdInfoBuilder uses the information in SEQ_ID (in the NDF) to get the
> target information (i.e., the contents for featureSet).
>
> Given that this is a custom array, I believe that the best idea is to
> contact the person who designed it and ask more details about the
> design (in particular, how many probesets and average number of probes
> per probeset)...
>
> I've seen some designs in which the information that was expected to
> be in SEQ_ID was actually stored in PROBE_ID (in such cases, the user
> needs to create a backup copy of the NDF, and then move the contents
> of PROBE_ID to SEQ_ID - and vice-versa).
>
> b
>
> 2013/6/12 Johnson, Franklin Theodore <franklin.johnson at email.wsu.edu>:
>> Dear Dr. Carvalho,
>>
>> Recently, we had cooresponence regaring makePDInfoPackage for an NimbleGen
>> apple microarray.
>> I was able to merge the ndf design and XYS files using PROBE_ID.
>> As a reminder this is a custom array, and there are no SIGNAL==NAs for
>> control probes.
>> It seemed to work:
>>> makePdInfoPackage(seed, destDir(""))
>>
>> ============================================================================================================================================================
>> Building annotation package for Nimblegen Expression Array
>> NDF: GPL11164.ndf
>> XYS: XYS.txt
>>
>> ============================================================================================================================================================
>> Parsing file: GPL11164.ndf... OK
>> Parsing file: XYS.txt... OK
>> Merging NDF and XYS files... OK
>> Preparing contents for featureSet table... OK
>> Preparing contents for bgfeature table... OK
>> Preparing contents for pmfeature table... OK
>> Creating package in
>> C:/Users/franklin.johnson.PW50-WEN/Desktop/Test/Yanmin's Microarray
>> Paper/Yanmin Microarray RAW/pd.gpl11164
>> Inserting 2 rows into table featureSet... OK
>> Inserting 765524 rows into table pmfeature... OK
>> Inserting 5075 rows into table bgfeature... OK
>> Counting rows in bgfeature
>> Counting rows in featureSet
>> Counting rows in pmfeature
>> Creating index idx_bgfsetid on bgfeature... OK
>> Creating index idx_bgfid on bgfeature... OK
>> Creating index idx_pmfsetid on pmfeature... OK
>> Creating index idx_pmfid on pmfeature... OK
>> Creating index idx_fsfsetid on featureSet... OK
>> Saving DataFrame object for PM.
>> Saving DataFrame object for BG.
>> Done.
>> Warning message:
>> In is.na(ndfdata[["SIGNAL"]]) :
>> is.na() applied to non-(list or vector) of type 'NULL'
>>>
>>
>> In contrast to this warning message, I see a pdinfopackage directory with
>> 4 subdirectories: c=("data", "inst", "man", R"), as well as
>> subsubdirectories in "inst"=c("extdata", and "Unit Tests"), in addition to
>> two text files in the main directory: c=("DESCRIPTION", "NAMESPACE") were
>> created in my destination folder.
>> Before using "oligo", if possible, I wanted to confirm with you that this
>> package is viable to use with "oligo" although a warning message that may
>> not pertain to my custom designed microarray was printed.
>>
>> Regards,
>> Franklin
>>
>> Great minds discuss ideas. Average minds discuss events. Small minds
>> discuss people. -Eleanor Roosevelt
>>
>>
>>
>>
>> ________________________________________
>> From: Johnson, Franklin Theodore
>> Sent: Friday, June 07, 2013 10:39 AM
>> To: Benilton Carvalho
>> Cc: bioconductor at r-project.org
>> Subject: RE: [BioC] PAIR files -- feature set table
>>
>> Resending to bioconductor message thread:
>>
>> Dear Dr. Carvalho,
>> Thanks for the response.
>> As you suggested, I will look into the merge function using "Probe_ID".
>> After reading in the data, I will start here: merge.datasets(dataset1,
>> dataset2, by="key").
>> Best Regards,
>> Franklin
>>
>> Great minds discuss ideas. Average minds discuss events. Small minds
>> discuss people. -Eleanor Roosevelt
>>
>> ________________________________________
>> From: Benilton Carvalho [beniltoncarvalho at gmail.com]
>> Sent: Thursday, June 06, 2013 8:11 PM
>> To: Johnson, Franklin Theodore
>> Cc: bioconductor at r-project.org; franklin.johnson at wsu.edu
>> Subject: Re: [BioC] PAIR files -- feature set table
>>
>> You will need to merge the PAIR and the NDF using the PROBE_ID column
>> as key. This will allow you to get the X/Y coordinates needed to
>> create the XYS as described on the other messages.
>>
>> Regarding annotation, you may need to contact NimbleGen to request
>> this information directly from them...
>>
>> benilton
>>
>> 2013/6/6 Johnson, Franklin Theodore <franklin.johnson at email.wsu.edu>:
>>> Dear Dr. Carvalho,
>>>
>>> Muchos grasias for the reply.
>>>
>>> Actually, this is what my .ndf file looks like:
>>>> head(ndf)
>>> PROBE_DESIGN_ID CONTAINER DESIGN_NOTE SELECTION_CRITERIA SEQ_ID
>>> 1 7552_0343_0009 Duplicate_1
>>> 2 7552_0345_0009 Duplicate_2
>>> 3 7552_0347_0009 Duplicate_1
>>> 4 7552_0349_0009 Duplicate_2
>>> 5 7552_0351_0009 Duplicate_2
>>> 6 7552_0353_0009 Duplicate_1
>>> PROBE_SEQUENCE MISMATCH
>>> MATCH_INDEX FEATURE_ID ROW_NUM COL_NUM PROBE_CLASS
>>> 1 cttgactcttctaagttcaaaggtaactcaagtgaagctgtcagatatgatccttcca 0
>>> 64535488 64535488 9 343
>>> 2 cccaagcattaaaccttactcatatacttataatgcagccatcaagagtttgtgcaagg 0
>>> 64799310 64799310 9 345
>>> 3 agggaggctgaaagagagagtgaatggtccagctgggcataattgctgca 0
>>> 64476989 64476989 9 347
>>> 4 ttgttggtgggggtgttgcccttagtaccccagaccttgaagcagttaaa 0
>>> 64862794 64862794 9 349
>>> 5 gtgtggggccccctttctttaactggaacctttctttgaagcaatttggg 0
>>> 64832726 64832726 9 351
>>> 6 ttgtccaattccaacatgccgagacggcagggattgtgatcgtgttgttc 0
>>> 64435686 64435686 9 353
>>> PROBE_ID POSITION DESIGN_ID X Y
>>> 1 Contig19819_1_f_28_10_535 0 7552 343 9
>>> 2 Malus_CN899188_2_f_147_1_755 0 7552 345 9
>>> 3 Contig20738_8_r_1179_2_1432 0 7552 347 9
>>> 4 Malus_CN880097_2_r_336_2_536 0 7552 349 9
>>> 5 Malus_CN918117_2_f_632_1_781 0 7552 351 9
>>> 6 Contig1991_1_f_71_2_1239 0 7552 353 9
>>>
>>> The pair files, .532 pair files only (one-color arrays), only obtain the
>>> probe ID and signal; after some text at the top describing the experiment.
>>> My real issue is that I can further normalize and analyze the RMA files with
>>> sva and limma, etc. However, I cannot annotate the probes without the array
>>> annotation, as there are duplicates in the ndf file which are removed in the
>>> RMA.pair files available on NCBI/GEO. So they will not match in any
>>> annotation package I've failed at trying.
>>> So, I' tried to go back and start from the raw pair files...this custom
>>> array is really a "custom" array without
>>> NimbleScan.
>>>
>>> Salud,
>>> Franklin
>>>
>>>
>>>
>>>
>>>
>>>
>>> Great minds discuss ideas. Average minds discuss events. Small minds
>>> discuss people. -Eleanor Roosevelt
>>>
>>>
>>>
>>>
>>> ________________________________________
>>> From: Benilton Carvalho [beniltoncarvalho at gmail.com]
>>> Sent: Wednesday, June 05, 2013 6:42 PM
>>> To: FRANKLIN JOHNSON [guest]
>>> Cc: bioconductor at r-project.org; franklin.johnson at wsu.edu; pdInfoBuilder
>>> Maintainer
>>> Subject: Re: [BioC] PAIR files -- feature set table
>>>
>>> It's an unfortunate mistake to have the pairFile *argument* in the
>>> call (not in the slots session, but I see your point). :-( I'll make
>>> sure that this is fixed.
>>>
>>> You need to convert the PAIR files to XYS...
>>>
>>> Some refs that should help you in the process:
>>>
>>> https://stat.ethz.ch/pipermail/bioconductor/2012-January/043186.html
>>>
>>> http://comments.gmane.org/gmane.science.biology.informatics.conductor/27547
>>>
>>> b
>>>
>>> 2013/6/5 FRANKLIN JOHNSON [guest] <guest at bioconductor.org>:
>>>>
>>>> Dear Maintainer,
>>>>
>>>> I downloaded available NimbleGen 'single channel' 532.PAIR files for a
>>>> custom built expression microarray from NCBI/GEO (GPL11164). However, I get
>>>> an error message when I try to make the annotation for this platform using
>>>> pdInfoBuild.
>>>>
>>>> In pdInfoBuilder Reference Manual (June 5, 2013), under the
>>>> NgsExpressionPDInfoPkgSeed method, there is a slot for pairFile, although,
>>>> showClasses("Ngs.."), does not show a slot for this, only, XYS. Thus, I
>>>> changed the .pair file extension to .xys.
>>>>
>>>> (ndf<- list.files(getwd(), pattern=".ndf", full.names=TRUE)) # read
>>>> annotation file
>>>> [1] "C:/Users/franklin.johnson.PW99-WEN/Desktop/Test/Yanmin's Microarray
>>>> Paper/Yanmin Microarray RAW/GPL11164.ndf"
>>>>
>>>> (xys <- list.files(getwd(), pattern = ".xys", full.names = TRUE)[1])
>>>> [1] "C:/Users/franklin.johnson.PW99-WEN/Desktop/Test/Yanmin's Microarray
>>>> Paper/Yanmin Microarray RAW/GSM618107_14418002_532.xys"
>>>>
>>>> But, doing this resulted in an error message:
>>>> seed <- new("NgsExpressionPDInfoPkgSeed", ndfFile = ndf, xysFile = xys,
>>>> author = "FJ", organism = "Apple", species = "Malus x Domestica cv.GD")
>>>>
>>>> makePdInfoPackage(arrays, destDir = getwd())
>>>>
>>>> ============================================================================================================================================
>>>> Building annotation package for Nimblegen Expression Array
>>>> NDF: GPL11164.ndf
>>>> XYS: GSM618107_14418002_532.xys
>>>>
>>>> ============================================================================================================================================
>>>> Parsing file: GPL11164.ndf... OK
>>>> Parsing file: GSM618107_14418002_532.xys... OK
>>>> Merging NDF and XYS files... OK
>>>> Preparing contents for featureSet table... Error in
>>>> `[.data.frame`(ndfdata, , colsFS) : undefined columns selected
>>>> In addition: Warning message:
>>>> In is.na(ndfdata[["SIGNAL"]]) :
>>>> is.na() applied to non-(list or vector) of type 'NULL'
>>>>
>>>> The only files available from NCBI/GEO are 24 PAIR files and 1 ndf. It
>>>> seems .xys has a different arrangement than .pair, thus .ndf is not
>>>> applicable to annotate the .pair file? Any suggestions?
>>>> Hope to hear from you soon.
>>>> Franklin
>>>>
>>>> -- output of sessionInfo():
>>>>
>>>>> sessionInfo()
>>>> R version 3.0.1 (2013-05-16)
>>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>>
>>>> locale:
>>>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
>>>> States.1252 LC_MONETARY=English_United States.1252
>>>> [4] LC_NUMERIC=C LC_TIME=English_United
>>>> States.1252
>>>>
>>>> attached base packages:
>>>> [1] tcltk grid parallel stats graphics grDevices utils
>>>> datasets methods base
>>>>
>>>> other attached packages:
>>>> [1] pdInfoBuilder_1.24.0 oligo_1.24.0 oligoClasses_1.22.0
>>>> affxparser_1.32.1 RSQLite_0.11.4 DBI_0.2-7
>>>> [7] Mfuzz_2.18.0 DynDoc_1.38.0 widgetTools_1.38.0
>>>> e1071_1.6-1 class_7.3-7 gplots_2.11.0.1
>>>> [13] KernSmooth_2.23-10 caTools_1.14 gdata_2.12.0.2
>>>> gtools_2.7.1 timecourse_1.32.0 MASS_7.3-26
>>>> [19] Biobase_2.20.0 BiocGenerics_0.6.0 limma_3.16.5
>>>> ggplot2_0.9.3.1 BiocInstaller_1.10.1
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] affyio_1.28.0 Biostrings_2.28.0 bit_1.1-10
>>>> bitops_1.0-5 codetools_0.2-8 colorspace_1.2-2
>>>> [7] dichromat_2.0-0 digest_0.6.3 ff_2.2-11
>>>> foreach_1.4.0 GenomicRanges_1.12.4 gtable_0.1.2
>>>> [13] IRanges_1.18.1 iterators_1.0.6 labeling_0.1
>>>> marray_1.38.0 munsell_0.4 plyr_1.8
>>>> [19] preprocessCore_1.22.0 proto_0.3-10 RColorBrewer_1.0-5
>>>> reshape2_1.2.2 scales_0.2.3 splines_3.0.1
>>>> [25] stats4_3.0.1 stringr_0.6.2 tkWidgets_1.38.0
>>>> tools_3.0.1 zlibbioc_1.6.0
>>>>>
>>>>
>>>>
>>>> --
>>>> Sent via the guest posting facility at bioconductor.org.
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
More information about the Bioconductor
mailing list