[BioC] makePdInfoPackage in preparation for RMA with oligo on Nimblegen Expression Arrays

Thu Aug 6 19:32:19 CEST 2009

Yes, and thank you Benilton for all your help on this.  The structure of an .xys file does not seem to be documented clearly anywhere.  If you are deriving your own .xys files from .pair files, some precise details might be useful:

The file needs to be sorted first on Y and then X.  And all values of X and Y from 1 to the max in your design file must be present, with an "NA" if they are not in the .pair file.  The first line of the .xys file should be a comment line that includes the designname

Here how the start of an .xys file might look:

# software=NimbleScan version=1.0.0 designname=XXXXXX_Exp 
X       Y       SIGNAL  COUNT
1       1       NA      NA
2       1       NA      NA
3       1       NA      NA
4       1       NA      NA
5       1       NA      NA
6       1       NA      NA
7       1       NA      NA
8       1       NA      NA
9       1       NA      NA
10      1       NA      NA

Jack

-----Original Message-----
From: Benilton Carvalho [mailto:bcarvalh at jhsph.edu] 
Sent: Thursday, August 06, 2009 10:23 AM
To: Jack Schonbrun
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] makePdInfoPackage in preparation for RMA with oligo on Nimblegen Expression Arrays

Just to close this thread:

Jack regenerated the XYS files ensuring that: A) the data are sorted  
by X/Y coordinates and B) the resulting file contains NA on the  
coordinates not listed in the PAIR file.

b

On Jul 14, 2009, at 7:05 PM, Jack Schonbrun wrote:

>> xys <- read.delim(xysFile, comment='#', nrow=3)
>> str(xys)
> 'data.frame':   3 obs. of  4 variables:
> $ X     : int  209 228 43
> $ Y     : int  203 52 257
> $ SIGNAL: num  203 146 159
> $ COUNT : int  1 1 1
>
> -----Original Message-----
> From: Benilton Carvalho [mailto:bcarvalh at jhsph.edu]
> Sent: Tuesday, July 14, 2009 3:03 PM
> To: Jack Schonbrun
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] makePdInfoPackage in preparation for RMA with  
> oligo on Nimblegen Expression Arrays
>
> how about?
>
> xys <- read.delim(xysFile, comment="#", nrow=100)
> str(xys)
>
> b
>
> On Jul 14, 2009, at 6:58 PM, Jack Schonbrun wrote:
>
>> Here's what I get:
>>
>>> ndf <- read.delim(ndfFile, stringsAsFactors=FALSE, nrow=100)
>>> str(ndf)
>> 'data.frame':   100 obs. of  17 variables:
>> $ PROBE_DESIGN_ID   : chr  "6531_0301_0005" "6531_0311_0005"
>> "6531_0331_0005" "6531_0333_0005" ...
>> $ CONTAINER         : chr  "SACCHAROMYCES1" "SACCHAROMYCES1"
>> "NGS_CONTROLS" "NGS_CONTROLS" ...
>> $ DESIGN_NOTE       : chr  "rank_selected" "rank_selected" "upper
>> right fiducial" "" ...
>> $ SELECTION_CRITERIA: chr  "rank:03;score:379;uniq:14;count:37;freq:
>> 01;rules:1;tm:82.4" "rank:05;score:046;uniq:14;count:1110;freq:
>> 30;rules:1;tm:78.3" "bright" "" ...
>> $ SEQ_ID            : chr  "SCER070900001885" "SCER070900001596"
>> "FIDUCIAL_UPPER_RIGHT" "CROSSHYBE" ...
>> $ PROBE_SEQUENCE    : chr
>> "GTCAACCCTGCAAGATCTCTGGGTGCCGCCGTTGCTGCCAGATATTTCCCTCATTACCAC"
>> "TCAGTTGGAACGCCTCTGAGCACTCCATCACCTGAGTCAGGTAATACATTTACTGATTCA"
>> "TGAGTTGTTTGATAGGATTATTCATAGAGGTCATTACAGCGAGAGGAANNNNNNNNN"
>> "CGATGCGACGCGAACTAAGCAGTTCGGCGCAGTCGACTAGTATAACAGNNNNNNNN" ...
>> $ MISMATCH          : int  0 0 0 0 0 0 0 0 0 0 ...
>> $ MATCH_INDEX       : int  72062965 72061238 2000207 70654015
>> 70652179 65069272 65069273 65069274 65069275 65069276 ...
>> $ FEATURE_ID        : int  72062965 72061238 71722817 71722819
>> 71722820 71722824 71722825 71722826 71722827 71722828 ...
>> $ ROW_NUM           : int  5 5 5 5 6 6 6 6 6 6 ...
>> $ COL_NUM           : int  301 311 331 333 1 5 6 7 8 9 ...
>> $ PROBE_CLASS       : chr  "experimental" "experimental" "fiducial"
>> "control:crosshybe" ...
>> $ PROBE_ID          : chr  "SCER070900001885P00271"
>> "SCER070900001596P00406" "CPK6" "XENOTRACK48P02" ...
>> $ POSITION          : int  271 406 0 2 0 0 5 0 6 0 ...
>> $ DESIGN_ID         : int  6531 6531 6531 6531 6531 6531 6531 6531
>> 6531 6531 ...
>> $ X                 : int  301 311 331 333 1 5 6 7 8 9 ...
>> $ Y                 : int  5 5 5 5 6 6 6 6 6 6 ...
>>>
>>
>> -----Original Message-----
>> From: Benilton Carvalho [mailto:bcarvalh at jhsph.edu]
>> Sent: Tuesday, July 14, 2009 2:56 PM
>> To: Jack Schonbrun
>> Cc: bioconductor at stat.math.ethz.ch
>> Subject: Re: [BioC] makePdInfoPackage in preparation for RMA with
>> oligo on Nimblegen Expression Arrays
>>
>> what do you get if you run the following (assuming ndfFile is a
>> variable has the file name)?
>>
>> ndf <- read.delim(ndfFile, stringsAsFactors=FALSE, nrows=100)
>> str(ndf)
>>
>> thanks,
>>
>> b
>>
>> On Jul 14, 2009, at 6:49 PM, Jack Schonbrun wrote:
>>
>>> Benilton,
>>>
>>> Thanks for your suggestions.
>>>
>>> By every means I have tested, the file is tab delimited.  And the
>>> first row is headers, all other data.
>>>
>>> Here is how the first (header) row looks:
>>> PROBE_DESIGN_ID CONTAINER       DESIGN_NOTE
>>> SELECTION_CRITERIA      SEQ_ID  PROBE_SEQUENCE  MISMATCH
>>> MATCH_INDEX     FEATURE_ID      ROW_NUM COL_NUM PROBE_CLASS
>>> PROBE_ID        POSITION        DESIGN_ID       X       Y
>>>
>>> Any other details on how the ndf is expected to look?
>>>
>>> Thanks again,
>>> Jack
>>>
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Benilton Carvalho [mailto:bcarvalh at jhsph.edu]
>>> Sent: Tuesday, July 14, 2009 1:34 PM
>>> To: Jack Schonbrun
>>> Cc: bioconductor at stat.math.ethz.ch
>>> Subject: Re: [BioC] makePdInfoPackage in preparation for RMA with
>>> oligo on Nimblegen Expression Arrays
>>>
>>> Jack,
>>>
>>> it looks like your NDF isn't as expected.
>>>
>>> When it shows: "inserting 0 rows into table 'featureSet'", it makes
>>> me
>>> wonder how the SEQ_ID column in the NDF looks like.
>>>
>>> But, instead of looking at the columns' contents right now, please
>>> make sure the delimiters of the NDF are tabs. It doesn't appear
>>> that's
>>> the case. Note the warning "In max(ndfdata[["X"]]): no non-missing
>>> arguments to max; returning -Inf"... It suggests that ndfdata[["X"]]
>>> is NULL.
>>>
>>> Another thing: ensure the first line of the NDF is the header  
>>> (column
>>> names) and the data start on the 2nd line.
>>>
>>> PLease let me know how it goes.
>>>
>>> b
>>>
>>> On Jul 14, 2009, at 3:57 PM, Jack Schonbrun wrote:
>>>
>>>> Hello,
>>>>
>>>> I would like to use the oligo package to run the RMA algorithm on
>>>> Nimblegen expression arrays.  To that end, I am attempting to
>>>> construct an annotation package using makePdInfoPackage().
>>>>
>>>> I have followed the pattern in the "Building Annotation Packages
>>>> with pdInfoBuilder
>>>> for Use with the oligo Package" vignette:
>>>>
>>>> ----------------
>>>>
>>>>> ndfFile.test <- "test.ndf"
>>>>> xysFile.test <- "test.xys"
>>>>> seed.test <- new("NgsExpressionPDInfoPkgSeed", ndfFile =
>>>>> ndfFile.test, xysFile = xysFile.test)
>>>>> makePdInfoPackage(seed.test, destDir = "./Annotation")
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> = 
>>>> ===================================================================
>>>> Building annotation package for Nimblegen Expression Array
>>>> NDF:  test.ndf
>>>> XYS:  test.xys
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> = 
>>>> ===================================================================
>>>> Parsing file: test.ndf ... OK
>>>> Parsing file: test.xys ... OK
>>>> Merging NDF and XYS files ...OK
>>>> Preparing contents for featureSet table ...OK
>>>> Preparing contents for bgfeature table ...OK
>>>> Preparing contents for pmfeature table ...OK
>>>> Creating package in ./Annotation/pd.test
>>>> Inserting 0 rows into table "featureSet"... Error in
>>>> sqliteExecStatement(con, statement, bind.data) :
>>>> RS-DBI driver: (incomplete data binding: expected 2 parameters, got
>>>> 0)
>>>> In addition: Warning messages:
>>>> 1: In max(ndfdata[["Y"]]) :
>>>> no non-missing arguments to max; returning -Inf
>>>> 2: In max(ndfdata[["X"]]) :
>>>> no non-missing arguments to max; returning -Inf
>>>> 3: In sqliteExecStatement(con, statement, bind.data) :
>>>> ignoring zero-row bind.data
>>>>
>>>> ------------------
>>>>
>>>> Any help on why it would only be inserting 0 rows, or any of the
>>>> other messages would be greatly appreciated.  It does make some
>>>> files in the destDir, but does not run to completion.  Listing of
>>>> this directory available if it would help.
>>>>
>>>> I am running on Windows XP SP 2.  sessionInfo follows.
>>>>
>>>>> sessionInfo()
>>>> R version 2.9.1 (2009-06-26)
>>>> i386-pc-mingw32
>>>>
>>>> locale:
>>>> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
>>>> States.
>>>> 1252;LC_MONETARY=English_United States.
>>>> 1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>>>>
>>>> attached base packages:
>>>> [1] stats     graphics  grDevices utils     datasets  methods    
>>>> base
>>>>
>>>> other attached packages:
>>>> [1] pdInfoBuilder_1.8.1      affxparser_1.16.0
>>>> RSQLite_0.7-1            DBI_0.2-4
>>>> makePlatformDesign_1.8.0 oligo_1.8.1
>>>> [7] preprocessCore_1.6.0     oligoClasses_1.6.0
>>>> Biobase_2.4.1            affyio_1.12.0
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] Biostrings_2.12.7 IRanges_1.2.3     splines_2.9.1
>>>> tools_2.9.1
>>>>
>>>>
>>>> ===========================
>>>> Jack Schonbrun Ph.D.
>>>> Software Developer
>>>> Amyris Biotech
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>