[BioC] makePdInfoPackage in preparation for RMA with oligo on Nimblegen Expression Arrays
Jack Schonbrun
schonbrun at amyris.com
Thu Aug 6 19:32:19 CEST 2009
Yes, and thank you Benilton for all your help on this. The structure of an .xys file does not seem to be documented clearly anywhere. If you are deriving your own .xys files from .pair files, some precise details might be useful:
The file needs to be sorted first on Y and then X. And all values of X and Y from 1 to the max in your design file must be present, with an "NA" if they are not in the .pair file. The first line of the .xys file should be a comment line that includes the designname
Here how the start of an .xys file might look:
# software=NimbleScan version=1.0.0 designname=XXXXXX_Exp
X Y SIGNAL COUNT
1 1 NA NA
2 1 NA NA
3 1 NA NA
4 1 NA NA
5 1 NA NA
6 1 NA NA
7 1 NA NA
8 1 NA NA
9 1 NA NA
10 1 NA NA
Jack
-----Original Message-----
From: Benilton Carvalho [mailto:bcarvalh at jhsph.edu]
Sent: Thursday, August 06, 2009 10:23 AM
To: Jack Schonbrun
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] makePdInfoPackage in preparation for RMA with oligo on Nimblegen Expression Arrays
Just to close this thread:
Jack regenerated the XYS files ensuring that: A) the data are sorted
by X/Y coordinates and B) the resulting file contains NA on the
coordinates not listed in the PAIR file.
b
On Jul 14, 2009, at 7:05 PM, Jack Schonbrun wrote:
>> xys <- read.delim(xysFile, comment='#', nrow=3)
>> str(xys)
> 'data.frame': 3 obs. of 4 variables:
> $ X : int 209 228 43
> $ Y : int 203 52 257
> $ SIGNAL: num 203 146 159
> $ COUNT : int 1 1 1
>
> -----Original Message-----
> From: Benilton Carvalho [mailto:bcarvalh at jhsph.edu]
> Sent: Tuesday, July 14, 2009 3:03 PM
> To: Jack Schonbrun
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] makePdInfoPackage in preparation for RMA with
> oligo on Nimblegen Expression Arrays
>
> how about?
>
> xys <- read.delim(xysFile, comment="#", nrow=100)
> str(xys)
>
> b
>
> On Jul 14, 2009, at 6:58 PM, Jack Schonbrun wrote:
>
>> Here's what I get:
>>
>>> ndf <- read.delim(ndfFile, stringsAsFactors=FALSE, nrow=100)
>>> str(ndf)
>> 'data.frame': 100 obs. of 17 variables:
>> $ PROBE_DESIGN_ID : chr "6531_0301_0005" "6531_0311_0005"
>> "6531_0331_0005" "6531_0333_0005" ...
>> $ CONTAINER : chr "SACCHAROMYCES1" "SACCHAROMYCES1"
>> "NGS_CONTROLS" "NGS_CONTROLS" ...
>> $ DESIGN_NOTE : chr "rank_selected" "rank_selected" "upper
>> right fiducial" "" ...
>> $ SELECTION_CRITERIA: chr "rank:03;score:379;uniq:14;count:37;freq:
>> 01;rules:1;tm:82.4" "rank:05;score:046;uniq:14;count:1110;freq:
>> 30;rules:1;tm:78.3" "bright" "" ...
>> $ SEQ_ID : chr "SCER070900001885" "SCER070900001596"
>> "FIDUCIAL_UPPER_RIGHT" "CROSSHYBE" ...
>> $ PROBE_SEQUENCE : chr
>> "GTCAACCCTGCAAGATCTCTGGGTGCCGCCGTTGCTGCCAGATATTTCCCTCATTACCAC"
>> "TCAGTTGGAACGCCTCTGAGCACTCCATCACCTGAGTCAGGTAATACATTTACTGATTCA"
>> "TGAGTTGTTTGATAGGATTATTCATAGAGGTCATTACAGCGAGAGGAANNNNNNNNN"
>> "CGATGCGACGCGAACTAAGCAGTTCGGCGCAGTCGACTAGTATAACAGNNNNNNNN" ...
>> $ MISMATCH : int 0 0 0 0 0 0 0 0 0 0 ...
>> $ MATCH_INDEX : int 72062965 72061238 2000207 70654015
>> 70652179 65069272 65069273 65069274 65069275 65069276 ...
>> $ FEATURE_ID : int 72062965 72061238 71722817 71722819
>> 71722820 71722824 71722825 71722826 71722827 71722828 ...
>> $ ROW_NUM : int 5 5 5 5 6 6 6 6 6 6 ...
>> $ COL_NUM : int 301 311 331 333 1 5 6 7 8 9 ...
>> $ PROBE_CLASS : chr "experimental" "experimental" "fiducial"
>> "control:crosshybe" ...
>> $ PROBE_ID : chr "SCER070900001885P00271"
>> "SCER070900001596P00406" "CPK6" "XENOTRACK48P02" ...
>> $ POSITION : int 271 406 0 2 0 0 5 0 6 0 ...
>> $ DESIGN_ID : int 6531 6531 6531 6531 6531 6531 6531 6531
>> 6531 6531 ...
>> $ X : int 301 311 331 333 1 5 6 7 8 9 ...
>> $ Y : int 5 5 5 5 6 6 6 6 6 6 ...
>>>
>>
>> -----Original Message-----
>> From: Benilton Carvalho [mailto:bcarvalh at jhsph.edu]
>> Sent: Tuesday, July 14, 2009 2:56 PM
>> To: Jack Schonbrun
>> Cc: bioconductor at stat.math.ethz.ch
>> Subject: Re: [BioC] makePdInfoPackage in preparation for RMA with
>> oligo on Nimblegen Expression Arrays
>>
>> what do you get if you run the following (assuming ndfFile is a
>> variable has the file name)?
>>
>> ndf <- read.delim(ndfFile, stringsAsFactors=FALSE, nrows=100)
>> str(ndf)
>>
>> thanks,
>>
>> b
>>
>> On Jul 14, 2009, at 6:49 PM, Jack Schonbrun wrote:
>>
>>> Benilton,
>>>
>>> Thanks for your suggestions.
>>>
>>> By every means I have tested, the file is tab delimited. And the
>>> first row is headers, all other data.
>>>
>>> Here is how the first (header) row looks:
>>> PROBE_DESIGN_ID CONTAINER DESIGN_NOTE
>>> SELECTION_CRITERIA SEQ_ID PROBE_SEQUENCE MISMATCH
>>> MATCH_INDEX FEATURE_ID ROW_NUM COL_NUM PROBE_CLASS
>>> PROBE_ID POSITION DESIGN_ID X Y
>>>
>>> Any other details on how the ndf is expected to look?
>>>
>>> Thanks again,
>>> Jack
>>>
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Benilton Carvalho [mailto:bcarvalh at jhsph.edu]
>>> Sent: Tuesday, July 14, 2009 1:34 PM
>>> To: Jack Schonbrun
>>> Cc: bioconductor at stat.math.ethz.ch
>>> Subject: Re: [BioC] makePdInfoPackage in preparation for RMA with
>>> oligo on Nimblegen Expression Arrays
>>>
>>> Jack,
>>>
>>> it looks like your NDF isn't as expected.
>>>
>>> When it shows: "inserting 0 rows into table 'featureSet'", it makes
>>> me
>>> wonder how the SEQ_ID column in the NDF looks like.
>>>
>>> But, instead of looking at the columns' contents right now, please
>>> make sure the delimiters of the NDF are tabs. It doesn't appear
>>> that's
>>> the case. Note the warning "In max(ndfdata[["X"]]): no non-missing
>>> arguments to max; returning -Inf"... It suggests that ndfdata[["X"]]
>>> is NULL.
>>>
>>> Another thing: ensure the first line of the NDF is the header
>>> (column
>>> names) and the data start on the 2nd line.
>>>
>>> PLease let me know how it goes.
>>>
>>> b
>>>
>>> On Jul 14, 2009, at 3:57 PM, Jack Schonbrun wrote:
>>>
>>>> Hello,
>>>>
>>>> I would like to use the oligo package to run the RMA algorithm on
>>>> Nimblegen expression arrays. To that end, I am attempting to
>>>> construct an annotation package using makePdInfoPackage().
>>>>
>>>> I have followed the pattern in the "Building Annotation Packages
>>>> with pdInfoBuilder
>>>> for Use with the oligo Package" vignette:
>>>>
>>>> ----------------
>>>>
>>>>> ndfFile.test <- "test.ndf"
>>>>> xysFile.test <- "test.xys"
>>>>> seed.test <- new("NgsExpressionPDInfoPkgSeed", ndfFile =
>>>>> ndfFile.test, xysFile = xysFile.test)
>>>>> makePdInfoPackage(seed.test, destDir = "./Annotation")
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> ===================================================================
>>>> Building annotation package for Nimblegen Expression Array
>>>> NDF: test.ndf
>>>> XYS: test.xys
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> ===================================================================
>>>> Parsing file: test.ndf ... OK
>>>> Parsing file: test.xys ... OK
>>>> Merging NDF and XYS files ...OK
>>>> Preparing contents for featureSet table ...OK
>>>> Preparing contents for bgfeature table ...OK
>>>> Preparing contents for pmfeature table ...OK
>>>> Creating package in ./Annotation/pd.test
>>>> Inserting 0 rows into table "featureSet"... Error in
>>>> sqliteExecStatement(con, statement, bind.data) :
>>>> RS-DBI driver: (incomplete data binding: expected 2 parameters, got
>>>> 0)
>>>> In addition: Warning messages:
>>>> 1: In max(ndfdata[["Y"]]) :
>>>> no non-missing arguments to max; returning -Inf
>>>> 2: In max(ndfdata[["X"]]) :
>>>> no non-missing arguments to max; returning -Inf
>>>> 3: In sqliteExecStatement(con, statement, bind.data) :
>>>> ignoring zero-row bind.data
>>>>
>>>> ------------------
>>>>
>>>> Any help on why it would only be inserting 0 rows, or any of the
>>>> other messages would be greatly appreciated. It does make some
>>>> files in the destDir, but does not run to completion. Listing of
>>>> this directory available if it would help.
>>>>
>>>> I am running on Windows XP SP 2. sessionInfo follows.
>>>>
>>>>> sessionInfo()
>>>> R version 2.9.1 (2009-06-26)
>>>> i386-pc-mingw32
>>>>
>>>> locale:
>>>> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
>>>> States.
>>>> 1252;LC_MONETARY=English_United States.
>>>> 1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>>>>
>>>> attached base packages:
>>>> [1] stats graphics grDevices utils datasets methods
>>>> base
>>>>
>>>> other attached packages:
>>>> [1] pdInfoBuilder_1.8.1 affxparser_1.16.0
>>>> RSQLite_0.7-1 DBI_0.2-4
>>>> makePlatformDesign_1.8.0 oligo_1.8.1
>>>> [7] preprocessCore_1.6.0 oligoClasses_1.6.0
>>>> Biobase_2.4.1 affyio_1.12.0
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] Biostrings_2.12.7 IRanges_1.2.3 splines_2.9.1
>>>> tools_2.9.1
>>>>
>>>>
>>>> ===========================
>>>> Jack Schonbrun Ph.D.
>>>> Software Developer
>>>> Amyris Biotech
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>
More information about the Bioconductor
mailing list