[Bioc-devel] NAs into accessory table

Marc Carlson mcarlson at fhcrc.org
Mon Jan 25 21:31:23 CET 2010


Hi Robert,

I am having trouble reproducing this.  When I run this I get the
behavior that I described before.  But perhaps you need to make sure
that the flat file is tab delimited?  Annotation packages have always
been generated from tab delimited files as described in the SQLForge
vignette.  It was a long time ago so I no longer recall if I was typing
a pipe to represent a tab abstractly, if this was how the thread was
discussing it, or if somehow my email client converted \t into |, but
hopefully I have not confused you into thinking these things should be
pipe delimited.  Anyhow, I suspect that is not actually the problem, so
it would be helpful to me if you could send me a small example file that
you are trying to use, and also include your sessionInfo().

my input file looks like this:

anykey\tID1
anykey2\t
anykey3\tID2

etc.

  Marc



Robert Castelo wrote:
> dear Marc and the rest of the bioc-devel list,
>
> this is about a message from a few months ago where you explained to me
> how to introduce NAs as values associated to valid probe identifiers of
> an accessory table with extra information about the probes of a custom
> array platform that would form part of an annotation package. you told
> me to simply put blank values in the flat file but at that time i
> couldn't try it out because i at that moment i could not put my hands on
> re-doing the package, now i'm redoing it to make it work for the latest
> devel version of R and BioC and i'm finding that it doesn't work or at
> least i didn't completely understand your suggestion, please see below.
>
> On Wed, 2009-11-04 at 12:12 -0800, Marc Carlson wrote:
>   
>> And to answer Roberts question:  To get an NA back (in R) from one of
>> these accessory tables you are adding, you should only have to have null
>> values in the relevant fields after the import.  No need to put NA
>> strings into your input files as that will result in NA strings being
>> stored in the DB.  Just leaving those portions of the input table blank
>> should result in null values in your database table, which should give
>> you the results you want when you look at those from R using
>> AnnotationDbi (meaning NAs). 
>>
>> So basically your database table should look like this when you query it:
>>
>> whateverkey|
>>
>>
>> Please let me know if I need to clarify that.
>>     
>
> i've generated a flat file with
>
> whateverkey|\n
>
> where '\n' is the newline character so that the associated value is
> blank (i'm interpreting blank as the empty string). however, once the
> package is built i get the following output for cases like the ones i
> show here where the probe IDs refer to control probes for which these
> extra information does not exist and the package should provide NAs:
>
> tail(unlist(as.list(myCustomPkgPROBESTART)))
> 43334 43335 43336 43337 43338 43339 
>    ""    ""    ""    ""    ""    "" 
>
> so the empty string is provided (which is in fact what i put into the
> flat file) instead of NAs.
>
> the SQL code that creates such a table is the following:
>
> CREATE TABLE prbStart (
>           probe_id VARCHAR(80) NOT NULL,
>           probe_start VARCHAR(32) NOT NULL,
>           FOREIGN KEY (probe_id) REFERENCES probes (probe_id) ) ;
>
> just in case this issue needs to be addressed in the table creation
> statements. any idea will be very appreciated.
>
> thanks!
> robert.
>
>
>   
>>   Marc
>>
>>
>>
>>
>> Robert Castelo wrote:
>>     
>>> hi Seth and rest of the list,
>>>
>>> never thought about this till i saw your email and in particular this
>>> clarification:
>>>
>>>   
>>>       
>>>> When you call get() or otherwise retrieve a value from an annotation 
>>>> package object using a key, like a probe ID, there are three situations:
>>>>
>>>> 1. The probe ID is valid and maps to a value in the given object.
>>>> 2. The probe ID is valid, but does not map to a value so NA is returned.
>>>> 3. The probe ID is not valid, an error is raised.
>>>>     
>>>>         
>>> i've built an annotation package for a custom array to which i've added
>>> a few new sql tables to provide additional mappings to various
>>> non-standard annotations on the probes (following section 3 -how to add
>>> extra data into your packages- from vignette SQLForge of AnnotationDbi).
>>> the way in which i add these data is by creating a flat file with the
>>> "records" and importing them into the SQL database of the package
>>> through the unix shell with
>>>
>>> sqlite3 dbName << EOF
>>> import newdata.txt newtable
>>> .exit
>>> EOF
>>>
>>> where dbName should be the .sqlite file created by popXXXCHIPDB(),
>>> newdata.txt is the flat file with the data of this new mapping and
>>> newtable is the SQL table i've specifically created on the .sqlite file
>>> to support the mapping in my annotation package.
>>>
>>> however, in this way i don't know how to implement the second situation
>>> you describe. i tried to associate NA's to valid keys having lines
>>>
>>> ..
>>> whateverkey|NA
>>> ..
>>>
>>> in the flat file that is imported later but then this NA is not
>>> interpreted as an NA value but as a string "NA". then i concluded that
>>> the way i had to do it was to remove those lines and having the user to
>>> specify the parametere ifnotfound=NA in their get/mget commands.
>>>
>>>
>>> so now my question would be (either for you or for whoever in the list
>>> knows about this). how do i introduce a new mapping into my annotation
>>> package such that a key is valid but it does not map to a value so that
>>> NA is returned?
>>>
>>>
>>> thanks!
>>> robert.
>>>
>>> _______________________________________________
>>> Bioc-devel at stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>>   
>>>       
>>     
>
>
>



More information about the Bioc-devel mailing list