[BioC] Decomposing and recomposing BioConductor data sets

Jarle Snertingdalen snerting at stud.ntnu.no
Fri Jan 9 17:00:36 MET 2004


Hi

We're working on a project which uses R and other similar
programs/languages to all sorts of computing. The project is written in
Python, and we use RPy to communicate with R. However, data sets (like
swirl) from BioConductor are not compatible with RPy. Through some
research we found that they are made up of empty lists with a range of
attributes attached to them.

I have to take apart the datasets, create 'clones' in python and then
reconstruct them in R. The solution I have come up with so far is to
recursively retrieve the attributes using 'attributes(foo)' and the
'foo$bar' syntax, place the retrieved attributes in a nested list, and
convert the list to Python with RPy.

My questions are: Why isn't the information put in nested lists in the
first place, is my suggested approach valid for all BioConductor data
sets, and is it the easiest way for generic decomposition of the data
sets?


regards,

Jarle Snertingdalen
Software Developer http://www.zherlock.org
NTNU Norway (Norwegian University of Science and Technology)


PS: Some background information about the project; It is called SciCraft
(formerly known as Zherlock), and is a graphical data analasys tool using
third party software (like R and Octave) for computation. A typical use of
the program could be to read data in R-format, use some Octave-function on
the data read, then sending the data back into R for further work, and
finally plot the results with some plot-tool and perhaps exporting it to
some format. Whether a function is ran in R og Octave is invisible to the
user.



More information about the Bioconductor mailing list