[Bioc-devel] ANN: BioC Developer's Meeting/agenda/discussion

Vincent Carey 525-2265 stvjc at channing.harvard.edu
Wed Jul 6 22:11:20 CEST 2005

I didn't see any discussion of this, and the upcoming
meeting is approaching rapidly.  So here are some comments.

> We propose the following agenda items.
> 1. Who do we want as our audience? Lab biologists? Practical
>    Bioinformaticians?  Statistics consultants/academics? What are the
>    implications for our projects.

It seems to me that the project has hit a reasonable target.
Methodological developers want, in many cases, to build on the conventions
and infrastructure of Bioconductor.  It gives them exposure and a grain
of plausibility.  Lab biologists want to use open
tools when those tools are convenient and valid, critically acceptable.
If the biologists weren't interested, we'd probably have less interest
from developers.  If the developers weren't interested, we wouldn't
have the biologists.

I don't think we have to alter our audience.  How we serve those
two components does warrant discussion.  For developers, we could
provide more guidelines/guidance.  For biologists, we could reduce
system complexity.  For example, we could work to reduce the number
of packages that somehow are manipulated to carry out a given task.

Following the example of MLInterfaces, we could use namespaces to
get access to code without explicit package loading.  The major
activities (e.g., biocImport, biocPreproc, biocDiffExp, biocMachLearn,
biocReportGen) would be functions in Biobase.  Biobase maintainers
will define the API that a package must conform to in order for its
code to be available through these wrappers.  This need not be very
stringent, as we see with MLInterfaces.  I believe that this approach
would take the user from a situation of navigating various packages
and functions within packages to a situation of navigating options
within one function within one package.

(N.B.  MLInterfaces is no panacea.  The current design with a generic
for every machine learning method is challenging to maintain and
document.  But the underlying idea seems good.  A simple input container
structure (exprSet), uniform output container structures (MLOutput and
subclasses for clustering and classification).  The external software
is acquired at run time through namespace qualifications in function calls.)

> 2. Integration of efforts (packages) that are currently duplicating
>    (or competing with) each other.

Yes, the task view catalog that I have established will help identify
these competitions more conveniently.  I will shortly post links to this catalog.

> 3. Do we need a 3-day BioC developers retreat to go into more depth
>    than what is possible on 1 day, if yes, settle on when, where (some
>    isolated but attractive place in US or Europe), and who's
>    organizing it.

I think we need more time to discuss strategies in the current core.
A joint retreat has to be more useful than three days of writ^H^H^H^Hfixing
code separately.

> 4. Short-term and medium-term goals for BioC?

Short term:  1) Agree on a set of task views.  I will give an example
soon and there will probably be disagreements.  2) Create a reference
card.  I started modifying the R reference card by Tom Short (CRAN/
Documentation/Contributed) and
will check in a package bcRefcard so that we can do this jointly.

Medium term:  1) Can we simplify the big picture of the project?  I
have described it as a set of containers and workflow components for
statistics of high-throughput experiments.  The container set is
somewhat large (marrayRaw, marrayInfo, affyBatch, RGList,
exprSet, environment, ...) and the workflow components are very
diverse and redundant in various cases.  I don't think we have
to eliminate containers or components for a while, but we should
be able to identify smaller sets of very durable component designs
that we try to get people to invest in.  2) Some complexity arises
because various packages assume roles for files on disk (e.g., targets
files, gal files).  In many cases one could more readily create
useful information in an R data frame than in a disk file, but the R data
frame cannot be used directly.  A possible principle:  whenever a disk
file plays a given role, it should be possible for an R object with equivalent
information to play the same role.  3) Can we identify a
good role for RDBMS?  4) Can we take advantage of compute clusters more

Another agenda item: Should we survey users formally to determine
how their needs are met and what needs are unmet?  Likewise with

Another agenda item: Forthcoming developments.  In bioc, how do
we communicate to people when new resources are available and
should be used, like eSet ?  In R, how do we get up-to-date information
on evolution of S4 and take advantage of this in coding and planning?

I suppose we could start maintaining a document like "Writing R
extensions" that is focused on "Writing Bioconductor Packages".

> Whether or not you will be able to attend, we encourage you to
> contribute suggestions for additional agenda items (send them to me
> or, better yet, post to bioc-devel).
> Details on specific meeting location and times will follow, but we
> wanted to put out this invitation as soon as possible to allow for
> those interested to arrange their schedules.
> Note that the timing is between the R and BioC conferences being held
> in Seattle.  Information on accommodations and airfare discounts
> available here: http://www.bioconductor.org/meeting05/
> Best Wishes,
> + seth
> _______________________________________________
> Bioc-devel at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

More information about the Bioc-devel mailing list