[Bioc-devel] Invitation to the bioC developers Meeting in Seattle Mon 15 Aug
rgentlem at fhcrc.org
Thu Aug 4 18:18:01 CEST 2005
Gordon Smyth wrote:
> I wish to propose an agenda topic:
> - Is Bioconductor's primary aim to provide a focused repository of
> packages, aiming to attract software implementing cutting edge
> bioinformatics research from as many quality labs around the world as
> possible. Or is to produce a set of packages implementing more or less a
> single integrated application?
These are very interesting ideas, but I am a bit worried about what
appears to be an either-or discussions when my own view seems to be
neither. But I would like to hear more, as I could well be missing the
point. Perhaps a more open ended question, with a fixed time for
discussion would be the best strategy - we have limited time and it
would be good not to diverge too far during the meeting (unless that is
what we want).
Bioconductor is a community, a core, developers, and users. It is
software, design strategies, data and a place (web + discussion lists)
to discuss computational biology and bioinformatics. It is both open
source and open development. Anyone that wants to can contribute. I
would not be too comfortable with much of a change to that.
What we can do from here, and try to do, is some of the mundane
stuff. Like building meta-data packages, testing packages. We can help
inexperienced developers learn more about R. One of John Chambers basic
ideas about the S language is that users should slowly become
programmers and programmers should slowly become developers. I
personally share that view.
I would also like to help people decide on common data structures.
The exprSet (now eSet) class of annotated data set. But also in other
areas such as flow cytometry, protein mass spec, array CGH and soon ways
of integrating data on genes and data on genomic coordinates.
In my view the field is still too young and too rapidly developing
to adopt a view that we know the answer and it is time to write the
"right code". And even if that does happen (it may soon for
microarrays), then I am of the opinion that some commercial operation is
much better suited to deliver that. And for me there will be new
technologies and new research challenges.
> - If the primary aim is that of a repository, would it be worthwhile to
> spin off a much smaller set of packages with a smaller developer team to
> try to develop towards an integrated application?
> Background. Although Bioconductor might have some characteristics of both a
> repository and an integrated application, one of these two paradyms needs
> to take precedence I think. To make an analogy, is it the aim of
> Bioconductor to be the software analog of a research journal, or is it to
> be the software analog of a monograph?
I think my view is that it is neither, but I am sure happy to hear
other points of view. As I said above, the monograph analogy does not
work too well for me, as it suggests we know all the answers - and I do
not. It also suggests exclusion of the developer community, and I would
prefer to take all comers and let the users vote with their feet (as it
were). And I do not mean to be trite here. I see your point, and it
might well be worth trying to spin off a monograph-like project or two.
I keep hoping we will get organized enough to start to encourage some
rethinking of a few of the areas where we have multiple packages that do
similar things and encouraging the developers to use common data
structures so that users can easily move between packages and do not
become locked in to one system for all of their analyses.
I would be happy to see how we can help to support different
development models. Basically limma has taken a more comprehensive
approach than most. We could, and probably should, start to produce
binaries of R for Windows and Mac with more things installed aimed at
particular user groups. So that users download the whole thing and some
form of GUI is preinstalled and will work simply. However, that, like
everything else costs money - and we simply do not have a lot of
financial support for the project.
In this vein, there is nothing to prevent it, and many are, using
BioC as a basis for grant applications to follow their own research
Also, if anyone UK-based has not noticed: the Welcome Trust has an RFA
for software development.
> Under the first model, the development of different packages providing
> different approaches to the same problem, i.e., competing with one another,
> is to be expected and even encouraged. The aim is to promote a stimulating
> environment for the development and dissemination of new techniques. The
> obvious down-side is that a research journal however provides a very steep
> learning curve for non-statistical users. A research journal can provide
> occasional review articles for a wider audience.
Yes, very good points - the learning curve is steep.
> Under the second model, it is not reasonable to expect every lab in the
> world to participate. Instead, one needs to select a smaller team of close
> collaborators. Co-authors on research monographs are normally collaborators
> who are also co-authors on associated research papers. Also, it is not
> realistic to expect a monograph to keep up with the pace of a research
> journal in terms of development of new techniques. So this model with move
> more slowly and be less inclusive, but will be easier to present as an
> integrated solution to a non-specialist audience.
I do want lots of contributors - many of my very good ideas have
come at unexpected times and from unexpected sources.
> I think that one could view R itself, meaning the set of packages in the
> default distribution, as being an example of the second model. This seems
> to me that this is appropriate considering that the statistical methodology
> implemented by the standard distribution of R is reasonably
> well-established, mostly part of the canonical core of the statistical
> discipline. On the other hand, the research problems being addressed by
> Bioconductor, almost without exception, do not yet have generally accepted
> solutions. On the contrary, the race is very much on to explore what is
> possible and what is best. This situation makes the contrast between a
> research journal and a monograph unusually marked, with the latter at risk
> of being dated unusually quickly.
Seems like we agree - but I think BioC can have a role in helping to
put comparisons on a fair and reasonable footing. My current pet peeve
is with the protein mass spec community, where comparison of methods is
the exception and not the rule. At least in the gene expression world,
we are moving towards substantive comparisons of methods and users
having a better idea of the capabilities of the different models.
I won't comment too much on base R, but my own view does differ a bit
from your's. Some stuff is there for historical reasons, and it would be
a mistake to take that as a wholesale endorsement, nor absence as the
lack of an endorsement.
BioC is and was what the members of the community make it. The core,
the developers and the users all have voices and I think it has been
remarkable in how well received it has been and how much interaction
there is between these groups. While I hate to be so mundane: it is what
we can get funding to make it or are willing to do for free. And some
things are fundable - others are not. I am starting to believe that the
only model that will work going forward, for the base project, is as a
community service. That does not mean that there cannot be many
different journal and monograph projects supported by the base project
(and I think that is the most likely outcome) but it is many and not
one. Computational biology and bioinformatics are both way to big to
imagine one solution - but they do require a lot of common
infrastructure and the more of that we can provide the better. It is
that shared infrastructure that allows developers to construct small
interoperable packages to solve novel problems (blending different data
types with different machine learning algorithms). That is my own
private bias; but I will say it has served me well.
We might do very well to look at other projects, bioPerl, bioPython
etc and see if they have had strategies that worked better for
somethings. If there are any volunteers to collect info on other
projects it might be good to try and coordinate that (Wolfgang, are you
willing to organize this).
> Best regards
> At 03:44 AM 4/08/2005, Wolfgang Huber wrote:
>>Hi bioC developers,
>>this is an invitation to the bioconductor developers meeting in
>>Seattle on Mon 15 Aug.
>>It will be 13.00-17.00h at the FHCRC (http://www.fhcrc.org/about/maps)
>>in room M2-A823. Visitors will need to present to the reception and
>>someone will come down and greet them.
>>Agenda topics are (you are encouraged to raise additional ones!) :
>>- Overview of downloads, contributors ("annual report") - Seth, 15 min.
>>- Overview over Task views - Vince
>>- Whom do we want as our audience? Lab biologists? Practical
>>Bioinformaticians? Statistics consultants/academics? All of them? What
>>are the implications for how design our projects.
>>- How can we identify duplication or 'synergies' and how can encourage
>>integration of efforts (packages) that are currently duplicating (or
>>competing with) each other.
>>- Are we happy with the way the project moves? Which people have we
>>lost, which ones do we want to welcome more?
>>- Do we need a 3-day bioC developers retreat to go into more depth? If
>>yes, settle on when, where, and who's organizing it.
>>- Short- and medium term goals for bioC?
>> * * *
>>For those who are interested, I propose we can also have an informal
>>follow up over drinks and perhaps food afterwards.
> Bioc-devel at stat.math.ethz.ch mailing list
More information about the Bioc-devel