[Bioc-devel] Invitation to the bioC developers Meeting in Seattle Mon 15 Aug

Thu Aug 4 18:18:01 CEST 2005

Gordon Smyth wrote:
> I wish to propose an agenda topic:
> 
> - Is Bioconductor's primary aim to provide a focused repository of 
> packages, aiming to attract software implementing cutting edge 
> bioinformatics research from as many quality labs around the world as 
> possible. Or is to produce a set of packages implementing more or less a 
> single integrated application?
Hi,
   These are very interesting ideas, but I am a bit worried about what 
appears to be an either-or discussions when my own view seems to be 
neither. But I would like to hear more, as I could well be missing the 
point. Perhaps a more open ended question, with a fixed time for 
discussion would be the best strategy - we have limited time and it 
would be good not to diverge too far during the meeting (unless that is 
what we want).

   Bioconductor is a community, a core, developers, and users. It is 
software, design strategies, data and a place (web + discussion lists) 
to discuss computational biology and bioinformatics. It is both open 
source and open development. Anyone that wants to can contribute. I 
would not be too comfortable with much of a change to that.

    What we can do from here, and try to do, is some of the mundane 
stuff. Like building meta-data packages, testing packages. We can help 
inexperienced  developers learn more about R. One of John Chambers basic 
ideas about the S language is that users should slowly become 
programmers and programmers should slowly become developers. I 
personally share that view.

    I would also like to help people decide on common data structures. 
The exprSet (now eSet) class of annotated data set. But also in other 
areas such as flow cytometry, protein mass spec, array CGH and soon ways 
of integrating data on genes and data on genomic coordinates.

    In my view the field is still too young and too rapidly developing 
to adopt a view that we know the answer and it is time to write the 
"right code". And even if that does happen (it may soon for 
microarrays), then I am of the opinion that some commercial operation is 
much better suited to deliver that. And for me there will be new 
technologies and new research challenges.

> 
> - If the primary aim is that of a repository, would it be worthwhile to 
> spin off a much smaller set of packages with a smaller developer team to 
> try to develop towards an integrated application?
> 
> Background. Although Bioconductor might have some characteristics of both a 
> repository and an integrated application, one of these two paradyms needs 
> to take precedence I think. To make an analogy, is it the aim of 
> Bioconductor to be the software analog of a research journal, or is it to 
> be the software analog of a monograph?
> 

   I think my view is that it is neither, but I am sure happy to hear 
other points of view. As I said above, the monograph analogy does not 
work too well for me, as it suggests we know all the answers - and I do 
not. It also suggests exclusion of the developer community, and I would 
prefer to take all comers and let the users vote with their feet (as it 
were). And I do not mean to be trite here. I see your point, and it 
might well be worth trying to spin off a monograph-like project or two. 
I keep hoping we will get organized enough to start to encourage some 
rethinking of a few of the areas where we have multiple packages that do 
similar things and encouraging the developers to use common data 
structures so that users can easily move between packages and do not 
become locked in to one system for all of their analyses.

   I would be happy to see how we can help to support different 
development models. Basically limma has taken a more comprehensive 
approach than most. We could, and probably should, start to produce 
binaries of R for Windows and Mac with more things installed aimed at 
particular user groups. So that users download the whole thing and some 
form of GUI is preinstalled and will work simply. However, that, like 
everything else costs money - and we simply do not have a lot of 
financial support for the project.

   In this vein, there is nothing to prevent it, and many are, using 
BioC as a basis for grant applications to follow their own research 
programs.
Also, if anyone UK-based has not noticed: the Welcome Trust has an RFA 
for software development.

> Under the first model, the development of different packages providing 
> different approaches to the same problem, i.e., competing with one another, 
> is to be expected and even encouraged. The aim is to promote a stimulating 
> environment for the development and dissemination of new techniques. The 
> obvious down-side is that a research journal however provides a very steep 
> learning curve for non-statistical users. A research journal can provide 
> occasional review articles for a wider audience.

   Yes, very good points - the learning curve is steep.

> 
> Under the second model, it is not reasonable to expect every lab in the 
> world to participate. Instead, one needs to select a smaller team of close 
> collaborators. Co-authors on research monographs are normally collaborators 
> who are also co-authors on associated research papers. Also, it is not 
> realistic to expect a monograph to keep up with the pace of a research 
> journal in terms of development of new techniques. So this model with move 
> more slowly and be less inclusive, but will be easier to present as an 
> integrated solution to a non-specialist audience.

    I do want lots of contributors - many of my very good ideas have 
come at unexpected times and from unexpected sources.

> 
> I think that one could view R itself, meaning the set of packages in the 
> default distribution, as being an example of the second model. This seems 
> to me that this is appropriate considering that the statistical methodology 
> implemented by the standard distribution of R is reasonably 
> well-established, mostly part of the canonical core of the statistical 
> discipline. On the other hand, the research problems being addressed by 
> Bioconductor, almost without exception, do not yet have generally accepted 
> solutions. On the contrary, the race is very much on to explore what is 
> possible and what is best. This situation makes the contrast between a 
> research journal and a monograph unusually marked, with the latter at risk 
> of being dated unusually quickly.

   Seems like we agree - but I think BioC can have a role in helping to 
put comparisons on a fair and reasonable footing. My current pet peeve 
is with the protein mass spec community, where comparison of methods is 
the exception and not the rule. At least in the gene expression world, 
we are moving towards substantive comparisons of methods and users 
having a better idea of the capabilities of the different models.

   I won't comment too much on base R, but my own view does differ a bit 
from your's. Some stuff is there for historical reasons, and it would be 
a mistake to take that as a wholesale endorsement, nor absence as the 
lack of an endorsement.

   BioC is and was what the members of the community make it. The core, 
the developers and the users all have voices and I think it has been 
remarkable in how well received it has been and how much interaction 
there is between these groups. While I hate to be so mundane: it is what 
we can get funding to make it or are willing to do for free. And some 
things are fundable - others are not. I am starting to believe that the 
only model that will work going forward, for the base project, is as a 
community service. That does not mean that there cannot be many 
different journal and monograph projects supported by the base project 
(and I think that is the most likely outcome) but it is many and not 
one. Computational biology and bioinformatics are both way to big to 
imagine one solution - but they do require a lot of common 
infrastructure and the more of that we can provide the better. It is 
that shared infrastructure that allows developers to construct small 
interoperable packages to solve novel problems (blending different data 
types with different machine learning algorithms). That is my own 
private bias; but I will say it has served me well.

   We might do very well to look at other projects, bioPerl, bioPython 
etc and see if they have had strategies that worked better for 
somethings. If there are any volunteers to collect info on other 
projects it might be good to try and coordinate that (Wolfgang, are you 
willing to organize this).

  Best wishes,
    Robert

> 
> Best regards
> Gordon
> 
> At 03:44 AM 4/08/2005, Wolfgang Huber wrote:
> 
>>Hi bioC developers,
>>
>>this is an invitation to the bioconductor developers meeting in
>>Seattle on Mon 15 Aug.
>>
>>It will be 13.00-17.00h at the FHCRC (http://www.fhcrc.org/about/maps)
>>in room M2-A823. Visitors will need to present to the reception and
>>someone will come down and greet them.
>>
>>Agenda topics are (you are encouraged to raise additional ones!) :
>>
>>- Overview of downloads, contributors ("annual report") - Seth, 15 min.
>>
>>- Overview over Task views - Vince
>>
>>- Whom do we want as our audience? Lab biologists? Practical
>>Bioinformaticians?  Statistics consultants/academics? All of them? What
>>are the implications for how design our projects.
>>
>>- How can we identify duplication or 'synergies' and how can encourage
>>integration of efforts (packages) that are currently duplicating (or
>>competing with) each other.
>>
>>- Are we happy with the way the project moves? Which people have we
>>lost, which ones do we want to welcome more?
>>
>>- Do we need a 3-day bioC developers retreat to go into more depth? If
>>yes, settle on when, where, and who's organizing it.
>>
>>- Short- and medium term goals for bioC?
>>
>>
>>
>>        *   *   *
>>
>>For those who are interested, I propose we can also have an informal
>>follow up over drinks and perhaps food afterwards.
>>
>>Best regards
>>   Wolfgang
> 
> 
> _______________________________________________
> Bioc-devel at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>