[Bioc-devel] Invitation to the bioC developers Meeting in Seattle Mon 15 Aug

Fri Aug 5 14:44:46 CEST 2005

Is it not possible to support both models proposed by Gordon ?

I think the current situation follows the first model where we have
different competing packages and it is fairly easy for people to write
and contribute new packages. This would be for people on the cutting
edge side of research and/or willing to spend time programming.

The second model could integrate the more "popular" packages into more
coherent and stable modules and perhaps even has nice GUIs. This will be
attractive to non-programmers. In order for the second model to come
into existence, there needs to be funding and a dedicated team.

The added advantage of the second model is that this team can work with
some of the more difficult, specialised or luxurious aspects of the
software such as optimisations and dynamic/interactive graphs. At some
point, these features could be fed back into the first model.

I am aware of S+ArrayAnalyzer module which satisfies some the
description of second model (for microarray only) but I am not sure if
they are willing to channel some of the profits back into BioConductor.

Here are two example of obtaining funds that I can think of :

a) Redhat, SUSE and Mandrake offer "enhanced  services" for a fee but
their basic products are free.

b) R Foundation where people can contribute money to show their
appreciation for existing product without requiring extra features. 

I am speaking from my limited knowledge of R, BioConductor, Linux and
life in general, so please take this with a lot of salt.

Regards, Adai

On Thu, 2005-08-04 at 09:18 -0700, Robert Gentleman wrote:
> 
> Gordon Smyth wrote:
> > I wish to propose an agenda topic:
> > 
> > - Is Bioconductor's primary aim to provide a focused repository of 
> > packages, aiming to attract software implementing cutting edge 
> > bioinformatics research from as many quality labs around the world as 
> > possible. Or is to produce a set of packages implementing more or less a 
> > single integrated application?
> Hi,
>    These are very interesting ideas, but I am a bit worried about what 
> appears to be an either-or discussions when my own view seems to be 
> neither. But I would like to hear more, as I could well be missing the 
> point. Perhaps a more open ended question, with a fixed time for 
> discussion would be the best strategy - we have limited time and it 
> would be good not to diverge too far during the meeting (unless that is 
> what we want).
> 
>    Bioconductor is a community, a core, developers, and users. It is 
> software, design strategies, data and a place (web + discussion lists) 
> to discuss computational biology and bioinformatics. It is both open 
> source and open development. Anyone that wants to can contribute. I 
> would not be too comfortable with much of a change to that.
> 
>     What we can do from here, and try to do, is some of the mundane 
> stuff. Like building meta-data packages, testing packages. We can help 
> inexperienced  developers learn more about R. One of John Chambers basic 
> ideas about the S language is that users should slowly become 
> programmers and programmers should slowly become developers. I 
> personally share that view.
> 
>     I would also like to help people decide on common data structures. 
> The exprSet (now eSet) class of annotated data set. But also in other 
> areas such as flow cytometry, protein mass spec, array CGH and soon ways 
> of integrating data on genes and data on genomic coordinates.
> 
>     In my view the field is still too young and too rapidly developing 
> to adopt a view that we know the answer and it is time to write the 
> "right code". And even if that does happen (it may soon for 
> microarrays), then I am of the opinion that some commercial operation is 
> much better suited to deliver that. And for me there will be new 
> technologies and new research challenges.
> 
> 
> > 
> > - If the primary aim is that of a repository, would it be worthwhile to 
> > spin off a much smaller set of packages with a smaller developer team to 
> > try to develop towards an integrated application?
> > 
> > Background. Although Bioconductor might have some characteristics of both a 
> > repository and an integrated application, one of these two paradyms needs 
> > to take precedence I think. To make an analogy, is it the aim of 
> > Bioconductor to be the software analog of a research journal, or is it to 
> > be the software analog of a monograph?
> > 
> 
>    I think my view is that it is neither, but I am sure happy to hear 
> other points of view. As I said above, the monograph analogy does not 
> work too well for me, as it suggests we know all the answers - and I do 
> not. It also suggests exclusion of the developer community, and I would 
> prefer to take all comers and let the users vote with their feet (as it 
> were). And I do not mean to be trite here. I see your point, and it 
> might well be worth trying to spin off a monograph-like project or two. 
> I keep hoping we will get organized enough to start to encourage some 
> rethinking of a few of the areas where we have multiple packages that do 
> similar things and encouraging the developers to use common data 
> structures so that users can easily move between packages and do not 
> become locked in to one system for all of their analyses.
> 
>    I would be happy to see how we can help to support different 
> development models. Basically limma has taken a more comprehensive 
> approach than most. We could, and probably should, start to produce 
> binaries of R for Windows and Mac with more things installed aimed at 
> particular user groups. So that users download the whole thing and some 
> form of GUI is preinstalled and will work simply. However, that, like 
> everything else costs money - and we simply do not have a lot of 
> financial support for the project.
> 
>    In this vein, there is nothing to prevent it, and many are, using 
> BioC as a basis for grant applications to follow their own research 
> programs.
> Also, if anyone UK-based has not noticed: the Welcome Trust has an RFA 
> for software development.
> 
> > Under the first model, the development of different packages providing 
> > different approaches to the same problem, i.e., competing with one another, 
> > is to be expected and even encouraged. The aim is to promote a stimulating 
> > environment for the development and dissemination of new techniques. The 
> > obvious down-side is that a research journal however provides a very steep 
> > learning curve for non-statistical users. A research journal can provide 
> > occasional review articles for a wider audience.
> 
>    Yes, very good points - the learning curve is steep.
> 
> > 
> > Under the second model, it is not reasonable to expect every lab in the 
> > world to participate. Instead, one needs to select a smaller team of close 
> > collaborators. Co-authors on research monographs are normally collaborators 
> > who are also co-authors on associated research papers. Also, it is not 
> > realistic to expect a monograph to keep up with the pace of a research 
> > journal in terms of development of new techniques. So this model with move 
> > more slowly and be less inclusive, but will be easier to present as an 
> > integrated solution to a non-specialist audience.
> 
> 
>     I do want lots of contributors - many of my very good ideas have 
> come at unexpected times and from unexpected sources.
> 
> > 
> > I think that one could view R itself, meaning the set of packages in the 
> > default distribution, as being an example of the second model. This seems 
> > to me that this is appropriate considering that the statistical methodology 
> > implemented by the standard distribution of R is reasonably 
> > well-established, mostly part of the canonical core of the statistical 
> > discipline. On the other hand, the research problems being addressed by 
> > Bioconductor, almost without exception, do not yet have generally accepted 
> > solutions. On the contrary, the race is very much on to explore what is 
> > possible and what is best. This situation makes the contrast between a 
> > research journal and a monograph unusually marked, with the latter at risk 
> > of being dated unusually quickly.
> 
> 
>    Seems like we agree - but I think BioC can have a role in helping to 
> put comparisons on a fair and reasonable footing. My current pet peeve 
> is with the protein mass spec community, where comparison of methods is 
> the exception and not the rule. At least in the gene expression world, 
> we are moving towards substantive comparisons of methods and users 
> having a better idea of the capabilities of the different models.
> 
>    I won't comment too much on base R, but my own view does differ a bit 
> from your's. Some stuff is there for historical reasons, and it would be 
> a mistake to take that as a wholesale endorsement, nor absence as the 
> lack of an endorsement.
> 
>    BioC is and was what the members of the community make it. The core, 
> the developers and the users all have voices and I think it has been 
> remarkable in how well received it has been and how much interaction 
> there is between these groups. While I hate to be so mundane: it is what 
> we can get funding to make it or are willing to do for free. And some 
> things are fundable - others are not. I am starting to believe that the 
> only model that will work going forward, for the base project, is as a 
> community service. That does not mean that there cannot be many 
> different journal and monograph projects supported by the base project 
> (and I think that is the most likely outcome) but it is many and not 
> one. Computational biology and bioinformatics are both way to big to 
> imagine one solution - but they do require a lot of common 
> infrastructure and the more of that we can provide the better. It is 
> that shared infrastructure that allows developers to construct small 
> interoperable packages to solve novel problems (blending different data 
> types with different machine learning algorithms). That is my own 
> private bias; but I will say it has served me well.
> 
>    We might do very well to look at other projects, bioPerl, bioPython 
> etc and see if they have had strategies that worked better for 
> somethings. If there are any volunteers to collect info on other 
> projects it might be good to try and coordinate that (Wolfgang, are you 
> willing to organize this).
> 
>   Best wishes,
>     Robert
> 
> > 
> > Best regards
> > Gordon
> > 
> > At 03:44 AM 4/08/2005, Wolfgang Huber wrote:
> > 
> >>Hi bioC developers,
> >>
> >>this is an invitation to the bioconductor developers meeting in
> >>Seattle on Mon 15 Aug.
> >>
> >>It will be 13.00-17.00h at the FHCRC (http://www.fhcrc.org/about/maps)
> >>in room M2-A823. Visitors will need to present to the reception and
> >>someone will come down and greet them.
> >>
> >>Agenda topics are (you are encouraged to raise additional ones!) :
> >>
> >>- Overview of downloads, contributors ("annual report") - Seth, 15 min.
> >>
> >>- Overview over Task views - Vince
> >>
> >>- Whom do we want as our audience? Lab biologists? Practical
> >>Bioinformaticians?  Statistics consultants/academics? All of them? What
> >>are the implications for how design our projects.
> >>
> >>- How can we identify duplication or 'synergies' and how can encourage
> >>integration of efforts (packages) that are currently duplicating (or
> >>competing with) each other.
> >>
> >>- Are we happy with the way the project moves? Which people have we
> >>lost, which ones do we want to welcome more?
> >>
> >>- Do we need a 3-day bioC developers retreat to go into more depth? If
> >>yes, settle on when, where, and who's organizing it.
> >>
> >>- Short- and medium term goals for bioC?
> >>
> >>
> >>
> >>        *   *   *
> >>
> >>For those who are interested, I propose we can also have an informal
> >>follow up over drinks and perhaps food afterwards.
> >>
> >>Best regards
> >>   Wolfgang
> > 
> > 
> > _______________________________________________
> > Bioc-devel at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
> 
> _______________________________________________
> Bioc-devel at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>